FAQ archive (Object-Oriented Programming)

From Spivey's Corner
Jump to: navigation, search

These questions relate to the 2013 edition of the course, which used Java in place of Scala.

New questions

The sample write-up has diff listings at the end where the differences ae highlighted. How did you do that?

I wrote the text using plain TeX and my unpublished collection of macros – though LaTeX would do just as well – and made a PDF file using pdftex. Then I made difference listings using Mercurial, converted them into Postscript using a unix command called enscript, and transformed the Postcript into PDF using ps2pdf. Finally, the write-up and the listings were concatenated using pdfunite into a single PDF file. To get the highlighting, give enscript the flag -Ediffu.

Beware: pdfunite is a dangerous program invoked with the command pdfunite file1.pdf file2.pdf ... filen.pdf outfile.pdf. Omitting to name the outfile results in overwriting one of the input files.

The term 'priority queue' seems to imply that the entries apear in a certain order. Is it possible to specify a priority queue in terms of an abstract state that is an unordered collection, such as a set or bag?

Yes, it is. Here's how to think about it – if I told you that the queue contained entries 4, 3, 1 and 6 and asked you what delMin would return, you could tell me: 1; and you could tell me that afterwards the queue would contain 3, 4, and 6. You don't need to list the entries in a certain order to answer these questions, so it's quite possible to model the abstract state as a set or, if we allow a number to be in the queue multiple times, a bag.

There's no need to keep track of the order the elements were inserted by adding them to the end of a sequence, and in fact it would spoil our specification if you did that, because the specification would say "keep track of the order", but many implementations – such as one using a heap – would not keep track of it. The upshot would be that the implementation, though behaviourally correct, could not be described with an abstraction function, since distinct abstract states would be represented by the same concrete state.

Or perhaps you had in mind that the sequence you use as the abstract state would be sorted, amounting to an abstract invariant. But note that sorted sequences without repretitions are in one-to-one correspondence with finite sets, or with repetition they can be put into correspondence with finite bags. If two ADT specifications have abstract state spaces that are in one-to-one correspondence, that makes them essentially the same specification, satisfied by the same implementations. We should choose between them on the basis of which is simplest, and it seems to me that bags are simpler than sequences with an (abstract) invariant. So either way, there's no real need to use an ordered structure to model the abstract state.

Now we can implement the ADT by storing the elements as a sequence, either letting elements appear more-or-less in the order they were inserted, and searching for the minimum element each time delMin is called (the order might be destroyed if delMin doesn't spend extra time preserving it); or we might choose to make the sequence ordered, implementing insert as the inner loop of insertion sort, and delMin by deleting and returning the first element. These are both valid implementations, and the abstraction function in each case is to take the bag of integers that appear somewhere in the sequence.

Can anything be said in favour of the Java collections framework?

Yes, quite a lot. I don't find all of it entirely convincing, however.

Problems with problems

What is problem 1.2 asking?

Problem 1.1 is to extend the program that uses an array of characters, so that it also deletes the 'c's from the input.

Problem 1.2 is to extend the program that uses a Text object in the same way. The 'horizontal lines' were missing from the program in the handout, but they surround the part between reading the input and writing the output.

The lectures explain data refinement in terms of an abstraction function from concrete states to abstract states, but in Problem 2.6(a) no such function exists. Is it OK to replace the abstraction function by a relation?

The rules for implementing an ADT using an abstraction function are relatively simple, but they are not fully general, and Problem 2.6 uncovers a number of data type implementations that are apparently correct but cannot be justified by an abstraction function. There is a formulation of data refinement that replaces the abstraction function by a relation, but the rules are more complicated. Roughly speaking, if we imagine both the concrete and the abstract data types carrying out the same sequence of operations, then the states should begin related and stay related, and states that are related should produce the same results for the operations. Even this formulation, however, is insufficient to justify all abstract data type implementations that are in fact correct, but the subtleties are too involved to go into here.

I caught sight over my tutor's shoulder of the model answer for Problem 2.6(c), and it says that the abstraction function is given by S = [0..n]. Isn't this making an equation between a set S and a sequence [0..n]?

I meant to say that S is equal to the set {0, 1, 2, ..., n}. Let's agree to write that as {0..n} if you prefer.

Dark corners of Ewoks

What misfeatures does Ewoks have?

Some of the things that any usable text editor should do, but Ewoks doesn't:

  • Display TAB characters in some sensible way.
  • Deal with lines that are too wide for the screen. (At the moment, the extra characters overwrite each other in the rightmost column of the screen, so all you see is the last of them.)
  • Allow anything but a window of fixed size – 80 characters by 24 lines.
  • Save the file somewhere when the editor crashes.

(Feel free to add to this list!)

Can you explain what is going on with Undoable.reflectAction in Ewoks1?

In Ewoks0, each command that is bound to a key is associated with a little class that implements the interface Keymap.Command<Editor> by calling an appropriate method on the editor object: for example, the Ctrl-G key is bound to the beep method by writing

keymap.register(Keymap.ctrl('G'), new Keymap.Command<Editor>() {
    public void command(Editor editor) {
        editor.beep(); 
    }
});

This makes an instance of an anonymous inner class and associates it with the appropriate key in the keymap.

In Ewoks1, the proper thing is to bind each key to an instance of some subclass of Undoable.AbstractAction, because that class implements Keymap.Command<Editor> in a way that interacts properly with the Undo system. So we could write this to bind Ctrl-G to beep:

keymap.register(Keymap.ctrl('G'), new Undoable.AbstractAction<Editor>() {
    public Undoable.Scrap execute(Editor editor) { 
        editor.beep(); 
        return null; 
    }
});

A command that changes the buffer would return a non-null Undoable.Scrap by calling a method of the editor object, like this:

keymap.register(Keymap.ctrl('T'), new Undoable.AbstractAction<Editor>() {
    public Undoable.Scrap execute(Editor editor) { 
        return editor.transposeCommand(); 
    }
});

It gets boring, however, to make all these little classes when the only thing that varies is the name of the Editor method to call: in these two cases, beep and transposeCommand. So I have provided a method editorAction that takes a string and, using Java's reflection API, dynamically constructs an object that can call that method. The result is that the two command registrations written above can be rewritten as

keymap.register(Keymap.ctrl('G'), editorAction("beep"));
keymap.register(Keymap.ctrl('T'), editorAction("transposeCommand"));

It's the fact that beep returns void and transposeCommand returns Undoable.Scrap that causes the null return value to be inserted in one case and not the other. The one thing that can go wrong with this is that the type-checking is no longer done when the program is compiled, but when it starts up and the reflection mechanism tries to find methods with the right types whose names are given as strings. This can cause the editor to crash right at the start.

You're welcome to investigate the details of how the reflection stuff is implemented if you wish, but I warn you, it isn't pretty. All you need to know, however, is that editorAction gives a concise way of creating the appropriate Command objects without a lot of fuss. By the way, insertAction(ch), moveAction(dir) and deleteAction(dir) are also defined using reflection, and for them you need only follow the pattern of existing calls.

The code for display update keeps a Text object called line and fetches lines by calling editor.fetchline(n, line). Why doesn't it just fetch each line as a string via Text.getString?

Doing so would involve transiently allocating on each display update about as much space as is needed for a full screen of text, and that amounts to about 4K of storage. I've no idea whether that would have a significant effect on the editor's running time, but I wanted to illustrate that the amount of storage turned over by the program can be kept under control without breaking encapsulation boundaries. With the existing design, space for just one line of text is allocated and reused for each line of the screen, and used again each time the screen is updated, dramamtically reducing the turnover of heap space.

In detail, EdBuffer.fetchline is a delegate method for PlaneText.fetchline. It causes the following sequence of method calls:

  • text.fetchline(n, line) works out the start and length of the line and calls ...
  • text.getRange(start, nchars, line), which clears line and calls ...
  • line.insertRange(0, text, start, nchars), which moves the gap in line and grows its buffer, then calls ...
  • text.getChars(start, nchars, line.buffer, 0), which uses System.arraycopy at most twice to fetch the needed portion of the text into line.buffer[0..nchars).

Although, according to the Java rules, instances of Text have access to each other's private instance variables, this sequence of calls avoids exploiting this. The inner loop is in the calls of System.arraycopy, so the overhead of the method chain is moderate. Note that text.getChars is implemented in a way that does not move the gap in text, so that the amount of copying is restricted to the one line of text that is returned in line.

Lab assistance

I am currently doing exercise 6 of lab 2, which asks us to create an Emacs-like interactive, incremental search using the mini-buffer. I have come to a problem, where I am supposed to create a new class inheriting its behaviour from the MiniBuffer. The actual problem appears when I rewrite any method which uses private variables (e.g, readString or commandLoop), so I am no longer able to access these variables (e.g, completer, display or keymap) in my child class. Should I write getter and setter methods for the MiniBuffer class? Or should I change the accessibility of the instance from private to protected?

Inheriting from MiniBuffer is not the answer here: try instead making a class that uses an instance of MiniBuffer but has its own command loop and keymap. You can post a MiniBuffer on the screen and modify its contents without invoking its command loop, as I'm sure you'll find.

General

What is this course about?

The course is about ways of organising medium to large programs that makes them easier to understand, and specifically the techniques for doing this that are called Object-Oriented Programming and supported by programming languages such as Java.

So what are the main ideas?

There are three main ideas: encapsulation, that programs should be built from objects that satisfy a coherent specification and hide the details of how that specification is implemented; object identity, that each object has a distinct identity, so that multiple objects can co-exist and interact with each other; and polymorphism, that objects that implement the same interface can be used interchangeably to give a variety of behaviours.

What will we learn to do?

Well, as a sideline, you'll learn to work with programs written in Java. But the main focus will be on learning to design, discuss and think about the design of larger programs. That's a more important skill, because it transcends any particular programming language.

So it's a course about Java really?

Weren't you listening to the previous answer? No, far from it. We'll spend little time on details of Java syntax, and ignore some parts of the language and almost all of the (very extensive) libraries, unless using them fits with our immediate purpose. Actually, I think it's a waste of time to go over programming language syntax in lectures, because people pick up the detailed rules of a language faster by copying small examples than by being told the rules explicitly.

What are the prerequisites for the course?

This is a second-level course in programming, so I will be assuming that participants are fluent at programming in a high-level language such as C, Pascal or perhaps Java, to the level where they are comfortable with defining and using subroutines or functions or procedures. The course will not be suitable for those who do not have at least this level of programming experience.

We'll also be using some notation from Functional Programming in specifying the behaviour of objects, but that will be easy enough to pick up without prior experience.

I thought OOP was all about building programs with a GUI. Why are we starting with a dumb editor?

There's always been a close link between OOP and GUI programming, right from the early days when GUI's were first developed for SmallTalk. But building a GUI necessarily involves using a pretty complicated library, and I didn't want that complication at the start of the course. The editor provides us with an example of a non-trivial program that we can see and understand all of, but perhaps not all at once.

Why Java? Why not Scala?

"Functional programming combines the flexibility and power of abstract mathematics with the intuitive clarity of abstract mathematics."
XKCD #1270

I agree and accept that Scala provides better notation for many things, and for someone who knows what they are doing, allows object oriented programs to be written more concisely, with less content-free boilerplate. Indeed, recent changes and proposed changes to Java make it more like Scala in a number of ways, but are annoyingly constrained by the need to work seamlessly with old code. (For an example of this, look at the rules for resolving overloading, which stretch for 13 pages in the language manual, and basically say, "try to resolve overloading without taking generics into account, and if that doesn't work, try again with generics." The goal being to ensure that the meaning of old programs is not changed by the introduction of generics.)

So I would say that both Java and Scala are complicated in different ways, and the difference is that we can avoid or ignore the complications by sticking to a simple subset of Java, whereas the rules of Scala impinge (I feel) on every program we would write. By the end of the course, I'm pretty sure we will be tiring of the long-windedness of Java and wanting something better: in short, we'll be ready for Scala. But I think that's entirely healthy, and better than having to confront an unnecessarily complex language from the start.

When all said and done, nobody is going to think you less employable because you've picked up a bit of Java during your degree course.

Why Java? Why not C++?

I think Ken Thompson and others have answered that one better than I can.

Why Mercurial? Why not Git?

For our purposes, Mercurial and Git are pretty much equivalent. I prefer Mercurial partly because the commands needed in day-to-day work are a bit simpler, partly because the community surrounding Mercurial is less ... dogmatic, but mostly because of the existence of TortoiseHG. I can do version control entirely from the command line if I have to, but it requires mental effort that I'd rather spend on other things. Having a THG window open gives a useful real-time picture of what's going on.

Many books about OOP don't mention ADTs and design-by-contract much. Why is that? Are they really so important?

There are several related reasons for this.

  • First, for a wide range of applications, there's no need to invent new abstract data types, and it's enough to use a fixed collection of existing ones – like the collections classes that Java provides – implementing sequences and sets and mappings. I maintain that there are, nevertheless, plenty of problems where introducing a new ADT, perhaps with a slightly different set of operations, is a productive thing to do. Applications programmers might be content to program with a fixed menu of abstractions, but as computer scientists we might also want to study the way the abstractions are implemented, and the criteria by which we can judge whether a proposed implementation is correct.
  • Second, specifying an ADT and reasoning about implementations of it require a bit of discrete mathematics that can present cultural problems.
  • Third, UML doesn't really provide a way of describing how classes implement ADTs – or if it does, then this requires parts of UML (such as the 'object constraint language' OCL) that aren't widely studied or applied. If designs are expressed in a fixed language, then the limitations of that language determine the range of things about the designs that can be said.

You don't seem keen on UML. Why not?

I took a course in it once, but I only got a D-minus.

Clarifications

Why are mementos and scraps separate?

Generally speaking, there's no absolute need. But we want to encapsulate in EdBuffer the list of components – apart from the text – of the editor state, and we achieve that putting the code for creating and restoring mementos inside that class. When the incidental state of the EdBuffer is a single integer, it seems wasteful to wrap it in a Memento object; but as the list of state components kept by EdBuffer grows and evolves, mementos will gain more fields. We want to insulate the rest of the undo mechanism and the implementation of editor commands from these details. We could make mementos capture the text too, and the only reason we don't do that is that keeping a separate, immutable copy of the text before and after each change would take an unreasonable amount of storage. Apart from the text, the restorable state of the editor amounts to a few integers, and they take a trivial amount of space, and we can afford to accumulate thousands of them. Mementos and scraps are really different, because a memento represents a point in time, but a scrap represents a change between two points in time.

In the Text ADT, isn't the length of the text fixed?

No: both the insert and delete operations change the length of the text, increasing or decreasing it. The length operation will return a different result after these operations from what it returned beforehand. So we have specified an ADT where the length of the stored text can increase without limit; we will be making implementations where there is no prior bound on the length of the text, though in practice of course there will be a physical limit. However, if we can store 109 characters before running out of memory, surely that will be good enough.

Can a class extend more than one other class? What about implementing several interfaces? I think I saw something like that in the Ewoks code, but I can't be sure.

No, in Java each class can inherit from only one superclass using extends, but it can implement more than one interface using implements: it just has to fulfil all the promises of methods that were made by each of the interfaces.

Perhaps you are thinking of this delicious morsel, taken from Ewoks1/Undoable.java:

/** An undoable action that can also be used as a keyboard command */
public static abstract class AbstractAction<T extends Undoable<T>>
	implements Action<T>, Keymap.Command<T> {
    /** Perform the action as a keyboard command */
    public void command(T target) {
	target.perform(this);
    }

    /** Execute the command, returning an undo scrap or null */
    public abstract Scrap execute(T target);
}

Translating into English: if T is a subclass of Undoable<T>, then AbstractAction<T> is a class that implements both the interface Action<T> (by providing an execute method) and the interface Keymap.Command<T> (by providing a command method). The execute method remains abstract here, to be implemented in subclasses that extend AbstractAction, but an implementation of command is given: it invokes the perform method on the target, passing the action itself to be performed. The effect is that when the action is bound to a key, pressing the key submits the action to the undo mechanism via perform, and the undo mechanism invokes the action itself via whatever execute method is defined in the concrete subclass. It's because we assume the target has a perform method that we require the class T to be a subclass of Undoable<T>. Phew.

In short: this is a class that implements two interfaces. It is abstract, so it is designed to be completed by making subclasses that provide an implementation of the execute method, but it does not itself extend any class other than Object. The use of extends places a constraint on the parameter type T.

I've heard dire warnings about redefining the hashcode method in classes that have their own definition of equals. What's all that about? And if it's so important, why do you need to be reminded to mention it?

By default, x.equals(y) is true only if x and y are the same object. In immutable classes, it can make sense to redefine equals so that x.equals(y) is true if x and y represent the same abstract value; for example, equals is redefined in Java's built-in String class to test whether two strings contain the same characters, even if they are different objects. For sanity's sake, there are several principles that should be followed in redefining equals:

  1. Equality should be an equivalence relation: reflexive, symmetric and transitive.
  2. Two objects considered equal should have identical observable properties: for example, two paints are not equal if the getName() method returns "Deep Purple" for one and "Mauve" for the other.
  3. If instances of the class are ever to be used as keys in a hash table (as immutable objects might), then it is important that the hashcode method has the proper relationship with the equals method. In particular, two objects considered equal should return the same result from hashcode. Hash tables depend for their correctness on the principle that any key is unequal to all the other keys that have a different hash code.

Principle (3) means that you shouldn't redefine equals in a class without also redefining hashcode, unless you are prepared to bear the risk that you or your successors will create a hash table with instances of the class as keys and plunge themselves into a debugging nightmare.

Among the many goals that I'm aiming to pursue in the course, I can't give a very high priority to giving a complete account of everything a Java programmer needs to know and all the pitfalls he or she needs to avoid.

I'm looking at old exam questions about Undo, and they mention the importance of cloning. What's that about?

In previous editions of the course, I used an approach to Undo where each Command object would, when executed, store as internal state the information needed to undo or redo the execution. In this scheme, the Undo history is a stack or double-stack of executed commands. But because executing a command modifies its internal state, it's necessary to make a copy of the command object that's bound to a key before executing it and putting it on the stack: otherwise the stack might contain multiple references to a command object that's been executed multiple times, and the internal undo information will become confused. The commands bound to keys become prototypes that are cloned before being used. (We may or may not have time to mention cloning in this edition of the course.)

By contrast, our approach is to represent each undoable command by a method that returns a freshly created undo scrap, and to keep these scraps on the undo stack. The scraps are quite separate from the command objects that are bound to keys, but now the command objects are immutable and don't need to be cloned. This solution involves an additional family of classes – scraps in addition to commands – but achieves (I think) a better separation of concerns. If designed carefully, the concrete undo scrap classes are fairly reusable.

Assessment

How will the course be assessed?

For undergraduates, there will be an assessed practical exercise over the Christmas vacation, and also a conventional sit-down exam. The practical exercise will build on the lab work you have done during term, and will be issued on Friday of eighth week. The exercise will require you to design and implement a programming task and write a report on the design decisions you made. You will have until the end of fourth week of Hilary Term to complete the exercise and submit a listing and report. The written paper will be set at the same time as the other Part A papers in June.

M.Sc. students will complete the same practical exercise as undergraduates, and will also answer a question paper in the same style as other M.Sc. assessments. It is assumed that M.Sc. students are available for academic work and will have access to computing facilities during the vacation. They will be required to submit both the practical report and their answers to the question paper at the beginning of Hilary Term, together with their other assignments.

For both undergraduates and M.Sc. students, the assessed practical will count for 35% of the marks, and the written exam for 65% of the marks.

This is a change from last year, when the assessment for both undergraduates and M.Sc. students consisted of a single practical exercise.

If the lab work is part of the assessment, will help from demonstrators still be available?

During term, demonstrator help will be available in the lab exactly as usual, and the demonstrators will provide help and advice in competing all parts of the lab work. They will not know which parts of the lab work are more or less relevant to the Christmas practical exercise. Once the exercise is issued, your work on it must be entirely independent, and no further demonstrator help will be provided, neither with the relevant parts of the lab work from term, nor with the new elements introduced as part of the assessment.

Why won't the demonstrator sign off my practicals?

We're not doing signing-off for this course. You're welcome to show the demonstrator your progress, and he or she will record your attendance at the lab sessions, purely so that tutors can know through the usual channels if you haven't appeared for at least some of the sessions. But the practical work will be assessed by means of the report you submit after Christmas, and there's no need for you to produce reports or for us to assign marks during the term. That's also the reason why the lab exercises don't have identified 'optional' parts that you need to do for an S+. Because we're not awarding S or S+ marks, there's no need for that.

Trivia

Why 'Berlin School remix'?

'Berlin School' is a style of electronic music that rose to prominence during the 1970s; it is particularly associated with the early albums of the German band Tangerine Dream, such as Phaedra (1974), Rubycon (1975) and the live album Ricochet (also 1975). It contrasts with the 'Düsseldorf School' of bands like Kraftwerk. There is no real link with the content of the course, except that the lecturer rather likes it.

Why the pictures of perpetual motion machines?

It's nice, I think, to break up the text with pictures of some kind. I suppose I wanted to suggest subliminally that those who seek a methodology for program development that takes no notice of the established mathematics of programming are as irrational as those who try to design mechanisms that produce more energy than they consume, while ignoring the known laws of physics.