Note: I've just migrated to a different physical server to run Spivey's Corner,
with a new architecture, a new operating system, a new version of PHP, and an updated version of MediaWiki.
Please let me know if anything needs adjustment! – Mike

Frequently asked questions (Imperative Programming)

Copyright © 2024 J. M. Spivey
Jump to navigation Jump to search

Some questions from previous years have been moved to the FAQ archive.

Administrivia

Will there be any examples classes this year?

Sorry, no. Circumstances have so conspired that I am giving four lectures a week this term, so there just isn't time for optional extras.

General

Why choose Oberon?

Here's a note that I wrote when I first put this course together.

Why not C?

Because C does not catch run-time errors, making the process of debugging extremely frustrating. Yes, you may say 'I'm a clever so-and-so, and I can debug a program from the single message Segmentation fault.' But we aren't all as clever as you, and I'd like us all to be on an even footing. Also because expressing common constructions like pointer-linked structures in C involves language weirdness (such as typedef declarations) that I'd rather avoid, and because the whole business of pointer arithmetic, which is what spoils array bounds checking, also suggests that machine-level messing about is necessary to writing good programs. This article contains more reasons to avoid C, if more be needed.

Why not Python? Why not Lua? Why not Ruby?

All these languages are (in a way) too high-level for us. What they have in common is the idea of using dictionaries as a universal data type, rather than using data types that have a direct representation on a computer. We already know Haskell (or a fragment of it anyway), so what's wanted is a relatively simple, low-level language where we can study the concrete implementation of programs, with a simple relationship to what happens when the program runs. Python and Ruby are complicated languages that bring a lot of baggage with them, and give the impression that such complications are somehow necessary. Besides, none of those languages are typed, and types can be a real help.

Why not X?

Enough, already! The fact is, I get to choose, and I quite like Oberon, even if certain features of it (such as the upper-case keywords) are a bit irritating. There's a compact description of the language, and a decent book with plenty of examples. And it isn't a moving target; I can't stand the 'used to be the bees knees, but is deprecated in version 2.5' business that comes with Python. If a language feature can be changed, then that's a sign that it wasn't necessary in the first place, and it should never have been included. (I'll say it again: everything that you learn with Oberon can be applied in other languages.)

In the program for printing numbers in decimal, couldn't we have used logarithms to compute the appropriate value of n such that t < 10n?

Yes, we could: in fact I could have written n := ENTIER(Math.Ln(t) / Math.Ln(10)) + 1. But that raises the question how the Math.Ln function in the library works. Perhaps it uses a loop like the one on the handout to reduce its argument to a small range: in which case the program taken as a whole is not very different from the one we wrote. Or more likely, it exploits the hardware representation of floating-point numbers to separate the exponent and mantissa, so that in Math.Ln(t) we write

t = m * 2y,

so that

ln t = ln m + y * ln 2

and then compute ln m by using a Chebyshev polynomial on the range [1, 2). Even in this case, the hardware that normalizes the representation of floating point numbers is doing the job that is described by our loop. – Mike

When you use a bound function to prove that a loop terminates, does the bound function have to reach zero?

No, it doesn't: all we want to know is that we can put an upper bound on the number of iterations. In many cases, we hope the loop will terminate quickly, but can't be sure; nevertheless, we can be sure that the loop will terminate eventually. Think about a program that's searching an array a[0..n) for (say) the first occurrence of 3. We hope the program will stop as soon as it finds a 3, but we can be sure it will make at most n iterations, so we use something like nk for the bound function.

It's also unnecessary to ensure that the bound function decreases by exactly 1 in each iteration. Sometimes, we can write a loop with a natural bound function that reduces much more quickly, as in

k := n;
WHILE k > 0 DO
  k := k DIV 2
END

where the bound function k halves in each iteration. That's OK for proving the loop terminates, but the initial value of the bound function no longer gives a tight bound on the running time of the program: in this case, the running time is O(log n), but the initial value of the bound function is n.

We do have to ensure that the bound function is an integer, however. The same program but with a real-valued variable x and the assignment x := x/2 would not (in theory) terminate, because we could go on halving x forever without it reaching zero. In practice, x would reduce until it was the smallest non-zero floating point number, and then it would become zero in computer arithmetic. So the loop would stop after many iterations because floating-point arithmetic is inexact. But that's not so nice to think about.

In the program for QuickSort, you defined your own string comparison function, "to make the results fairer." Why?

The program compared strings character by character using a procedure written in Oberon, instead of using the built-in comparison operator s < t. There's no great difference between these: if you write s < t in your Oberon program (where s and t are arrays of characters), then in essence that is calling a subroutine, written in C as it happens, that does compare the strings character by character. Because the C subroutine behind < is compiled in advance with optimisation, it may go a bit faster than the Oberon procedure that's partly compiled quickly but sub-optimally when the program starts. But the real reason for writing out the comparison explicitly was so that we could see how many times the loop was executed compared with, say, the number of recursive calls to QuickSort.

How does Haskell represent a list of lists?

In the lecture, I suggested that Haskell used essentially the same data structure as us to represent a list of pairs (x, y). That is: a chain of records, each containing an x, a y and a pointer to the next record. That isn't really true, because in Haskell, the ordered pair (x, y) is an object in itself, and instead of n records to represent a list of ordered pairs of length n, Haskell uses 2n records, with each element of the list represented by a record with two elements: a pointer to an ordered pair object, and a pointer to the next element of the list.

With this picture, it becomes easy to see how Haskell represents a list of lists: it is a chain of records, each containing a pointer to one of the lists and a pointer to the next record in the list of lists. Each of the individual lists is then also a chain of records.

Oberon

Why do some words appear in bold type and others in small capitals in your printed Oberon programs?

Some of the words (like ARRAY) that appear in upper case in the ASCII syntax of Oberon are reserved words of the language, and cannot – even in theory -- be used as identifiers in Oberon programs; these appear in bold-face (like array) in my notes. Other words (like INTEGER) are not reserved, but are ordinary identifiers that are pre-defined as part of the initial environment for Oberon programs. Though no sane programmer would want to do it, these identifiers can be redefined in your program, so you can say

TYPE INTEGER = REAL;

if you are so minded. These pre-defined identifiers are shown in (italic) small caps (like INTEGER), because ordinary upper case is too SHOUTY. That rule does not seem right for names that are a single upper-case letter, so I write ARRAY N OF INTEGER for what is ASCII-fied as ARRAY N OF INTEGER.

Why do programs in the book use INC(i) where you write i := i+1?

The two forms mean exactly the same thing, but I think using i := i+1 is simpler, since := and + can do many other jobs too. It's possible that Wirth's original Oberon compiler generated better code for INC(i), but that is not true of the compiler we are using.

Is it true that a string constant in double quotes is null terminated, (like in C), but a single character in double quotes is not null terminated?

There is no difference between the meaning of single and double quotes in Oberon, except that a string in double quotes can contain a single quote character, and vice versa. Whether the string contains one character or more than one, it is always null terminated. However, a string constant containing one character can be used to denote that single character. In this case, the null terminator is discarded. Oberon 2 report, section 3, paragraph 4. Also section 9.1.

Does Oberon support pointer arithmetic, i.e., incrementing the pointer by one points to the next object in contiguous memory, taking into account the size of the object that the pointer points to via its type?

No, definitely not. But open array types serve many of the same purposes, whilst maintaining bounds checking and permitting safe garbage collection.

If Oberon has pointers, why can't VAR parameters be simulated by passing pointers, as is done in C?

Oberon does have pointers, but there is no way in the language of taking the address of a variable and making that into a pointer. In fact, the only pointer values that exist are those created by calling the NEW procedure. This ensures that the lifetime of the object that is pointed to is always as long as the lifetime of the variable that contains the pointer. Also, passing an array as a VAR parameter means also passing the size of the array, so that array subscripts inside the procedure can be checked against the array size. This is a part of the meaning of the parameter mechanism in Oberon and other modern languages that's not automatically reflected when they are simulated by pointers in C.

What does In.Int do if the input does not contain a valid integer?

In.Int expects to read an optional sequence of white-space characters, an optional minus sign, a sequence of decimal digits and a terminating non-digit. If it fails to do so, either because it reads a character it is not expecting, or because it reaches the end of the file, then it sets the Boolean variable In.Done to false. End of file can be detected explicitly by calling Files.Eof(Files.stdin). This behaviour of In.Int is consistent with what is described in Section 7.1 of the Oberon book.

What do DIV and MOD do in Oberon if the operands are negative?

If q = x DIV y and r = x MOD y and y > 0, then it is guaranteed that x = q * y + r and 0 ≤ r < y; that much is promised by the Oberon-2 language definition. The Oxford implementation deals with negative values of y by promising that the relationship between DIV and MOD continues to hold, and that (k * x) DIV (k * y) = x DIV y also holds for all non-zero values of k, including k = −1. For example:

x y x DIV y x MOD y
7 3 2 1
-7 3 -3 2
7 -3 -3 -2
-7 -3 2 -1

Other programming languages treat negative arguments differently. – Mike

How do I type the uparrow symbol that's associated with pointers, as in p^.key?

The uparrow has lost its tail on modern displays, and sadly looks like this: p^.key. It's obtainable with Shift-6 on my keyboard. Perhaps the tail was lost as part of a confusion with the circumflex accent. Indeed Wikipedia says:

The up-arrow character was codified as character 5Ehex in the original 1963 version of the ASCII standard. The 1965 ECMA-6 standard replaced the up-arrow with the caret (and the left-arrow with the underscore). Two years later, the second revision of ASCII followed suit, partly due to the need for a circumflex as a diacritical mark in some languages.

Thankfully, the uparrow can be elided whenever it is followed by a dot (for field selection) or a square bracket (for subscripting), so you can write p.key or (if p is a pointer to an array) p[i] in place of p^.key or p^[i].

Are all BOOLEAN variables initialised to FALSE and all pointer variables initialised to NIL?

The Oberon-2 report makes no guarantee about initialization of variables, and a compiler may complain if it detects that a variable is used without first being assigned a value. On the other hand, it may make things easier for the garbage collector if no uninitialized pointer variables exist, and the easiest way to do that is for the runtime system to zero out all storage when it is allocated. This also makes the results of a buggy program more reproducible. When programming, it's sometimes convenient to exploit the fact that global arrays are initially zeroed, even if that makes the program less portable to other Oberon implementations.

The Oxford compiler

How do you get the compiler to produce the message, "Boy, are you inept"?

Write a module that imports itself, and enjoy the reference to a real classic of computer games.

How do you get the compiler to produce the message, "Your code sucks, but I am compiling it anyway"?

Sadly, the answer to this question is lost in the mists of time.

I notice that the compiler has an undocumented flag -rsb that seems to do something interesting. May we use it in our lab exercises?

The flag makes the compiler accept programs where the keywords and predefined identifiers are spelt in lower case. Of course you may use it if you prefer; the price you pay is that you will have to convert the code from the lab kit to match.

Is the compiler functional within Windows 7? I have installed it, but I have no way of launching the compiler.

Sorry, I've not used a Windows 7 machine myself: I'm still persisting with Windows XP (working backwards, that must be Windows 5, mustn't it?[1]). What you need to do is find the 'Command Prompt' window, if that still exists as it did in Windows 6 (= Vista); there you can type text commands and use the compiler just as you can under Unix. In a brand-new installation of XP, I think Command Prompt can be found in Accessories under the Start button. – Mike

Apparently, there's no longer a Start button entry for Command Prompt in Windows 7. A page on the Microsoft site explains how to find it nevertheless. After doing so, you could try making a shortcut on your desktop, and maybe moving it into the Start button folder tree. – Mike again

When I run the compiler on Windows with obc Fac.m, I get a message saying "aout.exe: Permission denied". What's going on?

Probably you are running the compiler in a directory that's protected against creating files; this can happen if, for example, you try to run it in the directory C:\Program Files\Oxford Oberon-2 Compiler where the compiler itself is installed. Much better is to create a fresh directory such as C:Documents and Settings\<your name>\ip1. Copy any program you want to compile into that directory, then run the compiler from there. If you just use obc Fac.m and not obc -o fac Fac.m, then (on Windows, at least) it puts its output in a file called aout.exe in the current directory.

Does the file fac.exe (or whatever) that's output by the compiler contain a native machine-code program?

Yes and no. On Windows, it contains two parts glued together: one of them is a translation of your Oberon program into a compact bytecode, different from Java bytecode but similar in purpose. The other part is either an interpreter for the bytecode, or more commonly a runtime translator that will compile the bytecode into native machine code just before running it. The debugger is based around a modified bytecode interpreter.

On unix, the file a.out produced by the compiler contains only the bytecode, preceded by a one-line command that invokes the bytecode interpreter or JIT translator when the program is invoked from the shell.

How can I use Oberon on Mac? When I try to run the program through the terminal window, this happens:

$ ./crib Christmas message
-bash: ./crib: /ecslab/mike/obc-2.9.2/lib/obc/obxj: 
        bad interpreter: No such file or directory

I'm wondering just what has happened here. Your program contains a reference to

/ecslab/mike/obc-2.9.2/lib/obc/obxj

which is where the Oberon runtime system is found on the lab machines. So I'm guessing you copied your work across from the lab machines and tried to run the program on the Mac without compiling it again. If (after installing the Mac version of OBC) you compile your program with the usual command

$ obc -o crib Crib.m

then it should work properly. – Mike 10:16, 8 February 2011 (UTC)

Trivia

Can't you find the expansion of (1+1/n)^n using Wolfram alpha?

Yes you can: just enter series (1+1/n)^n at n = infinity. But there are prizes for anyone who writes their own program to calculate the same result.

Why don't you like syntax highlighting?

Oh, I have nothing much against it in general. But I find people often set it up to be far too intrusive. Imagine reading a novel where all the nouns were in magenta and all the verbs were in dull green. Besides, syntax highlighting only works for languages that are known to the editor, and I often find myself editing files that are in all sorts of different languages, some of them invented for a specific purpose. Yes, you could set up syntax highlighting for them, or modify the colours that the editor uses, but I just don't think it's worth the effort for most purposes.

That said, how do I set up the Gnome text editor or Notepad++ to highlight Oberon code properly?

See this note.

Why does numbering of array elements start from zero?

Professor Dijkstra has a good explanation (local copy).

What does actually happen in this program, where x is a variable of type REAL?

x := 1.0;
WHILE x # 0.0 DO
  x := x / 2.0
END

I took your program and added some instrumentation to print out the value of x in each iteration, as a real number but also in hexadecimal and with the sign, exponent and mantissa parts separated. Here's the code: RealHalve.m. When the program runs, the first few lines are what we expect:

0.500000 = 3f000000 = 0 7e 000000 
0.250000 = 3e800000 = 0 7d 000000 
0.125000 = 3e000000 = 0 7c 000000 
0.0625000 = 3d800000 = 0 7b 000000 

We're getting a half, a quarter, an eighth, and so on. The machine representation of 1/8 (for example) is arrived at as follows:

1/8 = 1.000000 * 2-3

We take the 1.000000, suppress the leading 1, and write down the fractional part (as a binary fraction) to form the mantissa: that explains why the mantissa is shown as 000000, a 23-bit number written in hex. Now take the exponent -3, add 127 to it, and get 124, or 7c in hexadecimal; this forms the 8-bit exponent field. And because the number is positive, the sign bit is zero.

So far, every line has the mantissa shown as zero, and the exponent decreases by 1 from each line to the next. This pattern carries on for quite a while, until we reach

4.70198E-38 = 01800000 = 0 03 000000 
2.35099E-38 = 01000000 = 0 02 000000 
1.17549E-38 = 00800000 = 0 01 000000

That last line is 2-126, the smallest positive normalized floating point number. You might expect that the next step would be zero, but the floating point format has a trick up its sleeve: it can represent still smaller numbers at the cost of reducing the accuracy. The next step is in fact

5.87747E-39 = 00400000 = 0 00 400000

This number has an exponent field of zero, and that indicates that the leading 1 is no longer suppressed in the mantissa; the number is 0.5 * 2-126, and the next few numbers are 0.25 * 2-126, 0.125 * 2-126, and so on. In each case, the fraction is written as a 23-bit binary fraction with the binary point at the left, so the mantissas are 200000, 1000000, 080000, and so on:

2.93874E-39 = 00200000 = 0 00 200000 
1.46937E-39 = 00100000 = 0 00 100000 
7.34684E-40 = 00080000 = 0 00 080000 

This pattern of denormal numbers with a single 1 bit in the mantissa continues, until we reach

5.60519E-45 = 00000004 = 0 00 000004 
2.80260E-45 = 00000002 = 0 00 000002 
1.40130E-45 = 00000001 = 0 00 000001 

This is the smallest denorm, equal to 2-149. Note that the number of significant digits in the number gets smaller and smaller; the next representable number larger than 1.40130E-45 is in fact the number 2.80260E-45 shown in the line above, so if we rounded a number that lay between them to either one of these numbers, we would be likely to lose all the significant decimal digits in it.

And the next line shows zero:

0.00000 = 00000000 = 0 00 000000

All this is in single-precision floating point. If we declared x: LONGREAL instead, then the whole story would be longer, because double-precision floating point numbers have more bits in both the exponent and the mantissa fields. Oh, and none of this is special to Oberon: the same experiment gives the same results in any language using floating-point hardware that conforms to the standard IEEE-754.

Lab exercises

Is it possible to do the lab exercises on a Windows machine?

Mostly, yes, though it's necessary to work around the lack on Windows of some common tools that are provided under unix. I've written a page of notes that gives some hints.

In Lab 2, whenever I try and run tester, by typing ./tester 5, I get an error message, "-bash: tester: Permission Denied". What does this mean and how can I stop it happening?

In order for the script to run, the execute permission bit must be set. Use the shell command chmod +x tester to set the bit: when you list the file with ls -l tester, you should see the permission bits -rwxr-xr-x, or something like that, where at least the fourth character (user execute) is an x.

Isn't the problem of Lab 3 the same as the 'minimal edit distance' problem studied in the algorithms course? Why do we not use the same algorithm?

Yes, the problem is the same. The algorithm studied in the other course takes time [math]\displaystyle{ O(N^2) }[/math] in all cases, where \(N\) is the length of the text. Our algorithm (with a good implementation of the longest ascending subsequence problem) takes time \(O(m \log m)\), where \(m\) is the number of matches between the two files. In the worst case, \(m = N^2\), but in practice \(m\) is closer to \(N\), so our algorithm takes a time that is between \(O(N^2 \log N)\) and \(O(N \log N)\), and neither algorthim is better than the other under all circumstances. (Modern implementations of diff use yet another algorithm that runs in time \(O(ND)\), where \(D\) is the edit distance: see the reading list.)

The instructions for Lab 3 refer to problem 2.7. What is this problem 2.7?

Following some reorganisation, that problem is now number 2.104 in the collection of 'extra problems' listed on the problem sheets page. It's not on the main sequence of problems any more, because much the same material will be spelled out in Lecture 8.


<references>