Implementing a recursive subroutine

Copyright © 2017–2023 J. M. Spivey
Revision as of 00:53, 15 September 2022 by Mike (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

In C, pointer-linked binary trees can be represented using this datatype:

typedef struct node *tree;

struct node {
    int data;
    tree left, right;
};

A variable t declared with

tree t;

stores a pointer to a node; the data field of that node can be written t->data, an abbreviation for (*t).data, in which the * denotes the passage from a pointer to the structure it accesses, and the .data denotes selecting the data field of that record. (In Java, Scala and similar languages – and in Hakell too – the operation written * here is implicit.) Each node in the tree will be a structure with three words, one for the data and one each for the two child pointers, with data at offset 0 in the sructure, and the two pointers left and right at offsets 4 and 8 respectively. If a tree is built using memory that is dynamically allocated using the library function malloc, it's likely each node will be bigger than this; it may be prefixed by a header that says how large the structure is, so that it can be released by free, and it may be padded to a larger size in order to reduce the number of different sizes the malloc has to deal with. None of that need concern us, because a pointer to the node is the address of its first field, and that field will be data.

Given that type definition, we can write the following recursive function that adds up the integer labels in a tree.

int sum(tree t) 
{
    if (t == NULL)
        return 0;
    else
        return t->data + sum(t->left) + sum(t->right);
}

That's the same function we would write in Haskell with someting like

sum :: Tree -> Int
sum (Node x lt rt) = x + sum lt + sum rt
sum Null = 0

How can we write the function in assembly language?

The heart of the subroutine will be a sequence containing two recursive calls to the same routine, with their results added together and added also to the value of t->data. The key insight is that whatever values must survive the recursive calls sum(t->left) and sum(t->right) must not live in the registers r0 to r3 across those calls. In this subroutine, we can idenfify two such values: the parameter t, a pointer that arrives in r0, but can't stay there when the parameters of the recursive calls are in their turn computed into the same register; and the growing total where we add together the data field of t and the results of the recursive calls. Those results also appear in r0 after the calls, but must be moved from there quickly.

So, in spite of the fact that the whole routine uses (as will emerge) use no more than three registers, we must make two of them be in the range r4 to r7. We'll use r4 to hold on to the incoming parameter t, and r5 to compute the total that will become the result. We should adopt the following skeleton for our subroutine.

sum:
    push {r4, r5, lr}           @ Save registers

    <body of sum>

    pop {r4, r5, pc}            @ Restore and return

ARM coding conventions demand that we save registers on the stack as the very first action of the subroutine, and restore them right at the end. We need to preserve the incoming values of r4 and r5, which in the case of recursive calls will be working data for the outer instance of the function. We also need to preserve the return address from the lr register, because lr will be trashed by subsequent subroutine calls. That value will be restored into pc at the moment the suroutine returns.

For the body of the subroutine, we start with an if-then-else structure that tests whether the argument is zero. representing the NULL pointer.

<body of sum>=
    cmp r0, #0                  @ Test if t == NULL
    bne sum_calc                @ If not, go to calculate

    movs r0, #0                 @ Empty tree has sum 0
    b sum_exit                  @ Finished

sum_calc:
    <calculate sum into r0>

sum_exit:

We're left with the hard part, calculating the sum of a non-trivial tree. After saving the pointer t in r4 and initialising r5 with the data field of the structure, we write two recursive calls. For each one, we fetch a pointer to the relevant subtree into r0, invoke the sum subroutine recursively, and add the result of the recursive call (which is in r0 when it returns) to the value in r5. We can sneakily put the result of the final addition back into r0 ready to return it ourselves.

The addressing mode written [r4, #4] is especially useful here, because it allows us to add together the address of a node (in r4) and the offset 4 between the start of the node and its left field, as part of the instruction that fetches the value of that field.

<calculate sum into r0>=
    movs r4, r0                 @ Save t in r4
    ldr r5, [r4, #0]            @ Get data field in r5

    ldr r0, [r4, #4]            @ Get left subtree in r0
    bl sum                      @ Recusively compute its sum
    adds r5, r5, r0             @ Add to total

    ldr r0, [r4, #8]            @ Get right subtree in r0
    bl sum                      @ Second recursive call
    adds r0, r5, r0             @ Put grand total in t0

There's no particular advantage in fetching the data field at the start, rather than adding it at the end, but no special disadvantage either, given that r5 is ours to use throughout the subroutine.

The code could be optimised a bit, at the expense of moving away from a style that is understood by the debugger, by moving in the push and pop instructions so that they surround only the part of the code where r4 and r5 are used, namely the fragment <calculate sum into r0>. This saves the time needed to save and restore these registers when our task is merely to answer that the sum of the empty tree is 0. But what is not needed (and not an improvement) is to surround each recursive call with its own push/pop pair. Saving registers at the top and restoring them at the bottom makes the registers available to us throughout the subroutine; if we call other subroutines (including making recursive calls) then it's the responsibility of the subroutine we call to preserve the values we are keeping in those registers. There is a possible style where registers are saved on the stack around each call, but that's not the ARM style, and it's not generally as efficient. Details: if we move the push/pop pair inwards, then no registers have been saved before the then part of the routine, so it should end with bx lr rather than a pop instruction.

I have to say for completeness that the sequence

    cmp r0, #0                  @ Test if t == NULL
    bne sum_calc                @ If not, go to calculate
    movs r0, #0                 @ Empty tree has sum 0

can also be optimised a bit: if we reach the third instruction, then we know r0 contains 0 already, so we don't have to set it to 0, and that instruction can be omitted. That feels wrong, because the first 0 is a null pointer, and the second 0 is zero as a number. But the computer doesn't know the difference!

It's also the case that a C compiler is free to reorder the members of a structure and insert padding between them, so that there's no guarantee that the layout we've assumed in our assembly language program will actually match the one chosen by the C compiler. That might matter if we want to write a tree-processing program partly in C and partly in assembly language. However, although the compiler has the freedom to pad and reorder structures, actual compilers rarely do so. The potential gains in efficiency are small, and reordering structures gets in the way of using them to interface with hardware, with operating systems that are compiled differently, and (as in our case) subroutines hand-written in assembly language.