Lecture 14 – Context switching (Digital Systems)

From Spivey's Corner
Jump to: navigation, search

To implement an operating system like Phōs, a vital ingredient is a mechanism that switches the processor from running one client process to running another. Such a context switch might happen, for example, in response to an interrupt: the processor will switch from running some background process to running the device driver to which the interrupt is connected. This association with interrupts gives a clue to how context switching can be implemented in general.

With the help of the "supervisor call" instruction svc, we can arrange that every entry to the operating system is via an interrupt. When an interrupt happens, whether caused by a hardware device or by svc, some of the processor state is saved on the stack; we will extend this with a small fragment of assembly language that saves the remaining state – register r4--r7 and (in case anyone uses them) r8--r11. Just knowing the sp value is enough information now to restart the process, first with a little assembly language fragment that restores register r4--r11, then by means of the normal interrupt return mechanism that restores the remaining state and runs the process again.

BoardPic3.jpg

The new ingredient is that between entry to the operating system and exit back to client processes, we will change the stack pointer. The operating system will keep a table (details later) that contains much information about each process, but in particular contains the saved stack pointer of each process that is not running. When a context switch happens, the operating system saves the sp value for the process that is suspending, and retrieves the sp value of the process that is resuming. The interrupt return mechanism than activates the new process.

BoardPic4.jpg

Implementing this is a bit fiddly – the kind of thing that causes great rejoicing when it actually works – but the task is helped by the fact that the Cortex-M0 has a second stack pointer, so that (after a little configuration) there's one stack pointer psp for use by a running process, and another msp for use by the operating system. That means (a) that the operating system has its own stack, and doesn't need to steal stack space from either process; and (b) that we can use subroutines in the context switch code without worrying about messing up the process stacks.

Here is the code that implements system calls (from mpx-m0.s):

svc_handler:
    push {lr}               @ Push lr on main stack
    bl isave                @ Complete saving of state; r0 = stack pointer
    bl system_call          @ Perform system call; r0 = system_call(r0)
    bl irestore             @ Restore saved state from r0
    pop {pc}                @ Return to new thread

The lr value that is saved at the top and restored at the bottom is the magic value associated with an interrupt. It is saved on the 'main' stack belonging to the operating system, using the main stack pointer msp (at this point sp is an alias of msp).

The helper routines isave and ireturn implement the saving of that part of the state that is not saved by the hardware on interrupt. Here is code for isave:

isave:   
    mrs r0, psp             @ Get thread stack pointer
    subs r0, #16
    stm r0!, {r4-r7}        @ Save low regs on thread stack
    mov r4, r8              @ Copy from high to low
    mov r5, r9
    mov r6, r10
    mov r7, r11
    subs r0, #32
    stm r0!, {r4-r7}        @ Save high regs in thread stack
    subs r0, #16
    bx lr                   @ Return process sp

In this code, the first instruction fetches the process stack pointer into r0: the mrs instruction (Move to Register from Special) can be used to access multiple secret registers inside the processor, and has a counterpart called msr that we'll use later. Also new here is the stm instruction that (like push) stores multiple registers in consecutive words of memory. Unlike push, is can use an arbitrary register for the address where the values are stored, and it modifies that register (that's what the ! means) by the size of the data saved, in this case 16 bytes. Annoyingly, however, unlike push it increments the address rather than decrementing it, and that's the reason for the three instructions that adjust r0 by multiples of 16. Also a bit irritating is that fact stm cannot store from the high register, so we must laboriously more r8--r11 into r4--r7 before saving them. We are really earning our money here.

You can guess the implementation if irestore: it is a bit easier because the ldm instruction that's dual to stm does go in the direction we want, and it ends with msr psp, r0 to set the process stack pointer.

And here is the layout of the frame that is pushed by hardware and software onto the stack of the suspended process:

--------------------------------------
15  PSR  Status register
14  PC   Program counter
13  LR   Link register
12  R12
11  R3
10  R2           (Saved by hardware)
 9  R1
 8  R0   Process argument
--------------------------------------
 7  R7   
 6  R6
 5  R5
 4  R4           (Saved manually)
 3  R11
 2  R10
 1  R9
 0  R8   <-- Stack pointer
--------------------------------------

It's right to save this information on the process stack, because the register values are specific to the process, and will be needed again precisely when the process is resumed. The address at which r0 has been saved is the one that is recorded as the stack pointer of the suspended process.

The function system_call is written in C, and is the entry point of the operating system. Its heading is

unsigned *system_call(unsigned *sp),

showing that it is passed as a parameter the process stack pointer of the process that is being suspended, and returns as a result the stack pointer for the process to be resumed. As we'll see later, the system_call function can decode the state of the suspended process to find out what system call (such as yield(), send(), receive()) was requested, and what its parameters were. It's up to the operating system do decide (as a matter of policy) which process should get to run next. What we're concentrating on for the moment is the mechanism by which that policy is put into effect. If the operating system wants the currently executing process to carry on, it can simply return the same stack pointer it received, and the context switching mechanism will then return to the same process that it suspended when the call was made.

The explanation above captures well enough what happens once a program is running, but any explanation will be unsatisfying if it doesn't reveal what happens when the system starts. There are two aspects to this: how each process starts, and how the entire system starts.

The operating system resumes each process by the return-from-interrupt mechanism, and that applies also to the time when the process starts. To provide for this, when the process is set up it is given a fake interrupt frame on its stack. This is done in the function start in phos.c, which depends on the frame layout shown above.

  • r0 contains the integer argument that will be passed to the process.
  • pc contains the address of the function body. The LSB should not be set, or the result is UNPREDICTABLE (ARM manual, page B1-201), though in practice this causes no problems.
  • The value of psr doesn't matter much, but it should have the bit set that indicates Thumb mode. (There's a great sense of relief when you finally get such details right.)
  • The value of lr determines what happens if the process body should ever return. By setting it to the address of exit, we arrange for the process to terminate cleanly in this case.
  • Other registers can have arbitrary values, so it's safe to leave them as zero.

These values, saved on the initial stack for the process, ensure that when the process is first activated by the return-from-interrupt mechanism, it starts to run the process body with the supplied argument.

We also want to know how the whole system starts. The first process to run is a special process IDLE, belonging to the operating system itself, that will later become the process that runs when there is no other process ready to run. It contains an infinite loop with the only wfe instruction in the whole system.

Following a call to phos_init() to initialise the operating system's tables, and a call to the user-supplied init() function to create processes, the operating system calls a function phos_start() to get the ball rolling.

void phos_start(void) {
    current = idle_proc;
    setstack(current->p_sp);

    yield();                    // Pick a real process to run

    // Idle only runs again when there's nothing to do.
    while (1) {
        pause();                // Wait for an interrupt
    }
}

There's an assembly-language helper setstack that sets the psp register to the (empty) stack for the idle process, then starts using psp as the stack pointer. After that, what was the only thread of execution becomes the idle process: its first action is to use yield() to enter the operating system and allow the other processes to run. The operating system comes back here only when all those other processes temporarily have nothing to do.

The helper routine setstack() uses a couple of special instructions to do its job: msr allows values to be moved into special registers in the processor like psp. Setting the control register to 2 enables the use of psp as the stack pointer, and the isb instruction is there to ensure that no instructions in the pipeline use the old stack pointer.

setstack:
    msr psp, r0             @ Set up the stack
    movs r0, #2             @ Use psp for stack pointer
    msr control, r0
    isb                     @ Drain the pipeline
    bx lr

Context

More than any other code shown in this course, the context switch code shown here is architecture-dependent. Every machine capable of supporting multitasking makes it possible to save the machine state in this way, and restore it in order to revive a suspended process. Using the interrupt mechanism uniformly to enter the operating system is a very common approach.

Context switch for a simple machine like a microcontroller is itself quite simple. But for a more complex machine with an MMU, it is a more far-reaching operation. Each process has its own mapping from a virtual address space to physical addresses in the RAM that makes it appear to the process that it alon occupies the memory of the machine, and prevents any process from interfering with the memory occupied by others. When control is transferred from one process to another, or from a process to the operating system, this mapping must also be updated. Simpler machines may require the memory occupied by each process to occupy a contiguous region in the RAM, but more sophisticated machines divide the memory into pages that may be arbitrarily arranged to form the memory belonging to a process, and some of them may be stored not on RAM but on disk, to be retrieved by the operating system when the process needs access to them. All this adds up to quite a lot of state that must be saved and restored at a context switch.

Lecture 15

A symbolic representation of the machine code for a program.

A register sp that holds the address of the most recent occupied word of the subroutine stack. On ARM, as on most recent processors, the subroutine stack grows downwards, so that the sp holds the lowest address of any occupied work on the stack.

A register that holds the address of the next instruction to be executed.

A register that points to the last address on the stack that is in use (or alternatively, the first address that is not in use).

An alternative instruction encoding for the ARM in which each instruction is encoded in 16 rather than 32 bits. The advantage is compact code, the disadvantage that only a selection of instructions can be encoded, and only the first 8 registers are easily accessible. In Cortex-M microcontrollers, the Thumb encoding is the only one provided.

A single integrated circuit that contains a microprocessor together with some memory (usually both RAM for dynamic state and ROM for storing a persistent program) and peripheral interfaces.

A processor component that comes between the CPU and the memory, and efficiently translates virtual addresses generated by the running process into physical addresses denoting a specific storage location. By varying the parameters of the translation, an operating system can arrange that different processes exist in distinct, separate address spaces. Microcontrollers commonly lack an MMU, and all processes then run in the same address space, and must be carefully written not to interfere with each other.

A numbering system for memory locations. ARM-based microcontrollers (like most bigger machines) have a single address space containing both code and data. Some other microcontroller families have separate address spaces for code and data, in what is called a Harvard architecture.