Lecture 4 – Number representations (Digital Systems)
Signed and unsigned numbers
So far we have used only positive numbers, corresponding to the C type
unsigned. We can pin down the properties of this data type by defining a function bin(a) that maps an n-bit vector a to an integer:
(where n = 32). What is the binary operation ⊕ that is implemented by the
add instruction? It would be nice if always
but unfortunately that's not possible owing to the limited range 0 ≤ bin(a) < 2n. All we can promise is that
and since each bit vector maps to a different integer mod 2n, this equation exactly specifies the result of ⊕ in each case.
More commonly used than the
unsigned type is the type
int of signed integers. Here the interpretation of bit vectors is different: we define twoc(a) by
Taking n = 8 for a moment:
a bin(a) twoc(a) ----------------------------- 0000 0000 0 0 0000 0001 1 1 0000 0010 2 2 ... 0111 1111 127 127 = 27-1 1000 0000 128 -128 = -27 1000 0001 129 -127 ... 1111 1110 254 -2 1111 1111 255 -1
Note that the leading bit is 1 for negative numbers. So we see −2n−1 ≤ twoc(a) < 2n−1, and
So twoc(a) ≡ bin(a) (mod 2n). Therefore if bin(a ⊕ b) ≡ bin(a) + bin(b) (mod 2n), then also twoc(a ⊕ b) ≡ twoc(a) + twoc(b), and the same addition circuit can be used for both signed and unsigned arithmetic – a telling advantage for this two's complement representation.
How can we negate a signed integer? If we compute ā such that āi = 1 − ai , then we find
So to compute −a, negate each bit, then add 1. This works for every number except −2n−1, which like 0 gets mapped to itself.
ContextThis two's-complement binary representation of numbers is typical of every modern computer: the vital factor is that the same addition circuit can be used for both signed and unsigned numbers. Historical machines used different number representations: for example, in business computing most programs used to do a lot of input and output and only a bit of arithmetic, and there it was better to store numbers in decimal (or binary-coded decimal) and use special circuitry for decimal addition and subtraction, rather than go to the trouble of converting all the numbers to binary on input and back again to decimal on output, something that would require many slow multiplications and divisions. Floating point arithmetic has standardised on a sign-magnitude representation because two's-complement does not simplify things in a signficant way. That makes negating a number as simple as flipping the sign bit, but it does means that there are two representations of zero – +0 and −0 – and that can be a bit confusing.
Comparisons and condition codes
To compare two signed numbers a and b, we can compute a ⊖ b and look at the result:
- If a ⊖ b = 0, then a = b.
- If a ⊖ b is negative (sign bit = 1), then it could be that a < b, or maybe b < 0 < a and the subtraction overflowed. For example, if a = 100 and b = −100, then a − b = 200 ≡ −56 (mod 256) in 8 bits, so a ⊖ b appears negative when the true result is positive.
- Similar examples show that, with a < 0 < b, the result of a ⊖ b can appear positive when the true result is negative. In each case, we can detect overflow by examining the signs of a and b and seeing if they are consistent with the sign of the result a ⊖ b.
cmp instruction computes a ⊖ b, throws away the result, and sets four condition code bits:
- N – the sign bit of the result
- Z – whether the result is zero
- V – overflow as explained above
- C – the carry output of the subtraction
A subsequent conditional branch can test these bits and branch if an appropriate condition is satisfied. A total of 14 branch tests are implemented.
equality signed unsigned miscellaneous -------- ------ -------- ------------- beq* Z blt N!=V blo = bcc* !C bmi* N bne* ~Z ble Z or N!=V bls Z or !C bpl* !N bgt !Z and N=V bhi !Z and C bvs* V bge N=V bhs = bcs* C bvc* !V
The conditions marked * test individual condition code bits, and the others test meaningful combinations of bits. For example, the instruction
blt tests whether in the comparison a < b was true, and that is true if either the subtraction did not overflow and the N bit is 1, or the subtraction did overflow and the N bit is 0 – in other words, if N ≠ V. All the other signed comparisons use combinations of the same ideas: it's pointless to memorise all the combinations.
For comparisons of unsigned numbers, it's useful to work out what happens to the carries when we do a subtraction. For example, in 8 bits we can perform the subtraction 32 − 9 like this, adding together 32, the bitwise complement of 9, and an extra 1:
32 0010 0000 ~9 1111 0110 1 ----------- 1 0001 0111
The result should be 2310 = 101112, but as you can see, if performed in 9 bits there is a leading 1 bit that becomes the C flag. If we subtract unsigned numbers a − b in this way, the C is 1 exactly if a >= b, and this give the basis for the unsigned conditional branches
bhs (= Branch if Higher or Same), etc.
cmp instruction has the sole purpose of setting the condition codes, and throws away the result of the subtraction. But the
subs instruction, which saves the result of the subtraction in a register, also sets the condition codes: that's the meaning of the
s suffix on the mnemonic. (The big ARMs have both
subs that does set the condition codes, and
sub that doesn't.) Other instructions also set the condition codes in a well-defined way. If the codes are set at all, Z always indicates if the result is zero, and N is always equal to the sign bit of the result.
- In an
addsinstruction, the C flag is set to the carry-out from the addition, and the V flag indicates if there was an overflow, with a result whose sign is inconsistent with the signs of the operands.
- In a shift instruction like
lsrs, the C flag is set to the last bit shifted out. That means we can divide the number in
r0by 2 with the instruction
lsrs r0, r0, #1and test whether the original number was even with a subsequent
bccinstruction. Shift instructions don't change the V flag.
ContextThe four status bits NZCV are almost universal in modern processor designs. The exception is machines like the MIPS and DEC Alpha that have no status bits, but allow the result of a comparison to be computed into a register: on the MIPS, the instruction
slt r2, r3, r4sets
r2to 1 if
r3 < r4and to zero otherwise; this can be followed by a conditional branch that is taken is
Is overflow possible in unsigned subtraction, and how can we test for it?
Overflow happens when the mathematically correct result bin(a) − bin(b) is not representable as bin(c) for any bit-vector c. If bin(a) >= bin(b) then the difference bin(a) − bin(b) is non-negative and no bigger than bin(a), so it is representable. It's only when bin(a) < bin(b), so the difference is negative, that the mathematically correct result is not representable. We can detect this case by looking at the carry bit: C = 0 if and only if the correct result is negative.
A representation of real numbers with a sign, a mantissa, and an exponent that allows the number to be scaled by a power of two. Typical machines have special registers for holding floating point numbers and special instructions for loading, storing and comparing them and performing arithmetic operations on them. In this course, we ignore these machine features for simplicity because they add complexity without presenting any really new problems from the point of view of compiling.
C, in the processor status word that indicate the result of a comparison or other arithmetic operation. Briefly,
N indicates whether the result of the operation was negative,
Z indicates whether it was zero,
C is the value of the carry-out bit from the ALU, and
V indicates whether the operation overflowed, yielding a result that was different in sign from what could be predicted from the inputs to the operation. A comparison is treated like a subtraction as far as setting the condition codes is concerned. After the condition codes have been set, a subsequent conditional branch instruction can test them, and make a branch decision based on a boolean combination of their values. All ten arithmetic comparisons (equal, not-equal, and less-than, less-than-or-equal, greater-than, and greater-than-or-equal for both signed and unsigned representations) can be represented in this way. When a process is interrupted, the condition codes must be saved and restored as part of the processor state, in case the interrupt came between a comparison and a subsequent conditional branch.