Talk:The story of a bug report (Compilers)
The statement tmp := i MOD 2
generates the following Keiko code:
LDLW -4 CONST 2 MOD STLW -8
Examining the trace, we can watch the process of translating this into machine code. First, a high-level view:
31: LDLW -4 + LOCAL -4 + LOADW 32: PUSH 2 33: MOD --- STW I1, V2, -4 --- MOV I0, 2 --- STW I0, V2, -8 --- SUB I0, V2, 8 --- PREP 1 --- ARG I0 --- CALL 0x804ba50 34: STLW -8 + LOCAL -8 + STOREW --- LDW I0, V2, -4 --- STW I0, V3, -8
The JIT translator is reading bytecode instructions 31: LDLW -4
(where 31 is the offset in the code) and expanding them into more primitive instructions, shown as + LOCAL -4
and + LOADW
, etc., before translating these into Thunder instructions, shown as lines beginning with ---
. The MOD
operation is being translated into a subroutine call, because it takes a bit of care to get the result mathematically right, and that care can be taken by writing a subroutine (in C) that does the job. So the big sequence of Thunder instructions is storing the arguments of that procedure call in the stack, using the Thunder register V2
as a base register for addressing the stack; after that, it calls a subroutine at address 0x804ba50
, and that's the subroutine that calculates MOD
. After the subroutine, there's a pair of instructions that fetch the result of the MOD
(by again addressing the stack relative to V2
, and store it in the local variable tmp
(by addressing relative to V3
).
This use of two registers – V2
and V3
– to address the stack comes about because the procedure has an open array parameter that must be copied into the stack frame, and the amount of space needed for this is not known at compile time: so V3
points, as it always does, to the frame head, and V2
points to a part of the stack frame beyond the variable-length copy of the parameter, where outgoing arguments are assembled, as in this call.
You'll note that the initial operations LOCAL -4 / LOADW
and PUSH 2
appear to result in no code, and the operations LOCAL -8 / STOREW
generate two instructions, one a load (LDW
) and the other a store (STW
). The delayed load comes about because the translation of MOD
notes that the result can be found in the stack frame, but doesn't generate code to fetch it from there into a register; it's when the translation of STOREW
wants the value in a register for the STW
instruction that it gets moved.
We can see a bit more detail about this by adding in the next level of debugging information, where lines <0> = ...
show where the translator thinks the values of stack items can be found. So, reading the first few lines, the instruction LOCAL -4
makes the stack top contain the address V3-4
, and a subsequent LOADW
makes the stack top contain a value [LOADW -4(V3)]
that can be obtained by loading from that address. No actual LDW
instruction is generated at this stage. Next, the constant 2 is pushed on top, and the translator just notes that. When the time comes to store the arguments in the stack frame before calling the MOD
subroutine, then these values are moved into registers. No actual LDW
instruction is generated, because the translator knows that the value of i
is already to be found in register I1
– we'll see how in a moment.
31: LDLW -4 + LOCAL -4 <0> = [ADDR -4(V3)] + LOADW <0> = [LOADW -4(V3)] 32: PUSH 2 <1> = const 2 33: MOD <0> = reg I1 --- STW I1, V2, -4 <0> = stackw -4 --- MOV I0, 2 <1> = reg I0 --- STW I0, V2, -8 <1> = stackw -8 --- SUB I0, V2, 8 --- PREP 1 --- ARG I0 --- CALL 0x804ba50 34: STLW -8 + LOCAL -8 <1> = [ADDR -8(V3)] + STOREW <1> = [LOADW -8(V3)] --- LDW I0, V2, -4 <0> = reg I0 --- STW I0, V3, -8
Regs: I1(0) = [LOADW -4(V3)] V2(1001) 31: LDLW -4 + LOCAL -4 <0> = [ADDR -4(V3)] (-4/1) Regs: I1(0) = [LOADW -4(V3)] V2(1001) + LOADW <0> = [LOADW -4(V3)] (-4/1) Regs: I1(0) = [LOADW -4(V3)] V2(1001) 32: PUSH 2 <1> = const 2 (-8/1) Regs: I1(0) = [LOADW -4(V3)] V2(1001) 33: MOD move_to_reg(0: [LOADW -4(V3)]) Hit I1 <0> = reg I1 (-4/1) --- STW I1, V2, -4 --- movl -4(EDI), ECX <0> = stackw -4 (-4/1) move_to_reg(1: const 2) Kill I0 --- MOV I0, 2 --- movl EAX, #2 Cache I0 = const 2 <1> = reg I0 (-8/1) --- STW I0, V2, -8 --- movl -8(EDI), EAX <1> = stackw -8 (-8/1) Killregs Kill I0 --- SUB I0, V2, 8 --- movl EAX, EDI --- sub EAX, #8 --- PREP 1 --- ARG I0 --- push EAX --- CALL 0x804ba50 --- call ... --- add ESP, #4 Regs: V2(1001) 34: STLW -8 + LOCAL -8 <1> = [ADDR -8(V3)] (-8/1) Regs: V2(1001) + STOREW <1> = [LOADW -8(V3)] (-8/1) move_to_reg(0: stackw -4) Kill I0 --- LDW I0, V2, -4 --- movl EAX, -4(EDI) <0> = reg I0 (-4/1) --- STW I0, V3, -8 --- movl -8(EBP), EAX Unalias([LOADW -8(V3)]) Cache I0 = [LOADW -8(V3)] Regs: I0(0) = [LOADW -8(V3)] V2(1000)