Easy addressing on the AMD64

From Spivey's Corner
Jump to: navigation, search

Consider this C program:

int a[10];
int g(int x) { return a[x]; }

If you've written in 32-bit assembly languageA symbolic representation of the machine code for a program. for the x86, you can probably work out how the body of g could be compiled. (Let's agree to turn off PIC, even if it's the default on the OS.) According to the calling convention(A near-synonym for ABI). The convention that determines where arguments for a subroutine are to be found, and where the result is returned., the argument x arrives on the stack, so we need a load instruction, written movl, to fetch it into a register, and another load with scaling to get the element of a. Unlike a RISC machineA computer designed with a simplified instruction set. Typically, these machines have a large set of uniform registers, a small set of addressing modes, and load/store instructions separate from the instructions that carry out arithmetic operations., the x86 allows the whole address of a to appear as part of the addressing modeIn instructions that access memory, one of several rules for computing the address of the location to be accessed. For example, one addressing mode might obtain the address by adding the contents of two registers, and another might add a register and a small constant. CISC machines are characterised by more varied and more complex addressing modes than RISC machines.. In the AT&T syntax:

g:
   movl 4(%esp), %eax        8b 44 24 04
   movl a(,%eax,4), %eax     8b 04 85 00 00 00 00
   ret                       c3

Using gcc -S -O2 -fno-pic has elided the frame pointer for this leaf routineA subroutine that uses only a few registers and calls no others. On the Cortex-M0, such a routine can use only registers @r0@ to @r3@. The code for a leaf routine need not establish a stack frame or save its return address in memory, but can leave it where it arrives (in register @lr@) and return directly (using the instruction @bx lr@). This is an important optimisation, particularly for programs that contain many small subroutines for the sake of data abstraction.. For entertainment, we can examine the disassembly shown at the right. The first movl instruction has

  • an opcode of 8b for load or move-to-register,
  • a ModR/M byte of 44 = 01|000|100 = (1, EAX, 4) that means the result goes in the EAX register, there is an 8-bit displacement, and there is an SIB byte.
  • an SIB byte of 24 = 00|100|100 = (0, 4, ESP) that means the addressing mode is ESP+d8.
  • a displacement d8 of 4.

It seems a shame that eliding the frame pointer makes us use SP-relative addressing, which is awkwardly encoded.

The other movl instruction makes better use of the encoding. It has:

  • an opcode of 8b again.
  • a ModR/M byte of 04 = 00|000|100 = (0, EAX, 4) that puts the result in the EAX register, implies a 32-bit displacement, and expects an SIB byte.
  • an SIB byte of 85 = 10|000|101 = (2, EAX, 5) that means the addressing mode is EAX<<2+d32.
  • a displacement that the linker will fill in with the address of a.

So far, so good.

Now, what happens if we compile for the AMD64The 64-bit variant of the Intel architecture used in PCs. So called because the instruction set extensions to support 64 bits was first introduced on chips designed by Advanced Micro Devices. Also known as x86_64. = x86_64 = whatever? Here is the code:

g:
    movslq %edi, %rdi
    movl a(,%rdi,4), %eax
    ret
  0:	48 63 ff             	movslq %edi,%rdi
  3:	8b 04 bd 00 00 00 00 	mov    0x0(,%rdi,4),%eax
  a:	c3                   	retq   

Compiled again without the -fno-pic: g: .LFB0: .cfi_startproc leaq a(%rip), %rax movslq %edi, %rdi movl (%rax,%rdi,4), %eax ret .cfi_endproc


  0:	48 8d 05 00 00 00 00 	lea    0x0(%rip),%rax        # 7 <g+0x7>
  7:	48 63 ff             	movslq %edi,%rdi
  a:	8b 04 b8             	mov    (%rax,%rdi,4),%eax
  d:	c3                   	retq
Personal tools

Variants
Actions
Navigation
Tools