Readings (from BOH3)
Section 2.1 (you can skip 2.1.7 for now, on
bitwise operations, and 2.1.9 on shift operations)
Section 2.2 (don't sweat the B2Uw notation for now, though all it's doing
is formally defining the conversion from strings of bits to integers)
Section 2.3 on integer arithmetic (we'll save floating point, in 2.4, for
later)
Study guide
Linus Torvalds on ARM: https://www.realworldtech.com/forum/?threadid=183440&curpostid=183486
highbyte.c
Write a function endian(int x) that converts x from little-endian format to big-endian, and vice-versa. In other words, if the bytes of x are b0b1b2b3, then the function returns b3b2b1b0. If x = 0x0102a3b4, then the result of endian(x) is 0xb4a30201.
The function is the equivalent of ntohl(), defined in <arpa/inet.h>.
There are two general approaches:
1. Byte manipulation: cast &x to (unsigned char *). Now you have an array of four bytes, which you can easily reorder.
2. Numeric manipulation: use & and shifts to extract the bytes, and
then reassemble them./
int b0 = x & 0xff;
int b1 = (x & (0xff << 8)) >> 8;
...
result = (b0 << 24) + (b1 << 16) + ...
Test your function with some examples.
C: see c.html
Table on p 60: revisit two's-complement
take the low 31 bits as a positive number. Subtract the high-order bit, appropriately shifted (2^31 for 32-bit numbers).
Alternatively, if the sign bit is set on x, then ~x is positive, and x + ~x = 2^32 - 1 = -1, so x = -(~x)-1.
processor history: (intel_history.pdf slide from BOH2)
core i7, 0.78 billion transistors
core i7-Haswell 1.17 billion
transistors
core i7-Sandy Bridge 1.4 billion
transistors
x86-64 (sometimes called "x64")
Wednesday: look at mstore.c/mstore.s
The register file.
See also BOH3, p 180.
64-bit | 32-bit | 16-bit | 8-bit | |
%rax | %eax | %ax | %al | Often used for function return value |
%rbx | %ebx | %bx | %bl | |
%rcx | %ecx | %cx | %cl | arg4 |
%rdx | %edx | %dx | %dl | arg3 |
%rsi | %esi | %si | %sil | "index" register, arg2 |
%rdi | %edi | %di | %dil | arg1; see BOH p 245 |
%rbp | %ebp | %bp | %bpl | "base pointer", not always needed |
%rsp | %esp | %sp | %spl | stack pointer; some special hardware implications |
%r8 | %r8d | %r8w | %r8b | arg5; r8-r15 were added with x86-64 |
%r9 | %r9d | %r9w | %r9b | arg6 |
%r10 | %r10d | %r10w | %r10b | |
%r11 | %r11d | %r11w | %r11b | |
%r12 | %r12d | %r12w | %r12b | |
%r13 | %r13d | %r13w | %r13b | |
%r14 | %r14d | %r14w | %r14b | |
%r15 | %r15d | %r15w | %r15b | |
%rip | the instruction pointer; not directly available |
Each register has a 64-bit version shown, and also a 32 bit version formed (for the first eight) by changing r to e (eg %eax), a 16 bit version formed by dropping the r entirely (%ax), and an 8-bit version (%al, l for "low-byte").
The low-byte forms of %rdi, %rsi, %rbp and %rsp are %dil, %sil, %bpl and %spl respectively.
For %r8 through %r15, the 32-bit form is, eg, %r8d, the 16-bit form is %r8w, and the byte is %r8b.
In the 8086 CPU, only ax through dx had high/low byte versions, hence the inconsistent naming. The ax through dx registers were named "accumulator, base, counter and data", though those names meant nothing.The si and di registers accessed memory in conjunction with segment registers, which was an awkward workaround for expanding memory beyond 64KB.
In the demo file signs.s, there is an instruction movsbl %dil, %eax. The movsbl means "move signed from byte to long", and %eax is the 32-bit version of %rax. But %dil can be confusing; it is, as above, the 8-bit version of %rdi.
ATT v Intel notation
The register notation above is in ATT notation
In ATT style, mov
A,B moves data from A to B. In Intel, the direction is reversed
(like A = B)
ATT uses an instruction suffix q for quad-word, l for
32-bit long word, etc. Intel infers this from the operands
gcc -Og -S mstore.c
register use
gcc -c mstore.s: creates mstore.o (alternatively, use "gcc -c mstore.c")
objdump -d mstore.o
mult2.c
Operand formats (cf BOH3 p 181)
Move instructions (and most others) must have at least one operand be a register; memory-to-memory moves are disallowed.
immediate | movl $13, %eax | move decimal 13 into eax hex immediate operands use $0xdeadbeef format |
immediate src, indirect dest | movl 0x25,(%rdi) | move 37 to address pointed to by %rdi |
memory absolute | movl $0xdeadbeef,%eax | seldom used, except to probe certain fixed
addresses |
memory indirect | movl (%rbx), %rax | copy memory pointed to by %rbx to %rax |
base+displacement | movl 100(%rbx),%rax | copy memory 100 bytes past where %rbx points copy to %rax |
indexed | movl (%rbx,%rdi),%rax | memory at address %rbx+%rdi |
indexed with displacement | movl 16(%rbx,%rdi), %rax |
memory at address %rbx+%rdi + 16 |
scaled indirect | movl (,%rdi,4),%rax | memory address at 4*%rdi. Rare. Note comma. |
scaled indexed | movl (%rbx,%rdi, 4),%rax | memory address at %rbx + 4*%rdi. Common with arrays. |
scaled indirect with displacement |
||
scaled indexed with displacement |
Friday
Data move instructions, p 182:
movb/movw/movl/movq
How much data is being moved?
Registers as targets
Variations on moving a byte (or word) to word/long/quad, zero-extended:
But there is no movzlq! (Rather than worrying about why, make sure you understand the pattern). (In fact, the effect of movzlq can be achieved by movl, due to the rules of l-to-q zero-extension.)
movs does the same but with sign-extension. There is a movslq.
Practice Problem 32, p 185.
Determine, from looking at the operands, whether the moves below should be movb, movw, movl or movq:
mov %eax, (%rsp)
mov_ (%rax), %dx
mov_ $0xFF, %bl
mov_ (%rsp,%rdx,4), %dl
mov_ (%rdx), %rax
mov_ %dx, (%rax)
movb %dl,%al
mobsbq, movzbq %dl,%rax
Practice 3.5: decode(long *xp, long *yp, long * zp)
// xp is in %rdi, yp is in %rsi, zp is in %rdx (this is the standard allocation for arg1, arg2, arg3)
movq (%rdi),
%r8 // what is moved?
movq (%rsi), %rcx
movq (%rdx), %rax
movq %r8, (%rsi)
movq %rcx, (%rdx)
movq %rax, (%rdi)
Why don't we move (%rdi) directly to (%rsi)? (two reasons)
To what extent can these instructions be reordered?