Week 6 notes

Comp 264-002, Spring 2019,MWF, 11:30-12:20, Cuneo 218

Readings (from BOH3)

Chapter 1 (though we haven't covered much of section 9 yet, on concurrency)

Section 2.1 (you can skip 2.1.7 for now, on bitwise operations, and 2.1.9 on shift operations)
Section 2.2 (don't sweat the B2Uw notation for now, though all it's doing is formally defining the conversion from strings of bits to integers)
Section 2.3 on integer arithmetic (we'll save floating point, in 2.4, for later)

Study guide

Linus Torvalds on ARM: https://www.realworldtech.com/forum/?threadid=183440&curpostid=183486


Programming assignment 2

Write a function endian(int x) that converts x from little-endian format to big-endian, and vice-versa. In other words, if the bytes of x are b0b1b2b3, then the function returns b3b2b1b0. If x = 0x0102a3b4, then the result of endian(x) is 0xb4a30201.

The function is the equivalent of ntohl(), defined in <arpa/inet.h>.

There are two general approaches:

1. Byte manipulation: cast &x to (unsigned char *). Now you have an array of four bytes, which you can easily reorder.

2. Numeric manipulation: use & and shifts to extract the bytes, and then reassemble them./
    int b0 = x & 0xff;
    int b1 = (x & (0xff << 8)) >> 8;
    result = (b0 << 24) + (b1 << 16) + ...

Test your function with some examples.

C: see c.html

Table on p 60: revisit two's-complement

    take the low 31 bits as a positive number. Subtract the high-order bit, appropriately shifted (2^31 for 32-bit numbers).

    Alternatively, if the sign bit is set on x, then ~x is positive, and x + ~x = 2^32 - 1 = -1, so x = -(~x)-1.

Machine code

processor history: (intel_history.pdf slide from BOH2)
    core i7, 0.78 billion transistors
    core i7-Haswell    1.17 billion transistors
    core i7-Sandy Bridge    1.4 billion transistors

x86-64 (sometimes called "x64")

Wednesday: look at mstore.c/mstore.s

x-64 cheat sheet

The register file.

See also BOH3, p 180.

64-bit 32-bit 16-bit 8-bit
%rax %eax %ax %al Often used for function return value
%rbx %ebx %bx %bl
%rcx %ecx %cx %cl arg4
%rdx %edx %dx %dl arg3
%rsi %esi %si %sil "index" register, arg2
%rdi %edi %di %dil arg1; see BOH p 245
%rbp %ebp %bp %bpl "base pointer", not always needed
%rsp %esp %sp %spl stack pointer; some special hardware implications
%r8 %r8d %r8w %r8b arg5; r8-r15 were added with x86-64
%r9 %r9d %r9w %r9b arg6
%r10 %r10d %r10w %r10b
%r11 %r11d %r11w %r11b
%r12 %r12d %r12w %r12b
%r13 %r13d %r13w %r13b
%r14 %r14d %r14w %r14b
%r15 %r15d %r15w %r15b

the instruction pointer; not directly available

Each register has a 64-bit version shown, and also a 32 bit version formed (for the first eight) by changing r to e (eg %eax), a 16 bit version formed by dropping the r entirely (%ax), and an 8-bit version (%al, l for "low-byte").

The low-byte forms of %rdi, %rsi, %rbp and %rsp are %dil, %sil, %bpl and %spl respectively.

For %r8 through %r15, the 32-bit form is, eg, %r8d, the 16-bit form is %r8w, and the byte is %r8b.

In the 8086 CPU, only ax through dx had high/low byte versions, hence the inconsistent naming. The ax through dx registers were named "accumulator, base, counter and data", though those names meant nothing.The si and di registers accessed memory in conjunction with segment registers, which was an awkward workaround for expanding memory beyond 64KB.

In the demo file signs.s, there is an instruction movsbl  %dil, %eax. The movsbl means "move signed from byte to long", and %eax is the 32-bit version of %rax. But %dil can be confusing; it is, as above, the 8-bit version of %rdi.

ATT v Intel notation
    The register notation above is in ATT notation
    In ATT style, mov A,B moves data from A to B. In Intel, the direction is reversed (like A = B)
    ATT uses an instruction suffix q for quad-word, l for 32-bit long word, etc. Intel infers this from the operands

mstore.c, mstore.s

    gcc -Og -S mstore.c

    register use

    gcc -c mstore.s:     creates mstore.o (alternatively, use "gcc -c mstore.c")

    objdump -d mstore.o


Operand formats (cf BOH3 p 181)

Move instructions (and most others) must have at least one operand be a register; memory-to-memory moves are disallowed.

immediate movl    $13, %eax move decimal 13 into eax
hex immediate operands use $0xdeadbeef format
immediate src, indirect dest movl    0x25,(%rdi) move 37 to address pointed to by %rdi
memory absolute movl  $0xdeadbeef,%eax seldom used, except to probe certain fixed addresses
memory indirect movl  (%rbx), %rax copy memory pointed to by %rbx to %rax
base+displacement movl 100(%rbx),%rax copy memory 100 bytes past where %rbx points
copy to %rax
indexed movl (%rbx,%rdi),%rax memory at address %rbx+%rdi
indexed with displacement movl 16(%rbx,%rdi), %rax
memory at address %rbx+%rdi + 16
scaled indirect movl (,%rdi,4),%rax memory address at 4*%rdi. Rare. Note comma.
scaled indexed movl (%rbx,%rdi, 4),%rax memory address at %rbx + 4*%rdi.
Common with arrays.
scaled indirect with displacement

scaled indexed with displacement


Data move instructions, p 182:


How much data is being moved?

Registers as targets

Variations on moving a byte (or word) to word/long/quad, zero-extended:

But there is no movzlq! (Rather than worrying about why, make sure you understand the pattern). (In fact, the effect of movzlq can be achieved by movl, due to the rules of l-to-q zero-extension.)

movs does the same but with sign-extension. There is a movslq.

Practice Problem 32, p 185.

Determine, from looking at the operands, whether the moves below should be movb, movw, movl or movq:

mov    %eax, (%rsp)

mov_    (%rax), %dx

mov_    $0xFF, %bl

mov_    (%rsp,%rdx,4), %dl

mov_    (%rdx), %rax

mov_    %dx, (%rax)

movb %dl,%al
mobsbq, movzbq %dl,%rax

exchange.c, exchange.s

Practice 3.5: decode(long *xp, long *yp, long * zp)

// xp is in %rdi, yp is in %rsi, zp is in %rdx (this is the standard allocation for arg1, arg2, arg3)

movq    (%rdi), %r8        // what is moved?
movq    (%rsi), %rcx
movq    (%rdx), %rax
movq    %r8, (%rsi)
movq    %rcx, (%rdx)
movq    %rax, (%rdi)

Why don't we move (%rdi) directly to (%rsi)? (two reasons)

To what extent can these instructions be reordered?