Week 7 notes

Comp 264-002, Spring 2019,MWF, 11:30-12:20, Cuneo 218

Readings (from BOH3)

Chapter 1 (though we haven't covered much of section 9 yet, on concurrency)

Section 2.1 (you can skip 2.1.7 for now, on bitwise operations, and 2.1.9 on shift operations)
Section 2.2 (don't sweat the B2Uw notation for now, though all it's doing is formally defining the conversion from strings of bits to integers)
Section 2.3 on integer arithmetic
Section 2.4 on floating-point arithmetic

Section 3.1
Section 3.2
Section 3.3
Section 3.4

Study guide

Machine code

x-64 cheat sheet

%rax	Often used for function return value
%rbx
%rcx	arg4
%rdx	arg3
%rsi	"index" register, arg2
%rdi	arg1; see BOH p 245
%rbp	"base pointer", not always needed
%rsp	stack pointer; some special hardware implications
%r8	arg5; r8-r15 were added with x86-64
%r9	arg6
%r10
%r11
%r12
%r13
%r14
%r15
%rip	the instruction pointer; not directly available

Operand formats (cf BOH3 p 181)

Move instructions (and most others) must have at least one operand be a register; memory-to-memory moves are disallowed.

immediate	movl $13, %eax	move decimal 13 into eax hex immediate operands use $0xdeadbeef format
immediate	movl 0x25,(%rdi)	move 37 to address pointed to by %rdi
memory absolute	movl $0xdeadbeef,%eax	seldom used, except to probe certain fixed addresses
memory indirect	movl (%rbx), %rax	copy memory pointed to by %rbx to %rax
base+displacement	movl 100(%rbx),%rax	copy memory 100 bytes past where %rbx points copy to %rax
indexed	movl (%rbx,%rdi),%rax	memory at address %rbx+%rdi
indexed with displacement	movl 16(%rbx,%rdi), %rax	memory at address %rbx+%rdi + 16
scaled indirect	movl (,%rdi,4),%rax	memory address at 4*%rdi. Rare. Note comma.
scaled indexed	movl (%rbx,%rdi, 4),%rax	memory address at %rbx + 4*%rdi. Common with arrays.
scaled indirect with displacement
scaled indexed with displacement

Monday:

Practice 3.5: decode(long *xp, long *yp, long * zp)

// xp is in %rdi, yp is in %rsi, zp is in %rdx (this is the standard allocation for arg1, arg2, arg3)

movq    (%rdi), %r8        // what is moved?
movq    (%rsi), %rcx
movq    (%rdx), %rax
movq    %r8, (%rsi)
movq    %rcx, (%rdx)
movq    %rax, (%rdi)

Why don't we move (%rdi) directly to (%rsi)? (two reasons)

To what extent can these instructions be reordered?

Wednesday:

adder2.c

mul37.c; multest.c

mul.c; muul.c; mulskel.c; mul.sh

3.6: Conditional execution (not on midterm)

Condition codes, set by add and sub (but not lea)

CF: unsigned overflow (carry)
OF: signed overflow
ZF: last result was zero
SF: last result was negative

The last two are used in comparison operations:

cmpb
cmpw
cmpl
cmpq

Cmp S,D sets the ZF and SF condition codes as if it had calculated D-S (which is sort of backwards in ATT notation). If ZF is set, then D==S. If SF is set, then D<S. CF and OF are also set; CF is set if D<S via unsigned comparison.

There are also testb through testq, based on D & S instead of D-S.

There are a series of set instructions for setting a one-byte register to 1 if some combination of the condition codes is true. These are rarely used.

By far the most common user of condition codes are the conditional-jump instructions:

jmp label
jmp *operand eg jmp *%rax or jmp *(%rbx)
je label
jne label jump if nonzero
js label jump if negative
jns label
jg label greater: D>S, signed comparison
jge label
jl label less: D<S, signed comparison
jle label
ja/jae label
jb/jbe label below: checks for CF. Unsigned comparison, in other words.

If we're writing code for unsigned comparison, we'll use j/jb. Signed comparison will use js/jg/jl

Friday: demo of that. Just finished absdiff_se and jl

Jump instructions are encoded using the relative offset to the destination. This value is added (via signed addition) to the program counter; hence the name PC-relative addressing. The offset can be 1, 2 or 4 bytes.

Intel disallows a jmp to a ret instruction, so a rep instruction is often inserted just before.

absdiff_se: does the goto version lead to the same code?

absdiff_cm

Why would conditional moves ever be an improvement?

cmove S,R move if equal (ZF)
cmovne
cmovs negative
cmovg/ge signed greater
cmovl/le
cmova unsigned greater
cmovb

branch prediction

Conditional moves from memory, and segfaults

long cread(long *xp) {
return (xp ? *xp : 0);
}

This must not use cmov. Why?

if-else

while loops:

dofact.c: there is only one label

while.c: Note that the condition has been moved to the bottom. Does this improve anything? This is an example of the jump-to-middle format.

whilefact.c

whilefact.Og.s: jump-to-middle

whilefact.O1.s: guarded-do example

forfact.c: jump-to-middle.

sdir.c