Week 9 notes

Comp 264-002, Spring 2019,MWF, 11:30-12:20, Cuneo 218

Readings (from BOH3)

Chapter 3:

Section 3.1
Section 3.2
Section 3.3
Section 3.4
Section 3.5

dir assignment

Due Friday March 29 (changed!)

I have given you a file sdir.c that

You are to make two additions:

  1. Include the file size in the output, and also whether the file is a subdirectory (versus a "regular" file). I find the file size for you, using the stat() function, and also set the variable isdirectory. The tricky part is that you have to keep isdirectory and the file size together with the filename when doing the sorting, using a struct.
  2. Make the whole thing recursive: if a file is a subdirectory, print the contents of that subdirectory. Thus, if the current directory has a subdirectory foo, the output might look like this:

apple        7
banana      18
foo: directory
    .: directory
    ..: directory
    foo1.c    381
    foo2.c    476
sdir     18902
sdir.c    2088

I recommend getting stage 1 to work first, before starting on stage 2. You should also make the file sizes line up reasonably. To print a string padded on the right to make up 15 spaces, use printf("%-15s", str). To print a number in  a fixed field of width 10, with spaces at the left as needed, use printf("%10d", num).

To sort the triples of (filename, size, isdirectory), you will need to create a struct. The following will work:

struct fileinfo {
        long filesize;
        int  isdirectory;
        char * filename;

You will then have to modify cmpstringp to compare two pointers to struct fileinfo objects, basing the comparison on the filename field. You will also have to build fileinfo objects for each file returned by readdir().

The array files will now become an array of struct fileinfo, and the call to qsort will look something like this (with the new function named cmpfileinfop):

    qsort(files, filecount, sizeof(struct fileinfo), cmpfileinfop);

Once this is working, make printdir() recursive. That is, if, during the printing-out-the-files stage, you come upon a filename fname that is a directory, print fname and ": directory" and then call printdir() with the directory name dname + "/" + fname:

if (files[i].isdirectory) {
    printf(": directory\n");
    char * new_dname = dname + "/" + files[i].filename;    // won't compile as written; use strcat() instead!
    printdir(new_dname, indent+4);

That subdirectory will have its contents printed; when the recursive call returns, you resume with printing the original directory (eg with files[i+1]). However, you can't concatenate strings with "+" in C; you will have to use two calls to strcat() (See below). You should also increase the indent value, say to indent+4, or indent+8. I wrote a small function cat3() that takes the three strings dname, "/" and the filename, and concatenates them.

Remember: any time you copy a string for long-term storage, you will have to allocate space for the result with malloc(). (You really should then call free() when you're done with that string, but we'll worry about that later.)

The recursive case must ignore directories "." and "..". The simplest strategy is to ignore all directories with names beginning with '.'.

Windows users: The sdir.c program uses opendir()/readdir(). These are not available directly under windows (they work on macs and on linux). However, I have created a header file wdirent.h that makes these available. It has worked for me; it is possible something will need to be fixed though. To compile on Windows, delete the #include <dirent.h> and #include <sys/stat.h>, and replace them with #include "wdirent.h" (the quotation marks instead of <> mean that the file wdirent.h is not in a system location; it is in the same directory as sdir.c. Windows users will, in the recursive part, also have to form the subdirectory name as dname + "\" + fname; that is, with a backslash instead of a slash. But the backslash is the C string escape character, so it will really have to be dname + "\\" + fname.

The recursion does not take a lot of additional work. Don't overthink it.


To check the file size, with stat(),  you must append the current directory name, plus '/', to the front of the filename, at least when the current directory name is not ".". The filename "foo.c" is the same as "./foo.c", but the filename "foo.c" in subdirectory "foodir" must be referred to as "foodir/foo.c". I've updated sdir.c to handle this:

I've also added a check to the stat() call to see if it generates an error. This necessitated a change to wdirent.h, because my earlier version returned void. I also fixed another issue with stat() in wdirent.h, so please switch to the new version.

How to concatenate: for a one-time use, as is the case in the situation immediately above and when you are calling printdir() recursively, further above, the easiest way to concatenate dname, SEPARATOR and filename is to create a temporary buffer on the stack (I've defined SEPARATOR in sdir.c to be "/"; change it to "\\" on windows):

char buf[500];


strcpy(buf, dname);
strcat(buf, SEPARATOR);
strcat(buf, filename);

You still need malloc() if you won't be done with the buffer by the time you need to save the next string.

Update 2

Here is the sequence of steps:

1. Add the struct fileinfo type.

2. Change the array files to be an array of struct fileinfo

3. Update the files[filecount] references so that files[filecount].filename gets the string filename, and files[filecount].isdirectory and files[filecount].filesize are set appropriately. Also update the printout of the entries in files[] that follows qsort.

4. Update the call to qsort; the size of a component is now sizeof(struct fileinfo).

5. Update the comparison function. Here's the version I gave in class:

int cmpfileinfo(const void *p1, const void *p2) {
           /* The actual arguments to this function are "pointers to
              pointers to char", but strcmp(3) arguments are "pointers
              to char", hence the following cast plus dereference */

    struct fileinfo * sp1 = (struct fileinfo *) p1;
    struct fileinfo * sp2 = (struct fileinfo *) p2;
    return strcmp (sp1->filename, sp2->filename);

That should get everything working except the recursive directory. That goes at the point you're printing the entries in files[] after sorting. To check if a directory name is safe, check that filename[0] != '.'. But if you're submitting late, don't worry about this part.

x86 code in the news: hated and hunted (Fabian Wosar)

cmov example

Finish on Wednesday

cmove does not involve branch prediction. There is still a data dependency, but there always is. If we execute

    addq %rdi,%rdx
    addq %rdx,%rdx

then the CPU must keep track of the fact that the value that is supposed to be in %rdx after the first instruction will not be there until the add finishes and the register-writeback finishes. That is, the value in %rdx in the second instruction depends on the %rdi from the first instruction. Similarly, in the instructions

    cmpl    %edi, (%rsi,%rcx,4)     # %edi = cutoff (arg 1), %esi = A
    cmovge  %r9,%r8                 # if condition, put 1 into %r8

the value that ends up in %r8 depends on the previous value in %r8, the value in %r9, and the value of the condition codes after the cmpl instruction. So cmov can be thought of as a three-operand arithmetic instruction.

Using cmov, there is no need to dump anything from the instruction pipeline.

See BOH3 p 551. Use of the C ternary-if operator

    x>=0 ? x : -x

will often result in cmov code when using gcc.

    Linus Torvalds on cmove (he's wrong, actually, at least in general)

Chapter 3: procedure calls, floating-point registers

To call a subroutine (procedure or function):

  1. Set up any parameters. The first six go in %rdi, %rsi, %rdx, %rcx, %r8, %r9. Any additional parameters, or parameters too big to fit into a register (eg structs and arrays) get pushed on the stack.
  2. Get the subroutine's address
  3. push the current address on the stack (actually, the address of the next instruction to be executed)
  4. jump to the new subroutine's address

The procedure may allocate space for its local variables on the stack. Or not, if it can just use the registers.

To return:

  1. The return value should be in %rax
  2. call ret: pop the return address, and jump to it.
  3. pop any parameters that were pushed on the stack earlier.

If the subroutine wants to modify any callee-saved registers, it must push them on the stack before changing them, and pop them before exit. Typically, callee-saved registers are pushed before the local variables are allocated. The subroutine may modify any of the caller-saved registers without saving them.

Sometimes subroutine don't need to push anything on the stack: this is possible if all the local variables and parameters can be kept in registers, and the subroutine is a leaf procedure: it does not make any further subroutine calls.

Calling and returning is handled with callq address and ret.

The process for calling subroutines has become more streamlined over the years. The convention on the IBM/370 systems (popular until at least the mid-1990s, and still used in some forms) was to save all 16 registers on entry to any subroutine, whether or not all or even any of the registers would be modified.

If the code within a subroutine applies the address-of & operator to either a parameter or a local variable, then space for that parameter or variable must be allocated on the stack, even if it would otherwise fit into a register.

Allocation of space on the stack looks like subq $16, %rsp. That subtracts (decimal) 16 from the stack, creating space on the stack for 16 bytes. If these 16 bytes represent two long ints, x and y, then x is at address 8(%rsp) and y is at (%rsp).

Look at the code fact.c/fact.s. In fact.s, we save %rdi (n) in %rbx, after pushing the old contents of the latter.

Floating point

These are related to MMX / SSE / AVX / AVX2, with register sets MMX / XMM / YMM, a set of 64/128/256 bit registers for doing multiple operations in parallel.The AVX2 registers are sixteen 256-bit registers %ymm0 through %ymm15, with 128-bit halves denoted %xmm0 through %xmm15. When used for floating point data, only 64 bits are used (for double).

vmovss: move a 32-bit float
vmovsd: move a doublets

vcvttss2si: convert float to int, truncating the fractional part
vcvttsd2si: convert double to int
add q at the end to convert to long (64-bit int)

vcvtsi2ss: convert int to float
vcvtsi2sd: convert int to double
add q to convert from longt

Typical use: converts long int in %rax to double in %vmm1. Which is listed twice.

    vcvtsi2sdq  %rax, %vmm1, %vmm1

Floating-point arithmetic instructions all have two sources, and one destination:

    vaddsd  %vmm1, %vmm2, %vmm3
    vmulsd  %vmm1, %vmm3, %vmm1

Java byte code

Binary Bomb

phases 1 and 2

What is the secret phase?

Chapter 5:

combine1, BOH3 p 507

void combine1(vec_ptr v, data_t *result) {
    *result = IDENT;
    for (int i=0; i<vec_length(v); i++) {
        data_t val;
        get_vec_element(v, i, &val);
        *result = *result OP val;

with OP=+, this takes ~23 clocks per loop cycle for integers with no optimization flag at all, and ~10 with -O1 (BOH3 p 508)

Next steps:

2. move vec_length() call out of loop: 7 clocks (p 513; combine2())
3. Add data_t *get_vec_start(): does not help at all!  (p 513, combine3())
4. move indirect memory reference out of loop: 1.3 clocks (p 515, combine4())

Why can't the compiler optimizer figure out 2 and 4?

Out-of-order processing and branch prediction / memory access

strcmp loop example, BOH3 p 510

void lower1(char * s) {
    for (int i=0; i<strlen(s); i++) {
        if ('A' <= s[i] && s[i] <= 'Z') {
            s[i] += 'a' - 'A';

graphs with strlen in and out of loop

Why doesn't the optimizer fix this?

Data-flow analysis

Counting 1-bits: value of pre-computed tables.

Loop unrolling