264 Week 14 notes

Comp 264-002, Spring 2019,MWF, 11:30-12:20, Cuneo 218

Readings (from BOH3)

See the study guide on Sakai

Loop Unrolling

2×1 unrolling: add 2 array elements at a time using 1 accumulator
for(i=0; i<MAX; i+=2) {sum = (sum + A[i]) + A[i+1];}

The sums (or multiplications) must still be done sequentially. The best we can achieve is the latency limit.

2×2 unrolling: add 2 array elements at a time using 2 accumulators
for(i=0; i<MAX; i+=2) {sum1 += A[i]; sum2 += A[i+1];}

Now the operations can be done in parallel. This allows bypassing the latency limit, and may allow approaching the throughput limit.

Chapter 7: Linking

linking.html

PIC and dynamic libraries; the GOT and the PLT

Most linux system calls, when an error occurs, return -1 or 0. They also set the global variable errno, which can be examined to obtain additional error information. See the errno man page. (Note that some errors, such as segfaults, are not due to syscalls and so don't involve errno.)

Virtual Memory

Here are some things virtual memory accomplishes:

1. Makes it seem like there is more memory than there really is. If there are more virtual pages than physical pages, the extras are kept on the disk, in swap space (or, on windows, the pagefile). This is no longer so significant, though seldom-used memory structures do still get "paged out". From this perspective, DRAM acts as a cache for the disk.

2. Keeps processes isolated from one another. This is still a big deal. No process can touch the virtual memory of another process, except in limited cases when that is allowed.

3. Prevents memory fragmentation. With physical memory, if you have something allocated every 4KB, even just a few bytes long, then you cannot create a contiguous 8KB object. But with virtual memory, this is easy. The heap and the stack are far enough apart in virtual memory that they will, in practice, never collide. New pages can be added to either.

4. Allows faster I/O, by supporting mapping instead of copying. Traditionally, the kernel reads data from the hard disk or network interface into a buffer, and then has to copy the data into the user-process address space. But with virtual memory, the memory block containing the buffer can simply be mapped into the user-process address space, with no copying. Generally this is not the default

5. Allows intentional sharing of memory. For example, a certain buffer can be mapped into the address spaces of two different processes, allowing those processes a "window" through which they can communicate.

6. Allows per-page permissions. For example, the permissions "execute or write, but not both", sometimes abbreviated to X^W, means that if a page contains executable code, it must be read-only. If the virtual-memory system encounters a branch to a page without execute permission, then an error occurs. This prevents an attacker from writing the code that they will execute. (It does not prevent so-called "return from libc" attacks where the attacker selects a well-known section of C code to branch to, and that code, because it is not being executed starting at its intended entry point, has some unanticipated consequences.

fork() and exec()

This is how new processes are initialized on unix-based (Mac and Linux) systems. The fork() call creates the new process running the same program as the parent (the child and parent processes know which is which, though). The child sets up the new-process environment, and proceeds to launch the actual program requested via the exec() call.

The fork() system call creates a second process but with the same virtual-address pages, running the same program. The .text pages are the same, and so are, initially, the .data pages. However, the latter have the copy-on-write attribute, meaning that if the child process attempts to modify any data, then the applicable virtual-memory pages are copied first. This makes the fork() call a very lightweight operation.

The exec() system call then takes a new ELF file, and maps it in (in the sense of virtual memory) to the child address space (which has been prepared by the child after the fork() call). But these pages are not actually loaded into memory until they are needed. When the child process branches to the main() function of the new ELF file, a page fault occurs, and that virtual page is loaded from the disk to memory.

shared libraries()

Shared libraries depend on virtual memory. For one, virtual memory is what allows the shared-library data pages to be mapped in at a fixed offset relative to the text (code) pages. The GOT/PLT strategy depends on this. For another, virtual memory is what allows the shared-library text pages -- the pages that are actually shared -- to be marked as read-only to the individual processes involved. Finally, it is virtual memory that allows each process loading the library to share the text pages, but to each get their own copies of the data pages. The text + data looks too each sharing process like an integral unit, but they are quite different.