Class 7

Comp 150-001, TTh, 11:00-2:00, DH-339

Class 7

Operating Systems

Some of this we discussed before, but the goal here is to look at what chip design brings to OS development.

In the PIP machine, programs get loaded starting at address 0. Once upon a time, real machines were programmed that way: you'd load a deck of punchcards containing your program into the reader, and the machine operator (who was seldom you) would press a button when it was your turn and your program would be loaded in as the program that the machine was working on right then.

Nowadays many programs can be in memory at once, and the starting address is seldom 0. (However, as we shall see below, it is at least possible for all programs to start at address 0, simultaneously. At least it can appear that way.)

Operating systems, for both mainframes and microcomputers, started as a standardized way of doing disk I/O. You might have exclusive access to the CPU in the old days, but you would still be sharing the disk drive, which had a complicated organization on it known as a file system, and your colleagues tended to take an extremely dim view of programs that scribbled randomly on this and wiped out other files. Plus, dealing with disk drives was complicated.

The early DOS acronym stood for "Disk Operating System" (and there was a mainframe DOS decades before the first PC-DOS / MS-DOS came out). Another early acronym was BIOS (Basic Input/Output System), which was essentially a small I/O library written onto a ROM chip. Nowadays the BIOS tends to control only how your machine boots up, and the various clock speeds and clock multipliers and bus assignments; modern OS's do not use the BIOS for I/O once they are running.

Demo: reboot the machine, start the BIOS, and adjust settings so as to render windows unbootable. Hmm.. maybe not.

I/O and Interrupts

When you read from a file on the disk, you get the data "immediately". But at the lowest level of hardware, a disk read means that first the OS (or whatever) identifies the specific data block, by "block number", and then puts that block number into a special I/O register. (Sometimes that just represents a special word of memory.) The disk hardware goes to work (probably putting the block number into its queue, as it probably has several other pending requests), and when the block is available the CPU receives an interrupt signal telling it to stop what it's doing, save the current value of the Program Counter, and run a program at a pre-set memory address called the interrupt handler. The interrupt handler will begin by saving any registers, so that when we return to the original task we can pick up as if nothing had happened.

The interrupt handler may read the data bytes one at a time from the disk controller. Newer systems use DMA (Direct Memory Access): the disk controller writes the disk block directly to a prearranged space in main memory, and the OS then just has to place this block in the correct spot.

Disks are organized into concentric tracks; each track is subdivided into sectors. These sectors are the actual blocks read. Access time for a sector is the seek time (the time to move the disk head to the correct track), plus the rotational latency in bringing the disk around. Average seek times have dropped from ~50ms to 10 ms or lower, but disk rotation speeds have made only modest gains. If a disk rotates at 7200 rpm, that's 120 r/sec, or about 8 ms per rotation. The average rotational latency is thus ~4 ms.

If the file you're reading is sequential, the OS can pre-fetch disk blocks in anticipation, but if you're reading from a database, then data accesses are typically pseudo-random non-sequential.

Keyboard interrupts work on a similar basis, except that usually the keyboard-interrupt handler just puts keystrokes as they occur into a designated buffer, until the program needs them.

Processes

Early DOS ran one program at a time. There were clever system hacks that would allow you to have two programs loaded simultaneously, switchable via a "hot key", and there were also various TSR (Terminate and Stay Ready) programs that would load themselves for the duration of the session, again activated on demand by some special key combination.

But one thing ran at a time.

Processes: allowing more than one program to be loaded, so that the OS could select between them. Early OS's just switched processes when the running process needed to wait for some I/O. At any one time, one process was running and there was a queue of other processes that were ready to run; other processes were waiting for something to happen before they could become ready. (One thing the OS did when the disk driver received a full data block was to move the process that had requested the block from waiting to ready). Using this terminology, a running process would give up the CPU only when it entered the waiting state (or if it finished). Later systems added a clock interrupt, every 10 ms or so, that the ready processes would share the CPU in "round-robin" (cyclic) fashion, each getting a "timeslice" in turn.

A part of the OS called the scheduler determines which process had the next chance to run. Sometimes some processes have higher priority than others; protracted round-robin sharing of the CPU is actually pretty rare. One common strategem is to raise the priority of processes that have a history of only asking for the CPU for very brief bursts, on the theory that (a) such programs are likely to only need a brief burst, and (b) such programs are likely to be interactive, requiring rapid response.

Protection

Up until Windows NT (circa 1993?), most PC operating systems made no use of privilege layers. That is, a rogue process could scribble all over the memory of other processes or the system, corrupting it and usually causing a system crash. However, larger systems since the 1960's all made use of a two-layer (or more) privilege model. The CPU could be in user mode or supervisor mode. Certain opcodes were privileged, in that they could only be executed in supervisor mode. Typically these included any I/O operations, and any operations that involved memory access outside of your assigned region. In particular, low memory (eg the first 64 KB, or the first 1 MB) could only be accessed in supervisor mode.

User programs ran in user mode; supervisor mode could be invoked through a trap instruction. In protected low memory there was a table of trap handlers: addresses to a series of specific functions. The trap table starting address was built into the chip architecture, but the addresses contained in the table could be arbitrary (though they were always also in protected memory). When your program needed to do something privileged, it would place its data in registers, and then invoke the trap instruction with appropriate parameter. The hardware result of executing the trap instruction in user mode is to (a) switch to supervisor mode, but (b) branch immediately to the designated address in the trap table. In other words, your program could switch to supervisor mode whenever it wanted to, but it couldn't choose what to run when there. The trap-handling functions were always careful to return to user mode just before returning control to the user program.

Thus, execution of a typical read system call would proceed as follows:
    Place file handle of file into Register 0
    Call the TRAP instruction, with operand indicating a read.
          Supervisor mode:

verify that the read is allowed
figure out which disk block it corresponds to
request the data (perhaps running other processes while we wait)
wait for the data
copy the data into the user-program buffer
return to user mode
pick up the program where we left off

Process that block of data

Thus, the TRAP instruction would look to the user's program like a giant opcode that took care of reading the block of data.

Typically, the OS would return to user mode as soon as possible, perhaps well before returning control to the user program. This was done to minimize risks that somehow the user could "get away with something unintended".

Privilege layers allowed the operating system to limit access to files based on "ownership": files each had attributes indicating the "owner" and what "permissions" were granted (in unix, to the owner, the group, and to everybody). The owner could set those privileges. The open system call always verified that the caller had permission to open the file before allowing it. A user-mode program that tried to access the disk directly would discover that the opcode for requesting data from the disk drive was privileged: user-mode programs had to go through the trap mechanism to read from the disk, and the trap mechanism enforced the rules.

Memory protection was also important. An early strategy was to provide a pair of bounds registers. In user mode, you could only access memory (either as instructions or as data) that lay between the bounds registers. (Some systems let all memory be read, but only allowed writes to locations between the bounds registers.) By setting the lower bounds register to 0x100 000, you would thus make the first megabyte (2²⁰ = 0x100 000) inaccessable from user mode. But you could also have multiple processes each guaranteed protection from one another. Process A might have memory from 0x100 000 to 0x300 000, and process B might have from 0x400 to 0x500 000. Before switching to a process, the scheduler (running in supervisor mode) would load that process's bounds registers; upon switching to another process, the scheduler would reload the bounds registers.

(I've suggested here that the bounds registers held the low address and the high address allowed to the process. Real systems might be more likely to have one register (the base register) holding the low address, and then a length register holding the length of the memory segment; the high bound would thus be base+length. See CSI3, Chapter 10, for more information, esp Fig 10.6 on p 334)

The core OS thus "lived" in the trap-handler library. This part of the OS is sometimes called the kernel. The OS also has its own libraries, and its own processes (often running in user mode).

Device drivers typically must run in supervisor mode, and are often supplied by third parties. To guard against chaos, some chipsets (notably Intel's newer architectures) provide more than two privilege levels. An intermediate level for device drivers might be allowed direct access to the "bus", thus allowing, say, video updates, but stil have to live with the bounds-registers memory restrictions, thus allowing only limited opportunity to crash the system.

Virtual memory

Not all the computer memory you use is necessarily there. Typically there is physical memory, the installed RAM with addresses from 0 up to some maximum (eg 2³¹-1, for 2 GB, or 2³⁰-1 for 1 GB, or 2²⁰-1 for 1 MB). However, physical memory is then divided into blocks, or pages, typically 4KB each (12 bits' worth; 12 bits are needed to specify the position of a byte within a block). Each individual process is allocated however many blocks it needs, and the addresses are "remapped" on the fly so that the "virtual" blocks can be contiguous (or can be two or three widely separated contiguous blocks), but the physical blocks can be allocated in any manner.

For 32-bit addressing, the 4KB block size means the low 12 bits are left alone; they refer to the position within the block. The upper 20 bits specify the block, and special hardware replaces the virtual block number with the appropriate corresponding physical block number. (draw picture)

One of the main practical advantages is that a process can ask for memory only as it needs it, all the while other processes are also asking for memory in dribs and drabs, and yet memory does not end up fragmented: each process sees memory added contiguous with its previous allotment.

Once upon a time, pages that had not been used in a while could be written to disk, so as to expand the apparent physical memory by a great deal. However, with RAM so cheap today, this isn't necessarily worth the effort. Demo: look at c:\pagefile.sys under windows.

Virtual memory needs special hardware, called a memory management unit. Once upon a time this would have been an external chip, though MMUs are built into all current Intel offerings. The memory accesses of a given process are interpreted in light of the current page table, which can only be changed by the OS kernel; again, the scheduler loads the process's page table before running that process. Note that with virtual memory, one process can not only not write to the memory of another process, it cannot even see where it is.

There is a good description in CSI3, pp 335-337, with an illustration.

Note that each process typically has three main memory areas: the text area, for the machine code, the stack, which can expand and contract linearly, and the heap, which expands but generally doesn't contract. Typically these three areas are widely separated, with nothing allocated in between.

Buses

There's the PCI bus, and the memory bus. Once upon a time, they were one and the same, but PCI runs at 133MHz, which is generally much slower than memory.

A good picture can be found at http://en.wikipedia.org/wiki/Front-side_bus. The CPU connects to its cache via the very-high-speed "back-side" bus. The front-side bus connects first to the "Northbridge", from which hang the RAM and the AGP graphics slot, and then goes on to the "Southbridge", which handles the PCI bus (and things tied to that, like ethernet).

The Stack

When one function calls another, we need to save the current Program Counter somewhere (representing the address we return to), and then either the called or the caller must save any registers that must be reused. One approach is for each function to set aside some chunk of memory for that purpose. However, this disallows recursion: having a function call itself.

The stack is a chunk of memory onto which we can push values, and then later pop them off, in (more-or-less) strict last-in-first-out order. The active end of the stack is typically held in a register called the stack pointer, or SP. The idea is that the caller pushes the PC onto the stack; when the called function is done it pops off the return address and branches to it. Any function parameters are also pushed onto the stack.

In order to allow recursion, each separate "invocation" of a function has to have its local variables in a separate place. This leads to using the stack to hold all local variables as well, in a stack frame. A second register, called the frame pointer, points to the current stack frame (this allows SP to grow further for such things as dynamically created local space).

Example:
    def fact(n):
       if n==1: return 1
       else return n* fact(n-1)

Let's follow the evaluation of fact(4), as called from the main python interaction loop. We push the parameter 4 and the main-loop return address.

As n=4, not 1, we then call fact(3), leading us to push the parameter 3 and the return address inside of fact() where we will resume.

This leads to calling fact(2); we push the parameter 2 and the same return address in fact(). The same thing happens when we call fact(1).

But fact(1) doesn't call anything further; it just returns 1. We now return to the fact(2) instance, having evaluated fact(1)=1; we return 2*1.

This returns us to the fact(3) instance, with fact(2)=2. We compute 3*2, and return that to the fact(4) instance.

This returns us to the fact(4) instance, with fact(3)=6. We compute 4*6=24, and return that now to the python top level.

Stacks don't require any special hardware support; any registers can be used for the Stack Pointer and Frame Pointer. However, if interrupts are to be handled by pushing the current state onto the stack, then SP does have to be designated in hardware. Most systems since the 1970s take this approach.

Buffer Overflow Attacks

Smashing the stack for fun and profit.

Python

def collatz(n):
    count = 0
    while n!=1:
        if n%2 == 0: n=n/2
        else: n=3*n+1
        count += 1
    return count

index versus value: lab 3
    vals = map (collatz, range(1000))
    max(vals)
    vals.index(178)

map (function, list)
filter (function, list)
    [x for x in list if function(x)]
reduce(function(x,y), list)