Comp 150-001, TTh, 11:00-2:00, DH-339
Class 7
Operating Systems
Some of this we discussed before, but the goal here is to look at what chip design brings to OS development.
In the PIP machine, programs get loaded starting at address 0. Once
upon a time, real machines were programmed that way: you'd load a deck
of punchcards containing your program into the reader, and the machine
operator (who was seldom you) would press a button when it was your
turn and your program would be loaded in as the program that the machine was working on right then.
Nowadays many programs can be in memory at once, and the starting
address is seldom 0. (However, as we shall see below, it is at least possible for all programs to start at address 0, simultaneously. At least it can appear that way.)
Operating systems, for both mainframes and microcomputers, started as a
standardized way of doing disk I/O. You might have exclusive access to
the CPU in the old days, but you would still be sharing the disk drive,
which had a complicated organization on it known as a file system,
and your colleagues tended to take an extremely dim view of programs
that scribbled randomly on this and wiped out other files. Plus,
dealing with disk drives was complicated.
The early DOS acronym stood for "Disk Operating System" (and there was
a mainframe DOS decades before the first PC-DOS / MS-DOS came out).
Another early acronym was BIOS (Basic Input/Output System), which was
essentially a small I/O library written onto a ROM chip. Nowadays the
BIOS tends to control only how your machine boots up, and the various
clock speeds and clock multipliers and bus assignments; modern OS's do
not use the BIOS for I/O once they are running.
Demo: reboot the machine, start the BIOS, and adjust settings so as to render windows unbootable. Hmm.. maybe not.
I/O and Interrupts
When you read from a file on the disk, you get the data "immediately".
But at the lowest level of hardware, a disk read means that first the
OS (or whatever) identifies the specific data block, by "block number",
and then puts that block number into a special I/O register. (Sometimes
that just represents a special word of memory.) The disk hardware goes
to work (probably putting the block number into its queue, as it
probably has several other pending requests), and when the block is
available the CPU receives an interrupt signal
telling it to stop what it's doing, save the current value of the
Program Counter, and run a program at a pre-set memory address called
the interrupt handler. The
interrupt handler will begin by saving any registers, so that when we
return to the original task we can pick up as if nothing had happened.
The interrupt handler may read the data bytes one at a time from the
disk controller. Newer systems use DMA (Direct Memory Access): the disk
controller writes the disk block directly to a prearranged space in
main memory, and the OS then just has to place this block in the
correct spot.
Disks are organized into concentric tracks; each track is subdivided into sectors. These sectors are the actual blocks read. Access time for a sector is the seek time (the time to move the disk head to the correct track), plus the rotational latency
in bringing the disk around. Average seek times have dropped from ~50ms
to 10 ms or lower, but disk rotation speeds have made only modest
gains. If a disk rotates at 7200 rpm, that's 120 r/sec, or about 8 ms
per rotation. The average rotational latency is thus ~4 ms.
If the file you're reading is sequential, the OS can pre-fetch
disk blocks in anticipation, but if you're reading from a database,
then data accesses are typically pseudo-random non-sequential.
Keyboard interrupts work on a similar basis, except that usually the
keyboard-interrupt handler just puts keystrokes as they occur into a
designated buffer, until the program needs them.
Processes
Early DOS ran one program at a time. There were clever system hacks
that would allow you to have two programs loaded simultaneously,
switchable via a "hot key", and there were also various TSR (Terminate
and Stay Ready) programs that would load themselves for the duration of
the session, again activated on demand by some special key combination.
But one thing ran at a time.
Processes: allowing more than one program to be loaded, so that the OS
could select between them. Early OS's just switched processes when the
running process needed to wait for some I/O. At any one time, one
process was running and there was a queue of other processes that were ready to run; other processes were waiting
for something to happen before they could become ready. (One
thing the OS did when the disk driver received a full data block was to
move the process that had requested the block from waiting to ready). Using this terminology, a running process would give up the CPU only when it entered the waiting state (or if it finished). Later systems added a clock interrupt, every
10 ms or so, that the ready processes would share the CPU in
"round-robin" (cyclic) fashion, each getting a "timeslice" in turn.
A part of the OS called the scheduler
determines which process had the next chance to run. Sometimes some
processes have higher priority than others; protracted round-robin
sharing of the CPU is actually pretty rare. One common strategem is to
raise the priority of processes that have a history of only asking for
the CPU for very brief bursts, on the theory that (a) such programs are
likely to only need a brief burst, and (b) such programs are likely to
be interactive, requiring rapid response.
Protection
Up until Windows NT (circa 1993?), most PC operating systems made no use of privilege layers.
That is, a rogue process could scribble all over the memory of other
processes or the system, corrupting it and usually causing a system
crash. However, larger systems since the 1960's all made use of a
two-layer (or more) privilege model. The CPU could be in user mode or supervisor mode. Certain opcodes were privileged,
in that they could only be executed in supervisor mode. Typically these
included any I/O operations, and any operations that involved memory
access outside of your assigned region. In particular, low memory (eg
the first 64 KB, or the first 1 MB) could only be accessed in
supervisor mode.
User programs ran in user mode; supervisor mode could be invoked through a trap instruction. In protected low memory there was a table of trap handlers:
addresses to a series of specific functions. The trap table starting
address was built into the chip architecture, but the addresses
contained in the table could be arbitrary (though they were always also
in protected memory). When your program needed to do something
privileged, it would place its data in registers, and then invoke the
trap instruction with appropriate parameter. The hardware result of
executing the trap instruction in user mode is to (a) switch to
supervisor mode, but (b) branch immediately to the designated address
in the trap table. In other words, your program could switch to
supervisor mode whenever it wanted to, but it couldn't choose what to run when there. The trap-handling functions were always careful to return to user mode just before returning control to the user program.
Thus, execution of a typical read system call would proceed as follows:
Place file handle of file into Register 0
Call the TRAP instruction, with operand indicating a read.
Supervisor mode:
- verify that the read is allowed
- figure out which disk block it corresponds to
- request the data (perhaps running other processes while we wait)
- wait for the data
- copy the data into the user-program buffer
- return to user mode
- pick up the program where we left off
Process that block of data
Thus, the TRAP instruction would look to the user's program like a giant opcode that took care of reading the block of data.
Typically, the OS would return to user mode as soon as possible,
perhaps well before returning control to the user program. This was
done to minimize risks that somehow the user could "get away with
something unintended".
Privilege layers allowed the operating system to limit access to files
based on "ownership": files each had attributes indicating the "owner"
and what "permissions" were granted (in unix, to the owner, the group, and to everybody). The owner could set those privileges. The open
system call always verified that the caller had permission to open the
file before allowing it. A user-mode program that tried to access the
disk directly would discover that the opcode for requesting data from
the disk drive was privileged: user-mode programs had to go through the trap mechanism to read from the disk, and the trap mechanism enforced the rules.
Memory protection was also important. An early strategy was to provide a pair of bounds registers.
In user mode, you could only access memory (either as instructions or
as data) that lay between the bounds registers. (Some systems let all
memory be read, but only allowed writes to locations between the bounds
registers.) By setting the lower bounds register to 0x100 000, you
would thus make the first megabyte (220 = 0x100 000)
inaccessable from user mode. But you could also have multiple processes
each guaranteed protection from one another. Process A might have
memory from 0x100 000 to 0x300 000, and process B might have from 0x400
to 0x500 000. Before switching to a process, the scheduler (running in
supervisor mode) would load that process's bounds registers; upon
switching to another process, the scheduler would reload the bounds
registers.
(I've suggested here that the bounds registers held the low address and
the high address allowed to the process. Real systems might be more
likely to have one register (the base register) holding the low address, and then a length
register holding the length of the memory segment; the high bound would
thus be base+length. See CSI3, Chapter 10, for more information, esp
Fig 10.6 on p 334)
The core OS thus "lived" in the trap-handler library. This part of the OS is sometimes called the kernel. The OS also has its own libraries, and its own processes (often running in user mode).
Device drivers typically must run in supervisor mode, and are
often supplied by third parties. To guard against chaos, some chipsets
(notably Intel's newer architectures) provide more than two privilege
levels. An intermediate level for device drivers might be allowed
direct access to the "bus", thus allowing, say, video updates, but stil
have to live with the bounds-registers memory restrictions, thus
allowing only limited opportunity to crash the system.
Virtual memory
Not all the computer memory you use is necessarily there. Typically there is physical memory, the installed RAM with addresses from 0 up to some maximum (eg 231-1, for 2 GB, or 230-1 for 1 GB, or 220-1 for 1 MB). However, physical memory is then divided into blocks, or pages,
typically 4KB each (12 bits' worth; 12 bits are needed to specify the
position of a byte within a block). Each individual process is
allocated however many blocks it needs, and
the addresses are "remapped" on the fly so that the "virtual" blocks
can be contiguous (or can be two or three widely separated contiguous
blocks), but the physical blocks can be allocated in any manner.
For 32-bit addressing, the 4KB block size means the low 12 bits are
left alone; they refer to the position within the block. The upper 20
bits specify the block, and special hardware replaces the virtual block number with the appropriate corresponding physical block number. (draw picture)
One of the main practical advantages is that a process can ask for
memory only as it needs it, all the while other processes are also
asking for memory in dribs and drabs, and yet memory does not end up fragmented: each process sees memory added contiguous with its previous allotment.
Once upon a time, pages that had not been used in a while could be
written to disk, so as to expand the apparent physical memory by a
great deal. However, with RAM so cheap today, this isn't necessarily
worth the effort. Demo: look at c:\pagefile.sys under windows.
Virtual memory needs special hardware, called a memory management unit.
Once upon a time this would have been an external chip, though MMUs are
built into all current Intel offerings. The memory accesses of a given
process are interpreted in light of the current page table,
which can only be changed by the OS kernel; again, the scheduler loads
the process's page table before running that process. Note that with
virtual memory, one process can not only not write to the memory of
another process, it cannot even see where it is.
There is a good description in CSI3, pp 335-337, with an illustration.
Note that each process typically has three main memory areas: the text area, for the machine code, the stack, which can expand and contract linearly, and the heap,
which expands but generally doesn't contract. Typically these three
areas are widely separated, with nothing allocated in between.
Buses
There's the PCI bus, and the memory bus. Once upon a time, they were
one and the same, but PCI runs at 133MHz, which is generally much
slower than memory.
A good picture can be found at http://en.wikipedia.org/wiki/Front-side_bus.
The CPU connects to its cache via the very-high-speed "back-side" bus.
The front-side bus connects first to the "Northbridge", from which hang
the RAM and the AGP graphics slot, and then goes on to the
"Southbridge", which handles the PCI bus (and things tied to that, like
ethernet).

The Stack
When one function calls another, we need to save the current Program
Counter somewhere (representing the address we return to), and then
either the called or the caller must save any registers that must be
reused. One approach is for each function to set aside some chunk of
memory for that purpose. However, this disallows recursion: having a function call itself.
The stack is a chunk of memory onto which we can push values, and then later pop
them off, in (more-or-less) strict last-in-first-out order. The active
end of the stack is typically held in a register called the stack pointer, or SP.
The idea is that the caller pushes the PC onto the stack; when the
called function is done it pops off the return address and branches to
it. Any function parameters are also pushed onto the stack.
In order to allow recursion, each separate "invocation" of a function
has to have its local variables in a separate place. This leads to
using the stack to hold all local variables as well, in a stack frame. A second register, called the frame pointer, points to the current stack frame (this allows SP to grow further for such things as dynamically created local space).
Example:
def fact(n):
if n==1: return 1
else return n* fact(n-1)
Let's follow the evaluation of fact(4), as called from the main python
interaction loop. We push the parameter 4 and the main-loop return
address.
As n=4, not 1, we then call fact(3), leading us to push the parameter 3
and the return address inside of fact() where we will resume.
This leads to calling fact(2); we push the parameter 2 and the same
return address in fact(). The same thing happens when we call fact(1).
But fact(1) doesn't call anything further; it just returns 1. We now
return to the fact(2) instance, having evaluated fact(1)=1; we return
2*1.
This returns us to the fact(3) instance, with fact(2)=2. We compute 3*2, and return that to the fact(4) instance.
This returns us to the fact(4) instance, with fact(3)=6. We compute 4*6=24, and return that now to the python top level.
Stacks don't require any special hardware support; any registers can be
used for the Stack Pointer and Frame Pointer. However, if interrupts
are to be handled by pushing the current state onto the stack, then SP
does have to be designated in hardware. Most systems since the 1970s
take this approach.
Buffer Overflow Attacks
Smashing the stack for fun and profit.
Python
def collatz(n):
count = 0
while n!=1:
if n%2 == 0: n=n/2
else: n=3*n+1
count += 1
return count
index versus value: lab 3
vals = map (collatz, range(1000))
max(vals)
vals.index(178)
map (function, list)
filter (function, list)
[x for x in list if function(x)]
reduce(function(x,y), list)