Let's take a look at "machine code".
Typical CPU: executes a sequence of instructions stored in
memory. Execution is in sequence, unless JUMP instruction is
encountered. Some instructions have an ARGUMENT: this can be a memory reference or
a number. Early computers had a single register or accumulator
that held the current value; all operations were done to this
accumulator value (computers shortly added more than one register, but
then bits in the instruction had to be allocated to specifying which
register). Some examples:
LOAD addr
STOR addr
ADD addr // ditto SUB, MUL, DIV
ADD #5 // Add the value '5' to the accumulator
CLEAR // sometimes AND #0
JMP
program_addr
JMZ
program_addr // jump if the result of the previous TEST
was Zero
These are taken from the Pep/7 machine simulator described in Chapter 7
of Dale & Lewis, with a few simplifications. Instructions are one
byte each, but most are then followed by a two-byte operand: an address (or numeric quantity).
The actual machine code is a sequence of specific numeric bytes; generally that is extremely difficult to read. However, machine code has a one-to-one correspondence with the symbolic form, above. The symbolic form is called assembly code, and a program called an assembler translates it into the actual machine code. The one of the tasks for the assembler is to figure out the exact addresses for instructions such as LOAD, ADD, JMP. Note that the address for LOAD and ADD is a data address, while the address in a JMP is a program address, that is, the address of one of the machine-code instructions. Generally, when writing machine code, data addresses are represented symbolically, and program addresses are represented with labels: tags that represent lines in the program. When a data address is specified in Pep/7, usually the two-byte "word" starting at that address is used; in particular, this applies to LOAD and ADD.
Here is assembly code to evaluate 7*X2 + A + 37, and to leave the result in the accumulator.
LOAD X
MUL X
MUL #7
ADD A
ADD #37
The next example represents the machine code for the following loop that adds up 5+4+3+2+1:
sum = 0
n = 5
while n>0:
sum = sum + n
n = n - 1
LOAD #0
STOR sum
// save 0 in sum
LOAD #5
STOR N
LOOP: LOAD N
JMZ DONE ;; jump if N=0
LOAD SUM ;; sum = sum + n
ADD N
STOR SUM
LOAD N
;; n = n - 1
SUB #1
STOR N
JMP LOOP
DONE:
How are we going to store things in a computer?
binary integers: 23 v 10111 v 00010111
Counting in base 2:
0, 1, 10, 11, 100, 101, 110, 111, 1000, ....
Be sure you understand the pattern here!
Reminder about positional notation: in base 10, 2745 = 2 x 103 + 7 x 102 + 4 x 101 + 5 x 100. The same is true in base 2.
Converting from base 2 to base 10: The easiest way is probably to
identify the positions where there is a 1, figure out the corresponding
power of two, and add them:
1 |
0 |
1 |
1 |
1 |
0 |
32 |
8 |
4 |
2 |
So 101110 is 32+8+4+2=46.
You can use the same technique to convert from base 10 to base 2,
except then you're working out all sorts of powers of 10 in base 2 (ten
is 1010, hundred is 1100100), and then doing a lot of base-2
arithmetic. On the assumption that most of us would prefer to do the
arithmetic part in base 10, there are two basic approaches: the big-end
method and the little-end method.
For the big-end method, find the highest power of 2 that is smaller than your number. Subtract it, and write down that power of 2. Keep going. To convert 101, we first note that 128 is too big but 64 = 26 will fit. That leaves 101-64 = 37. The next power of 2 is 25 = 32; subtracting that leaves 37-32=5. The next power of 2 that fits is 22 = 4; that leaves 5-4 = 1 as the last digit. We ended up with 101 = 64+32+4+1; converting to positional notation that's 1100101 (the 0's correspond to 16, 8, and 2).
Most people seem to use the big-end method when converting smallish numbers by hand. Computers, however, typically use the little-end method: keep integer-dividing by 2 until you get to zero, and record a 1 for an odd result, or a 0 for an even. The digits you write down will be the binary digits of the result, right to left. Here's the 101 example; the first column is the successive divisions by 2.
101 |
odd |
1 |
50 |
even |
0 |
25 |
odd |
1 |
12 |
even |
0 |
6 |
even |
0 |
3 |
odd |
1 |
1 |
odd |
1 |
0 |
done |
Now read off the digits from the bottom up: 1100101.
Why does this work? Consider the same deal with base 10: divide by 10
at each step, and instead of odd/even 1/0, just divide by 10 and take
the remainder. That's just the last digit:
|
mod 10 |
2745 |
5 |
274 |
4 |
27 |
7 |
2 |
2 |
0 |
Binary arithmetic
We can add and multiply binary numbers just like base 10.
Negative numbers
This is a good time to remember that a byte holding the value 23 doesn't hold 10111; it holds 0001 0111.
How are we going to represent negative numbers? One approach would
be to reserve the leftmost bit to hold the sign, 0 for + and 1 for -.
Thus, -23 would be 1001 0111. Alas, this means that how we add depends
very much on the sign bit. A simpler strategy (for hardware designers)
is called 2's complement: flip
all the bits, and add 1. This turns out to mean that signed addition is
done exactly the same way as unsigned! This is a big win. (Logically,
flipping all the bits means subtracting from 1111 1111; adding 1 means
that we were really subtracting from 1 0000 0000; that is, subtracting
from 0000 0000; that is, negating.)
Octal and Hex
Some SERIOUSLY heterogeneous lists
z = [[1], [1,2], "hello", 61]
z+z
z*4
[z,z]
[z, [[z]] ]
61 in z
list operators
Miller & Ranum p 10, Table 1:
a[i] a starts with a[0]
a+b
a*3
x in a
len(a)
a[3:6] from a[3] to a[5], 3 =6-3 chars. Note that a starts with a[0]
a[3:], etc
for
range
conditionals: if, else, elif: Miller & Ranum p 24