Computer Networks Week 5 Corboy Law 522
Read:
Chapter 1: 1.1-1.3, 1.5
Chapter 2: 2.1-2.6, 2.8.2 (wi-fi)
3.1.2: virtual circuit switching
The road not taken by IP.
In VC switching, routers know about end-to-end connections. To send a
packet, a connection needs to be established first. For that
connection, each link is
assigned a "connection ID" (traditionally called the VCI, for Virtual
Circuit Identifier). To send a packet, the host marks the packet with
the VCI assigned to the host--router1 link.
Packets arrive (and depart) routers via one of several ports,
which we will assume are numbered beginning at 0. Routers maintain a
connection table indexed by <VCI,port> pairs. As a packet
arrives, its inbound VCI and inbound port are looked up, and this
produces an outbound <VCIout, portout> pair. The VCI field is then rewritten to VCIout, and the packet is sent via portout.
Note that typically there is no source address information included in
the packet (although the sender can be identified from the connection,
which can be identified from the VCI at any point along the connection). Packets are identified by connection, not destination. Any node along the path (including the endpoints) can look up the connection and figure out the endpoints.
Note also that each switch must rewrite the VCI. Datagram switches never rewrite addresses (though they do rewrite hopcount/TTL fields).
Example: construct VC connections between:
A and F
A and E
A and C
B and D
A--S1-----S2--D
| |
| |
B--S3-----S4----S5---F
| |
C E
I will use the following VCIs. They are chosen more or less randomly
here, but the requirement is that they be unique to each link. Because
links are generally taken to be bidirectional, a VCI used from S1 to S3
cannot be reused from S3 to S1 until the first connection closes.
A to F: A--4--S1--6--S2--3--S3--8--S5--1--F A to E via S2
A to E: A--5--S1--6--S3--3--S4--8--E
Note that this path went via S3, the opposite
corner of the square
A to C: A--6--S1--7--S3--3--C
B to D: B--4--S3--8--S1--7--S2--8--D
Demo: construct the <VCI,port> tables from the above.
The namespace for VCIs is small, and compact (eg contiguous). Typically
the VCI and port bitfields can be concatenated to produce a
<VCI,Port> index suitable for use as an array index. VCIs are local identifiers. (IP addresses are global identifiers.)
IP advantages:
- Routers have less state info to manage
- Router crashes and partial connection state loss are not a problem
- Per-connection billing is very difficult
VC advantages:
- connections can get resource guarantees
- smaller headers / faster throughput
- headers are small enough that virtual circuits are efficient even
for voice lines, where the data might be < 100 bytes. (TCP/IP
headers are a minimum of 40 bytes; voice data of 80 bytes would mean
33% header overhead)
3.1.3: source routing
Never used in the real world, but a conceptual possibility.
3.3: ATM cell basics (defer?)
rational for small fixed-size cells:
- minimal queuing delay, at least for high-priority traffic
- simplified hardware design
- simplifies parallelism
- reduced store-and-forward delay
- voice fill-time: voice is generated at 8 bytes / ms
- On average, 1/2 of last cell is wasted on padding, so fixed => small is good
ATM (and cell networks in general):
small cells (typically 5 bytes header + 48 bytes data)
virtual circuits; connection-oriented
(28-bit addresses after connection is established)
Switched point-to-point links; some rings
Note ATM mandates no cell reordering!
This is bad for parallelism
No physical b'cast
Forwarding delay & packet size; cut through
loss of 1 cell destroys packet; need reliable medium
Error correction of Shacham & McKenney [1990]
send N cells and then one of all N XOR'ed together
allows recovery from any one lost cell
3.3.2: Skim. Segmentation/reassembly and AAL 3/4, AAL 5.
SAR/AAL. AAL 1, 2, 3/4, 5
AAL 3/4: we first define a high-level "wrapper" for an IP packet, called the CS-PDU.
we then chop this into as many 44-byte chunks as are needed; each chunk goes into a 48-byte ATM payload, along with
- 2-bit type: 10 begin new message, 00 continue, 01 end of message, 11 single segment
- 4-bit SEQ number, 0-15, good for catching up to 15 dropped cells
- 10-bit MessageID field
- CRC-10 checksum.
9 bytes overhead / 44 bytes data: > 20% overhead
AAL 5: CRC is moved to the CS-PDU and promoted to 32-bits.
MID field is discarded (no one used it, anyway)
A bit from the ATM header is used to indicate:
- 1: start of new CS-PDU
- 0: continuation
The CS-PDU is chopped into 48-byte chunks, which are then used as the
entire body of each ATM cell. 5 bytes overhead / 48 bytes data: 10%
overhead. Errors are detected by the CS-PDU CRC-32. This also detects
lost cells (impossible with a per-cell crc!)
Addressing: VPI/VCI VCI: local use only?
3.3.3: virtual circuits as applied to ATM
store-and-forward of cells v. cut-through for packets
3.4: switching: cut-through v. store-and-forward (not done)
Crossbar switch, other switching fabrics
Chapter 4: IP
4.1: the
goal of IP is to connect all the different LANs into one large
"virtual" LAN. To this end, the primary feature offered by the IP layer
is routing and addressing, which go hand-in-hand.
In terms of the "protocol graph", there are several LAN models that lie
below IP, and several end-to-end transport models above. However, there
are in practice no competitors to IP.
The IP network service model is to act like a LAN. That is, there are no acknowledgements; delivery is generally described as best-effort.
IP routing is based on the idea that, at any given host or router, an
IP address can be divided into the network portion and the host
portion. Classically, this IP address division is as follows:
|
1st bits
|
1st byte
|
byte 1
|
byte 2
|
byte 3
|
byte 4
|
# nets
|
# hosts
|
Class A
|
0
|
0-127
|
net
|
host
|
host
|
host
|
128
|
224
|
Class B
|
10
|
128-191
|
net
|
net
|
host
|
host
|
16384
|
65536
|
Class C
|
110
|
192-223
|
net
|
net
|
net
|
host
|
221
|
256
|
(The underlying idea here was that there would be a small number of
very large networks, a medium number of institution-sized networks, and
a large number of small networks.)
Routing is then based only on the net
portion of the address. A class-B site would represent only a single
routing-table entry in the outside world; only inside the site would
the host bits be taken into account.
This feature is what gives IP routing its great scalability.
While Ethernet also uses datagram routing, Ethernet routing tables must
be much larger, and are not practical for wide-area routing.
Unlike Virtual Circuit routing, IP routers do not rewrite addresses
(although we will come later to Network Address Translation, or NAT,
where this is done). However, IP routers do need to perform some header updates.
IP header fields and what they do
3.1. Internet Header Format (from RFC 791)
A summary of the contents of the internet header follows:
0
1
2
3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL |Type of
Service| Total
Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Identification
|Flags| Fragment Offset
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time to Live
| Protocol
| Header
Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Source
Address
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Destination
Address
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Options
| Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Example Internet Datagram Header
Figure 4.
TOS
fragmentation
TTL, protocol, checksum
options
Ethernet headers have no TTL field. Routing cycles are a calamity, as a
result; one-dimensional routing loops (A->B->A) are banned by the
forwarding algorithm. IP needs a way of catching badly addressed
packets; this is done by decrementing the TTL by 1 at each router, and
then discarding the packet if the TTL reaches 0. Making any change in
the header requires updating the header checksum; this can be done
"algebraically" but it is not hard simply to re-sum the 8 halfwords of
the average header.
Fragmentation and Reassembly
If you are trying to interconnect two LANs (as IP does), what else
might be needed besides Routing and Addressing? IP assumes both LANs
are based on 8-bit bytes (something not universally
true in the early days of IP; to this day the RFCs refer to "octets" to
emphasize this requirement). IP also defines bit-order within a byte,
and it is left to the networking hardware to translate properly. Data
bytes are completely transparent.
There is one more feature IP must provide, however: it must accomodate networks for which the maximum packet size, or maximum transfer unit, MTU,
is smaller. Otherwise, if we were using IP to join IP Token Ring (MTU =
4k) to Ethernet (MTU = 1500), the token-ring packets might be too large
to deliver. (They might not, if the endpoints had been able to
negotiate an appropriate MTU, but this cannot be guaranteed).
So, IP must support fragmentation, and also reassembly. There are a couple major strategies here: per-link fragmentation and reassembly, where the reassembly is done at the opposite end of the link (like ATM), and path
fragmentation and reassembly, where reassembly is done at the far end
of the path. The latter approach is what is taken by IP, partly because
intermediate routers are too busy to do reassembly (this is as true
today as it was thirty years ago), and partly because IP fragmentation
is seen as the strategy of last resort.
When an IP datagram is fragmented, the IDENT field marks fragments of
the same packet, and the Fragment Offset field marks the start position
of this fragment. Note that the start position can be a number up to 216,
the maximum IP packet length, but the FragOffset field has only 13
bits. This is handled by requiring fragments to have sizes a multiple
of 8 (three bits), and left-shifting the FragOffset value by 3 bits
before using it.
Example (where MTUs are excluding the LAN header)
A------MTU 1500---- R1 ------- MTU 1000 -------- R2 ----------MTU 400 ------ B
A sends a packet of 1500 bytes to R1: 20 bytes of IP header and 1480 of data.
R1 fragments into two packets of sizes 20+976 = 996 and 20+504=544.
Having 980 bytes of payload in the first fragment would fit, but
violates the divisible-by-eight rule. The first has FragOffset = 0; the
second has FragOffset = 976.
R1 refragments the first fragment into three packets as follows:
- first: size = 20+376=396, FragOffset = 0
- second: size = 20+376=396, FragOffset = 376
- third: size = 20+224 = 244 (note 376+376+224=976), FragOffset = 752.
R1 refragments the second fragment into two:
- first: size = 20+376 = 396, FragOffset = 976+0 = 976
- second: size = 20+128 = 148, FragOffset = 976+376=1352
Note that it would have been more efficient to have fragmented into
four fragments of sizes 376, 376, 376, and 352 in the beginning. Note
also that the packet format is designed to handle fragments of
different sizes easily. The algorithm is based on multiple
fragmentation with single reassembly.
An Example Reassembly Procedure
(RFC 791)
For each datagram the buffer identifier is computed as the
concatenation of the source, destination, protocol, and
identification fields. If this is a whole datagram (that is both
the fragment offset and the more fragments fields are zero), then
any reassembly resources associated with this buffer identifier
are released and the datagram is forwarded to the next step in
datagram processing.
If no other fragment with this buffer identifier is on hand then
reassembly resources are allocated. The reassembly resources
consist of a data buffer, a header buffer, a fragment block bit
table, a total data length field, and a timer. The data from the
fragment is placed in the data buffer according to its fragment
offset and length, and bits are set in the fragment block bit
table corresponding to the fragment blocks received.
If this is the first fragment (that is the fragment offset is
zero) this header is placed in the header buffer. If this is the
last fragment ( that is the more fragments field is zero) the
total data length is computed. If this fragment completes the
datagram (tested by checking the bits set in the fragment block
table), then the datagram is sent to the next step in datagram
processing; otherwise the timer is set to the maximum of the
current timer value and the value of the time to live field from
this fragment; and the reassembly routine gives up control.
If the timer runs out, the all reassembly resources for this
buffer identifier are released. The initial setting of the timer
is a lower bound on the reassembly waiting time. This is because
the waiting time will be increased if the Time to Live in the
arriving fragment is greater than the current timer value but will
not be decreased if it is less. The maximum this timer value
could reach is the maximum time to live (approximately 4.25
minutes). The current recommendation for the initial timer
setting is 15 seconds. This may be changed as experience with
this protocol accumulates. Note that the choice of this parameter
value is related to the buffer capacity available and the data
rate of the transmission medium; that is, data rate times timer value equals buffer size (e.g., 10Kb/s X 15s = 150Kb).
Finally, any given IP link may
provide its own link-layer fragmentation and reassembly (as ATM links
do). This can be done transparently by the LAN layer (ATM again), or
(less often) with some kind of negotiation by the IP layers.
4.1.4: IP routing
host algorithm: Check IPnet. If it matches our own network, deliver directly via the LAN. Otherwise, send to our designated router.
router:
- Check IPnet. If it matches the network number for any
connected network, deliver via the LAN connected to that interface.
This means somehow looking up the physical address of the destination, and sending it via the interface.
- If there is no match, look up IPnet in the routing table and send to the associated next_hop (which must represent a physically connected neighbor).
- If there was no match in the routing table, and a Default Route is
listed, send it there.
Default routes are hugely important in keeping leaf router tables small.
How a packet traverses layers, with headers; routing
A net
200.3.9 router net
201.4.6 B
|_________________________| |___________________|
200.3.9.5
200.3.9.254
201.4.6.1 201.4.6.7
Actual routers might try in order: host-specific, local, net, default
ARP and DHCP
How does a router (or host) find the physical address of a neighbor on
the same network? The Address Resolution Protocol (ARP) is generally
used. Alternatives: polling, link-layer notice, IP-layer notice
Basic ideas: broadcast request/unicast reply, cache
Timeout: used to be ~10 minutes (20 min for early linux),
now is much less (linux 2.4 arp timeout is ~60 seconds)
see: http://www.cs.helsinki.fi/linux/linux-kernel/2002-07/0179.html
ip -s neigh
finer points:
If A arps "where is B?"
1. B always puts A in its cache
2. All hosts with A in their cache update the entry
Self-arp, or gratuitous arp: detects duplicates, ethernet address
changes send an arp request for yourself (and hope you don't get
answers!)
Flooding:
what if A tries to send 100 packets to B; how many ARPs?
A b'casts, everyone replies & needs to ARP to get A's addr
ARP and networks without b'cast [eg ATM]
Failure in presence of looping
security implications of ARP
proxy arp
detecting sniffers:
To find out if host A is in promiscuous mode, send an ARP "who-has A?" query. Address it not to the broadcast Ethernet address, though, but to some nonexistent Ethernet address.
If promiscuous mode is off, A's NI will reject the packet.
If promiscuous mode is on, A's NI will pass the arp request to A itself, which will probably answer it.
Alas, linux kernels reject at the software level arp queries to physical ethernet addresses other than our own.
BUT: they do respond to faked Ethernet multicast addresses.
Windows: try Ethernet addresses ff:ff:ff:00:00:00 or ff:ff:ff:ff:ff:fe
You can ping A's actual IP address (which requires a separate ping for
each host, and a prior scan to find all the hosts), or try pinging the
IP b'cast address (all 1's in the host part).
DHCP; once known as Reverse ARP (RARP)
You b'cast your Ethernet addr, and hope a DHCP server finds it and
sends you your IP address. Also other helpful startup options!!!
subnet mask
default router
DNS Server
DHCP and servers: who's going to update the DNS entries?
Minimal network config:
ip addr
subnet mask
default router
DNS server
4.3.1: subnets
Just outlined