Comp 343/443
Fall 2011, LT 412, Tuesday 4:15-6:45
Week 5, Sept 27
Read:
Ch 1, sections 1,2,3 and 5
Ch 2, sections 1, 2, 3, 5, 6
Midterm: October 25
Wi-fi backoff (week 4 notes)
3.1.2: virtual circuit switching
The road not taken by IP.
In VC switching, routers know about end-to-end connections. To send a
packet, a connection needs to be established first. For that
connection, each link is
assigned a "connection ID" (traditionally called the VCI, for Virtual
Circuit Identifier). To send a packet, the host marks the packet with
the VCI assigned to the host--router1 link.
Packets arrive (and depart) routers via one of several ports,
which we will assume are numbered beginning at 0. Routers maintain a
connection table indexed by <VCI,port> pairs. As a packet
arrives, its inbound VCI and inbound port are looked up, and this
produces an outbound <VCIout, portout> pair. The VCI field is then rewritten to VCIout, and the packet is sent via portout.
Note that typically there is no source address information included in
the packet (although the sender can be identified from the connection,
which can be identified from the VCI at any point along the connection). Packets are identified by connection, not destination. Any node along the path (including the endpoints) can look up the connection and figure out the endpoints.
Note also that each switch must rewrite the VCI. Datagram switches never rewrite addresses (though they do rewrite hopcount/TTL fields).
Example: construct VC connections between:
A and F
A and E
A and C
B and D
A--S1-----S2--D
| |
| |
B--S3-----S4----S5---F
| |
C E
I will use the following VCIs. They are chosen more or less randomly
here, but the requirement is that they be unique to each link. Because
links are generally taken to be bidirectional, a VCI used from S1 to S3
cannot be reused from S3 to S1 until the first connection closes.
A to F: A--4--S1--6--S2--3--S3--8--S5--1--F A to E via S2
A to E: A--5--S1--6--S3--3--S4--8--E
Note that this path went via S3, the opposite
corner of the square
A to C: A--6--S1--7--S3--3--C
B to D: B--4--S3--8--S1--7--S2--8--D
Demo: construct the <VCI,port> tables from the above.
The namespace for VCIs is small, and compact (eg contiguous). Typically
the VCI and port bitfields can be concatenated to produce a
<VCI,Port> index suitable for use as an array index. VCIs are local identifiers. (IP addresses are global identifiers.)
IP advantages:
- Routers have less state info to manage
- Router crashes and partial connection state loss are not a problem
- Per-connection billing is very difficult
VC advantages:
- connections can get resource guarantees
- smaller headers / faster throughput
- headers are small enough that virtual circuits are efficient even
for voice lines, where the data might be < 100 bytes. (TCP/IP
headers are a minimum of 40 bytes; voice data of 80 bytes would mean
33% header overhead)
3.1.3: source routing
Never used in the real world, but a conceptual possibility.
ATM cell basics (defer?)
rational for small fixed-size cells:
- minimal queuing delay, at least for high-priority traffic
- simplified hardware design
- simplifies parallelism
- reduced store-and-forward delay
- voice fill-time: voice is generated at 8 bytes / ms
- On average, 1/2 of last cell is wasted on padding, so fixed => small is good
ATM (and cell networks in general):
small cells (typically 5 bytes header + 48 bytes data)
virtual circuits; connection-oriented
(28-bit addresses after connection is established)
Switched point-to-point links; some rings
Note ATM mandates no cell reordering!
This is bad for parallelism
No physical b'cast
Forwarding delay & packet size; cut through
loss of 1 cell destroys packet; need reliable medium
Error correction of Shacham & McKenney [1990]
send N cells and then one of all N XOR'ed together
allows recovery from any one lost cell
Skim. Segmentation/reassembly and AAL 3/4, AAL 5.
SAR/AAL. AAL 1, 2, 3/4, 5
AAL 3/4: we first define a high-level "wrapper" for an IP packet, called the CS-PDU.
we then chop this into as many 44-byte chunks as are needed; each chunk goes into a 48-byte ATM payload, along with
- 2-bit type: 10 begin new message, 00 continue, 01 end of message, 11 single segment
- 4-bit SEQ number, 0-15, good for catching up to 15 dropped cells
- 10-bit MessageID field
- CRC-10 checksum.
9 bytes overhead / 44 bytes data: > 20% overhead
AAL 5: CRC is moved to the CS-PDU and promoted to 32-bits.
MID field is discarded (no one used it, anyway)
A bit from the ATM header is used to indicate:
- 1: start of new CS-PDU
- 0: continuation
The CS-PDU is chopped into 48-byte chunks, which are then used as the
entire body of each ATM cell. 5 bytes overhead / 48 bytes data: 10%
overhead. Errors are detected by the CS-PDU CRC-32. This also detects
lost cells (impossible with a per-cell crc!)
Addressing: VPI/VCI VCI: local use only?
3.3.3: virtual circuits as applied to ATM
store-and-forward of cells v. cut-through for packets
3.4: switching: cut-through v. store-and-forward (not done)
Crossbar switch, other switching fabrics
Section 3.2: IP
4.1: the
goal of IP is to connect all the different LANs into one large
"virtual" LAN. To this end, the primary feature offered by the IP layer
is routing and addressing, which go hand-in-hand.
In terms of the "protocol graph", there are several LAN models that lie
below IP, and several end-to-end transport models above. However, there
are in practice no competitors to IP.
The IP network service model is to act like a LAN. That is, there are no acknowledgements; delivery is generally described as best-effort.
IP routing is based on the idea that, at any given host or router, an
IP address can be divided into the network portion and the host
portion. Classically, this IP address division is as follows:
|
1st bits
|
1st byte
|
byte 1
|
byte 2
|
byte 3
|
byte 4
|
# nets
|
# hosts
|
Class A
|
0
|
0-127
|
net
|
host
|
host
|
host
|
128
|
224
|
Class B
|
10
|
128-191
|
net
|
net
|
host
|
host
|
16384
|
65536
|
Class C
|
110
|
192-223
|
net
|
net
|
net
|
host
|
221
|
256
|
(The underlying idea here was that there would be a small number of
very large networks, a medium number of institution-sized networks, and
a large number of small networks.)
Routing is then based only on the net
portion of the address. A class-B site would represent only a single
routing-table entry in the outside world; only inside the site would
the host bits be taken into account.
This feature is what gives IP routing its great scalability.
While Ethernet also uses datagram routing, Ethernet routing tables must
be much larger, and are not practical for wide-area routing.
Unlike Virtual Circuit routing, IP routers do not rewrite addresses
(although we will come later to Network Address Translation, or NAT,
where this is done). However, IP routers do need to perform some header updates.
IP header fields and what they do
3.1. Internet Header Format (from RFC 791)
A summary of the contents of the internet header follows:
0
1
2
3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL |Type of
Service| Total
Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Identification
|Flags| Fragment Offset
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time to Live
| Protocol
| Header
Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Source
Address
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Destination
Address
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Options
| Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Example Internet Datagram Header
Figure 4.
TOS
fragmentation
TTL, protocol, checksum
options
Ethernet headers have no TTL field. Routing cycles are a calamity, as a
result; one-dimensional routing loops (A->B->A) are banned by the
forwarding algorithm. IP needs a way of catching badly addressed
packets; this is done by decrementing the TTL by 1 at each router, and
then discarding the packet if the TTL reaches 0. Making any change in
the header requires updating the header checksum; this can be done
"algebraically" but it is not hard simply to re-sum the 8 halfwords of
the average header.
Fragmentation and Reassembly
If you are trying to interconnect two LANs (as IP does), what else
might be needed besides Routing and Addressing? IP assumes both LANs
are based on 8-bit bytes (something not universally
true in the early days of IP; to this day the RFCs refer to "octets" to
emphasize this requirement). IP also defines bit-order within a byte,
and it is left to the networking hardware to translate properly. Data
bytes are completely transparent.
There is one more feature IP must provide, however: it must accomodate networks for which the maximum packet size, or maximum transfer unit, MTU,
is smaller. Otherwise, if we were using IP to join IP Token Ring (MTU =
4k) to Ethernet (MTU = 1500), the token-ring packets might be too large
to deliver. (They might not, if the endpoints had been able to
negotiate an appropriate MTU, but this cannot be guaranteed).
So, IP must support fragmentation, and also reassembly. There are a couple major strategies here: per-link fragmentation and reassembly, where the reassembly is done at the opposite end of the link (like ATM), and path
fragmentation and reassembly, where reassembly is done at the far end
of the path. The latter approach is what is taken by IP, partly because
intermediate routers are too busy to do reassembly (this is as true
today as it was thirty years ago), and partly because IP fragmentation
is seen as the strategy of last resort.
When an IP datagram is fragmented, the IDENT field marks fragments of
the same packet, and the Fragment Offset field marks the start position
of this fragment. Note that the start position can be a number up to 216,
the maximum IP packet length, but the FragOffset field has only 13
bits. This is handled by requiring fragments to have sizes a multiple
of 8 (three bits), and left-shifting the FragOffset value by 3 bits
before using it.
Example (where MTUs are excluding the LAN header)
A------MTU 1500---- R1 ------- MTU 1000 -------- R2 ----------MTU 400 ------ B
A sends a packet of 1500 bytes to R1: 20 bytes of IP header and 1480 of data.
R1 fragments into two packets of sizes 20+976 = 996 and 20+504=544.
Having 980 bytes of payload in the first fragment would fit, but
violates the divisible-by-eight rule. The first has FragOffset = 0; the
second has FragOffset = 976.
R1 refragments the first fragment into three packets as follows:
- first: size = 20+376=396, FragOffset = 0
- second: size = 20+376=396, FragOffset = 376
- third: size = 20+224 = 244 (note 376+376+224=976), FragOffset = 752.
R1 refragments the second fragment into two:
- first: size = 20+376 = 396, FragOffset = 976+0 = 976
- second: size = 20+128 = 148, FragOffset = 976+376=1352
Note that it would have been more efficient to have fragmented into
four fragments of sizes 376, 376, 376, and 352 in the beginning. Note
also that the packet format is designed to handle fragments of
different sizes easily. The algorithm is based on multiple
fragmentation with single reassembly.
An Example Reassembly Procedure
(RFC 791)
For each datagram the buffer identifier is computed as the
concatenation of the source, destination, protocol, and
identification fields. If this is a whole datagram (that is both
the fragment offset and the more fragments fields are zero), then
any reassembly resources associated with this buffer identifier
are released and the datagram is forwarded to the next step in
datagram processing.
If no other fragment with this buffer identifier is on hand then
reassembly resources are allocated. The reassembly resources
consist of a data buffer, a header buffer, a fragment block bit
table, a total data length field, and a timer. The data from the
fragment is placed in the data buffer according to its fragment
offset and length, and bits are set in the fragment block bit
table corresponding to the fragment blocks received.
If this is the first fragment (that is the fragment offset is
zero) this header is placed in the header buffer. If this is the
last fragment ( that is the more fragments field is zero) the
total data length is computed. If this fragment completes the
datagram (tested by checking the bits set in the fragment block
table), then the datagram is sent to the next step in datagram
processing; otherwise the timer is set to the maximum of the
current timer value and the value of the time to live field from
this fragment; and the reassembly routine gives up control.
If the timer runs out, the all reassembly resources for this
buffer identifier are released. The initial setting of the timer
is a lower bound on the reassembly waiting time. This is because
the waiting time will be increased if the Time to Live in the
arriving fragment is greater than the current timer value but will
not be decreased if it is less. The maximum this timer value
could reach is the maximum time to live (approximately 4.25
minutes). The current recommendation for the initial timer
setting is 15 seconds. This may be changed as experience with
this protocol accumulates. Note that the choice of this parameter
value is related to the buffer capacity available and the data
rate of the transmission medium; that is, data rate times timer value equals buffer size (e.g., 10Kb/s X 15s = 150Kb).
Finally, any given IP link may
provide its own link-layer fragmentation and reassembly (as ATM links
do). This can be done transparently by the LAN layer (ATM again), or
(less often) with some kind of negotiation by the IP layers.
3.2.4: IP routing
host algorithm: Check IPnet. If it matches our own network, deliver directly via the LAN. Otherwise, send to our designated router.
router:
- Check IPnet. If it matches the network number for any
connected network, deliver via the LAN connected to that interface.
This means somehow looking up the physical address of the destination, and sending it via the interface.
- If there is no match, look up IPnet in the routing table and send to the associated next_hop (which must represent a physically connected neighbor).
- If there was no match in the routing table, and a Default Route is
listed, send it there.
Default routes are hugely important in keeping leaf router tables small.
How a packet traverses layers, with headers; routing
A net
200.3.9 router net
201.4.6 B
|_________________________| |___________________|
200.3.9.5
200.3.9.254
201.4.6.1 201.4.6.7
Actual routers might try in order: host-specific, local, net, default
ARP and DHCP
DHCP; once known as Reverse ARP (RARP)
Minimal network config:
ip addr
subnet mask
default router
DNS server
3.2.5: subnets
This was the first step away from Class A/B/C routing: a large network (eg a class A or B) could be divided into subnets.
For example, the default assumption within a Class B network such as
Loyola's (147.126.0.0/16) is that any packet can be delivered via the
underlying Ethernet to any internal host. This would require a rather
large switched Ethernet (and any other physical LAN would have to be
Ethernet-compatible at the switch level). What if our site has more
than one physical LAN? Or is really too big
for one physical LAN? It did not take
long for the IP world to run into this problem.
Getting a separate IP network number for each subnet is bad for routers, and bad politically.
Subnets support further routing within the IPnet site. "Site" may mean different things, but once the packet has been delivered to the IPnet border router, further routing within the subnet should NOT involve the outside Internet!
Subnets introduce hierarchical routing: first we route to the primary network, then inside that site we route to the subnet, and finally the last hop delivers to the host.
Routing and subnets involves in effect moving the IPnet division line rightward.
Shortly we will consider the other case of moving the line to the left.
For now, note that moving the line rightward within your site does not
affect the outside world at all; outside routers are not even aware of
site-internal subnetting.
To implement subnets, we divide our network into physical LANs, and
assign each a "subnet address": an extended version of the IP network
address. The number of bits in the subnet address we convert to a mask
(that many 1 bits followed by that many 0 bits); every host and router
on a given physical LAN must know this subnet mask.
subnets versus:
- bridging
- multiple IP networks
Basic example: divide a class-B into 256 "class-C subnets", on byte boundaries.
If a node doesn't know it is on a subnet, and a packet arrives destined
for another subnet, it will assume it can deliver the packet locally
and this will fail. Nodes know they
are on subnets by virtue of having a SUBNET MASK. They then use the
mask to decide where to divide IP addresses into net and host parts, rather than the classA-B-C rule. Each interface (each subnet) can have its own subnet mask.
The (common) net portion of the IP addresses on a subnet is called the subnet address or subnet number.
A subnet mask (of length n) is really a 32-bit word, consisting of n
1-bits followed by 0-bits. If M is such a mask and A is an IP address,
then we can use the bitwise AND of A and M, A & M, to in effect
truncate A after n bits (more precisely, to zero out A after n bits).
Comparing A & M to B&M is to compare the first n bits of A and
B.
The class-based IP-address strategy allowed any host anywhere on the
internet to properly separate any address into its net and host
portions. With subnets, this division point is now allowed to vary; for example, the address 147.126.65.47 divides into 147.126 | 65.47 outside of Loyola, but into 147.126.65 |
47 inside. This means that the net-host division is no longer an
absolute property of addresses, but rather something that depends on
where the packet is on its journey.
As a result, division into net versus host on subnets cannot be done for arbitrary packets (eg outbound packets). Howver, that division is not in fact what we need. What is
needed is a way to tell if a given address A belongs to the current
subnet, say B; that is, we need to compare the first n bits of A and B,
where n is the length of the subnet mask. We do this by comparing
A&M to B&M, where M is the mask corresponding to n (and "&" is the bitwise AND operator). A&M is not necessarily the same as Anet, for arbitrary A. However, if A&M == B&M, then A also belongs to the subnet, and then A&M is in fact Anet.
Technically, we also need the requirement that given any two subnet
numbers of disjoint subnets, neither is a proper prefix of the other.
This guarantees that if A is an address and B is a subnet number with
mask M (so B = B&M), then if A&M = B then A does not match any other subnet. Regardless of the net/host division rules, we cannot possibly have subnet 147.126 with host 65.47 and subnet 147.126.65 with different host 47.) (I specify that the subnets are disjoint because 147.126.65/24 is in some sense a proper subset of 147.126/16.)
- Subnets and local delivery
- consequences of misconfigured masks
- variable-sized subnet mask
Exampe: Division of a class-C into subnets of size 70, 40, 25, and 20.