Chapter 4: IP and Routing

The main topic here is how IP works; 4.1 contains an overview of IP itself, 4.2 discusses how IP routers communicate and build their routing tables, and 4.3 extends these ideas to, as the section is titled, the "Global Internet."

We'll omit section 4.4 on multicast.

Section 4.1

Figure 4.2 on page 249 is an important illustration of how IP is used to join different kinds of physical networks (eg Ethernet, FDDI (a kind of token ring), and PPP (point-to-point protocol)). The "I" in IP stands for Internetworking, and that's exactly what IP does: interconnects different physical LANs into one network.

To do this, IP must provide a global mechanism for addressing and routing, so that packets can actually be delivered from any host to any other host. IP addresses are 32 bits, and space for them is shown in the header diagram on page 252 (Fig 4.3). A more detailed discussion of IP addresses begins on page 262, in section 4.1.3. An essential feature of IP addresses is that they consist of a "network" part and a "host" part. 4.1.3 lists the "legacy" mechanism for network/host allocation; the current strategy today is discussed later in 4.3.2. The entire point of this network/host separation is that routers list only the network parts of the destination addresses, which saves space. There are currently around 5 million hosts on the Internet, but only 50,000 or so networks. A routing table of 50,000 entries is (just barely) feasible; a table a hundred times larger is not.

In addition to routing and addressing, IP must also support the transmission of packets that may be too large for some intermediate physical LAN. For example, two token rings with a 4K max packet size can send and receive packets of size 4KB, but if they are connected by an intervening Ethernet that can only forward packets of size 1.5KB, something has to give. IP handles this by supporting fragmentation. This is summarized starting on page 253. The IP approach is awkward and inefficient, and IP fragmentation is seldom used (instead, IP tends to choose rather small packet sizes (eg 512 bytes) to avoid it). However, fragmentation is essential conceptually, in order for IP to be able to support large packets without knowing anything about the intervening networks.

Note that IP is a "best effort" system; there are no acknowledgements or retransmissions or any of that stuff. We ship it off, and hope it gets there. Most of the time, it does. 4.1.2 discusses this "service model" at more length.

In section 4.1.4 the text returns to the details of how IP forwarding works.

4.1.5 discusses ARP, the protocol used if a host knows the desired local IP address and needs to find out the Ethernet address.

4.1.6, which you can skip entirely, discusses automatic IP address assignment to hosts, by a DHCP server.

4.1.7 discusses ICMP, the Internet Control Message Protocol. You should at least skim this, because new ICMP messages will be introduced later in the text.

Skip 4.1.8.

Study questions:

Suppose IP didn't support fragmentation, but instead simply required that all participating physical LANs supported a packet size of at least 512 bytes. How would this change things? (This model is actually relatively close to the truth, since fragmentation is avoided so strenuously.)

How does IP forwarding (section 4.1.4) differ from Ethernet switch forwarding? Both are instances of the datagram-forwarding strategy, presented in 3.1.1.

Suppose you wanted to connect two Ethernet LANs, Eth1 and Eth2. You have two routers, one on each LAN, and a Token Ring link joining them. What would you have to do to get a packet from host A on Eth1 to host B on Eth2? Assume everything has to be done "manually". To where does host A's Ethernet actually deliver the packet, initially?

Exercises:

1    IP addresses come one per interface
11  ARP cache algorithms. Hint: the bad case for (a) is when we send a bunch of packets before the first packet's ARP query comes back. For each of the subsequent packets in the bunch, the Ethernet destination address is not in the cache.
27    IP routers v learning bridges. Ignore (a) versus (b) though; just describe the differences generally.

Section 4.2 - IP routing: distance-vector and shortest-path-first

Technically, forwarding is the name for sending a packet along its path, and routing refers to the process of building and maintaining routing tables. Section 4.1.1 discusses the abstraction used in the rest of this section: we'll ignore hosts and IP networks and simply focus on getting packets from one router to another, destination, router.

4.1.2 discusses the primary "small-scale" algorithm/protocol used today: distance-vector, or RIP (Routing Information Protocol). The normal case proceeds straightforwardly; some abberant cases (and their fixes) are discussed beginning with the last paragraph on page 287. There is some debate today whether or not these fixes have, effectively anyway, completely solved the problem.

Skip the subsections titled "Implementation" (p 288) and "Routing Information Protocol" (p 290).

The Link State algorithm/protocol discussed in 4.2.3 is the primary "large-scale" or "large-site" algorithm used today; it is used internally at most large Internet Providers. Read up to the green text on page 298, through the example; skip the subsection beginning on page 298 titled "The Open Shortest Path First Protocol (OSPF)".

Internet providers share routing information between each other via BGP, discussed in 4.3.3.

Skip 4.2.4 and 4.2.5.

Study Questions

How does IP forwarding differ from Ethernet-switch forwarding? How does IP routing differ from the table-learning mechanism used by Ethernet switches? Are there any similarities?

Exercises

12    Build a routing table using distance-vector
13    Build a routing table using link-state
14    Given the distance-vector routing tables, reconstruct the network topology
15    Slightly different distance-vector problem
19    Split horizon technique for avoiding "slow convergence to infinity"
23    Link-state reliable-flooding issue

Section 4.3 - IP and the Internet

4.3.1 The original IP specification assumed that one IP network number (that is, the network part of an IP address) corresponded to one (and only one) physical LAN. This is fine for a class-C IP network number, where there are only 255 hosts allowed and this is a reasonable number for a single LAN. But for a class-B IP network (eg Loyola, with 147.126.x.y) this leaves one LAN with 216 = 65536 hosts, which is a huge number. Most LANs cannot scale to that large.
Enter subnets, which provides a way to take a class-B IP network and subdivide it into many multiple LANs. Subnetting means that some of the host bits, in some contexts, are interpreted as additional network bits.

Here is the Loyola situation. Loyola has a class-B IP network address, 147.126.x.y. The x.y are the host bits, officially. However, within Loyola, the third byte, x, is used as a subnet number; x=2 and x=3 are used by the Computer Science department, x=1 is used by the computer center, and x=36 is used by the dorm room computers. Internal Loyola routers use the first three bytes in their forwarding tables. Hosts attempt direct Ethernet delivery only if the first three bytes match. Outside Loyola, only the first two bytes are used in routing tables. Another way to look at the situation is that, within Loyola, it looks like Loyola has 255 class-C networks. Outside, it looks like Loyola has a single, class-B, network.

Subnetting does not make more addresses available; bits used for the subnet number come out of the host bits. Subnetting does, however, make network addressing more flexible.

Subnetting, in effect, moves the network/host division point to the right of where it would be based only on the official class-A/B/C rule. A subnet mask is used to keep track of the new position.

4.3.2 In the mid-90's the Internet was running out of IP addresses. Class C allows only 255 hosts; most organizations want more than a class C. There are only 16,000 possible class-B addresses. There are more than 16,000 organizations that want such addresses! An organization can take multiple class-C's instead of a class B, but then the organization appears multiple times in the backbone tables, and the backbone tables become larger. In the mid-90s the backbone tables reached 40-50,000 entries and a crisis loomed. The tables were about as large as they could get.

Around this time, CIDR was introduced. This stands for "Classless internet domain routing", with the emphasis on "classless". The old address classes A, B, and C were (selectively) abandoned. Instead, multiple class C's could be bundled together into a single routing-table entry, via CIDR or "supernets". This strategy is just like subnets, except the network/host division point is being moved to the left. This means that an organization can buy an address block consisting of a bundle of class C's, of exactly the right size (well, block sizes are always a power of 2).

Another use of CIDR has been the rise of provider-based addressing. Loyola is 147.126.0.0; suppose that 147.0.0.0 meant the Chicago area (it doesnt). In this picture, 147.124 might be Northwestern, 147.125 might be DePaul, and 147.127 might be IIT. Real provider-based addressing is done by assigning to internet service providers large (very large) chunks of Class C addresses. The provider then subdivides the address blocks and allocates them to customers; outside the providers' routers, only a single entry (for that provider) need be made in any other organization's routing table. This has proven to be a huge savings, allowing much smaller routing tables. In theory, one only needs one entry in the backbone routing table for each top-level internet service provider.

4.3.3 Within an organization, we saw in 4.2 that one could use either distance-vector or link-state routing. Between organizations, however, neither is practical: the organizations would have to agree to use the same protocol, and these routing protocols just aren't universal. Enter BGP, which is a universal routing protocol, of a sorts.

The tradeoff with BGP is that it does not have any way to specify distance information at all. All it can convey is reachabilitiy. Optimal routes are not even attempted.

RIP uses optimal routes, and optimality serves as a guarantee that there are no routing loops: a route with a loop in it cannot be the shortest one possible! BGP can't guarantee optimality, but loops would still be a problem. So BGP deals with them another way: by exchanging full path information. The paths aren't lists of each router traversed, though, but just of each organization (autonomous system).

Exercises

24    BGP
25    BGP
32    Bridge-routers and subnets
33    subnet allocation
38    Basic CIDR
39    CIDR with more complicated interconnections