LANs, IP, and TCP

LANs, IP, and TCP - A summary

This is a short overview of the three "primary" topics of this course. Read this first, and then you can start working through the specific sections of Peterson & Davie and you'll have some idea of where all this is going.

LANs, or Local Area Networks, are the "physical" networks that provide the connection between machines within, say, a school or corporation. LANs are, as the name says, "local", and so the IP, or "internet protocol" layer provides an abstraction for connecting multiple LANs into, well, the Internet. Or smaller closed "internets", as well. Finally, TCP deals with transport and connections and actually sending data. These three topics are often called "layers"; they consitute the "link layer", the "internetwork layer", and the "transport layer" respectively. Together with the "application layer" (the software you use), these form the "four-layer model" for networks. The LAN (link) layer is often split in two, forming a five-layer model. If you've read about networks at all, though, you've probably encountered the notion of the "seven-layer model". This was a standard developed by various international committees, most notably OSI, and the extra two layers are mostly wishful thinking. In the real world, there are only 4-5 layers.

LANs are complemented by "point to point links" that tie them together. Sometimes the distinction between the two isn't too important; normally, when you connect to an Internet Service Provider via a modem, your connection is considered a point-to-point link. It would, however, be legitimate to view all the ISP's links, and the box to which they all connect, as a single LAN; see "Links, Nodes, and Clouds" in Peterson & Davie (p 5).

The primary LAN today is Ethernet, which comes in different speeds. The original Ethernet ran at the then-absurdly-fast rate of 10Mbps, or about one megabyte per second. So-called "fast Ethernet" runs at 100Mbps, and there is a 1Gbps Ethernet available too (used, at this writing, mostly for backbones). In the early days of Ethernet, a typical installation was one "physical" Ethernet; nowadays it is standard to break a physical Ethernet up into multiple smaller ones with "switches",
or bridges, that are active devices that forward traffic from one link to another. The ultimate trend is for Ethernet to become a collection of point-to-point-only links all tied together with switches, further blurring the distinction between LANs and links. We'll see how all this works in the text; section 2.6 covers physical Ethernet while 3.2 discusses switching.

A once-common alternative to Ethernet is Token Ring, which is actually a generic technique that comes in a wide variety of flavors. IBM token ring is perhaps the best known, and was once upon a time widely used in offices. It ran at 4Mbps, later upgraded to 16Mbps. Token ring performs, in theory, much better under heavy load than Ethernet, and so 4Mbps IBM token ring was heralded as ready to take over from 10Mbps Ethernet. This didn't happen, for two reasons: first, 10Mbps Ethernet is actually about as fast as 6-7Mbps token ring; 4Mbps was a distinct notch slower. Second, and probably more important, IBM's terms for licensing token ring were decidedly more expensive than Xerox's terms for licensing Ethernet; during one formative period in the early 90's a typical Ethernet board cost about $100 and a typical Token Ring board cost $400 (today, fall 2000, you can buy quantity-1 Ethernet boards for about $15, and they come in bulk for as little as $5.) Token ring can interoperate with Ethernet, by the way, through switches.

Another LAN alternative, this time with a radically different basis, is ATM, which stands in theory for Asynchronous Transfer Mode. Because it is brought to you by the telephone companies -- and was originally intended to combine data and voice in one network -- and supports a central-office model of connection rather than the fully "distributed" model for early Ethernet, some folks argue it really stands for A Tariffing Mechanism. Another interpretation is, more bluntly, A Terrible Mistake. You'll learn more about ATM in section 3.3; suffice it to say that ATM does work well as a LAN, although proponents claim it can replace the IP and TCP functionality as well, but that it costs 10 to 100 times more than Ethernet.

LANs work by allowing connected machines to send packets, or short messages, to other hosts connected to the same LAN. Packet sizes are typically about 1Kbyte, although ATM uses 48 bytes at the low end and there are others who feel that packets larger than 64Kbytes would be a good thing. A packet consists mostly of data, with a header attached to the front; the header contains various bits (literally) of LAN-specific and delivery-related information. The most important part of the header is a destination address, so the LAN knows to where the packet should be delivered. Each host on the LAN has a unique address. How LANs keep track of addresses, and how senders learn new LAN addresses, is one of the things you'll learn about in, well, for Ethernet, section 2.6.

The main problem with LANs is that they are local. They often come with built-in size/scale limitations, and even if they did not, there are people you'd probably like to communicate with who use some different, incompatible, brand of LAN, or who use the same brand of LAN as you use but between you and them there's some incompatible point-to-point link. Again, the issue of scale is not clear-cut; bridged Ethernet can accomodate entire campuses of up to, say, 100,000 users, and hard-core ATM proponents claim that ATM can scale to support all the world's net traffic (but at a cost that makes this unlikely to say the least). Even if ATM became cost-competitive with Ethernet, it wouldn't do alone until everyone switched, and until that time everyone would still run something else on top (that being IP, below) to permit interoperation, and because of that there would be no incentive for everyone to switch. There's also the issue of address uniqueness; if an address on my LAN is the same as an address on your LAN, then we can't directly interconnect without changing something even though both LANs should work just fine in isolation.

IP

All this brings us to IP, which stands for Internetwork Protocol (or just "Internet Protocol", but at the time IP was created "internet" just meant "internetwork"; there was no modern "Internet"). "Internetworking" means getting different brands and types of LANs (and links) to talk to one another, no more and no less. LANs were the "networks", and "internetworking" just meant "inter-connection of LANs". Nowadays we tend to see large "internetworks" as single "networks" themselves; LANs are seldom regarded as significant standalone entities any more. IP provides a common abstract interface for sending a packet from one machine to another, across a wide range of hodgepodge intermediate LANs and links. Although private IP internets can and do exist, we'll consider only The Internet in this course. IP is the protocol that ties all of the host of the Internet together.

IP works by providing two fundamental things. First, it provides an addressing scheme, by which every host on the Internet can receive an address. These addresses are 4-byte numeric quantities; they are, not surprisingly, known as IP addresses. While IP addresses and LAN addresses play similar roles, they are in principle not related numerically. Ethernet LAN addresses, for example, are "burned in" to Ethernet cards at the factory; IP addresses are assigned administratively according to where the machine in question is connected to the network. IP addresses are globally unique, so as to provide a framework for routing, or forwarding, whereby a packet sent from one host to another can be forwarded from one router to the next along a path to its final destination. Routing and addressing are fundamentally intertwined. Normally the four bytes of an IP addess are written individually, in decimal, separated by dots; the workstation on my desk has IP address 147.126.2.5 for example.

IP addresses are divided into two parts: the network address and the host address. We'll learn more about this division later; at Loyola the first 16 bits denote the network (147.126) and the last 16 bits denote the specific host at Loyola (2.5 for my machine). In principle, every host on the same LAN has the same network portion of the address; the LAN is assigned an IP network address and then the host address bits are assigned by the LAN administrator. Each host on the LAN gets different host bits; this network/host address allocation thus means that every host on the Internet has a unique IP address. If a host moves to another LAN, it gets a new IP address simply because it gets a new network address.

Now let's look at a simple example of how routing works. Machines can be thought of as either hosts (user machines, with a single network connection) or routers (these do packet-forwarding only, and are not directly visible to users, and essentially always have at least two different network interfaces representing different networks that the router is connecting). (Machines can be both hosts and routers, but this is tricky.)

Let's start with the sending host S, delivering to a destination host D. The IP header of the packet contains D's IP address (and, for that matter, S's address). First of all, S must determine whether D is on the same LAN as the sender or not. This is done by looking at the network part of the destination address, which we'll denote by D_net. If this net address is the same as S's (that is, is equal numerically to S_net), then S figures D is on the same LAN as itself, and can do direct delivery. It looks up (never mind how) the appropriate LAN address for D, attaches a LAN header to the packet in front of the IP header, and sends the packet straight to D via the LAN.

If, however, S_net and D_net don't match, then S looks up a router to use. Most ordinary hosts have only a single router to which they connect, making this choice very simple. (If there are multiple routers, S looks up the best one using the router-table algorithm in the following paragraph.) If you are a dialup user, your router is at the other end of your modem connection. S then forwards the packet to the router, again using direct delivery over the LAN. Note that the IP destination address in the packet remains D in this case, although the LAN destination address will be that of R.

The router strips off any LAN address from the incoming packet, but leaves the IP address. It extracts the destination D, and then looks at D_net. The router first checks to see if it is on the same LAN as D; recall that the router connects to at least one additional network besides the one for S. If the answer is yes, then the router does direct delivery to the destination, as above. If, on the other hand, D_net is not one to which the router is connected directly, then the router consults its internal routing table consisting of a list of networks each with an associated next-hop address. Next-hop addresses are chosen so that the router can always reach them via direct LAN delivery; generally they are other routers. The router looks up D_net in the table (generally there is a catchall default entry, so the table doesn't have to be huge), and uses direct LAN delivery to get the packet to the corresponding next-hop machine. The packet's IP header remains essentially unchanged, although the router most likely attaches an entirely new LAN header.

The packet continues being forwarded like this, from router to router, until it finally arrives at a router that is connected to D_net; it is then delivered directly to D.

You can find further information in section 4.1.4 on page 264. There's a sample routing table on pages 266 and 267. Or you can just wait until you get to Chapter 4 following along in sequence.

Just how routers build their <destnet,next-hop> tables will be a major topic itself.

TCP

TCP stands for Transmission Control Program, and it serves as a "transport" layer for application data. IP packets are sent from one host to another, but the routing is a "best-effort" mechanism, which means packets can and do get lost sometimes. Data that does arrive can arrive out of order. The sending application has to keep track of division into packets; that is, buffering. Finally, IP only supports sending to a specific host; normally, one wants to send to a given application running on that host. Email and web traffic, or two different peoples' web sessions, should not be commingled! TCP extends IP with the following services:

reliability: TCP numbers each packet, and keeps track of which are lost and retransmits them after a timeout, and holds early-arriving out-of-order packets for delivery at the correct time.
TCP supports connections
connection-oriented: Once a TCP "connection" is made, data sent over that connection needs no further addressing
stream-oriented: The application can write 1 byte at a time, or 100KB at a time; TCP will buffer and/or divide up the data into appropriate sized packets.

When you launch a web browser, it starts up by opening a TCP connection to your designated Home Page. Subsequent pages are also viewed by opening TCP connections to the appropriate host, and then sending the appropriate request for data (an HTML GET or POST request). TCP is ubiquitous, although the realtime performance of TCP is not predictable and so sound/video types of applications tend to use something different.