Intro to computer networks

Comp 343/443, Peter Dordal, April 2006

Packets

Packets are modest-sized buffers of data. Ethernet allows a maximum of 1500 bytes of data; typical TCP/IP packets often hold only 512 bytes of data. While there are proponents of larger packet sizes, larger even than 64KB, it should be noted that the ATM (Asynchronous Transfer Mode) model uses 48 bytes of data per packet. In front of the data is a header (or more likely a series of headers). The header will contain the address of the destination; internal nodes of the network called routers or switches will then make sure that the packet is delivered to the requested destination.

One significant advantage of packets is that the same link can carry, at different times, traffic to different destinations and from different senders. Thus, packets are the key to supporting shared transmission lines. The alternative of a separate line between every pair of machines grows prohibitively complex very quickly.

When a router/switch receives a packet, it (generally) reads in the entire packet before looking at the header to decide to what next node to forward it. This is known as a store-and-forward process, and introduces a delay equal to the time needed to read in the entire packet. For individual packets this forwarding delay can’t be avoided, but if one is sending a long train of packets then by keeping multiple packets en route at the same time one can greatly reduce the significance of the forwarding delay.

Total packet delay is due to the following features:

·        Bandwidth delay, ie to send 1000 Bytes at 20 B/millisecond will take 50 ms.

·        Propagation delay due to the speed of light; ie if you start sending a packet right now on a cable across the US, the first bit won’t arrive at the destination until about 20 ms later. The bandwidth delay then determines how much after that the entire packet will take to arrive.

·        Store-and-forward delay, equal to the sum of the times each router takes to receive a packet. These times in turn depend on the respective link bandwidths.

·        Queuing delay, or waiting in line at busy routers. At bad moments this can exceed 1 sec, though generally is less than 100 ms.

LANs

The simplest network is the LAN (“Local-Area Network”). LANs are the "physical" networks that provide the connection between machines within, say, a school or corporation. LANs are, as the name says, "local", and so the IP, or "internet protocol" layer provides an abstraction for connecting multiple LANs into, well, the Internet. Finally, TCP deals with transport and connections and actually sending data. These three topics are often called "layers"; they consitute the "link layer", the "internetwork layer", and the "transport layer" respectively. Together with the "application layer" (the software you use), these form the "four-layer model" for networks.

The most common type of LAN is Ethernet, due to cost more than anything else. The original Ethernet had a bandwidth of 10 Mbps (megabits per second), though nowadays most Ethernet operates at 100 Mbps and gigabit (1000 Mbps) Ethernet is widely used in server rooms. Wireless LANs are gaining popularity, and may soon supplant wired Ethernet.

Ethernet LANs can be unswitched, where every packet is received by every host and it’s up to the network card in each host to determine if the arriving packet is addressed to that host, or switched, in which intermediate nodes called switches make sure that each packet is delivered only to the host to which it is addressed. Unswitched Ethernet poses a security threat, and “password sniffers” that surreptitiously collected passwords used to be common. A better solution to the eavesdropping problem, however, is regular use of encryption.

On an Ethernet, addresses are six bytes long. The address is burned into each Ethernet card at the time of manufacture; all are (supposedly) unique. The first three bytes determines the manufacturer; the subsequent three bytes are a serial number assigned by that manufacturer. Because addresses are assigned by the hardware, knowing an address doesn’t provide any direct indication of where that address is located on the network. In switched Ethernet, the routers must have a record of every Ethernet address on the network; this becomes unwieldy for large networks. Consider the analogous situation with postal addresses: Ethernet is somewhat like attempting to deliver mail using everyone’s social-security number as address, and providing each postal worker with a large catalog listing each person’s SSN together with their physical location. In contrast, real mail is addressed “hierarchically” using ever-more-precise specifiers: state, city, zipcode, street address, and name / room#. Ethernet, in other words, does not scale well to large sizes. On the other hand, an advantage of having addresses in hardware is that hosts automatically know their own addresses on startup.

There is also a designated broadcast address. A host sending to the broadcast address has its packet received by everyone. This allows a host to contact another host when its hardware address is as yet unknown; typical broadcast queries have forms such as “Will the designated server please answer” or “Will the host with the given IP address please answer”.

Switched Ethernet works quite well, though, up to 10,000-100,000 nodes. Tables of that size are straightforward to manage. Typically a host is entered into the table when it first transmits; a switch notes the packet’s arrival interface and source address and assumes that the same interface is to be used to deliver other packets to that sender. If a host’s address hasn’t been seen yet, Ethernet still has a backup delivery option of broadcast to everyone (like on unswitched Ethernet) and allowing the host Ethernet cards to sort it out. Since this is not generally used for more than one packet (after that, the switches will have learned the correct routing-table entries), security is pretty much a non-issue. The <host,interface> routing table is often easier to think of as <host,next_hop>.

IP - Internet Protocol

To solve the scaling problem with Ethernet, and to allow support for other types of LANs and point-to-point links as well, the Internet Protocol was developed. In the early days, IP networks were considered to be “internetworks” of basic networks (LANs); nowadays users generally ignore LANs and think of the Internet as one large (virtual) network. Perhaps the central issue in the design of IP was to support universal connectivity (everyone can connect to everyone else) in such a way as to allow scaling to enormous size (currently ~108 nodes, although IP should work to 109-1010 nodes), without resulting in unmanageably large routing tables (currently none are larger than 105 nodes, and that seems to be a practical maximum.)

To support universal connectivity, IP provides a global mechanism for addressing and routing, so that packets can actually be delivered from any host to any other host. IP addresses are 4 bytes (32 bits), and are part of the IP header that generally follows the Ethernet header. The Ethernet header only stays with a packet for one hop; the IP header stays with the packet for its entire journey across the Internet. An essential feature of IP addresses isthat they consist of a "network" part and a "host" part. The"legacy" mechanism for network/host allocation was to divide according to ranges by the first byte:

            first byte        network bits     host bits      name                feature
            0-127                  8                    24          class A             few networks, but very very large
            128-191            16                    16          class B             Loyola has (or had) one, net=147.126
            192-223            24                      8          class C             millions of networks, but each one is small

Nowadays the division into network & host is dynamic, and can be made at different positions in the address at different levels of the network.

All hosts with the same network part of the address (same network bits) must be located together (on the same LAN!), and so outside of that site only the network bits are needed to route a packet to the site. The entire point of this network/host separation is that routers list only the network parts of the destination addresses, which saves space. There are currently around 100 million hosts on the Internet, but only 100,000 or so networks. A routing table of 100,000 entries is (just barely) feasible; a table a hundred times larger is not, let alone a thousand times larger. You can think of the network bits as analogous to the “zip code” on postal mail, and the host bits as analogous to the rest of the address. Alternatively, one can think of the network bits as like the area code, and the host bits as like the rest of the phone number. Newer protocols that support different net/host division points at different places in the network allow support for addressing schemes that correspond to, say, zip/street/user, or areacode/exchange/subscriber#.

In addition to routing and addressing, IP must also support the transmission of packets that may be too large for some intermediate physical LAN. For example, two token rings with a 4K max packet size can send and receive packets of size 4KB, but if they are connected by an intervening Ethernet that can only forward packets of size 1.5KB, something has to give. IP handles this by supporting fragmentation. The IP approach is awkward and inefficient, and IP fragmentation is seldom used (instead, IP tends to choose rather small packet sizes (eg 512 bytes) to avoid it). However, fragmentation is essential conceptually,in order for IP to be able to support large packets without knowing anything about the intervening networks.

Note that IP is a "best effort" system; there are no acknowledgements or retransmissions or any of that stuff. We ship it off, and hope it gets there. Most of the time, it does.

Now let's look at a simple example of how IP routing works. Machines can be thought of as either hosts (user machines, with a single network connection) or routers (these do packet-forwarding only, and are not directly visible to users, and essentially always have at least two different network interfaces representing different networks that the router is connecting). (Machines can be both hosts and routers, but this is tricky.)

Let's start with the sending host S, delivering to a destination host D. The IP header of the packet contains D's IP address (and, for that matter, S's address). First of all, S must determine whether D is on the same LAN as the sender or not. This is done by looking at the network part of the destination address, which we'll denote by Dnet. If this net address is the same as S's (that is, is equal numerically to Snet), then S figures D is on the same LAN as itself, and can do direct delivery. It looks up (never mind how) the appropriate LAN address for D, attaches a LAN header to the packet in front of the IP header, and sends the packet straight to D via the LAN.

If, however, Snet and Dnet don't match, then S looks up a router to use. Most ordinary hosts have only a single router to which they connect, making this choice very simple. If you are a dialup, DSL, or cable-modem user, your router is at the other end of your connection. S then forwards the packet to the router, again using direct delivery over the LAN. Note that the IP destination address in the packet remains D in this case, although the LAN destination address will be that of the router.

The router strips off any LAN address from the incoming packet, but leaves the IP address. It extracts the destination D, and then looks at Dnet. The router first checks to see if it is on the same LAN as D; recall that the router connects to at least one additional network besides the one for S. If the answer is yes, then the router does direct delivery to the destination, as above. If, on the other hand, Dnet is not one to which the router is connected directly, then the router consults its internal routing table consisting of a list of networks each with an associated next-hop address. These <net, next_hop> tables compare with switched-Ethernet’s <host,next_hop> tables; the former type will be smaller because there are many fewer nets than hosts. Next-hop addresses are chosen so that the router can always reach them via direct LAN delivery; generally they are other routers. The router looks up Dnet in the table (generally there is a catchall default entry, so the table doesn't have to be huge), and uses direct LAN delivery to get the packet to the corresponding next-hop machine. The packet's IP header remains essentially unchanged, although the router most likely attaches an entirely new LAN header.

The packet continues being forwarded like this, from router to router, until it finally arrives at a router that is connected to Dnet; it is then delivered directly to D.

Just how routers build their <destnet,next-hop> tables is a major topic itself. Small sites have their routers just exchange information with their immediately neighboring routers; tables are built up this way through a sequence of such periodic exchanges. Larger sites (such as large ISPs) add a protocol in which routers propagate information about the state of each link; all routers receive all this information and each one builds and maintains a map of the entire network. The routing table is then constructed (sometimes on demand) from this map. Unrelated organizations exchange information through the Border Gateway Protocol, which allows for a fusion of technical information (which sites are reachable at all, and through where) with “policy” information representing legal or commercial agreements: which outside routers are “preferred”, whose traffic we will carry even if it isn’t to one of our customers, etc.

Before Internet 2, Loyola’s largest router table consisted of maybe a dozen internal routes, plus one “default” route to the outside Internet. Internet 2 is a consortium of research sites with very-high-bandwidth internal interconnections that don’t also carry commercial traffic.

TCP

TCP stands for Transmission Control Program, and it serves as a "transport" layer for application data. IP packets are sent from one host to another, but the routing is a "best-effort" mechanism, which means packets can and do get lost sometimes. Data that does arrive can arrive out of order. The sending application has to keep track of division into packets; that is, buffering. Finally, IP only supports sending to a specific host; normally, one wants to send to a given application running on that host. Email and web traffic, or two different peoples' web sessions, should not be commingled!

TCP extends IP with the following services:

When you enter a site name in a web browser, it starts up by opening a TCP connection to that site; the connection is made to the server’s port 80 (the standard web-traffic port). TCP is ubiquitous, although the realtime performance of TCP is not predictable and so sound/video types of applications tend to use something different.

Sliding-windows is used to keep multiple packets en route at any one time. This minimizes the effect of store-and-forward delays, and propagation delays, as they then only count once for the entire packet stream and not once per packet. The window size represents the number of packets en route at any one time; if the window size is 10, for example, then at any one time 10 packets are out there (probably 5 data packets and 5 returning acknowledgements). As each acknowledgement arrives, the window “slides forward” and the data packet 10 packets ahead is sent. For example, consider the moment when packets 20-29 are in transit. When Ack20 is received, Data30 is sent, and so now 21-30 are in transit. When Ack21 is received, Data31 is sent, so 22-31 are in transit.

The ideal window size is such that it takes one round-trip time to send an entire window, so that the next Ack will always be arriving just as the sender has finished the window. Determining this ideal size, however, is difficult; for one thing, the ideal size varies with network load. As a result, TCP approximates the ideal size. The window size is slowly raised until packet loss occurs, which TCP takes as a sign of network congestion. At that point the window size is reduced to half its previous value, and the slow climb resumes. The effect is a “sawtooth” graph of window size with time, which oscillates (more or less) around the “optimal” window size.

A busy server may have thousands of connections to its port 80 (the web port) and hundreds of connections to port 25 (the email port). Web and email traffic are kept separate by virtue of the different ports used. All those clients, though, are kept separate because each comes from a unique <host,port> pair. A TCP connection is determined by the <host,port> pair at each end; traffic on different connections does not intermingle. That is, there may be multiple independent connections to <www.luc.edu,80>. This is sort of analogous to certain 800 phone numbers, where there can be multiple callers at the same time to the same 800 number. Each call is answered by a different operator (corresponding to a different cpu process), and different calls do not “overhear” each other.

TCP’s automatic acknowledgement system means that data that has to be retransmitted has its delivery delayed until that retransmission succeeds; furthermore, the following data is also delayed. While this works just fine for large file transfers, it doesn’t work quite as well for voice and video. In these forms, timeliness is more important than completeness. A few lost packets mean some voice dropouts (pretty common on cell phones) or flicker/snow on the video screen; both of these are better than stalling out completely. So, as the Internet moves towards more voice (such as Voice over IP, or VOIP, a form of phone service) and video, non-TCP transport mechanisms are an active research agenda.

Firewalls

One problem with having a program on your machine listening on an open TCP port is that bad guys may connect and, using some known flaw in the software on your end, do something bad to your machine. Damage can range from the unintended downloading of personal data to compromise and takeover of your entire machine, making it a distributor of viruses and worms or a steppingstone in later breakins of other machines. An attack known as buffer overflow, where some internal memory location fills with network-supplied data and overflows, overwriting parts of the program itself (that are later executed), is the most common (and most serious, when it works) form of total-compromise attacks.

A firewall is a program to block such outside connections. Generally ordinary workstations don’t ever need to accept connections from the Internet; client machines instead initiate connections to better-protected servers. So blocking incoming connections works pretty well; when necessary (eg for games) certain ports can be selectively unblocked.

The original firewalls were routers. Incoming traffic to certain ports was blocked for servers; for non-servers, often all inbound connections were blocked. This allowed unprotected machines to operate reasonably safely; however, it also meant that client machines could not accept connections no matter what. Nowadays per-machine firewalls are common: you can configure your machine not to accept inbound connections to most (or all) ports regardless of whether software on your machine requests such a connection.

Network Address Translation

Another way of protecting ordinary machines is Network Address Translation, or NAT. This also conserves IP addresses. Instead of assigning each host a “real” IP address, instead just one address is assigned to a border router. Internal machines get “internal” IP addresses, typically of the form 10.x.y.z (addresses beginning with 10 can’t be used on the Internet). Inbound connections to the hidden machines are banned, as with firewalls. When a hidden machine wants to connect to the outside, the NAT router intercepts the connection, allocates a new port, and makes the connection itself using that new port and its own IP address. The remote machine responds, and the NAT box remembers the connection and forwards data to the correct hidden host. Done properly, NAT improves the security of a site. It also allows multiple machines to share a single IP address, and so is popular with home users who have a single broadband connection but who wish to use multiple computers.