Basic introduction to Ethernet

Peter Dordal, Loyola Univ Chicago, 1996-98

Ethernet - Sources: 2.2 of Stevens, or 3.6 of Peterson & Davie, ch. XX of Nemeth, Snyder, Seebass; Digital-Intel-Xerox 1982 standard.

An Ethernet is a broadcast bus, which means that all packets are, at the physical level, broadcast onto the medium and can be seen, theoretically, by all other nodes (as we will see later, this can have significant security implications). The basic medium is a straight conductor, or bus, although this can be modified with repeaters (hubs) into an arbitrary tree structure. If two stations transmit at the same time, the signals will collide, and interfere with one another. Both transmissions fail as a result. In order to minimize collision loss, each station must implement the following:

1. Before transmission, wait for the line to become quiet
2. While transmitting, continually monitor the line for signs that a collision has occurred; if a collision happens, then cease
3. If a collision occurs, use a backoff-and-retransmit strategy

These properties can be summarized with the CSMA/CD acronym: Carrrier Sense, Multiple Access, Collision Detect. (Carrrier sense means "wait for quiet before transmitting"; there is no literal carrier frequency to be sensed.)

Ethernet comes in version 1 [1980, DEC-Intel-Xerox], version 2 [1982, DIX], and IEEE 802.3. There are some minor electrical differences between these, and one rather substantial packet-format difference. In addition to these, the Berkeley Unix trailing-headers packet format was used for a while; it is now obsolete. The original paper describing Ethernet was Metcalfe and Boggs, 1976.

There are three physical formats for Ethernet cable: thick coax (10base5), thin coax (10base2), and twisted pair (10baseT). Thick coax was the original; economics drove the successive development of the later two. Connections are made into thick coax via taps plus transceivers and into thin coax via T-connectors. Twisted-pair does not allow mid-cable stations; it is only used for single host-to-hub connections. All three formats can interconnect, although only through repeaters, and all use the same 10 megabit/sec transmission speed (a megabit here is 1,000,000 bits, not 2^20 bits). Coax cable must have a terminating resistor at each end; this is implicit in twisted-pair cable.

Here is the format of a typical Ethernet packet (DIX specification)

     +---------------+--------------+----+---------------------+-----+
     |   dest addr   |   src addr   |type|        data         | CRC |
     +---------------+--------------+----+---------------------+-----+

The destination and source addresses are 48-bit quantities; the type is 16 bits, the data length is variable, and the final CRC checksum is 32 bits. The checksum is added by the ethernet hardware, never by the host software. There is also a preamble: a block of 1 bits followed by a 0, in the front of the packet, for synchronization.

The type field identifies the next higher protocol layer; here are some sample values: 8137 = IPX, 0800 = IP, etc.

Each Ethernet board has a (hopefully unique) physical address in ROM; by default any packet sent to this address will be received by the board and passed up to the host system. Packets addressed to other physical addresses will be seen by the board, but ignored (by default). All Ethernet devices also agree on a broadcast address of all 1's: a packet sent to the broadcast address will be delivered to all attached hosts. It is often possible to change the physical address of a given board in software; it is also generally possible to put a given board into promiscuous mode, which means that all packets on the net, no matter what the destination address, are delivered to the attached host. This mode was originally there for diagnostic purposes, but is now best known for the security breach it opens: it is not unusual to find a host with network board in promiscuous mode and with a process collecting the first 100 bytes (presumably including userid and password) of every telnet connection.

Another category of addresses is multicast.

The first bit of the physical address indicates whether the address is physical or multicast; the second bit is supposed to indicate, in the case of physical addresses, whether the address is supposed to be globally unique or if it is only locally unique. Some proposals for TCP would require that hosts have globally unique "Endpoint IDentifiers", or EIDs; the Ethernet physical address would qualify only if it were globally unique. When Ethernet IDs are assigned by the manufacturer, the first three bytes serve to indicate the manufacturer.

The diameter of an ethernet is the maximum distance between any pair of stations. Note that the actual total length of cable can be much greater than this, if, for example, the topology is a "star" configuration. The maximum allowed diameter, measured in bits, is limited to 232 (a sample "budget" for this is below). This makes the round-trip-time 464 bits. Adding 48 bits for the maximum "jam time" gives 512 bits (64 bytes) as the slot time of an Ethernet. The value of the slot time determines several subsequent aspects of Ethernet. If a station has transmitted for one slot time, then no collision can occur (unless there is a hardware error) for the remainder of that packet. This is because one slot time is enough time for any other station to have realized that the first station has started transmitting, so after that time they will wait for the first station to finish. Thus, after one slot time a station is said to have acquired the network. The slot time is also used as the basic interval for retransmission scheduling, below.

Delay budget:

Here are typical maximum values for the delay due to various components. These are taken from the Digital-Intel-Xerox (DIX) standard of 1982, except that "point-to-point link cable" is replaced by standard cable. The DIX specification allows 1500m of coax with two repeaters and 1000m of point-to-point cable; I've used 2500m of coax + 4 repeaters here, following the later IEEE 802.3 Ethernet specification. I've also simplified someof the more obscure delays. These are one-way delay times, in bits. The maximum path may have four repeaters, and ten transceivers each with drop cable (two per repeater, plus one at each endpoint).

item            size    delay           explanation
coax            2500M   110 bits        23 bits/meter (.77c)
transceiver
cables          500M     25 bits        19.5 bits/meter (.65c)
transceivers             40 bits        max 10, 4 bits each
repeaters                25 bits        max 4,  6+ bits each (DIX 7.6.4.1)
encoders                 20 bits        max 10, 2 bits each
(generate signal)
Total                   220 bits        should be 232

Some of these are high, but there are also signal rise time delays, sense delays, and timer delays that I've omitted. It works out fairly closely.

If you were designing a new CSMA/CD protocol, you would choose the physical configuration rules (which determine the network diameter and thus the slot time) and the minimum packet size so that slot_time <= min_packet_size. For example, if you took the physical cabling rules for 10mbps ethernet and tried to upgrade the speed tenfold to 100mbps, you would need to require a minimum packet size of at least 640 bytes (because the time to send 512 bits has now become the time needed to send 5120 bits). If this were too large, you would have to reduce the network diameter.

The signal loss in any single segment of cable is limited to 8.5 db, or about 14% of original strength. Repeaters will restore the signal to its original strength. The reason for the per-segment length restriction is that Ethernet requires a strict limit on how much the remote signal can be allowed to lose strength. It is possible for a station to detect and reliably read very weak remote signals, but not at the same time that it is transmitting locally. This is exactly what must be done, though, for collision detection: remote signals must arrive with sufficient strength to be heard even while the receiving station is itself transmitting. Note that the per-segment limit, then, has nothing to do with the overall length limit; the latter is set only to ensure that a sender is guaranteed of detecting a collision, even if it sends the minimum-sized packet.

If a collision is going to occur, it will occur within one slot time (more exactly, within 2*diameter) of the network. This gives time for the initial sender's signal to reach the second participant's, and for the collision to return to the initial sender. As a specific example, consider A and B, 5 units apart. A sends "helloworld!" at T=0; B starts sending just as A's message arrives, at T=5. B has listened before transmitting, but A's signal was not yet evident. A doesn't discover the collision until 10 units have elapsed, = 2*distance.

A                   B
|---|---|---|---|---| T=-1 Idle
h---|---|---|---|---| T=0 A begins to send
e---h---|---|---|---| T=1
l---e---h---|---|---| T=2
l---l---e---h---|---| T=3
o---l---l---e---h---| T=4
---o---l---l---e---hh T=4.99 just before collision
w---o---l---l---e---X T=5 COLLISION!
o---w---o---l---X---X T=6
r---o---w---X---X---X T=7
l---r---X---X---X---X T=8
d---X---X---X---X---X T=9
X---X---X---X---X---X T=10 A detects the collision

A corollary is that a station that has transmitted for one slot time without collision is assured that there is no further risk of collision; in this sense such a transmitter is said to have acquired the cable.

The Ethernet minimum packet size is 64 bytes, or one slot time; a station transmitting a packet this size is assured that if a collision were to occur, the sender would detect it (and be able to apply the retransmission algorithm, below). Smaller packets might collide and yet the sender not know it.

Implicit in the delay budget table above is the size of a bit. The speed of propagation in copper is about .77*c, where c=3x10^8 m/sec is the speed of light in vacuum. So, in 1/10 microseconds (the time to send one bit at 10 mbps), the signal propagates approximately .77*c*1E-7 = 23 meters.

Ethernet packets also have a maximum packet size, of 1500 bytes. This limit is only for the sake of fairness, so one station cannot unduly monopolize the cable. Past "rogue Ethernet" specifications (manufacturer's "value-added" "enhancements") have enlarged the maximum packet size to as much as 4KB. There is no reason, actually, not to do this, except compatibility.

Transmissions

Here is the Ethernet retransmission algorithm:
1. Listen before transmitting ("carrier detect")
2. If line is busy, wait for sender to stop and then wait an additional 9.6 microseconds (96 bits). One consequence of this is that there is always a 96-bit gap between packets, so packets do not run together.
3. Transmit while simultaneously monitoring for collisions
4. If a collision does occur, send the jam signal, and choose a backoff time as follows: For transmission N, 1<=N<=10, [N=0: original attempt], choose k randomly between 0 and 2^N-1. Wait k slot times (k*51.2 microsec; a slot time (the time to send 512 bits) is 51.2 microsec). Then check if the line is idle, waiting if necessary for someone else to finish, and then retry step 3. For 11<=N<=15, choose k randomly between 0 and 1023 (= 2^10-1)
5. If we reach N=16 (16 transmission attempts), give up.

Exponential backoff means that if two hosts have waited for a third to finish and transmit simultaneously, and collide, then when N=1 they have a 50% chance of recollision; when N=2 there is a 25% chance, etc. When N>=10 the maximum wait is 52 milliseconds; without this cutoff the max wait at N=15 would be 1.5 seconds. Note that, as indicated above in the min-packet-size discussion, this retransmission strategy assumes that the sender detects the collision while it is still sending, so it knows that the packet must be resent.

Note that this algorithm is not "fair", in the sense that the longer a station has been waiting to send, the lower its priority sinks. Newly transmitting stations with N=0 need not delay at all. The Ethernet capture effect, below, illustrates this unfairness.

Capture effect

Suppose A and B are sending, and each host is fast enough to generate a constantly nonempty queue. Let A and B wait for some third-party packet to finish. Then both transmit, and collide. Suppose A wins, and sends. When A is finished, B tries to transmit again. But A has a second packet, and so A tries too. A chooses a backoff k between 0 and 1 (inclusive), but since B is on its second attempt it must choose k between 0 and 3. This means A is favored to win. Suppose it does. After that transmission is finished, A and B try yet again: A on its 1st attempt for its 3rd packet, and B on its 3rd attempt for its 1st packet. Now A chooses k<2 but B must choose k<8.

As time goes on, if B fails to "win" a given backoffs, its probability of winning the next one is reduced by 1/2. It is quite possible, and does occur in practice, for B to lose all the backoffs until N=16 is reached; once it has lost the first three or four this is in fact quite likely. At this point B simply discards the packet and goes on to the next one with N reset to 1 and k chosen from {0,1}.

SQE: Signal Quality Error: optional test of collision-detection. Available option on some boards; causes trouble with some equipment.

Late Collision errors: some host's network interface is broken, and it transmitted even when it should have been able to detect that someone else was transmitting.

Repeaters and topology

Repeaters change the topology, but not the fundamental constraints. 10baseT (twisted pair ethernet) uses repeaters heavily; a repeater with multiple connections (typically 24) is caled a hub. With twisted pair, a device can only connect to the endpoint of the wire. Thus, typically, each host is connected directly to a hub.

The maximum diameter of an Ethernet consisting of multiple segments, joined by repeaters, is constrained by the round-trip-time, and the need
to detect collisions before the sender has completed sending.

Errors

Packets can have bits flipped or garbled by electrical noise on the cable; estimates of the frequency with which this occurs range from 1 in 10^4 to 1 in 10^6. Bit errors are not uniformly likely; when the occur, they are likely to occur in bursts. Packets can also be lost in repeaters, although this appears less likely. Packets can be lost due to collisions only if the sending host makes 16 unsuccessful transmission attempts and gives up. Ethernet packets contain a 32-bit CRC error-detecting code to detect bit errors.

Packets can also be misaddressed by the sending host, or, most likely of all, they can arrive at the receiving host at a point when the receiver has no free buffers and thus be dropped.

CSMA persistence

A transmission strategy is said to be nonpersistent if, when the line is busy, the sender waits a randomly selected time.
A strategy is p-persistent if, after waiting for the line to clear, the sender sends with probability p<=1. Ethernet uses 1-persistence. A consequence of 1-persistence is that, if more than one station is waiting for line to clear, then when the line does clear a collision is certain. Ethernet then handles the resulting collision via the usual exponential backoff. If N stations are waiting to transmit, the time required for one station to win the backoff is linear in N.
.
An Ethernet broadcast storm is said to occur when there are too many transmission attempts, and most of the available bandwidth is tied up in collisions. A properly functioning Ethernet has an effective bandwidth of as much as 50-70% of the nominal 10Mbps capacity, but attempts to transmit more than this typically result in successfully transmitting a good deal less.

Analysis of Ethernet

In the ALOHA model, stations transmit packets without listening first for a quiet line or monitoring the transmission for collisions (this models the situation of several ground stations transmitting to a satellite; the ground stations are presumed unable to see one another). To model the success rate of ALOHA, assume all the packets are the same size and let T be the slot time, the time to send one packet.

We will assume that a large number N of hosts are transmitting, each at a relatively low rate of s packets/slot time. The average rate of all attempts per slot time is G; we have G = N*s. We will formulate an expression for S, the average rate of successful transmissions per slot, in terms of G.

If two packets overlap during transmissions, they are lost. Thus, a successful transmission requires everyone else quiet for an interval of 2T. The probability of one station transmitting during an interval of time T is G = N*s; the probability of the remaining N-1 stations all quiet for an interval of 2*T is (1-s)^2(N-1). The probability of a succssful transmission is thus
     S =   Ns*(1-s)^2(N-1)
          = G(1-G/N)^(2N)
          -> G*exp(-2G)    as N-> infinity.

The function S = G exp(-2G) has a maximum at G=1/2, S=1/2e. That is, for one slot time T, Prob(success) = 1/2e. So, for time 2e*T, Prob(success) = 1; hence, at this maximum attempt rate, we expect about 2e slot times worth of contention between successful transmissions. What happens to the remaining G-S unsuccessful attempts is not addressed by this model; presumably some higher-level mechanism (eg backoff) leads to retransmissions.

A given S<1/2e may be achieved at either of 2 G's; that is, a given success rate may be due to a comparable attempt rate or else due to a very high attempt rate with a similarly high failure rate.

The relevance of all this to Ethernet is that during one Ethernet slot time there is no way to detect collisions (they haven't reached the sender yet!) and so the Ethernet contention phase resembles ALOHA with T the Ethernet slot time of (maximally) 51.2 microseconds. Once an Ethernet sender succeeds, however, it continues with a full packet transmission, which is presumably many times longer than T. Thus, on the Ethernet, we see an alternating sequence of contention intervals (which may in fact be relatively quiet) and packet transmissions:

The average length of the contention interval, at the maximum throughput calculated above, is 2e slot times (from ALOHA). This is the minimum contention interval; with lower loads the contention interval is longer due to greater idle times and with higher loads the contention interval is longer due to more collisions.

Finally, let P = time to send an entire packet in units of T = average packet size in units of T. P is thus the length of the "packet" phase in the diagram above. The contention phase has length 2e, so the total time to send one packet (contention+packet time) is 2e+P. The useful fraction of this is, of course, P, so the effective maximum bandwidth is P/(2e+P).

At 10Mbps, T=51.2 microseconds is 512 bits, or 64 bytes. For P=128 bytes = 2*64, the effective bandwidth becomes 2/(2e+2), or 27%. For P=512 bytes=8*64, the effective bandwidth is 8/(2e+8), or 60%. If the bandwidth rises to 100 mbps, effective bandwidth plummets because P shrinks by a factor of 10 as measured in units of T

Bridges

Bridges join separate physical ethernets (or ethernets and token rings). A bridge receives packets on one interface and transmits them on the other; collisions are not propagated. On the other hand, a bridge must maintain a queue in case the outbound side is busy; on occasion packets are lost when this queue overflows. A large "virtual" Ethernet built from bridges must not have any loops! (Actually, bridges contain a loop-detection protocol and algorithm.)

Bridges do not forward every packet; each bridge learns address locations as time goes on and does not forward a packet when it knows where the destination is and knows that forwarding isn't necessary. A bridge learns address locations as follows: for each interface, the bridge maintains a table of physical addresses that have appeared as source addresses in packets arriving via that interface. The bridge thus knows that to reach these addresses the packet needs to be sent only via that interface; if a packet arrives from that same interface then it doesn't need to be forwarded at all. When a packet arrives, the bridge examines the source address S. The bridge infers that S is to be reached through the interface by which it arrives, and so enters S into the table for that interface.

Forwarding decisions are then made based on the destination address, D. If D is found in the table for the interface by which the packet arrived, the bridge knows that the packet is local to that interface and does not need to be forwarded at all. If D is found in the table for another interface, the bridge forwards the packet on that interface only. Only if D is not found at all (or if D is the broadcast address!) must the bridge forward the packet on all interfaces.

Once all the bridges have learned where all the hosts are, packet routing becomes optimal. At this point packets are never sent on links unnecessarily; a packet from A to B only travels those links that lie along the (unique) path from A to B. (Paths must be unique because bridged networks cannot have loops).

Bridges have an additional advantage in that traffic that doesn't flow where it doesn't have to flow is much harder to eavesdrop on. On a typical Ethernet, one host configured to receive all packets can listen in on all traffic. Ethernets are notorious for allowing one unscrupulous station to capture, for instance, all passwords in use on the network. On a fully bridged ethernet, a host physically only sees the traffic actually addressed to it; other traffic remains private.

Typical bridges have room for table with 10^4-10^6 entries; this is large enough to learn about all hosts in even a relatively large organization. A bridged Ethernet can fail when total traffic becomes excessive, but total traffic would drown any network (although other network mechanisms might support higher bandwidth). The main limits specific to bridging are the requirement that the topology must be loop-free (so duplicate paths cannot be provided for redundancy), and all broadcast traffic must always be forwarded everywhere. Eventually broadcast traffic comes to dominate a large bridged ethernet, and the organization must move to a routing architecture.