Basic introduction to Ethernet

Peter Dordal, Loyola Univ Chicago, 1996-97

Ethernet - Sources: 2.2 of Stevens, or 3.6 of Peterson & Davie, ch. XX of Nemeth, Snyder, Seebass;

An Ethernet is a broadcast bus, which means that all packets are, at the physical level, broadcast onto the medium and can be seen, theoretically, by all other nodes (as we will see later, this can have significant security implications). The basic medium is a straight conductor, or bus, although this can be modified with repeaters into an arbitrary tree structure. If two stations transmit at the same time, the signals will collide, and interfere with one another. Both transmissions fail, as a result. In order to minimize collision loss, each station must implement the following:

1. Before transmission, wait for the line to become quiet
2. While transmitting, continually monitor the line for signs that a collision has occurred; if a collision happens, then cease
3. If a collision occurs, use a backoff-and-retransmit strategy

These properties can be summarized with the CSMA/CD acronym: Carrrier Sense, Multiple Access, Collision Detect. (Carrrier sense means "wait for quiet before transmitting"; there is no literal carrier frequency to be sensed.)

Ethernet comes in version 1 [1980, DEC-Intel-Xerox], version 2 [1982, DIX], and IEEE 802.3. There are some minor electrical differences between these, and one rather substantial packet-format difference. In addiiton to these, the Berkeley Unix trailing-headers packet format was used for a while; it is now obsolete. The original paper describing Ethernet was by Metcalfe and Boggs, 1976.

There are three physical formats for Ethernet cable: thick coax (10base5), thin coax (10base2), and twisted pair (10baseT). Thick coax was the original; economics drove the successive development of the later two. Connections are made into thick coax via taps plus transceivers and into thin coax via T-connectors. Twisted-pair does not allow mid-cable stations; it is only used for single host-to-hub connections. All three formats can interconnect, although only through repeaters, and all use the same 10 megabit/sec transmission speed (a megabit is 1,000,000 bits, not 2^20 bits). Coax cable must have a terminating resistor at each end; this is implicit in twisted-pair cable.

Here is the format of a typical Ethernet packet (DIX specification)

     +---------------+--------------+----+---------------------+-----+
     |   dest addr   |   src addr   |type|        data         | CRC |
     +---------------+--------------+----+---------------------+-----+

The destination and source addresses are 48-bit quantities; the type is 16 bits, the data length is variable, and the final CRC checksum is 32 bits. The checksum is added by the ethernet hardware, never by the host software. There is also a preamble: a block of 1 bits followed by a 0, in the front of the packet, for synchronization.

The type field identifies the next higher protocol layer; here are some sample values: 8137 = IPX, 0800 = IP, etc.

Each Ethernet board has a (hopefully unique) physical address in ROM; by default any packet sent to this address will be received by the board and passed up to the host system. Packets addressed to other physical addresses will be seen by the board, but ignored (by default). All Ethernet devices also agree on a broadcast address of all 1's: a packet sent to the broadcast address will be delivered to all attached hosts. It is often possible to change the physical address of a given board in software; it is also generally possible to put a given board into promiscuous mode, which means that all packets on the net, no matter what the destination address, are delivered to the attached host. This mode was originally there for diagnostic purposes, but is now best known for the security breach it opens: it is not unusual to find a host with network board in promiscuous mode and with a process collecting the first 100 bytes (presumably including userid and password) of every telnet connection.

Another category of addresses is multicast.

The first bit of the physical address indicates whether the address is physical or multicast; the second bit is supposed to indicate, in the case of physical addresses, whether the address is supposed to be globally unique or if it is only locally unique. Some proposals for TCP would require that hosts have globally unique "Endpoint IDentifiers", or EIDs; the Ethernet physical address would qualify only if it were globally unique. When Ethernet IDs are assigned by the manufacturer, the first three bytes serve to indicate the manufacturer.

The diameter of an ethernet is the maximum distance between any pair of stations. Note that the actual total length of cable can be much greater than this, if, for example, the topology is a "star" configuration. The maximum allowed diameter, measured in bits, is limited to 232 (a sample "budget" for this is below). This makes the round-trip-time 464 bits. Adding 48 bits for the maximum "jam time" gives 512 bits (64 bytes) as the slot time of an Ethernet. The value of the slot time determines several subsequent aspects of Ethernet. If a station has transmitted for one slot time, then no collision can occur (unless there is a hardware error) for the remainder of that packet. This is because one slot time is enough time for any other station to have realized that the first station has started transmitting, so after that time they will wait for the first station to finish. Thus, after one slot time a station is said to have acquired the network. The slot time is also used as the basic interval for retransmission scheduling, below.

Delay budget:

Here are typical maximum values for the delay due to various components. These are taken from the Digital-Intel-Xerox (DIX) standard of 1982, except that "point-to-point link cable" is replaced by standard cable. The DIX specification allows 1500m of coax with two repeaters and 1000m of point-to-point cable; I've used 2500m of coax + 4 repeaters here, following the later IEEE 802.3 Ethernet specification. I've also simplified someof the more obscure delays. These are one-way delay times, in bits. The maximum path may have four repeaters, and ten transceivers each with drop cable (two per repeater, plus one at each endpoint).

item            size    delay           explanation
coax            2500M   110 bits        23 bits/meter (.77c)
transceiver
cables          500M     25 bits        19.5 bits/meter (.65c)
transceivers             40 bits        max 10, 4 bits each
repeaters                25 bits        max 4,  6+ bits each (DIX 7.6.4.1)
encoders                 20 bits        max 10, 2 bits each
(generate signal)
Total                   220 bits        should be 232

Some of these are high, but there are also signal rise time delays, sense delays, and timer delays that I've omitted. It works out fairly closely.

If you were designing a new CSMA/CD protocol, you would choose the physical configuration rules (which determine the network diameter and thus the slot time) and the minimum packet size so that slot_time <= min_packet_size. For example, if you took the physical cabling rules for 10mbps ethernet and tried to upgrade the speed tenfold to 100mbps, you would need to require a minimum packet size of at least 640 bytes (because the time to send 512 bits has now become the time needed to send 5120 bits). If this were too large, you would have to reduce the network diameter.

The signal loss in any single segment of cable is limited to 8.5 db, or about 14% of original strength. Repeaters will restore the signal to its original strength. The reason for the per-segment length restriction is that Ethernet requires a strict limit on how much the remote signal can be allowed to lose strength. It is possible for a station to detect and reliably read very weak remote signals, but not at the same time that it is transmitting locally. This is exactly what must be done, though, for collision detection: remote signals must arrive with sufficient strength to be heard even while the receiving station is itself transmitting. Note that the per-segment limit, then, has nothing to do with the overall length limit; the latter is set only to ensure that a sender is guaranteed of detecting a collision, even if it sends the minimum-sized packet.

If a collision is going to occur, it will occur within one slot time (more exactly, within 2*diameter) of the network. This gives time for the initial sender's signal to reach the second participant's, and for the collision to return to the initial sender. As a specific example, consider A and B, 5 units apart. A sends "helloworld!" at T=0; B starts sending just as A's message arrives, at T=5. B has listened before transmitting, but A's signal was not yet evident. A doesn't discover the collision until 10 units have elapsed, = 2*distance.

A                   B
|---|---|---|---|---| T=-1 Idle
h---|---|---|---|---| T=0 A begins to send
e---h---|---|---|---| T=1
l---e---h---|---|---| T=2
l---l---e---h---|---| T=3
o---l---l---e---h---| T=4
---o---l---l---e---hh T=4.99 just before collision
w---o---l---l---e---X T=5 COLLISION!
o---w---o---l---X---X T=6
r---o---w---X---X---X T=7
l---r---X---X---X---X T=8
d---X---X---X---X---X T=9
X---X---X---X---X---X T=10 A detects the collision

A corollary is that a station that has transmitted for one slot time without collision is assured that there is no further risk of collision; in this sense such a transmitter is said to have acquired the cable.

The Ethernet minimum packet size is 64 bytes, or one slot time; a station transmitting a packet this size is assured that if a collision were to occur, the sender would detect it (and be able to apply the retransmission algorithm, below). Smaller packets might collide and yet the sender not know it.

Implicit in the delay budget table above is the size of a bit. The speed of propagation in copper is about .77*c, where c=3x10^8 m/sec is the speed of light in vacuum. So, in 1/10 microseconds (the time to send one bit at 10 mbps), the signal propagates approximately .77*c*1E-7 = 23 meters.

Ethernet packets also have a maximum packet size, of 1500 bytes. This limit is only for the sake of fairness, so one station cannot unduly monopolize the cable. Past "rogue Ethernet" specifications (manufacturer's "value-added" "enhancements") have enlarged the maximum packet size to as much as 4k. There is no reason, actually, not to do this.

Here is the Ethernet retransmission algorithm:
1. Listen before transmitting ("carrier detect")
2. If line is busy, wait for sender to stop and then wait an additional 9.6 microseconds (96 bits). One consequence of this is that there is always a 96-bit gap between packets, so packets do not run together.
3. Transmit while simultaneously monitoring for collisions
4. If a collision does occur, send the jam signal, and choose a waiting time as follows: For transmission N, 1<=N<=10, [N=0: original attempt], choose k randomly between 0 and 2^N-1. Wait k slot times (k*51.2 microsec; a slot time (the time to send 512 bits) is 51.2 microsec). Then check if the line is idle, waiting if necessary for someone else to finish, and then retry step 3. For 11<=N<=15, choose k randomly between 0 and 1023 (= 2^10-1)

5. If we reach N=16 (16 transmission attempts), give up.

Exponential backoff means that if two hosts have waited for a third to finish and transmit simultaneously, and collide, then when N=1 they have a 50% chance of recollision; when N=2 there is a 25% chance, etc. When N>=10 the maximum wait is 52 milliseconds; without this cutoff the max wait at N=15 would be 1.5 seconds. Note that, as indicated above in the min-packet-size discussion, this retransmission strategy assumes that the sender detects the collision while
it is still sending, so it knows that the packet must be resent.

Note that this algorithm is not "fair", in the sense that the longer a station has been waiting to send, the lower its priority sinks. Newly transmitting stations with N=0 need not delay at all.

Capture effect

Suppose A and B are sending, and each host is fast enough to generate a constantly nonempty queue. Let A and B wait for some third-party packet to finish. Then both transmit, and collide. Suppose A wins, and sends. When A is finished, B tries to transmit again. But A has a second packet, and so A tries too. A chooses a backoff k between 0 and 1 (inclusive), but since B is on its second attempt it must choose k between 0 and 3. This means A is favored to win. Suppose it does. After that transmission is finished, A and B try yet again: A on its 1st attempt for its 3rd packet, and B on its 3rd attempt for its 1st packet. Now A chooses k<2 but B must choose k<8.

As time goes on, if B fails to "win" a given backoffs, its probability of winning the next one is reduced by 1/2. It is quite possible, and does occur in practice, for B to lose all the backoffs until N=16 is reached. At this point B simply discards the packet and goes on to the next one with N reset to 1 and k chosen from {0,1}.

SQE: Signal Quality Error: optional test of collision-detection. Available option on some boards; causes trouble w. some equipment.

Late Collision errors: some host's network interface is broken, and it transmitted even when it should have been able to detect that someone else was transmitting.

Repeaters and topology

Repeaters change the topology, but not the fundamental constraints. 10baseT (twisted pair ethernet) uses repeaters heavily; a repeater with multiple connections (typically 24) is caled a hub. With twisted pair, a device can only connect to the endpoint of the wire. Thus, typically, each host is connected directly to a hub

physical issues with CSMA/CD: S/N v. S/Local
max length of a single segment is related only to
signal weakening along the length of that segment:
sender must be able to detect collisions occurring
*while* it is sending. This requires a certain minimum
strength for the remote signal.

max length of *multiple* segments, joined by repeaters,
is constrained by the round-trip-time, and the need
to detect collisions before the sender has completed sending.

what perils befall packets: garbled, lost, delayed, misaddressed
Errors: collision, buffer overflow, CRC error

CSMA persistence
A transmission strategy is said to be nonpersistent if, when the line is busy, it waits a random time.
A strategy is p-persistent if, after waiting for the line to clear, the sender sends with probability p<=1. Ethernet uses 1-persistence.
A consequence of 1-persistence is that, if
10 stations waiting for line to clear:
all stations would then transmit at once; collision
would be certain. Exponential backoff would then
come into play. Stations that had waited longer would
have backed off to larger timeouts, and would thus
have *lower* priority than stations making 1st attempts.

stability of exponential backoff; broadcast storms

Analysis of ethernet
ALOHA model: pure contention. Slot time T = time to send packet
If two packets overlap, they are lost.
A successful transmission requires everyone else quiet for 2T.
S = expected # of successful packets/slot
G = expected # of attempts/slot
N hosts, transmitting at rate s packets/slot time
So, G = Ns
Prob of one station transmitting successfully (others quiet for 2T):
S = Ns*(1-s)^2(N-1) = G(1-G/N)^(2N) ->G*exp(-2G)
Max of S v G is at G=0.5, S=1/2e.
That is, for time T, Prob(success) = 1/2e.
So, for time 2e*T, Prob(success) = 1; hence, at this max,
we expect about 2e*T time between successful packets.

A given S may be achieved at either of 2 G's; instability

Ethernet model:
Let T = bit length of net; eg 100 bits for 2500 m at 2.5e8 m/sec
The ALOHA model applies (roughly) during the contention period,
before a packet has acquired the net:

|contention|..packet..|...contention...|packet|contention|..packet..|

Once net is acquired, transmission is uninterrupted.
Avg length of contention, at *max* throughput: 2e*T (from ALOHA)
minimum possible time (at *any* t-put) between packets: avg 2e*T
Let P = time to send entire packet/T = avg packet size in units of T
Total time to send packet: contention time + P = 2e+P
maximum bandwidth: P/(2e+P), not achievable due to instability
Typical P: 100 byte packets, T=25 bytes: P=4, BW=42%
500 byte packets: P=20, BW=78% P=10, BW=67%
100 mbps: BW *plummets* (P shrinks as measured in units of T)

10 mbps: ********** ********** **********
100 mbps: * * * * *
****: packet
(packet & contention-period lengths measured in absolute time)

Bridges and Adaptive Bridges; cycles; scalability
Bridges join separate physical ethernets.
(or ethernets and token rings)
Packets are propagated (unless the address is recognized
as belonging only to one side of the device)
but collisions are *not*.

Bridges learn addresses as follows. A bridge has several interfaces.
For each interface, the bridge maintains a table of physical addrs
known to be reached through that interface. Initally these are empty.
When a packet arrives, the bridge examines the source address S.
The bridge infers that S is reached through the interface by
which it arrives, and so enters S into the table for that interface.

Forwarding decisions are then made based on the destination addr, D.
If D is found in the table for the interface by which the packet
arrived, the bridge knows that the packet is local to that
interface and does *not* need to be forwarded.

If D is found in the table for another interface, the bridge
forwards the packet on that interface only.

If D is not found at all, only then must the packet be
forwarded on all interfaces. Broadcast traffic always
is in this category, too.

Once all the bridges have learned where all the hosts are,
packet routing is optimal. Packets are never sent on links
unnecessarily; a packet from A to B only travels those links
that form the unique path from A to B. (Paths must be unique
because bridged networks cannot have loops).

Limit to total size: total traffic
Limits to size: b'cast, table sizes (10^4 v. 10^6)
Cannot use loop topology
Delay (we don't want packets arriving late)
bridges & security: other parties cannot listen in.