Computer Networks Week 9 Mar 25 Corboy Law 522
TCP sliding windows: mostly the same as "normal" sliding windows, done earlier
- basic outline
- examples, window-size flow control
- fast sender, slow receiver
- fast receiver, slow sender
- Keeping the pipe full: bandwidth×delay! p. 301
Final-ACK problem:
what if the final ACK is lost? The other side will resend its final
FIN, but there will be no one left to answer! This is solved with the
TIMEWAIT state.
Old late duplicates problem:
Suppose a connection between the same pair of ports is closed and
promptly reopened. Sometime during the first connection, a packet is
delayed (and retransmitted). It finally arrives during the second
connection, at just the right moment that its sequence number fits into
the receive window of the receiver. (Example: ISN1 = 0, delayed packet
seq number = 8000, ISN2 = 5000, receiver is expecting relative sequence
number of 3000 when the old packet arrives.)
TIMEWAIT to the rescue again!
Other problems:
Duplicate request problem: How does the server side distinguish between two requests and one request?
Reboot problem: How does one side tell if something received is part of a previous session created before a reboot?
What a connection is: machine state at each endpoint
TCP should handle:
- lost packets
- damaged packets
- reordered packets
- duplicated packets
- widely varying delay
ISN rationale 1: old late duplicates
ISN rationale 2: distinguishing new SYN from dup SYN
From Dalal & Sunshine's original paper on the TCP 3-way handshake:
2-way handshake: can't confirm both ISNs; the second ISN is never acknowledged
4-way handshake:
1 --SYN->
2 <-ACK--
3 <-SYN--
4 --ACK->
This FAILS if first SYN is very very old! The ack at line 2 is ignored
by its receiver. LHS thinks the SYN on line 3
is a new request,
and so it acks it. It would then send
its own SYN (on what would
be line 5), but it would be ignored. At this point A and B have
different notions of ISNA.
3-way handshake: good
half open:
one side has crashed. This is discovered when the other side sends a
message. It may take a while, if the protocol requires tha tthe other
side simply wait passively.
half closed: one side (but not the other) has sent its FIN.
simultaneous open
Anomalous TCP scenarios
Duplicate
SYN (cf
Duplicate RRQ in the TFTP protocol)
recognized because of same ISN
Loss of final ACK (cf TFTP)
any resent FIN will receive a RST in response
Old segments arriving for new connection
solved by TIMEWAIT
Sequence number wraparound (WRAPTIME < MSL)
Note WRAPTIME = time to send 4 GB
WRAPTIME = 100 sec => 40 mbytes/sec => >300 Mbits/sec
Client reboots (application restart isn't an issue)
Could an old connection be accepted as new?
TFTP/WUMP:
TFTP is the standard Trivial File Transfer Protocol; WUMP is a windowing version due to me.
- basic strategy
- dallying
- RRQ/REQ
- port change
- State view: differences between UNLATCHED, LATCHED, and DALLY
Client sends REQ to port 69
Server chooses new data port, eg 2000; sends DATA[1] from it
Client latches on to new port, sends ACK[1] to new port
Server sends DATA[2]
...
Server sends final DATA[N]
Client sends ACK[N]
Client enters Dally state
TFTP/WUMP scenarios:
1. duplicate REQ
If the first REQ is sent twice (perhaps
due to a client timeout), two child processes start on the server. The
one that the client receives DATA[1] from first is the one the client
will "latch on" to; the other should be sent an ERROR packet.
2. lost final ACK
This is addressed with the DALLY state
3. old late duplicates
This is addressed by having EITHER side (preferably both) choose a new port number for each "connection" (ie transfer).
4. Sequence number wrap: not allowed (but do we check?)
5. Two scenarios where we get something other than requested:
1. Lost REQ: REQ "foo" (received but response delayed)
<abort>
REQ "bar" (lost, but now delayed DATA1-foo arrives)
6. 2. Malicious flood of DATA[1].bad from port, so
BAD guy opens port 666
<---- DATA[1].bad
from 666
<---- DATA[1].bad
from 666
<---- DATA[1].bad
from 666
<---- DATA[1].bad
from 666
REQ "good" --> server
server creates
good
<---- DATA[1].bad
from 666 LATCH!
<---- server sends
DATA[1].good
<---- DATA[1].bad
from 666
<---- DATA[1].bad
from 666
<---- DATA[1].bad
from 666
Remote Procedure Call (RPC)
RPC is the name given to any request-reply protocol.
SunRPC
Client sends a REQ
Server responds with REPLY
If no REPLY is received, client retransmits the REQ
Timelines:
Lost REPLY
Lost REQ
The lost-REQ is a problem, because the client does not know whether the
server executed the request once or twice. This is famously known as
"at-least-once" semantics, perjoratively known as "at-least-zero".
The server can not cache REPLY
messages, because it has no idea how long to keep them. If the client
sent a final ACK, the server could keep REPLY messages until the ACK
was received.
SunRPC works well for settings in which requests are idempotent: executing twice is the
same as executing once. Here are a few idempotent operations:
read block 318 of a file
write block 29
return an open-file handle for reading file "foo"
open file "foo" for writing, creating it if necessary
validate user "pld" / password "foo"
On the other hand, some operations are fundamentally non-idempotent:
read (or write) the next block of a file
make new directory "foo"
make new file "foo"; fail if "foo" already exists
lock file "foo"
Perhaps the most widespread use of SunRPC was Network File Sharing, or
NFS. Sun defined special semantics for many file operations to make as
many operations as possible idempotent. The end result was a system
where, without special programming, a server crash left the clients
just waiting until the server came back up, whereupon they would
resume. But lack of file locks was a problem.
DCE-RPC
This adds the following to the REQ/REPLY model:
client: PING server: WORKING (for long requests)
if the REQ had never been received, the server replies NoCall
if the REPLY had been sent previously, it retransmits its cached
copy.
client sends ACK when it receives REPLY. On receipt of ACK, the
server deletes its cached REPLY.
Finally, DCE-RPC allows at most one outstanding REQ at a time, for any
single request-reply channel (called an activity).
The client is responsible for most timeouts. It sends the REQ (usually
via UDP) and sets a timer. When that timer expires, it sends PING.
Eventually it gets a REPLY, and it sends an ACK (which may piggyback on
the next REQ).
Note how all this ensures that each REQ is executed exactly once.
If a client wants multiple outstanding requests, it must open multiple
activity channels. A classic example is NFS writes: physical disk
hardware does not do writes in FIFO order. Instead, it typically
executes writes in "elevator algorithm" order, as the write head scans
first up and then down the disk tracks. Waiting for a single write to
complete would mean that we give up many opportunities for writes later
in the queue to complete first. The solution is to have a dozen, say,
write channels; each client-side write goes to the next free channel.
DCE-RPC also supports its own fragmentation-reassembly strategy, to
support large message sizes. If a large message requires fragmentation
(and most disk IO operations involve 8K or even 16K blocks), then
selective Fragment ACKs (FACKs) are built into DCE-RPC. The client does
not have to wait for the PING timeout to discover that one of the
fragments was lost (though it does
have to wait this long to discover that all fragments have been lost).
Transport Problems
Discussion of how TCP, TFTP, and DCE-RPC deal with lost-final-ACK, old-late-duplicates, duplicate-request, and reboot.
6.2.5: TCP timeout & retransmission
original adaptive retransmission: TimeOut = 2*EstRTT,
EstRTT = α*EstRTT + (1-α)*SampleRTT, for fixed α, 0<α<1 (eg
α=1/2 or α=7/8)
For α≃1 this is very conservative
(EstRTT is slow to change). For α≃0, EstRtt is very volatile.
RTT measurement ambiguity: if a packet is sent twice, is the ACK in
response to the first transmission or the second?
Karn/Partridge algorithm:
on packet loss (and retransmission)
- Double Timeout
- Stop recording SampleRTT
- Use doubled Timeout as EstRTT when things resume
Jacobson/Karels algorithm
for calculating the TimeOut value:
EstRTT = α*EstRTT + (1-α)*SampleRTT
EstDev = α*EstDev + (1-α)*SampleDev
TimeOut = EstRTT + 4*EstDev
TCP timers:
- TimeOut
- 2*MSL Timewait
- persist: sender polls receiver when windowsize = 0
- keepalive
Path MTU Discovery
Covered Week 8
Routinely part of TCP implementations now
Uses IP DONT_FRAG bit, and
ICMP Frag Needed / DF Set
response
Simple packet-based sliding-windows algorithm
receiver-side
window size W
Next packet expected N; window is N ... N+W-1
Generic strategy
We have a pool EA of "early arrivals": packets buffered for future use.
When packet M arrives:
if M<N or M≥N+W, ignore
if M>N, put the packet into EA.
if M=N,
output the packet (packet N)
K=N+1
slide window forward by 1
while (packet K is in EA)
output packet K
slide window forward by 1 (to start at K+1)
There are a couple details left out.
Specific implementation:
bufs[]: array of size W. We always put packet M into position M % W
As before, N represents the next packet we are waiting for.
At
any point between packet arrivals, packet slot N is empty, but some or all of N+1 .. N+W-1 may
be full.
Suppose packet M arrives.
1. M<N or M≥N+W: ignore
2. otherwise, put packet M into bufs[M%W]
3. while (bufs[N % W] has a valid packet) {
write it
N++
}
If M!=N, this loop will do nothing.
But if M==N, we will write packet N and any further saved packets, and slide the window forward.
4.
Send a cumulative acknowledgement of all packets up to but not
including (the current value of) N; this is either ACK[N] or ACK[N-1]
depending on protocol wording
sender side:
We will assume ACK[M] means all packets <=M have been received (second option immediately above)
W = window size, N = bottom of window
window is N, N+1, ..., N+W-1
init: N=0. Send full windowful of packets 0, 1, ..., W-1
Arrival of Ack[M]:
if M < N or M≥N+W, ignore
otherwise:
set Last = N+W-1 (last packet sent)
set N = M+1.
for (i=Last+1; i<N+W; i++) send packet i
Some TCP notes
First, if a TCP packet arrives outside the receiver window, the
receiver sends back its current ACK. This is required behavior. We
discard the packet, but we don't completely ignore it.
Second, the TCP window size fluctuates. Thus, the pool EA must be more abstract than simply keeping track of positions modulo W.
Third, TCP senders do not send a full window, ever. TCP has something called "slow start" to prevent this.
Routing
LinkState
Linkstate routing is an alternative to distance-vector. In distance-vector, each node keeps a minimum of network topology. In linkstate, each node keeps a maximum: a full map of all nodes and all links.
4.2.3: Link-state routing and SPF
Whenever either side of a link notices it has died (or if a node
notices that a new link has become available), it sends out LSP packets
(Link State Protocol) that "flood" the network. This is called reliable flooding;
note that in general broadcast protocols work poorly with networks that
have even small amounts of topological looping (redundant paths).
Flooding algorithm: new messages are sent on over all links except the
arriving interface. Each node maintains a database of all messages
received. LSPs have sequence numbers, and a message is new if its
sequence number is larger than any seen so far.
It is important that LSP sequence numbers not wrap around. lollipop sequence-numbering