Computer Networks Week 9   Mar 25  Corboy Law 522


               
TCP sliding windows: mostly the same as "normal" sliding windows, done earlier

   
Final-ACK problem: what if the final ACK is lost? The other side will resend its final FIN, but there will be no one left to answer! This is solved with the TIMEWAIT state.

Old late duplicates problem: Suppose a connection between the same pair of ports is closed and promptly reopened. Sometime during the first connection, a packet is delayed (and retransmitted). It finally arrives during the second connection, at just the right moment that its sequence number fits into the receive window of the receiver. (Example: ISN1 = 0, delayed packet seq number = 8000, ISN2 = 5000, receiver is expecting relative sequence number of 3000 when the old packet arrives.)

TIMEWAIT to the rescue again!
 
Other problems:

Duplicate request problem: How does the server side distinguish between two requests and one request?

Reboot problem: How does one side tell if something received is part of a previous session created before a reboot?

What a connection is: machine state at each endpoint
TCP should handle:
 
     
ISN rationale 1: old late duplicates
       
ISN rationale 2: distinguishing new SYN from dup SYN
 
From Dalal & Sunshine's original paper on the TCP 3-way handshake:
     
2-way handshake: can't confirm both ISNs; the second ISN is never acknowledged
             
4-way handshake:
            1    --SYN->
            2    <-ACK--
            3    <-SYN--
            4    --ACK->
This FAILS if first SYN is very very old! The ack at line 2 is ignored by its receiver. LHS thinks the SYN on line 3  is a new request, and so it acks it. It would then send  its own SYN (on what would be line 5), but it would be ignored. At this point A and B have different notions of ISNA.
         
3-way handshake: good
 



half open: one side has crashed. This is discovered when the other side sends a message. It may take a while, if the protocol requires tha tthe other side simply wait passively.

half closed: one side (but not the other) has sent its FIN.

simultaneous open
     
                               

   

Anomalous TCP scenarios

Duplicate SYN           (cf Duplicate RRQ in the TFTP protocol)
                recognized because of same ISN

Loss of final ACK       (cf TFTP)
                any resent FIN will receive a RST in response

Old segments arriving for new connection
                solved by TIMEWAIT

Sequence number wraparound (WRAPTIME < MSL)
                Note WRAPTIME = time to send 4 GB
                WRAPTIME = 100 sec => 40 mbytes/sec => >300 Mbits/sec

Client reboots (application restart isn't an issue)
                Could an old connection be accepted as new?
                
 


    
TFTP/WUMP:    
TFTP is the standard Trivial File Transfer Protocol; WUMP is a windowing version due to me.

Client sends REQ to port 69
Server chooses new data port, eg 2000; sends DATA[1] from it
Client latches on to new port, sends ACK[1] to new port
Server sends DATA[2]
...
Server sends final DATA[N]
Client sends ACK[N]
Client enters Dally state
 
TFTP/WUMP scenarios:
        
1.    duplicate REQ
         
If the first REQ is sent twice (perhaps due to a client timeout), two child processes start on the server. The one that the client receives DATA[1] from first is the one the client will "latch on" to; the other should be sent an ERROR packet.
         
2.    lost final ACK
This is addressed with the DALLY state
                 
         
3.    old late duplicates
This is addressed by having EITHER side (preferably both) choose a new port number for each "connection" (ie transfer).
         
4.    Sequence number wrap: not allowed (but do we check?)
         
5.    Two scenarios where we get something other than requested:
        1. Lost REQ:     REQ "foo" (received but response delayed)
                <abort>
                REQ "bar" (lost, but now delayed DATA1-foo arrives)
                
6.    2. Malicious flood of DATA[1].bad from port, so
         
                             BAD guy opens port 666
                       <---- DATA[1].bad from 666
                       <---- DATA[1].bad from 666
                       <---- DATA[1].bad from 666
                       <---- DATA[1].bad from 666
                       
                REQ "good" --> server
                        server creates good
 
                       <---- DATA[1].bad from 666    LATCH!
                         
                         
                    <----  server sends DATA[1].good
                     
                       <---- DATA[1].bad from 666
                       <---- DATA[1].bad from 666
                       <---- DATA[1].bad from 666



Remote Procedure Call (RPC)

RPC is the name given to any request-reply protocol.

SunRPC

Client sends a REQ
Server responds with REPLY
If no REPLY is received, client retransmits the REQ

Timelines:
    Lost REPLY
    Lost REQ

The lost-REQ is a problem, because the client does not know whether the server executed the request once or twice. This is famously known as "at-least-once" semantics, perjoratively known as "at-least-zero".

The server can not cache REPLY messages, because it has no idea how long to keep them. If the client sent a final ACK, the server could keep REPLY messages until the ACK was received.

SunRPC works well for settings in which requests are idempotent: executing twice is the same as executing once. Here are a few idempotent operations:
    read block 318 of a file
    write block 29
    return an open-file handle for reading file "foo"
    open file "foo" for writing, creating it if necessary
    validate user "pld" / password "foo"

On the other hand, some operations are fundamentally non-idempotent:
    read (or write) the next block of a file
    make new directory "foo"
    make new file "foo"; fail if "foo" already exists
    lock file "foo"

Perhaps the most widespread use of SunRPC was Network File Sharing, or NFS. Sun defined special semantics for many file operations to make as many operations as possible idempotent. The end result was a system where, without special programming, a server crash left the clients just waiting until the server came back up, whereupon they would resume. But lack of file locks was a problem.


DCE-RPC

This adds the following to the REQ/REPLY model:
    client: PING server: WORKING   (for long requests)
    if the REQ had never been received, the server replies NoCall
    if the REPLY had been sent previously, it retransmits its cached copy.

    client sends ACK when it receives REPLY. On receipt of ACK, the server deletes its cached REPLY.

Finally, DCE-RPC allows at most one outstanding REQ at a time, for any single request-reply channel (called an activity).

The client is responsible for most timeouts. It sends the REQ (usually via UDP) and sets a timer. When that timer expires, it sends PING. Eventually it gets a REPLY, and it sends an ACK (which may piggyback on the next REQ).

Note how all this ensures that each REQ is executed exactly once.

If a client wants multiple outstanding requests, it must open multiple activity channels. A classic example is NFS writes: physical disk hardware does not do writes in FIFO order. Instead, it typically executes writes in "elevator algorithm" order, as the write head scans first up and then down the disk tracks. Waiting for a single write to complete would mean that we give up many opportunities for writes later in the queue to complete first. The solution is to have a dozen, say, write channels; each client-side write goes to the next free channel.

DCE-RPC also supports its own fragmentation-reassembly strategy, to support large message sizes. If a large message requires fragmentation (and most disk IO operations involve 8K or even 16K blocks), then selective Fragment ACKs (FACKs) are built into DCE-RPC. The client does not have to wait for the PING timeout to discover that one of the fragments was lost (though it does have to wait this long to discover that all fragments have been lost).



Transport Problems


Discussion of how TCP, TFTP, and DCE-RPC deal with lost-final-ACK, old-late-duplicates, duplicate-request, and reboot.
    

 

6.2.5: TCP timeout & retransmission

original adaptive retransmission: TimeOut = 2*EstRTT,
EstRTT = α*EstRTT + (1-α)*SampleRTT, for fixed α, 0<α<1 (eg α=1/2 or α=7/8)
For α≃1 this is very conservative (EstRTT is slow to change). For α≃0, EstRtt is very volatile.
        
RTT measurement ambiguity: if a packet is sent twice, is the ACK in response to the first transmission or the second?
Karn/Partridge algorithm: on packet loss (and retransmission)
 
Jacobson/Karels algorithm for calculating the TimeOut value:
EstRTT = α*EstRTT + (1-α)*SampleRTT
EstDev = α*EstDev + (1-α)*SampleDev
TimeOut = EstRTT + 4*EstDev
                
TCP timers: 
 

Path MTU Discovery
    Covered Week 8
    Routinely part of TCP implementations now
    Uses IP DONT_FRAG bit, and ICMP Frag Needed / DF Set response


 

Simple packet-based sliding-windows algorithm


receiver-side

window size W
Next packet expected N; window is N ... N+W-1
   
Generic strategy
   
We have a pool EA of "early arrivals": packets buffered for future use.
When packet M arrives:
if M<N or M≥N+W, ignore
if M>N, put the packet into EA.

if M=N,
       output the packet (packet N)
        K=N+1
        slide window forward by 1
        while (packet K is in EA)
               output packet K
               slide window forward by 1 (to start at K+1)
       
There are a couple details left out.

Specific implementation:
   
bufs[]: array of size W. We always put packet M into position M % W
As before, N represents the next packet we are waiting for.

At any point between packet arrivals, packet slot N is empty, but some or all of N+1 .. N+W-1 may be full.
   
Suppose packet M arrives.
   
1. M<N or M≥N+W: ignore
2. otherwise, put packet M into bufs[M%W]
3. while (bufs[N % W] has a valid packet) {
           write it
           N++
       }
       If M!=N, this loop will do nothing.
       But if M==N, we will write packet N and any further saved packets, and slide the window forward.
4. Send a cumulative acknowledgement of all packets up to but not including (the current value of) N; this is either ACK[N] or ACK[N-1] depending on protocol wording
      
sender side:
    We will assume ACK[M] means all packets <=M have been received (second option immediately above)
   
    W = window size, N = bottom of window
    window is N, N+1, ..., N+W-1
   
    init: N=0. Send full windowful of packets 0, 1, ..., W-1
   
Arrival of Ack[M]:
   
    if M < N or M≥N+W, ignore
    otherwise:
        set Last = N+W-1 (last packet sent)
        set N = M+1.
        for (i=Last+1; i<N+W; i++) send packet i
       
   
Some TCP notes

First, if a TCP packet arrives outside the receiver window, the receiver sends back its current ACK. This is required behavior. We discard the packet, but we don't completely ignore it.

Second, the TCP window size fluctuates. Thus, the pool EA must be more abstract than simply keeping track of positions modulo W.

Third, TCP senders do not send a full window, ever. TCP has something called "slow start" to prevent this.
   

 



Routing

LinkState

Linkstate routing is an alternative to distance-vector. In distance-vector, each node keeps a minimum of network topology. In linkstate, each node keeps a maximum: a full map of all nodes and all links.
  
4.2.3: Link-state routing and SPF

Whenever either side of a link notices it has died (or if a node notices that a new link has become available), it sends out LSP packets (Link State Protocol) that "flood" the network. This is called reliable flooding; note that in general broadcast protocols work poorly with networks that have even small amounts of topological looping (redundant paths).

Flooding algorithm: new messages are sent on over all links except the arriving interface. Each node maintains a database of all messages received. LSPs have sequence numbers, and a message is new if its sequence number is larger than any seen so far.

It is important that LSP sequence numbers not wrap around. lollipop sequence-numbering