Comp [34]43 Week 14: December 3, 2008

BLAST notes
RTP
Active Queue management
	TCP Vegas
RSVP
RPC
LinkState

===========================================================================

BLAST

Scenarios: just get sent frag1 (singly or repeating):
LAST_FRAG; send SRR
RETRY;	send SRR
RETRY;	send SRR
RETRY	give up!!!

Handout is confusing as to whether LAST_FRAG gets restarted each time
a new fragment arrives. It should *not*! (Otherwise you either have a 
heck of a lot of programming bookkeeping to do, OR you risk having 
BLAST wait forever, constantly resetting LAST_FRAG.)

====

Bit N is on in FragMask if the following condition is true:
       if ( FragMask & (1 <<N) ) ...

====

fragment numbering starts at 0

===========================================================================

   
   RTP: sometimes (though not always) coupled with TFRC: TCP-Friendly Rate Control
   
   * establish RATE of sending packets
   * periodic ACKs return summaries of loss rates
   * suitable for MULTICAST use: greatly limits feasible ACK rates
   * Adjust sending up/down based on loss rate and TCP cwnd=1.2/sqrt(P) rule
   * usually some stability rule: 
   * on loss, reduce by less than half
   	
=====================================================        
	
SACK TCP

	TCP with Selective ACKs, so we don't just guess from dupACKs what has
	gotten through. In practice, not nearly as useful as one might imagine.
	Reno does pretty well, in low-loss environments.
	
ECN TCP (below) may be just as effective

===========================================================================


===========================================================================

Active Queue Management

Dealing with timing:

        PacketPairs
        	Send two packets in rapid succession;
        	do this multiple times; get min.
        	Assume at the min the two packets were transmitted
        	consecutively by the bottleneck router; 
        	then bandwidth = size of 1st packet / time gap
        	
        	Gives glimpse at basic network capacity of link
        	Doesn't shed much light on average share
        	
        DECbit / ECN bit (ECN is below)
        
        	Early detection of congestion
        	DECbit: use AIMD (+1, *.875) to shoot for 50% "congested"
        	
        	
        RED gateways
        	Basic idea: improve TCP performance by dropping a *few* packets
        	when the queue is, say, half full.
        	TCP Reno: behaves badly with multiple losses/window;
        	RED tends to minimize that.
        	
                AvgLen: weighted average queue length
                dropped packets are spaced more-or-less uniformly
                        
	ECN: 
		routers set bit when we might otherwise drop the packet
		(possibly when queue is half full, or in lieu of RED drop)
		receivers: cwnd = cwnd/2

		Biggest advantage of ECN: the receiver discovers the congestion
		immediately, rather than waiting for the existing queue to be
		transmitted. Dropped packets are dropped upon arrival at 
		bottleneck router, but loss is not discovered until the
		queue is transmitted and three subsequent packets are sent.

		RED gateway can flag ECN *instead* of dropping
		
TCP/Vegas 
    Why it doesn't compete well with TCP Tahoe/Reno
    Notion of "extra packets"
    We can measure bandwidth as ack_rate * packet_size
    Queue_use = bandwidth * (RTT - No_load_RTT)
	estimate No_load_RTT by minimum RTT (RTT_min)
	bandwidth is easy to estimate; call it BWE
	"Ideal" cwnd: BWE*RTT_min
	
   Goal: adjust cwnd so BWE*RTT_min + alpha <= cwnd <= BWE*RTT_min + beta
   Typically alpha = 2-3 packets, beta = 4-6
   Add 1 to cwnd if we dip down to alpha,
   subtract 1 from cwnd if we rise to beta (do NOT divide in half!!)
   
    Book: Diff = ExpectedRate - ActualRate; 
    ie express in terms of rate rather than bytes
    (This is the original, "old-fashioned" TCP Vegas exposition.
    Larry Peterson was one of the developers of TCP Vegas.)

===========

TCP Westwood:
Keep continuous estimate of bandwidth, BWE  (= ack rate * packet size)
BWE * RTT_min = min window size to keep bottleneck link busy

Vegas: target window size is to keep cwnd just a hair bigger than BWE*RTT_min 

On loss, reduce cwnd to max(cwnd/2, BWE*RTT_min)

Classic sawtooth, TCP Reno

cwin alternates between cwin_min and cwin_max = 2*cwin_min.
cwin_max = transit_capacity + queue_capacity

If transit_capacity < cwin_min, then Reno does a pretty good job 
keeping the bottleneck link saturated.

but if transit_capacity > cwin_min, then when Reno drops to cwin_min,
the bottleneck link is not saturated until cwin climbs to transit_capacity.
Westwood: on loss, cwin drops to transit_capacity, a smaller reduction.

What about random losses?

Reno: on random loss, cwin /= 2
Westwood: On random loss, drop back to transit_capacity.
If cwin < transit_capacity, don't drop at all [?]
 
===============================================================================
===============================================================================

RPC

Remote Procedure Calls (RPC) 
	goals for RPC: lookup, grid computing, Sun network file sharing (NFS)
	can we just use TCP?
		YES, but you'll need code like
			send(message m):
				if (TCP connection does not exist) reconnect it
				send m on tcp connection
		Actually, you'll need to check for failure of the connection
		*after* trying to send, too: server-reboot problem
		
	Nature of request-reply semantics
	At-least-once semantics, idempotency, and statelessness (SunRPC)
	client reboot v. server reboot
	Timeouts
	XDR (eXternal Data Representation) (omitted)
	6.3: BLAST, CHAN, SELECT
		BLAST is a fragmentation/reassembly protocol
		B: Blast header, C: Chan header, S: Select header, D: data
		
		BCHDDDD   BDDDDDD   BDDDDDD   BDDDDDD   BDDDDDD
		
		Think in terms of grid computing
		Why BLAST has selective ACKs
		How CHAN implements ACKs; serialization of CHAN
		How CHAN deals with reboots, lost data: 
		Limitations of having REQ[N+1] implicitly acknowledge REPLY[N]
			CID: channel ID: at most one req outstanding per channel
				consequences if processing is slow
			MID: message ID: messages are numbered serially
				used as ack field, more or less
			BID: Boot ID: incremented each time system is booted
				client reboot
				server reboot

		Retransmit timer value

	omit:
	T/TCP: (a TCP alternative to RPC)
		Implications of final ACK
		TIMEWAIT issues: old segments, lost final ACK.
		On end, connection goes into TIMEWAIT for 8*RTO
		Why this time?
		T/TCP: add new CCOUNT fields
		Allow SYN+DATA when CCOUNT is new; etc.
		Connection may be reopened by client *within* this time,
		if a new CCOUNT is used

	Serialization issues in RPC (CHAN is *synchronous*) (omit)

	SunRPC
	NFS; implications of statelessness
	NFS stateful operations: probably omit
		rm, mkdir and server duplicate request cache
		file locking - server maintains locks, queries clients
		    if it crashes/recovers, keeps list of clients in file
		NFS v. Unix semantics for deleting open files
			client-side fix of open-file-deletion problem

===============================================================================

================================================================
	
QoS issues:
	playback buffer
	fine-grained (per flow) v coarse-grained (per category)
	
Reservations
Integrated Services / RSVP:
Each flow can make a connection with the routers. 
Routers maintain SOFT STATE about a connection, not hard state!
Can be refreshed if lost (though with some small probability of failure)

Token bucket flow specification: token rate r bytes/sec, bucket depth B.

	Bucket fills at rate specified, does not get fuller than B
	When a packet of size S needs to be sent, S tokens are taken from B
		(B = B-S)
	B represents a "burst capacity".
	B = size of queue needed, if outbound link rate is r

Used for input control:
	if a packet arrives and the bucket is empty, it is discarded,
	or marked "noncompliant"
	
Used for shaping:
	Packets wait until there is sufficient capacity.
	This is what happens if the outbound link rate is r, 
	and B (thus) represents the queue capacity.

Simple bandwidth summation; bucket depth represents queue capacity needed
for bursts

Admission control:
	* calculation for when a flow spec can be satisfied
	* noncompliant (with bucket filter) packets can have lower priority

RESV packets: move backwards in very special way (NOT sent from receiver to sender)

PATH message contains Tspec and goes from sender to receiver.
Each router figures out reverse path.
RESV packet is sent along this reverse path by *receiver*.

Compatible w. multicast

Problem: too many reservations
And how do we decide who gets to reserve what?

Two models:
   1. Charge $ for reservations
   2. Anyone can ask for a reservation, but the answer may be "no"
      Maybe there would be a cap on size

==============

Differentiated Services: basically just two service classes: high and low (now 3 levels)

Rules on which packets can be "premium": max rate from border router? 

Goal: set some rules on admitting premium packets, and hope that their total
numbers to any given destination is small enough that we can meet service
targets (not exactly guarantees)

Packets are marked at ingress. This simplifies things.

Example: VOIP
ISP (not user!) marks VOIP packets as they enter, subject to some ceiling.
Routes these internally with premium service
The ISP negotiates with *its* ISP for a total bulk delivery of premium packets.

One possibility is that the leaf ISPs do use RSVP, but core runs DS
Packets are DS-marked as they enter the core, based on their RSVP status
	DS field: 
	6 bits; 3+3 class+drop_precedence
	
	Two basic strategies: EF and AF. 
	101 110	"EF", or "Expedited Forwarding": best service
	
	Assured Forwarding: 3 bits of Class, 3 bits of Drop Precedence
	Class:
	100	class 4: best
	011	class 3
	010	class 2
	001	class 1
		
	Drop Precedence:
	
	010:	don't drop
	100:	medium
	110	high
	
	Main thing: The classes each get PRIORITY service, over best-effort.
		
	DS uses IP4 TOS field, widely ignored in the past.
		
	Routers SHOULD implement priority queues for service categories

Basic idea: get your traffic marked for the appropriate class.
Then what?

000 000: current best-effort status
xxx 000: traditional IPv4 precedence

PHBs (Per-Hop Behaviors): implemented by all routers
Only "boundary" routers do traffic policing/shaping/classifying/re-marking
to manage categories (re-marking is really part of shaping/policing)

=================

EF: Expedited Forwarding
basically just higher-priority. Packets should experience low queuing delay.

Maybe not exactly; we may give bulk traffic *some* guaranteed share

Functionality depends on ensuring that there is not too much EF traffic.

Basically, we control at the boundary the total volume of EF traffic
(eg to a level that cannot saturate the slowest link), so that we have
plenty of capacity for EF traffic. THen we just handle it at a higher
priority.

This is the best service.

EF provides a minimum-rate guarantee.
This can be tricky: if we accept input traffic from many sources,
and have four traffic outlets R1, R2, R3, R4, then we *should*
only accept enough EF traffic that any *one* Ri can handle it.
But we might go for a more statistical model, if in practice
1/4 of the traffic goes to each Ri.

========================

AF: Assured Forwarding
Simpler than EF, but no guarantee. Traffic totals can be higher.
There is an easy way to send more traffic: it is just marked as "out".

In-out marking: each packet is marked "in" or "out" by the policer.
Actually, we have three precedence levels to use for marking.

The policer *can* be in the end-user network (though "re-policing"
within the ISP, to be sure the original markings were within spec,
is appropriate). But the point is that the end-user gets to choose
*which* packets get precedence, subject to some total ceiling.

From RFC2597:
   The drop precedence level of a packet could be assigned, for example,
   by using a leaky bucket traffic policer, which has as its parameters
   a rate and a size, which is the sum of two burst values: a committed
   burst size and an excess burst size.  A packet is assigned low drop
   precedence if the number of tokens in the bucket is greater than the
   excess burst size [ie bucket is *full*],  medium drop precedence if
   the number of tokens in the bucket is greater than zero, but at most
   the excess burst size, and high drop precedence if the bucket is empty. 
   
Packet mangling to mark DS bits, plus a goodly number of priority bands
for the drop precedences)
(not sure how to handle the different classes; they might get classful
TBF service)

Fits nicely with RIO routers: RED with In and Out (or In, Middle, and Out):
each traffic "level" is subject to a different drop threshold

============================================================================

LinkState: omit in 2008

Alternative to distance-vector:
	dv: keep MINIMUM of network topology
	linkstate: maximum!
	
    4.2.3: Link-state routing and SPF
    	Flooding, SPF
        Flooding protocol; LSP's
        	lollipop sequence-numbering
        SPF algorithm (forward search)
        
       		             B
       		           / | \
       		Example: A   |   D
       		           \ | /
       		             C
       		A-B: 5, B-C: 3, C-D: 2, A-C: 10, B-D: 11
       		
       	Build routes from A to D: (P&D do example from D to A)
       	
       	At each step, 
       	(a) take ALL nodes reachable in one hop from the newest 
       	member of Confirmed, and see if they improve existing routes and
       	if so add to Tentative.
       	
       	(b) Then take the SHORTEST path in Tentative, & move to Confirmed
       	
       	Step	Confirmed		Tentative
       	0	(A,0,-)
       	1a	(A,0,-)			(B,5,B)**, (C,10,C)
       	1b	(A,0,-),(B,5,B)		(C,10,C)
       	2a	(A,0,-),(B,5,B)		(C,8,B)** (better), (D,16,B) (new)
       	2b	...(B,5,B),(C,8,B)	(D,16,B)
       	3a	...(B,5,B),(C,8,B)	(D,10,B) (better, assuming D B routes to C)
       	
       	
       	Another example:
       	
       		A---3---B
       		|       |
       		12      2
       		|       |
       		D---4---C
       		
        Allows precise or TOS-based metrics (TOS=Type of Service)
        Allows multiple paths
        time to compute routes: O(N log N) for SPF, O(N^2) for VD
        
        link-state still requires precise universal link-cost measurements!