Comp [34]43 Week 14: December 3, 2008 BLAST notes RTP Active Queue management TCP Vegas RSVP RPC LinkState =========================================================================== BLAST Scenarios: just get sent frag1 (singly or repeating): LAST_FRAG; send SRR RETRY; send SRR RETRY; send SRR RETRY give up!!! Handout is confusing as to whether LAST_FRAG gets restarted each time a new fragment arrives. It should *not*! (Otherwise you either have a heck of a lot of programming bookkeeping to do, OR you risk having BLAST wait forever, constantly resetting LAST_FRAG.) ==== Bit N is on in FragMask if the following condition is true: if ( FragMask & (1 < cwin_min, then when Reno drops to cwin_min, the bottleneck link is not saturated until cwin climbs to transit_capacity. Westwood: on loss, cwin drops to transit_capacity, a smaller reduction. What about random losses? Reno: on random loss, cwin /= 2 Westwood: On random loss, drop back to transit_capacity. If cwin < transit_capacity, don't drop at all [?] =============================================================================== =============================================================================== RPC Remote Procedure Calls (RPC) goals for RPC: lookup, grid computing, Sun network file sharing (NFS) can we just use TCP? YES, but you'll need code like send(message m): if (TCP connection does not exist) reconnect it send m on tcp connection Actually, you'll need to check for failure of the connection *after* trying to send, too: server-reboot problem Nature of request-reply semantics At-least-once semantics, idempotency, and statelessness (SunRPC) client reboot v. server reboot Timeouts XDR (eXternal Data Representation) (omitted) 6.3: BLAST, CHAN, SELECT BLAST is a fragmentation/reassembly protocol B: Blast header, C: Chan header, S: Select header, D: data BCHDDDD BDDDDDD BDDDDDD BDDDDDD BDDDDDD Think in terms of grid computing Why BLAST has selective ACKs How CHAN implements ACKs; serialization of CHAN How CHAN deals with reboots, lost data: Limitations of having REQ[N+1] implicitly acknowledge REPLY[N] CID: channel ID: at most one req outstanding per channel consequences if processing is slow MID: message ID: messages are numbered serially used as ack field, more or less BID: Boot ID: incremented each time system is booted client reboot server reboot Retransmit timer value omit: T/TCP: (a TCP alternative to RPC) Implications of final ACK TIMEWAIT issues: old segments, lost final ACK. On end, connection goes into TIMEWAIT for 8*RTO Why this time? T/TCP: add new CCOUNT fields Allow SYN+DATA when CCOUNT is new; etc. Connection may be reopened by client *within* this time, if a new CCOUNT is used Serialization issues in RPC (CHAN is *synchronous*) (omit) SunRPC NFS; implications of statelessness NFS stateful operations: probably omit rm, mkdir and server duplicate request cache file locking - server maintains locks, queries clients if it crashes/recovers, keeps list of clients in file NFS v. Unix semantics for deleting open files client-side fix of open-file-deletion problem =============================================================================== ================================================================ QoS issues: playback buffer fine-grained (per flow) v coarse-grained (per category) Reservations Integrated Services / RSVP: Each flow can make a connection with the routers. Routers maintain SOFT STATE about a connection, not hard state! Can be refreshed if lost (though with some small probability of failure) Token bucket flow specification: token rate r bytes/sec, bucket depth B. Bucket fills at rate specified, does not get fuller than B When a packet of size S needs to be sent, S tokens are taken from B (B = B-S) B represents a "burst capacity". B = size of queue needed, if outbound link rate is r Used for input control: if a packet arrives and the bucket is empty, it is discarded, or marked "noncompliant" Used for shaping: Packets wait until there is sufficient capacity. This is what happens if the outbound link rate is r, and B (thus) represents the queue capacity. Simple bandwidth summation; bucket depth represents queue capacity needed for bursts Admission control: * calculation for when a flow spec can be satisfied * noncompliant (with bucket filter) packets can have lower priority RESV packets: move backwards in very special way (NOT sent from receiver to sender) PATH message contains Tspec and goes from sender to receiver. Each router figures out reverse path. RESV packet is sent along this reverse path by *receiver*. Compatible w. multicast Problem: too many reservations And how do we decide who gets to reserve what? Two models: 1. Charge $ for reservations 2. Anyone can ask for a reservation, but the answer may be "no" Maybe there would be a cap on size ============== Differentiated Services: basically just two service classes: high and low (now 3 levels) Rules on which packets can be "premium": max rate from border router? Goal: set some rules on admitting premium packets, and hope that their total numbers to any given destination is small enough that we can meet service targets (not exactly guarantees) Packets are marked at ingress. This simplifies things. Example: VOIP ISP (not user!) marks VOIP packets as they enter, subject to some ceiling. Routes these internally with premium service The ISP negotiates with *its* ISP for a total bulk delivery of premium packets. One possibility is that the leaf ISPs do use RSVP, but core runs DS Packets are DS-marked as they enter the core, based on their RSVP status DS field: 6 bits; 3+3 class+drop_precedence Two basic strategies: EF and AF. 101 110 "EF", or "Expedited Forwarding": best service Assured Forwarding: 3 bits of Class, 3 bits of Drop Precedence Class: 100 class 4: best 011 class 3 010 class 2 001 class 1 Drop Precedence: 010: don't drop 100: medium 110 high Main thing: The classes each get PRIORITY service, over best-effort. DS uses IP4 TOS field, widely ignored in the past. Routers SHOULD implement priority queues for service categories Basic idea: get your traffic marked for the appropriate class. Then what? 000 000: current best-effort status xxx 000: traditional IPv4 precedence PHBs (Per-Hop Behaviors): implemented by all routers Only "boundary" routers do traffic policing/shaping/classifying/re-marking to manage categories (re-marking is really part of shaping/policing) ================= EF: Expedited Forwarding basically just higher-priority. Packets should experience low queuing delay. Maybe not exactly; we may give bulk traffic *some* guaranteed share Functionality depends on ensuring that there is not too much EF traffic. Basically, we control at the boundary the total volume of EF traffic (eg to a level that cannot saturate the slowest link), so that we have plenty of capacity for EF traffic. THen we just handle it at a higher priority. This is the best service. EF provides a minimum-rate guarantee. This can be tricky: if we accept input traffic from many sources, and have four traffic outlets R1, R2, R3, R4, then we *should* only accept enough EF traffic that any *one* Ri can handle it. But we might go for a more statistical model, if in practice 1/4 of the traffic goes to each Ri. ======================== AF: Assured Forwarding Simpler than EF, but no guarantee. Traffic totals can be higher. There is an easy way to send more traffic: it is just marked as "out". In-out marking: each packet is marked "in" or "out" by the policer. Actually, we have three precedence levels to use for marking. The policer *can* be in the end-user network (though "re-policing" within the ISP, to be sure the original markings were within spec, is appropriate). But the point is that the end-user gets to choose *which* packets get precedence, subject to some total ceiling. From RFC2597: The drop precedence level of a packet could be assigned, for example, by using a leaky bucket traffic policer, which has as its parameters a rate and a size, which is the sum of two burst values: a committed burst size and an excess burst size. A packet is assigned low drop precedence if the number of tokens in the bucket is greater than the excess burst size [ie bucket is *full*], medium drop precedence if the number of tokens in the bucket is greater than zero, but at most the excess burst size, and high drop precedence if the bucket is empty. Packet mangling to mark DS bits, plus a goodly number of priority bands for the drop precedences) (not sure how to handle the different classes; they might get classful TBF service) Fits nicely with RIO routers: RED with In and Out (or In, Middle, and Out): each traffic "level" is subject to a different drop threshold ============================================================================ LinkState: omit in 2008 Alternative to distance-vector: dv: keep MINIMUM of network topology linkstate: maximum! 4.2.3: Link-state routing and SPF Flooding, SPF Flooding protocol; LSP's lollipop sequence-numbering SPF algorithm (forward search) B / | \ Example: A | D \ | / C A-B: 5, B-C: 3, C-D: 2, A-C: 10, B-D: 11 Build routes from A to D: (P&D do example from D to A) At each step, (a) take ALL nodes reachable in one hop from the newest member of Confirmed, and see if they improve existing routes and if so add to Tentative. (b) Then take the SHORTEST path in Tentative, & move to Confirmed Step Confirmed Tentative 0 (A,0,-) 1a (A,0,-) (B,5,B)**, (C,10,C) 1b (A,0,-),(B,5,B) (C,10,C) 2a (A,0,-),(B,5,B) (C,8,B)** (better), (D,16,B) (new) 2b ...(B,5,B),(C,8,B) (D,16,B) 3a ...(B,5,B),(C,8,B) (D,10,B) (better, assuming D B routes to C) Another example: A---3---B | | 12 2 | | D---4---C Allows precise or TOS-based metrics (TOS=Type of Service) Allows multiple paths time to compute routes: O(N log N) for SPF, O(N^2) for VD link-state still requires precise universal link-cost measurements!