Week 11 Nov 7 LinkState BGP TCP congestion response RPC?? Alternative to distance-vector: dv: keep MINIMUM of network topology linkstate: maximum! 4.2.3: Link-state routing and SPF Flooding, SPF Flooding protocol; LSP's lollipop sequence-numbering SPF algorithm (forward search) B / | \ Example: A | D \ | / C A-B: 5, B-C: 3, C-D: 2, A-C: 10, B-D: 11 Build routes from A to D: (P&D do example from D to A) At each step, (a) take ALL nodes reachable in one hop from the newest member of Confirmed, and see if they improve existing routes and if so add to Tentative. (b) Then take the SHORTEST path in Tentative, & move to Confirmed Step Confirmed Tentative 0 (A,0,-) 1a (A,0,-) (B,5,B)**, (C,10,C) 1b (A,0,-),(B,5,B) (C,10,C) 2a (A,0,-),(B,5,B) (C,8,B)** (better), (D,16,B) (new) 2b ...(B,5,B),(C,8,B) (D,16,B) 3a ...(B,5,B),(C,8,B) (D,10,B) (better, assuming D B routes to C) Another example: A---3---B | | 12 2 | | D---4---C Allows precise or TOS-based metrics (TOS=Type of Service) Allows multiple paths time to compute routes: O(N log N) for SPF, O(N^2) for VD link-state still requires precise universal link-cost measurements! =================================================================== Why we need external routing: can't compare internal metrics with someone else's. Metrics may be based on: hopcount RTT bandwidth cost congestion One provider's metric may even use larger numbers for *better* routes. An Autonomous System is a domain in which one consistent metric is used; typically administered by a single organization. Between AS's we can't use cost info. Lots of problems come up as a result. BGP basics: how AS's actually talk to each other. Autonomous Systems Routing reduced to finding an AS-path! EGP Predecessor and tree structure configurable for preferences For each destination: receive lots of routes from neighbors; filter INPUT choose route we will use: eliminate AS_PATH loops apply local preference apply MED break ties by choosing routes through fewer ASs, etc decide whether we will advertise that route: filter OUTPUT Rule: we can only advertise routes we actually use! local traffic v transit traffic configurable for supporting transit routing or not ASpath info, and loop avoidance instability MED values ("multi exit discriminator") BGP: important part of network management at ISP level BGP relationships: customer-provider: provider agrees to handle transit for customer customer advertises its own routes only! siblings: often provide mutual backup; not "normal" transit peers: large providers exchanging all customer traffic with each other Every AS exports its OWN routes and OWN customers' routes customers DO NOT export provider/peer routes to providers Providers DO export provider/peer routes to customers (often aggregated) Peers DO NOT export provider/peer routes to each other (Peers (usually) DO NOT provide transit services to third parties.) What if small ISP A connects to providers P1 and P2? A negotiates rules as to what traffic it will send to P1 & what to P2 Then uses BGP to implement route advertisements, route learning *Might* advertise customers to both. If A "learns" of a route from P1 only, then A will use P1 for routing, even if P2 advertises a route too. This illustrates INPUT FILTERING. siblings DO export provider/peer routes to one another ----ISP1---nwu | |link1 | ----ISP2---luc 1. nwu,luc on't export link1: no transit at all 2. Export but have ISP1, ISP2 rank at low preference: used for backup only; ISP1 prefers route to luc through ISP2 3. Have luc have a path to ISP1 via link1; that won't be used unless luc starts to route to ISP1 via link1, eg if ISP2 reports ISP1 is unreachable... No-valley theorem: at most one peer-peer link; LHS are cust->prov or sib->sib links General ideas about routing * we need aggregated routing for table-size efficiency (desperately!) * there is often a "natural" routing hierarchy, eg provider-based * cidr allows us to allocate addresses consistent with the routing hierarchy * routing "hierarchy" is often just an approximation; there are lots of exception cases that are dealt with via extra table entries. * longest-match is to allow moving in the hierarchy without renumbering, and multi-homing (multiple attachments) to the hierarchy. =============================================================================== Chapter 6: congestion Basics of flows taxonomy: 6.1.2 router v host reservation v feedback window v. rate digression on window size Power curves: throughput/delay (they tend to rise in proportion) Fairness; fairness index 6.2 Queuing: [FIFO versus Fair Queuing; tail-drop v random-drop] ================ 6.3: TCP Congestion avoidance self-clocking: sliding windows itself keeps # of packets constant RTTnoload = travel time with no queuing (RTT-RTTnoload) = time spent in queues sws*(RTT-RTTnoload) = packets in queues, usually at one router (the "bottleneck" router, right before slowest link) CongestionWindow: limits amount of data in transit window = #packets in transit + # packets in queues additive increase, multiplicative decrease timeouts as a sign of congestion Reaching equilibrium: slow start last time: double cwnd each RTT This is kind of an oversimplification slow start: for each ACK received, increment cwnd by 1 assuming packets travel together, this means cwnd doubles each RTT Eventually the network gets "full", and drops a packet. Say this is after N RTTs, so cwnd=2^N. Then during the previous RTT, cwnd=2^(N-1) worked fine, so go back to that previous value: set cwnd = cwnd/2. need for further polling for changes in capacity Slow increase *after* equilibrium slow start threshhold, ssthresh. cong. avoidance *only*: on cong., cwnd = curr_window/2 drop back by 1/2; rationale congestion avoidance phase + slow start phase