Comp 343/443 Week 10: Oct 29 RPC Remote Procedure Calls (RPC) goals for RPC: lookup, grid computing, Sun network file sharing (NFS) can we just use TCP? Nature of request-reply semantics At-least-once semantics, idempotency, and statelessness (SunRPC) client reboot v. server reboot Timeouts XDR (eXternal Data Representation) (omitted) 6.3: BLAST, CHAN, SELECT Think in terms of grid computing Why BLAST has selective ACKs How CHAN implements ACKs; serialization of CHAN How CHAN deals with reboots, lost data: Limitations of having REQ[N+1] implicitly acknowledge REPLY[N] CID: channel ID: at most one req outstanding per channel consequences if processing is slow MID: message ID: messages are numbered serially used as ack field, more or less BID: Boot ID: incremented each time system is booted client reboot server reboot Retransmit timer value T/TCP: (a TCP alternative to RPC) Implications of final ACK TIMEWAIT issues: old segments, lost final ACK. On end, connection goes into TIMEWAIT for 8*RTO Why this time? T/TCP: add new CCOUNT fields Allow SYN+DATA when CCOUNT is new; etc. Connection may be reopened by client *within* this time, if a new CCOUNT is used Serialization issues in RPC (CHAN is *synchronous*) (omit) SunRPC NFS; implications of statelessness NFS stateful operations: probably omit rm, mkdir and server duplicate request cache file locking - server maintains locks, queries clients if it crashes/recovers, keeps list of clients in file NFS v. Unix semantics for deleting open files client-side fix of open-file-deletion problem =============================================================================== 4.3.2 CIDR table explosion running out of Class B addresses too many Class C's for tables running out of IP address space basic strategy new problem: address comes in, but masks only exist in the table; packets don't carry masks with them. How does lookup work? Answer: Theoretical algorithm: given a dest A, and table entries , search for i such that A & M[i] = D[i] Or, in terms of # of bits, where D[i] has N[i] network bits, A == D[i] to first N[i] bits Problem: possible multiple matches, and responsibility for avoiding this is *much* too distributed to be feasible. longest-match rule policy v mechanism: cidr is an address-block-allocation *mechanism* how provider-based routing might work review longest-match *mechanism* What *policies* do we want to implement with it? NSFnet-model: NSFnet was the backbone; providers formed a tree below it. But IP addresses are still handed out by IANA directly to organizations. (IANA = Internet Assigned Numbers Association) Application 1: CIDR allows IANA to allocate blocks of Class C Application 2: CIDR allows huge provider blocks, suballocation by provider Providers P0(A,B,C), P1(D,E), P2(F,G) each with customers shown in parentheses: how provider-based address allocation helps Routing model: route to provider, then to customer This CHANGES things, subtly; we're no longer looking for the optimum path (at least once the NSFnet routing model broke down). ======================= CIDR and staying out of jail (allowing change of provider) Providers P0(A,B,C), P1(D,E), P2(F,G) each with customers shown routing tables assuming each customer gets an address from its provider's block how longest-match allows customers to move without renumbering hidden cost of such moves Providers P0(B,C), P1(A,D,E), P2(F,G) each with customers shown (A has moved from P0 to P1) but now we have addrs unrelated to provider, and so A needs to be entered in every table! /------\ Consider P0---P1---P2 versus P0___P1___P2 router pseudo-hierarchy v. addr-alloc true hierarchy Don't have to agree, but there is a cost for disagreement What if B adds a link to P1, in addition to link to A? How CIDR allows provider-based and geography-based routing provider-based addresses Problems: route asymmetries inefficient routes (send to closest link to dest provider?) A | P1: r1--------r2----+---R3 | | | | | | | | | P2: s1--+-----s2--------s3 | B BGP "MED" value (not defined) allows server providers to carry the server's *outbound* traffic! renumbering: threat or menace? [DHCP, NAT] Locators v. EID changing IP addrs midstream geographical addresses Problems: inefficient paths between close sites. large sites Real issue with geographical routing: who carries the traffic? Provider-based: *business* model jibes with routing model!! New routing picture: destinations are networks, still, but some are organizations and some are major providers, with intermediate nets in between. Sometimes we might CHOOSE whether to view a large net as one unit, or to view it as separate medium-sized subunits (for the sake of visualization, assume the subunits have some geographical nature, or other attribute so that we can treat them as separate destinations. Tradeoff: consolidation => more compact routing table individual subentries => more optimal route selection 2-step routing: when does it NOT find optimal routes? =============================================================================== LinkState BGP TCP congestion response RPC?? Alternative to distance-vector: dv: keep MINIMUM of network topology linkstate: maximum! 4.2.3: Link-state routing and SPF Flooding, SPF Flooding protocol; LSP's lollipop sequence-numbering SPF algorithm (forward search) B / | \ Example: A | D \ | / C A-B: 5, B-C: 3, C-D: 2, A-C: 10, B-D: 11 Build routes from A to D: (P&D do example from D to A) At each step, (a) take ALL nodes reachable in one hop from the newest member of Confirmed, and see if they improve existing routes and if so add to Tentative. (b) Then take the SHORTEST path in Tentative, & move to Confirmed Step Confirmed Tentative 0 (A,0,-) 1a (A,0,-) (B,5,B)**, (C,10,C) 1b (A,0,-),(B,5,B) (C,10,C) 2a (A,0,-),(B,5,B) (C,8,B)** (better), (D,16,B) (new) 2b ...(B,5,B),(C,8,B) (D,16,B) 3a ...(B,5,B),(C,8,B) (D,10,B) (better, assuming D B routes to C) Another example: A---3---B | | 12 2 | | D---4---C Allows precise or TOS-based metrics (TOS=Type of Service) Allows multiple paths time to compute routes: O(N log N) for SPF, O(N^2) for VD link-state still requires precise universal link-cost measurements! =================================================================== Why we need external routing: can't compare internal metrics with someone else's. Metrics may be based on: hopcount RTT bandwidth cost congestion One provider's metric may even use larger numbers for *better* routes. An Autonomous System is a domain in which one consistent metric is used; typically administered by a single organization. Between AS's we can't use cost info. Lots of problems come up as a result. BGP basics: how AS's actually talk to each other. Autonomous Systems Routing reduced to finding an AS-path! EGP Predecessor and tree structure configurable for preferences For each destination: receive lots of routes from neighbors; filter INPUT choose route we will use: eliminate AS_PATH loops apply local preference apply MED break ties by choosing routes through fewer ASs, etc decide whether we will advertise that route: filter OUTPUT Rule: we can only advertise routes we actually use! local traffic v transit traffic configurable for supporting transit routing or not ASpath info, and loop avoidance instability MED values ("multi exit discriminator") BGP: important part of network management at ISP level BGP relationships: customer-provider: provider agrees to handle transit for customer customer advertises its own routes only! siblings: often provide mutual backup; not "normal" transit peers: large providers exchanging all customer traffic with each other Every AS exports its OWN routes and OWN customers' routes customers DO NOT export provider/peer routes to providers Providers DO export provider/peer routes to customers (often aggregated) Peers DO NOT export provider/peer routes to each other (Peers (usually) DO NOT provide transit services to third parties.) What if small ISP A connects to providers P1 and P2? A negotiates rules as to what traffic it will send to P1 & what to P2 Then uses BGP to implement route advertisements, route learning *Might* advertise customers to both. If A "learns" of a route from P1 only, then A will use P1 for routing, even if P2 advertises a route too. This illustrates INPUT FILTERING. siblings DO export provider/peer routes to one another ----ISP1---nwu | |link1 | ----ISP2---luc 1. nwu,luc on't export link1: no transit at all 2. Export but have ISP1, ISP2 rank at low preference: used for backup only; ISP1 prefers route to luc through ISP2 3. Have luc have a path to ISP1 via link1; that won't be used unless luc starts to route to ISP1 via link1, eg if ISP2 reports ISP1 is unreachable... No-valley theorem: at most one peer-peer link; LHS are cust->prov or sib->sib links General ideas about routing * we need aggregated routing for table-size efficiency (desperately!) * there is often a "natural" routing hierarchy, eg provider-based * cidr allows us to allocate addresses consistent with the routing hierarchy * routing "hierarchy" is often just an approximation; there are lots of exception cases that are dealt with via extra table entries. * longest-match is to allow moving in the hierarchy without renumbering, and multi-homing (multiple attachments) to the hierarchy.