Network Management

Summer 2017, Corboy 205, TTh 5:30-8:45 pm

Class 1: July 6

Exams, ground rules

Managing:

Device (hardware)
Server software
Links
Traffic

Management: the choices we make.

The following is the "official OSI" basic five areas for network management (see also IntroNetworks Network Management and SNMP)

fault detection
configuration
accounting (eg user accounts)
performance
security -- a topic unto itself

Some people add:

maintaining reliability: five 9's (99.999% availability is 5 minutes/year) (reliability is related to fault detection; for example, redundant hardware helps with reliability but only if faults can be detected quickly so "failover" can be initiated promptly)
helpdesk support
compliance monitoring

Sometimes we look at network management as managing the network hardware and software. Lots of traditional network management focuses almost entirely on this. However, we can also talk about managing bandwidth, which ultimately boils down to doing something other than giving everyone (or every connection!) a roughly equal share of what is available.

Fault detection might not seem to be tied directly to our choices, but we do make choices that affect how readily faults are detected. And anyone with the title "Network Manager" is expected to detect and repair problems promptly!

A classic configuration decision is whether a medium-sized network should use Ethernet switching exclusively, or should be divided into subnets so as to make use of IP routing. The rise of Software-Defined Networking has further complicated this choice.

SNMP (Simple Network Management Protocol) is a protocol associated with retrieving network statistics from various "agents". Management is the art of making initial configuration decisions, and then later decisions based on SNMP data and other data to keep everything running smoothly.

(For completeness, the OSI alternative to SNMP is also an option: it is called Common Management Information Protocol, or CMIP. It is decades behind schedule, and so may never be widely supported, but it is possibly a better solution technically.)

Another form of network management is change management. Is your site changing its IP address prefix, due to a provider change? Are you migrating to use of private 10.0.0.0/8 IP addresses, along with Network Address Translation (NAT) to reach the outside world? Are you upgrading from Windows 10 to Xenial Xerus? There is a fair bit of material in chapter 1 of Mauro & Schmidt devoted to the nuts and bolts of change management: administration, testing, support, software distribution, etc. There is also emergency change management, usually initiated by the discovery of malware (and usually, though not always, focused on distribution of service patches or updates).

Other examples of management:

BGP policy-based routing: what can we do with creative routing?
Linux Advanced Routing Toolkit: what tools do we have for bandwidth allocation?

There is some conflict in Network Management world as to whether the main focus is hardware (physical network at your site), or software services (web, servers, etc). Managing bandwidth through allocation is something that many "network managers" do not do at all.

How do you tell when a server is down??? When it's not responding? How long? What if it responds to simple queries, but not complex ones?

Here are four rough sizes of networks:

Building-sized (or campus-sized) single-business networks
Multi-campus networks (eg Loyola's)
long links, sophisticated internal routing
Internet Service providers
very long links, internal & external routing
Data Centers, which may have ~100,000 servers and ~4,000 switches

Layers

   7-layer, 5-layer models
   Physical
   LAN
   Internetwork (IP)
       IPv4 addresses have a Net part and a Host part. The division point is constant per LAN.
   Transport (TCP, UDP)
            ports
        Session
        Presentation
   Application

OSI 7-layer model:
    wishful thinking from self-important bureaucrats trying to justify their existence?
Not exactly, but not far off


Comments on Session & Presentation layers
Session: ssh controlmaster connection! But we don't need this as a special layer;
Presentation: ASN.1, BER: these are very important for SNMP!

Some synonyms: packet/frame/PDU/segment/??

Review of network building blocks

Workstations & Servers: endpoints

Software services live on these devices! Also, these devices speak IP (Internet Protocol), and so you might want to collect stats on IP addresses assigned, subnet masks, routers, DNS, etc.

Workstations have a 6-byte physical Ethernet address burned into the card (occasionally there are problems with duplicate addresses; these are rare, but pretty frustrating). On bootup, workstations acquire a 4-byte IP address, usually via DHCP but occasionally by static configuration. They also acquire, at a minimum,

a subnet mask, which defines how the IP address splits into the net portion and host portion
a preferred router
a DNS server, to translate, say, "ulam.cs.luc.edu" to 147.126.65.147

The way DHCP works is that clients broadcast a DHCP query that contains their physical address; the DHCP server on the same subnet answers it. (Actually, usually the local-subnet router plays a role as a "forwarder" to the real DHCP server, typically not on the same subnet). The DHCP response includes the assigned IP address as well as the information above, and sometimes a lot more information as well.

A subnet is defined as all hosts with a common IP net address, as determined by the subnet mask. Two nodes with the same IP net address reach each other directly, by sending to each others physical Ethernet address (as discovered by the ARP protocol). Two nodes on different subnets send to each other via routers.

Note that in order for the network to work, we need

routers
DHCP servers
DNS servers

Repeaters/Hubs

Brief view of Ethernet packet format:

    6 bytes destination address
    6 bytes source address
    2 bytes type (eg IP, IPX, ARP)
    Data

Linear coax had nothing to fail, except the cable itself. You noticed a fault when you couldn't reach the other end. Repeaters in some sense are simply an active replacement for coax; they retransmit the arriving bits on all other interfaces, as they arrive; collisions are passed on. Some repeaters do speak SNMP; they can report on the following:

collision rates
per-host traffic
per-host/per-destination traffic
total available bandwidth consumed
Ethernet errors: packets too small, too large, insufficient gap, corrupted packets
Hardware errors within the repeater itself: interface errors, dropped packets, temperature, OS faults, etc

Hubs are simply multi-way repeaters.

Bridges/Switches

These devices shield segments from collisions. The underlying topology must be free of any loops (perhaps after application of the spanning-tree algorithm). Classic switches learn forwarding tables:

If a packet arrives for destination D, and there's an entry for ⟨D,i⟩, then the packet is forwarded only on interface i; otherwise, it is forwarded on all interfaces except for the arrival interface (that is, broadcast).

If a packet arrives on interface i from origin D, then ⟨D,i⟩ is inserted into the table.

Thus, initially all packets are broadcast, but quickly the bridge builds its table to route packets more efficiently, and soon each packet takes only the direct path to its destination.

Switches read in full packets; that is, each interface is a full Ethernet interface. Thus, there is a full set of Ethernet data for each interface. Additionally, most switches are capable of sophisticated configuration, in which certain sets of ports (interfaces) are linked together into virtual networks. Switch ports may not all run at the same speed (eg there may be a mix of 100mbps Ethernet and gigabit Ethernet); the switch's statistics can be used to help decide whether you're using the different port capabilities optimally. Finally, switches may be able to report information about the size of the forwarding tables and how many non-b'cast packets arrive for which the destination is not found in the table (forwarding errors).

Spanning Tree Algorithm

Let's give the switches ID numbers. They all send out special packets. The lowest-numbered switch becomes the root node. The rest of the switches examine the messages looking for

The shortest path to the root
If there are two equal-length paths, the one that starts with the neighbor with lower ID number.

Read intronetworks.cs.luc.edu/current/html/ethernet.html#spanning-tree-algorithm-and-redundancy.

Routers

IP routers work like switches, except that traffic is forwarded from one IP network to another only by arrangement. There is no analogue to "learning switches". Router topology can be arbitrary; this is important.

Routers, unlike switches, must have IP addresses to work. They have information on rate of packets routed, rate of routing-table modifications, etc.

Here's an important router question. What if I bring my home laptop into work, and plug it into my office computer jack? Will this be detected? If so, how? The DHCP server on the network might notice that it has handed out an IP address to a physical address never before seen, but I could bypass this by configuring my home laptop to use my office machine's IP address. At that point, the router might notice that my Ethernet address is different. Will it actually catch this? How can it report some statistics that would let management notice what is going on? Can routers be configured so as to attempt to prevent this? (Many high-end wireless routers do attempt to block any traffic from Wi-Fi physical addresses that haven't been authorized.)

Switches are considered "Layer 2" in the 7-layer and 5-layer models; routers are "Layer 3". Sometimes one speaks of "layer-2 switching" versus "layer-3 switching".

A typical configuration decision is whether to have your site be one giant subnet, where switched Ethernet is used to route packets from one workstation to another, or whether to subdivide internally (eg by floor, or department, or building) into IP subnets. Routers would then be needed to move traffic from one subnet to the other. Routers serve to limit the scope of broadcast traffic (such as ARP and DHCP requests). Routers are smarter and more flexible, able to implement internal firewalls and other traffic restrictions. However, routers are also slower, formerly an order of magnitude slower.

Routers are often pressed into service as firewalls; that is, the router does some kind of "packet inspection" and blocks packets that don't meet the rules. The inspection might be as simple as blocking selected TCP (or UDP) ports.

Here are some references to IntroNetworks:

Overview of distance-vector route-discovery.

Could we implement DV on an Ethernet?

Proxy Transport

At many sites, connection to the web is made not by direct connection to remote webservers on port 80, but by connecting to a proxy server at your site, which in turn makes the actual connections. The proxy server is thus able to filter out some malicious material, and also can cache sites for better bandwidth utilization. Proxy servers can be transparent, where you appear to be connecting directly to the remote server's port 80 but in fact your connection has been intercepted, or else explicit, in which case the address and port of the proxy server has to be configured in your browser.

Concept of NMS: Network Management System
We will look (some) at OpenNMS; see opennms.org.

Agents: every device on the network that reports to the NMS is called an agent. Agents can report via SNMP (below) or via some other mechanism.

The management station, or manager, is the node to which agents report, either directly or indirectly. Indirect reporting means that there is a "submanager" out there, collecting data from a pool of agents and forwarding it up to the master manager.

Agent reporting may be initiated by the agent or, more commonly, by the manager, through polling.

Some sort of PROTOCOL is used. Most common is SNMP, although application software is often polled by "direct contact"; eg, we can verify that a server is successfully running SMTP (email) by connecting to port 25 and verifying that we see the expected responses. At some point we will look at some of the java applets used by OpenNMS to attempt to contact various servers to verify that services are running appropriately.

The following SNMP data is stored by the manager (possibly in a distributed fashion):
       MIB (Mgmt Information Base): the table of attribute names and "lookup keys"
       MDB (mgmt database): actual data values

An NMS constantly monitors devices for function, operation, and configuration, and reports problems in real time. The NMS can answer questions about:

Fault Management / Reliability
Help-Desk management
Configuration Management
Security
Performance
   Compliance
       Is everyone running WinXP? Is everyone running the company version?
         Does everyone have Service Update 09-31804 installed?
         Is anyone plugging in devices that IT doesn't know about?
Accounting

Mininet

Simple example of TCP traffic:
h1---s1---h2
h1---r1---h2

Monitoring pings
Monitoring a TCP connection with netcat