Network Management

Summer 2016, Corboy 710, TTh 5:30-8:45 pm

Class 1: July 5

Exams, ground rules

Management: the choices we make.

The following is the "official OSI" basic five areas for network management (see also IntroNetworks Network Management and SNMP)

fault detection
configuration
accounting (eg user accounts)
performance
security -- a topic unto itself

Some people add:

maintaining reliability: five 9's (99.999% availability is 5 minutes/year) (reliability is related to fault detection; for example, redundant hardware helps with reliability but only if faults can be detected quickly so "failover" can be initiated promptly)
helpdesk support
compliance monitoring

Sometimes we look at network management as managing the network hardware and software. Lots of traditional network management focuses almost entirely on this. However, we can also talk about managing bandwidth, which ultimately boils down to doing something other than giving everyone (or every connection!) a roughly equal share of what is available.

Fault detection might not seem to be tied directly to our choices, but we do make choices that affect how readily faults are detected. And anyone with the title "Network Manager" is expected to detect and repair problems promptly!

A classic configuration decision is whether a medium-sized network should use Ethernet switching exclusively, or should be divided into subnets so as to make use of IP routing. The rise of Software-Defined Networking has further complicated this choice.

SNMP (Simple Network Management Protocol) is a protocol associated with retrieving network statistics from various "agents". Management is the art of making initial configuration decisions, and then later decisions based on SNMP data and other data to keep everything running smoothly.

(For completeness, the OSI alternative to SNMP is also an option: it is called Common Management Information Protocol, or CMIP. It is decades behind schedule, and so may never be widely supported, but it is possibly a better solution technically.)

Another form of network management is change management. Is your site changing its IP address prefix, due to a provider change? Are you migrating to use of private 10.0.0.0/8 IP addresses, along with Network Address Translation (NAT) to reach the outside world? Are you upgrading from Windows 10 to Xenial Xerus? There is a fair bit of material in chapter 1 of Mauro & Schmidt devoted to the nuts and bolts of change management: administration, testing, support, software distribution, etc. There is also emergency change management, usually initiated by the discovery of malware (and usually, though not always, focused on distribution of service patches or updates).

Other examples of management:

BGP policy-based routing: what can we do with creative routing?
Linux Advanced Routing Toolkit: what tools do we have for bandwidth allocation?

There is some conflict in Network Management world as to whether the main focus is hardware (physical network at your site), or software services (web, servers, etc). Managing bandwidth through allocation is something that many "network managers" do not do at all.

How do you tell when a server is down??? When it's not responding? How long? What if it responds to simple queries, but not complex ones?

Here are four rough sizes of networks:

Building-sized (or campus-sized) single-business networks
Multi-campus networks (eg Loyola's)
long links, sophisticated internal routing
Internet Service providers
very long links, internal & external routing
Data Centers, which may have ~100,000 servers and ~4,000 switches

Layers

   7-layer, 5-layer models
   Physical
   LAN
   Internetwork (IP)
       IP addresses have a Net part and a Host part. The division point is constant per LAN.
   Transport (TCP, UDP)
        Session
        Presentation
   Application

OSI 7-layer model:
    wishful thinking from self-important bureaucrats trying to justify their existence?
Not exactly, but not far off


Comments on Session & Presentation layers
Session: ssh controlmaster connection! But we don't need this as a special layer;
Presentation: ASN.1, BER: these are very important for SNMP!

Some synonyms: packet/frame/PDU/segment/??

Review of network building blocks

Workstations & Servers: endpoints

Software services live on these devices! Also, these devices speak IP (Internet Protocol), and so you might want to collect stats on IP addresses assigned, subnet masks, routers, DNS, etc.

Workstations have a 6-byte physical Ethernet address burned into the card (occasionally there are problems with duplicate addresses; these are rare, but pretty frustrating). On bootup, workstations acquire a 4-byte IP address, usually via DHCP but occasionally by static configuration. They also acquire, at a minimum,

a subnet mask, which defines how the IP address splits into the net portion and host portion
a preferred router
a DNS server, to translate, say, "ulam2.cs.luc.edu" to 147.126.65.47

The way DHCP works is that clients broadcast a DHCP query that contains their physical address; the DHCP server on the same subnet answers it. (Actually, usually the local-subnet router plays a role as a "forwarder" to the real DHCP server, typically not on the same subnet). The DHCP response includes the assigned IP address as well as the information above, and sometimes a lot more information as well.

A subnet is defined as all hosts with a common IP net address, as determined by the subnet mask. Two nodes with the same IP net address reach each other directly, by sending to each others physical Ethernet address (as discovered by the ARP protocol). Two nodes on different subnets send to each other via routers.

Note that in order for the network to work, we need

routers
DHCP servers
DNS servers

Repeaters/Hubs (or linear coax-based Ethernet)

Brief view of Ethernet packet format:

    6 bytes destination address
    6 bytes source address
    2 bytes type (eg IP, IPX, ARP)
    Data

Linear coax had nothing to fail, except the cable itself. You noticed a fault when you couldn't reach the other end. Repeaters in some sense are simply an active replacement for coax; they retransmit the arriving bits on all other interfaces, as they arrive; collisions are passed on. Some repeaters do speak SNMP; they can report onthe following:

collision rates
per-host traffic
per-host/per-destination traffic
total available bandwidth consumed
Ethernet errors: packets too small, too large, insufficient gap, corrupted packets
Hardware errors within the repeater itself: interface errors, dropped packets, temperature, OS faults, etc

Bridges/Switches

These devices shield segments from collisions. They construct tables of the form ⟨dest,interface⟩.

If a packet arrives for destination D, and there's an entry for ⟨D,i⟩, then the packet is forwarded only on interface i; otherwise, it is forwarded on all interfaces except for the arrival interface (that is, broadcast).

If a packet arrives on interface i from origin D, then ⟨D,i⟩ is inserted into the table.

Thus, initially all packets are broadcast, but quickly the bridge builds its table to route packets more efficiently, and soon each packet takes only the direct path to its destination.

Switches read in full packets; that is, each interface is a full Ethernet interface. Thus, there is a full set of Ethernet data for each interface. Additionally, most switches are capable of sophisticated configuration, in which certain sets of ports (interfaces) are linked together into virtual networks. Switch ports may not all run at the same speed (eg there may be a mix of 100mbps Ethernet and gigabit Ethernet); the switch's statistics can be used to help decide whether you're using the different port capabilities optimally. Finally, switches may be able to report information about the size of the forwarding tables and how many non-b'cast packets arrive for which the destination is not found in the table (forwarding errors).

Routers

IP routers work like switches, except that traffic is forwarded from one IP network to another only by arrangement. There is no analogue to "learning switches".

Routers do have IP addresses. They have information on rate of packets routed, rate of routing-table modifications, etc.

Here's an important router question. What if I bring my home laptop into work, and plug it into my office computer jack? Will this be detected? If so, how? The DHCP server on the network might notice that it has handed out an IP address to a physical address never before seen, but I could bypass this by configuring my home laptop to use my office machine's IP address. At that point, the router might notice that my Ethernet address is different. Will it actually catch this? How can it report some statistics that would let management notice what is going on? Can routers be configured so as to attempt to prevent this? (Many high-end wireless routers do attempt to block any traffic from wifi physical addresses that haven't been authorized.)

Switches are considered "Layer 2" in the 7-layer and 5-layer models; routers are "Layer 3". Sometimes one speaks of "layer-2 switching" versus "layer-3 switching".

A typical configuration decision is whether to have your site be one giant subnet, where switched Ethernet is used to route packets from one workstation to another, or whether to subdivide internally (eg by floor, or department, or building) into IP subnets. Routers would then be needed to move traffic from one subnet to the other. Routers serve to limit the scope of broadcast traffic (such as ARP and DHCP requests). Routers are smarter and more flexible, able to implement internal firewalls and other traffic restrictions. However, routers are also slower, formerly an order of magnitude slower.

In the lecture I'm working on how basic IP routing works, and how it works with subnets. Here are some references to IntroNetworks:

Proxy Transport

At many sites, connection to the web is made not by direct connection to remote webservers on port 80, but by connecting to a proxy server at your site, which in turn makes the actual connections. The proxy server is thus able to filter out some malicious material, and also can cache sites for better bandwidth utilization. Proxy servers can be transparent, like at Loyola, where you appear to be connecting directly to the remote server's port 80 but in fact your connection has been intercepted, or else explicit, in which case the address and port of the proxy server has to be configured in your browser.

Concept of NMS: Network Management System
We will look (some) at OpenNMS; see opennms.org.

Agents: every device on the network that reports to the NMS is called an agent. Agents can report via SNMP (below) or via some other mechanism.

The management station, or manager, is the node to which agents report, either directly or indirectly. Indirect reporting means that there is a "submanager" out there, collecting data from a pool of agents and forwarding it up to the master manager.

Agent reporting may be initiated by the agent or, more commonly, by the manager, through polling.

Some sort of PROTOCOL is used. Most common is SNMP, although application software is often polled by "direct contact"; eg, we can verify that a server is successfully running SMTP (email) by connecting to port 25 and verifying that we see the expected responses. At some point we will look at some of the java applets used by OpenNMS to attempt to contact various servers to verify that services are running appropriately.

The following SNMP data is stored by the manager (possibly in a distributed fashion):
       MIB (Mgmt Information Base): the table of attribute names and "lookup keys"
       MDB (mgmt database): actual data values

An NMS constantly monitors devices for function, operation, and configuration, and reports problems in real time. The NMS can answer questions about:

Fault Management / Reliability
Help-Desk management
Configuration Management
Security
Performance
   Compliance
       Is everyone running WinXP? Is everyone running the company version?
         Does everyone have Service Update 09-31804 installed?
         Is anyone plugging in devices that IT doesn't know about?
Accounting

Brief intro to SNMP

SNMP, for Simple Network Monitor Protocol, is a way to get information from each node on your network. Each device must run an SNMP "agent" module; for example, workstations must run an SNMP software package in order to respond. SNMP can be used readonly to poll the agents and retrieve data, or in readwrite mode to update and configure the devices via their agents.

SNMP started as SGNP: Simple Gateway Monitoring Protocol, in 1987 ("gateway" is an old term for "router"). It conflicted with the OSI approach known as CMIP (Common Management Information Protocol). At the time CMIP was too large and complex for practical implementation.

In 1988 the Internet Activities Board decided to pursue both SGMP and CMOT: CMIP over TCP/IP. This failed within a year: CMOT was dropped and SGMP had evolved into SNMPv1.

Perhaps the first issue for SNMP is how are we going to NAME all the possible attributes? Remember that many devices will have manufacturer-specific attributes

One important manufacturer-specific attribute is the Device Temperature.

SNMP defines an enormous tree-structured naming hierarchy, using strings of digits known as Object IDentifiers, or OIDs. A diagram appears in Mauro & Schmidt, page 24. Here are some upper levels:

1   iso
3   standard
6   dod
1   internet
2   mgmt       4: private
1   mib-2

Thus, the prefix 1.3.6.1.2.1 is would be the OID prefix for the mib-2 data; mib-2 was an early standardization of the SNMP data that would "usually" be available. The prefix 1.3.6.1.4.1 is for "private", or manufacturer-specific, data.

Here are some of the next mib-2 levels; we will use "mib2" to represent "1.3.6.1.2.1"; thus mib2.5 denotes "1.3.6.1.2.1.5"
   mib2.1   system
   mib2.2   interfaces
   mib2.3   arp
   mib2.4   ip
   mib2.5   icmp
   mib2.6   tcp
   mib2.7   udp
   mib2.8   egp (obsolete)
   mib2.9   unimplemented [?]
   mib2.10   unimplemented [?]
   mib2.11   snmp server
   mib2.25   host resources
There are more.

SET GET GET-NEXT, response, TRAP
atomic values only! Note use of GET-NEXT
The "base" MIB is MIB-2

Issues:
   data presentation (eg byte order, but much more)
   NAMING for all those possible attributes!

ASN.1/BER data representation: defer

data can be subdivided into fields, though it is not for SNMP.

A MIB is an assignment to each of a set of OIDs a specific attribute name and type. (MIBs also define tabular data forms.) The OIDs name the general attributes, not a specific instance. In that sense, OIDs are like Java class definitions, not class instances.

Questions:

given an OID, how do we find a MIB file that defines it?
given a piece of hardware, how do we find a MIB that defines its SNMP responses?

The first case corresponds to our seeing 1.3.6.1.2.1.1.9 in the output of the system snmp walk; we did not, however, know how to interpret the responses.

The second case is probably more common: you have a new switch, and need to find out what kinds of SNMP data it submits in the private (1.3.6.1.4.1) subtree.

If we run a MIB browser such as iReasoning, we can see the OIDs. Sometimes googling for the OID will turn something up. Sometimes searching the mib files for, say, the string "system 9" to figure out the OIDs of form system.9, will find what we want.

Demos using iReasoning tool and snmpwalk

We will use host ulam3 (10.38.2.42) for these demos

(/etc/default/snmpd by default binds snmpd only to localhost!)

   snmpwalk -v 2c -c public ulam3 .1.3.6.1.2.1.1

    snmpwalk -v 2c -c public ulam3 1.3.6.1.4.1
    End of MIB

    snmpwalk -v 2c -c tengwar ulam3 1.3.6.1.4.1
    gads of data

    snmpwalk -v 1 -c tengwar ulam3 1.3.6.1.4.1.42
    gads of data

As of 1 July 2016, the ulam3 SNMP community strings are "public", "futhark" and "tengwar".

You can put .1.3.6.1.4.1.42 into the upper-right box of the iReason tool [at least for ulam3]

Other ways of polling devices:

   ssh: limitations: lack of "universal" account
              lack of "limited" account
              doesn't work for most hubs/switches/non-hosts