Linux Traffic Control: tc

Hierarchical queuing is available in Linux via the traffic control (tc) command. A range of queuing disciplines are built into the Linux kernel -- some not work-conserving, but they still fit -- and they can be spliced together using the tc command.

Pretty much all the useful qdiscs are for outbound traffic. Here are a few of the more useful; the bold ones are classful.

For the classful qdiscs, a node has a number of child nodes. For the non-classful, there is not.

Each node has a numeric handle, typically followed by a colon (10:). The top of the hierarchy is the root. When adding a new qdisc, you have to specify its parent (or, if it is a new root, you say that).

We will start with a look at the tc command structure. Every tc command is relative to a specific interface, which is eth here. First, these are useful for viewing your qdisc stack:

tc -s qdisc show dev eth
tc -s class show dev eth

You delete the root qdisc (and everything below it) with

tc qdisc del dev eth root

Now let's add three qdiscs, stacked. From the root down, they will be qdisc1, qdisc2, and qdisc3. We will use handles 1:, 2: and 3: here. Right now, only the tc stacking parts of the commands are shown, not the individual qdisc commands.

tc qdisc add dev eth root handle 1: qdisc1 qdisc1-args
tc qdisc add dev eth parent 1: handle 2: qdisc2 qdisc2-args
tc qdisc add dev eth parent 2: handle 3: qdisc3 args

(You can also use change or delete instead of add.)

Suppose we want qdisc2 to be htb, limiting all bandwidth to 20mbit. To do this, we need a class below the htb qdisc, with the given bandwidth limit applied to the class. The class gets a secondary handle; we will use 10 here. So the full class handle is 2:10.

tc qdisc add dev eth parent 1: handle 2: htb default 10
tc class add dev eth parent 2: classid 2:10 htb rate 20mbit

Note only the class gets the rate. Also note that the class gets a classid rather than a handle. The "default 10" in the first line says that by default all traffic should go to class 10, which is defined in the second line.

If we do this, we would need to modify our third qdisc, as it now has parent 2:10 rather than just 2:

tcp qdisc add dev eth parent 2:10 handle 3: qdisc3 qdisc3-args

Here is a typical Mininet configuration to use for testing these:

   h1----+
         |
         r ---- 25 Mbps, 50 ms ---- h3
         |
   h2----+

There are no constraints on any links except the r--h3 link, which is set to a bandwidth of 25 Mbps and a delay of 40 ms one way, 10 ms the other.

Example 1

We will use Linux HTB to set bandwidth on the h1--r or h2--r links, so that two flows arriving at h2 have two different bandwidth limits. Then we will change one of the bandwidth limits while testing is going on.

Data will be received using dualreceive.py on h3. To send data, we will use the C sender command. The full commandline for sender, sending 2000 blocks to 10.0.3.10 port 5430, is:    

    ./sender 2000  10.0.3.10  5430

The easiest way to start two flows is to start them both on h1. The alternative, starting one sender on h1 and one on h2, is only needed if the h1--r and h2--r delays are unequal.

We will typically run htb on r. If we're trying to control traffic through r, we must apply htb to the downstream interface, which for h1->h3 traffic is named r-eth3. Here are the commands necessary to limit h1->h3 UDP traffic to 1 mbit/sec, and h1->h3 TCP traffic to 10 mbit/sec. When specifying rates, mbit is megabit/sec and mbps is megabyte/sec. Also, these units are written adjacent to the numeric value, with no intervening space!

First, set up the htb "root", and a default class (which we eventually do not use, but the point is that traffic goes through the default class until everything else is set up, so traffic is never blocked).

tc qdisc del dev r-eth3 root        # delete any previous htb stuff on r-eth2

tc qdisc add dev r-eth2 root handle 1: htb default 10    # class 10 is the default class

Now we add two classes, one for UDP (class 1) and one for TCP (class 2). At this point you can't tell yet what each class is for; that's in the following paragraph. (Sometimes we would set up a root class, right below the root qdisc but above both the classes below, to enable sharing between them. We're not doing that here, though.)

tc class add dev r-eth2 parent 1: classid 1:1 htb rate 1mbit    # '1mbit' has no space! The classes are 1:1 and (next line) 1:2

tc class add dev r-eth2 parent 1: classid 1:2 htb rate 10mbit

Now we have to create filters that assign traffic to one class or another. The flowid of the filter matches the classid above. We'll assign UDP traffic to classid 1:1, and TCP traffic to classid 1:2, although the tc filter command calls these flowids. The "parent 1:" identifies the root above with handle 1:. The "u32" refers to the so-called u32 classifier; the name comes from unsigned 32-bit. The 0xff is an 8-bit mask.

tc filter add dev r-eth2 protocol ip  parent 1: u32 match ip protocol 0x11 0xff flowid 1:1    # 0x11 = 17, the UDP protocol number in the IP header

tc filter add dev r-eth2 protocol ip  parent 1: u32 match ip protocol 0x6 0xff flowid 1:2        # 0x6 is the TCP protocol number

Now if we start the senders (you'd have to come up with your own UDP sender), we should see UDP traffic getting 1mbit and TCP traffic getting 10mbit. We can also just drop the 1:2 class (and its filter), and have that traffic go to the default. In this case, UDP is limited to 1mbit and TCP is not limited at all.

We can also apply filters to traffic from selected ports or hosts, so different traffic from the same host is treated differently. Here are a couple examples; the first filters by IP source address and the second by TCP destination port number (16 bits, so we need a 16-bit mask 0xffff):

tc filter add dev r-eth2 protocol ip  parent 1: u32 match ip src 10.0.1.10 flowid 1:1

tc filter add dev r-eth2 protocol ip  parent 1: u32 match ip dport 5431 0xffff flowid 1:2

We can also replace "add" by "change" to update the rules. This makes the most sense for the "tc class" statements that assign rates. (We can also use "del" to delete classes.)

Finally, any lower-level qdiscs have to go below 1:1 or 1:2; we cannot, for example, add a netem qdisc to limit the queue size that is directly below 1: (that is, lies below both 1:1 and 1:2 together).

When first experimenting, it may be easiest to type the tc commands into xterm windows on the respective hosts (or at the mininet> prompt preceded by the hostname). However, after you get things working, it may be easier to move the commands into the Mininet python file:

r.cmd('tc qdisc add dev r-eth2 root handle 1: htb default 10')

Commands go in main(), after the r = net['r'].

The argument to r.cmd() is a Python string, so you can put numeric values into it with str.format() (eg 'tc qdisc add dev r-eth2 root handle 1: htb default {}'.format(default_class))

Example 2

This is very similar, but now we also set the delay and the queue capacity. Typically we set a smaller bandwidth on the r--h3 link, and a higher bandwidth on the h1--r and h2--r links, so the r--h3 link is the bottleneck.

The tricky part here is that Mininet often has  already added a queuing discipline attached to r-eth3, so we will have to change it rather than add it. And it is essential to get the classid m:n numbers correct. Use "tc qdisc show" and "tc class show dev r-eth3" to figure these out. But, basically, here's the Mininet hierarchy:

5:      the root HTB qdisc
5:1    the HTB class below the root, specifying rate and ceil and burst
10:    the netem (network emulator) class below HTB that specifies the delay and queue (or "limit")

The queue capacity won't matter here

Verify your delay with the ping command. Note that netem is "classless", so you can't directly set different delays for different traffic classes.

There is a problem here: delay is implemented by holding packets in the queue, so the queue has to be large enough for all the packets delayed. That is inconsistent with using the queue to emulate limited queue capacity. So what we can do instead is this hierarchy:

1:      the root netem qdisc, implementing delay
5:      the HTB qdisc below 1:
5:1    the HTB class below 5:, specifying rate and ceil and burst
10:    the netem (network emulator) class below HTB that specifies the queue (or "limit")

The idea is that all outbound packets enter this structure, and are sent when the root qdisc is ready, and when it requests a packet, all the lower qdiscs are also ready. But packets are physically stored in the leaf queues, so this sometimes leads to ambiguities: if there is a large delay, just where are the delayed packets kept?

The qdisc stack above does appear to work as expected, with some reasonable assumptions about how tc dequeuing works.

Having said all this, one can do all three of queue, rate and delay management within a single layer of netem; see intronetworks.cs.luc.edu/current2/html/mininet.html#link-emulation-in-mininet. As an example, this sets the queue to 50 packets, the (outbound) delay to 100 ms, and the rate to 8 mbit:

tc qdisc add root dev r-eth1 netem rate 8mbit delay 100ms limit 50