NetMgmt Week 13

Network Management Week 13

managing traffic rather than devices
iproute2
tc
BGP
"ip route" v "ip rule"

iproute2

From http://linux-ip.net/html/routing-tables.html, emphasis added:

The multiple routing table system provides a flexible infrastructure on top of which to implement policy routing. By allowing multiple traditional routing tables (keyed primarily to destination address) to be combined with the routing policy database (RPDB) (keyed primarily to source address), the kernel supports a well-known and well-understood interface while simultaneously expanding and extending its routing capabilities. Each routing table still operates in the traditional and expected fashion. Linux simply allows you to choose from a number of routing tables, and to traverse routing tables in a user-definable sequence until a matching route is found.

Here is an example combining iptables and iproute2 that will allow special routing for all packets arriving on interface eth2 (from http://ornellas.apanela.com/dokuwiki/pub:firewall_and_adv_routing):

iptables: this command "marks" all packets arriving on eth2

iptables -t mangle -A PREROUTING -i eth2 -j MARK --set-mark 1

Now we issue this command to create the rule:

ip rule add from all fwmark 0x1 lookup 33

Here is the modified iproute2 ruleset, where a special table 33 has been created (together with rule 32765). See below, Example 1, for more details on how this table should be created.

# ip rule list
0:	from all lookup local 
32765:	from all fwmark 0x1 lookup 33 
32766:	from all lookup main 
32767:	from all lookup default

Note that the rules for iptables don't allow us to check the arriving interface directly, hence the use of iptables, the fwmark, and packet mangling.

Note also that we haven't created table 33 yet (see below).

visit /etc/iproute2:
rt_tables, etc

1. Add an entry to rt_tables:
100 foo
2. ip rule list:
doesn't show it
3. ip rule add fwmark 1 table foo
Now: ip rule list shows

0:    from all lookup local
32765:    from all fwmark 0x1 lookup foo
32766:    from all lookup main
32767:    from all lookup default

The rule number is assigned automatically.
Cleanup: ip rule del fwmark 1
If there's a way to delete by number, I don't know it.

Example 1: simple source routing (http://lartc.org/howto/lartc.rpdb.html), Hubert

Suppose one of my house mates only visits hotmail and wants to pay less. This is fine with me, but you'll end up using the low-end cable modem

fast link: local router is 212.64.94.251
slow link: local end is 212.64.78.148; local_end <--link--> 195.96.98.253
(The 212.64.0.0/16 are the local addresses)
user JOHN has ip addr 10.0.0.10 on the local subnet 10.0.0.0/8

Step 1: Create in /etc/iproute2/rt_tables a line
    200 JOHN_TABLE
This makes JOHN_TABLE a synonym for 200. /etc/iproute2/rt_tables contains a number of <num, tablename> pairs.

Step 2: have John's host use JOHN_TABLE:

# ip rule add from 10.0.0.10 lookup JOHN_TABLE

output of "ip rule list"
0:    from all lookup local
32765:    from 10.0.0.10 lookup JOHN_TABLE
32766:    from all lookup main
32767:    from all lookup default

Step 3: create JOHN_TABLE
main: default outbound route is the fast link
JOHN_TABLE: default = slow link
    ip route add default via 195.96.98.253 dev ppp2 table JOHN_TABLE

This is a standard "unicast" route.
Other options:
    unreachable
    blackhole
    prohibit
    local (route back to this host; cf "local" policyroute table
    broadcast
    throw: terminate table search; go on to next policy-routing table
    nat
    via    (used above)

for table_id in main local default
do
    ip route show table $table_id
done

Note that what we REALLY want is to limit John's bandwidth, even if we have a single outbound link that John shares with everyone. We'll see how to do this with tc/tbf below.

Example 2
from: http://lartc.org/howto/lartc.netfilter.html (Hubert, ch 11)

iptables: allows MARKING packet headers (this is the fwmark).
Marking packets destined for port 25:

# iptables -A PREROUTING -i eth0 -t mangle -p tcp --dport 25 -j MARK --set-mark 1

Let's say that we have multiple connections, one that is fast (and expensive) and one that is slower. We would most certainly like outgoing mail to go via the cheap route.

We've already marked the packets with a '1', we now instruct the routing policy database to act on this:

# echo 201 mail.out >> /etc/iproute2/rt_tables
# ip rule add fwmark 1 table mail.out
# ip rule ls
0:    from all lookup local
32764:    from all fwmark        1 lookup mail.out
32766:    from all lookup main
32767:    from all lookup default

Now we generate a route to the slow but cheap link in the mail.out table:

# ip route add default via 195.96.98.253 dev ppp0 table mail.out

Example 3: special subnet:

Subnet S of site A has its own link to the outside world.
This is easy when this link attachment point (call it RS) is on S: other machines on S just need to be told that RS is their router.

However, what if the topology is like this:

    S----R1---rest of A ----GR----link1---INTERNET
                              \__link2_INTERNET

How does GR (for Gateway Router) route via link1 for most traffic, but link2 for traffic originating in S?
Use matching on source subnet to route via second link!

Again, we're probably going to have to set the fwmark.

LARTC section 4.2 on two outbound links.
PRwL chapter 5.2 is probably better here.

Examples from Policy Routing with Linux

Example 5.2.2: Basic Router Filters

                  coreRouter
                       |
      Actg_Router------+-----EngrRouter----192.168.2.0/24
      172.17.0.0/16    |
                       |
                 corp backbone 10.0.0.0/8

Now we configure so most 10/8 traffic can't enter the side networks.
Accounting: admin denial except for 10.2.3.32/27 and 10.3.2.0/27.
Engineering test network accessible from 10.10.0.0/14

From accounting network - 172.17.0.0/16
Rules for inbound traffic
   10.2.3.32/27   -   full route
    10.3.2.0/27    -   full route
   10/8             -   prohibit    (block everyone else; note longest-match)
   172.16/16      -   prohibit    (explained in more detail in 5.2.1)

From Engineering test network - 192.168.2.0/24
   10.10/14       -   full route       (special subnet granted access)
   10/8           -   blackhole     (zero access to corporate backbone
   172.17/16      -   blackhole     (zero access to accounting)
   172.16/16      -   blackhole

Possible configuration for EngrRouter:

ip addr add 10.254.254.253/32 dev eth0 brd 10.255.255.255
ip route add 10.10/14 scope link proto kernel dev eth0 src 10.254.254.253

ip route add blackhole 10/8
ip route add blackhole 172.17/16
ip route add blackhole 172.16/16

Possible configuration for Actg_Router

ip route add 10.2.3.32/27 scope link proto kernel dev eth0 src 10.254.254.252
ip route add 10.3.2.0/27 scope link proto kernel dev eth0 src 10.254.254.252
ip route add prohibit 10/8
ip route add prohibit 172.16/16
ip route add prohibit 192.168.2/24

Examples from Jason Boxman, A Traffic Control Journey: Real World Scenarios

8.1.1: setting TOS flags for OpenSSH connections. Normal ssh use is interactive, and the TOS settings are thus for Minimize-Delay. However, ssh tunnels are not meant for interactive use, and we probably want to reset the TOS flags to Maximize-Throughput. If we have both tunnels and interactive connections, then without this option our router will probably pretty much bring ssh interactive traffic to a halt while the bulk traffic proceeds.

This is a good example of the --limit and --limit-burst options. Note the -m limit module-loading option before. See also hashlimit in man iptables.

Other examples in this section involve tc, and we'll look at them soon.

Demo of linux1 and linux2

Step 1: start tcp_writer on the host system, tcp_reader on linux2

Step 2: block the traffic with
    iptables --table filter --append FORWARD --destination linux2 --protocol tcp --source-port 5432 --jump DROP

And then unblock with
iptables --table filter --delete FORWARD --destination linux2 --protocol tcp --source-port 5432 --jump DROP
or
iptables --table filter --delete FORWARD 1

Note that putting this rule in the INPUT or OUTPUT chains doesn't help. We could use the POSTROUTING chain, but it doesn't support the FILTER table.

"ip route" v "ip rule"
To tweak a table: "ip route ..."
To administer the routing-policy database (RPDB): "ip rule ..."

The linux iproute and tc packages allow us to manage the traffic rather than the devices. Why do we want to do this??

To tweak a table: "ip route ..."
To administer the routing-policy database (RPDB): "ip rule ..."

RPDB rules look at <srcaddr, dstaddr, in_interface, tos, fw_mark>
These rules can't look at anything else! BUT: the fw_mark field (a "virtual" field not really in the packet) can be set with other things outside of RPDB (like iptables).

    Marking packets destined for port 25:
    table: mangle; chain: PREROUTING

    # iptables -A PREROUTING -i eth0 -t mangle -p tcp --dport 25 -j MARK --set-mark 1

    # echo 201 mail.out >> /etc/iproute2/rt_tables
    # ip rule add fwmark 1 table mail.out    ;; routes on mark set above!

From ip man page:

   1. Priority: 0, Selector: match anything, Action: lookup routing table local (ID 255). The local table is a special routing table containing high priority control routes for local and broadcast addresses.

      Rule 0 is special. It cannot be deleted or overridden.

   2. Priority: 32766, Selector: match anything, Action: lookup routing table main (ID 254). The main table is the normal routing table containing all non-policy routes. This rule may be deleted and/or overridden with other ones by the administrator.

   3. Priority: 32767, Selector: match anything, Action: lookup routing table default (ID 253). The default table is empty. It is reserved for some post-processing if no previous default rules selected the packet. This rule may also be deleted.

Warning: table main is updated by routing-table protocol, RIP/EIGRP/etc
Other tables are not: if a routing change occurs, other tables (and traffic that uses them) may be out of luck.

Traffic shaping and traffic control

Generally there's not much point in doing shaping from the bottleneck link into a faster link. The bottleneck link has done all the shaping already!

fair queuing
        Restricts bandwidth when there is competition, but allows full use when network is idle. Caps share only when link is busy!

token bucket
         Restricts bandwidth to a fixed rate, period (but also allows for burstiness as per the bucket, which can always be small)

tc command:

shaping: output rate limiting (delay or drop)
scheduling: prioritizing. Classic application: sending voip ahead of bulk traffic
policing: input rate regulation
dropping: what gets done to nonconforming traffic

Two scenarios to restrict user/host Joe:
    1. Reduce absolute bandwidth (in/out?) available to Joe. If the link is otherwise idle, Joe is still cut
    2. Guarantee non-Joe users a min share; ie cap Joe's bandwidth only when the link is busy.

qdisc: queuing discipline
You can think of this as the TYPE of queue. Examples: fifo, fifo+taildrop, fifo+randomdrop, fair_queuing, RED, tbf

queuing disciplines are applied to INTERFACES, using the tc command.

Queuing disciplines can be "classless" or "classful" (hierarchical)

Queuing Disciplines (qdisc): does scheduling. Some also support shaping/policing
how packets are enqueued, dequeued

fifo + taildrop
fifo + random drop
RED: introduces random drops not for policing, but to encourage good behavior by senders. Used in core networks, not leaf networks
stochastic fair queuing (each TCP connection is a flow): gives each flow a guaranteed fraction of bandwidth, when needed. Other flavors: flows are subnets, etc. However, if we're doing scheduling of inbound traffic, it doesn't do much good to do sfq based on destination (unless we can do it at the upstream router at our ISP). Some reasons for an application to open multiple TCP connections:

cheating SFQ limits
much improved high-bandwidth performance
the data naturally divides into multiple connections

pfifo_fast (or, generically, pfifo): priority fifo. tc's pfifo_fast has three priority bands built-in.
enqueuing: figure out which band the packet goes into
dequeuing: take from band 0 if nonempty, else 1 if nonempty, else 2

Basic "classless" qdiscs

pfifo_fast

(see man pfifo_fast): three-band FIFO queue

Consider the following iptables command, to set the TOS bits on outbound ssh traffic to "Minimize-Delay":

# iptables -A OUTPUT -t mangle -p tcp --dport 22 -j TOS --set-tos Minimize-Delay

This works with pfifo_fast, which provides three bands. Band selection by default is done using TOS bits of packet header (which you probably have to mangle to set). See Hubert, §9.2.1.1, for a table of the TOS-to-band map.

Dequeuing algorithm (typically invoked whenever the hardware is ready for a packet (or whenever the qdisc reports to the hardware that it has a packet):

    if there are any packets in band 1, dequeue the first one and send it
    else if there are any packets in band 2, dequeue the first one and send it
    else if there are any packets in band 3, dequeue the first one and send it
    else report no packets available

Note that in a very direct sense pfifo_fast does support three "classes" of traffic. However, it is not considered to be classful, since

we cannot control how traffic is classified
we cannot attach subqueues to the individual classes

Example: queue flooding on upload

In practice, it is very important to set interactive traffic to have a higher priority than normal traffic (eg web browsing, file downloads). However, you don't have much control of the downlink traffic, and if the uplink queue is on the ISP's hardware (eg their cablemodem), then you won't have much control of the upload side.

me------<fast>------[broadband gizmo]-----------<slow>-----...internet

In the scenario above, suppose the broadband gizmo, BG, has a queue capacity of 30K (30 packets?). A bulk UPLOAD (though not download) will fill BG's queue, no matter what you do at your end with pfifo_fast. This means that every interactive packet will now wait behind a 30KB queue to get up into the internet. As the upload packets are sent, they will be ACKed and then your machine will replenish BG's queue.

One approach is to reduce BG's queue. But this may not be possible.

Here's another approach:

me---<fast>---[tq_router]---<slow2>---[broadband gizmo]---<slow>--internet

Make sure slow2 ~ slow. Then upload will fill tq_router's queue, but fast traffic can still bypass.

Logically, can have "me" == "tq_router"
_______________________

Token bucket filter (tbf)

See man tbf or man tc-tbf

restrict flow to a set average rate, while allowing bursts. The tbf qdisc slows down excessive bursts to meet filter limits. This is shape-only, no scheduling.

tbf (or hbf) is probably the preferred way of implementing bandwidth caps.

tokens are put into a bucket at a set rate. If a packet arrives:
    tokens available: send immediately and decrement the bucket
    no token: drop (or wait, below)

Over the long term, your transmission rate will equal the token rate.
Over the short term, you can transmit bursts of up to token size.

Parameters:
    bucketsize (burst): how large the bucket can be
    rate: rate at which tokens are put in

    limit: number of bytes that can wait for tokens. All packets wait, in essence; if you set this to zero then the throughput is zero. While theoretically it makes sense to set this to zero, in practice that appears to trigger serious clock-granularity problems.

    latency: express limit in time units

    mpu: minimum packet unit; size of an "empty" packet

Granularity issue: buckets are typically updated every 10 ms.
During 10 ms, on a 100Mbps link, 1MB can accumulate, or ~100 large packets! Generally, tbf introduces no burstiness itself up to 1mbit (a 10 kbit packet takes 10 ms on a 1mbit link). Beyond that, a steady stream of packets may "bunch up" due to the every-10-ms bucket refil.

Use of tbf for bulk traffic into a modem:
# tc qdisc add          \
    dev ppp0        \
    root            \
    tbf             \
    rate 220kbit    \        # if actual bandwidth is 250kbit
    latency 50ms    \
    burst 1540            # one packet
You want packets queued at your end, NOT within your modem!
Otherwise you will have no way to use pfifo_fast to have interactive traffic leapfrog the queue.

What you REALLY want is for TBF to apply only to the bulk bands of PFIFO_FAST.
Can't do this with TBF; that would be classFUL traffic control (though we can do that with PRIO, the classful analogue of pfifo_fast)

Can't be used to limit a subset of the traffic!

TBF is not WORK-CONSERVING:
the link can be idle and someone can have a packet ready to send,
and yet it still waits.

Demo: linux1, linux2, and tc

start tcp_writer in ~/networks/java/tcp_reader. This accepts connections to port 5432, and then sends them data as fast as it can.

tbf1:
tc qdisc add dev eth1 root tbf rate $BW burst $BURST limit $LIMIT

also try:

    tc qdisc change dev eth1 root tbf rate newrate burst newburst limit newlimit
This might cause a spike in kbit/sec numbers:
479
479
159
4294
564
564
...

clear with tc qdisc del dev eth1 root

tc_stats

demo: enable rate 1000 kbit/sec (1 kbit/ms), burst 20kb, limit 100kb. Then try limit = 5kb v 6kb.
At the given rate, 1kb takes 8ms. The bucket is replenished every 10ms (hardware clock limitation), so a burst should not be consumed much during a 10ms interval.

At 10mbit, 10ms is 12kb, and we won't get that rate unless burst is set to about that.

___________________________

Fair Queuing

Note that the FQ clock can be reset to zero whenever all queues are empty, and can in fact just stay at zero until something arrives.

    Linux "sfq":
    Flows are individual tcp connections
        NOT hosts, or subnets, etc! Can't get this?!
    Each flow is hashed by srcaddr,destaddr,port.
    Each bucket is considered a separate input "pseudoqueue"
    Collisions result in unfairness, so the hash function is altered at regular intervals to minimize this.

What we really want is a way to define flow_id's, so they can be created dynamically, and connections can be assigned to flow_id by:

connection
host
subnet
etc

    sfq is schedule-only, no shaping

What you probably want is to apply this to DOWNLOADs.
tc doesn't do that if your linux box is tied directly to your broadband gizmo. Applied to downloads at a router,

    joe,mary,alice---<---1--------[router R]------<---2----internet

it would mean that each of joe,mary,alice's connections would get 1/3 the bw, BUT that would be 1/3 of the internal link, link 1.

Regulating shares of link 2 would have to be done upstream.

If we know that link 2 has a bandwidth of 3 Mbps, we can use CBQ (below) to restruct each of joe,mary,alice to 1 Mbps, by controlling the outbound queue at R into link 1.

Further sharing considerations:

If we divide by 1 flow = 1 tcp connection, joe can double throughput by adding a second tcp connection.

If we divide by 1 flow = 1 host, we do a little better at achieving per-user fairness, assuming 1 user = 1 machine. Linux sfq does not support this.

Linux sfq creates multiple virtual queues. It's important to realize that there is only one physical queue; if one sender dominates that queue by keeping it full, to the point that the other connections get less bandwidth than their FQ share, then the later division into LOGICAL queues won't help the underdogs much.

sfq IS work-conserving

----------------------------

RED

Generally intended for internet routers
We drop packets at random (at a very low rate) when queue capacity reaches a preset limit (eg 50% of max), to signal tcp senders to slow down.

---------------------------

These gizmos are added to interfaces, basically. If you want to slow a particular sender down, create a virtual interface for them or use classful qdiscs.

classful qdiscs

CBQ, HTB, PRIO

Disneyland example: what is the purpose of having one queue feed into another queue?

However, under tc we can have these classful qdiscs form a tree, possibly of considerable depth.

LARTC 9.3: (http://lartc.org/howto/lartc.qdisc.advice.html)

To purely slow down outgoing traffic, use the Token Bucket Filter. Works up to huge bandwidths, if you scale the bucket.
If your link is truly full and you want to make sure that no single session can dominate your outgoing bandwidth, use Stochastical Fairness Queueing.
If you have a big backbone and know what you are doing, consider Random Early Drop (see Advanced chapter).
To 'shape' incoming traffic which you are not forwarding, use the Ingress Policer. Incoming shaping is called 'policing', by the way, not 'shaping'.
If you are forwarding it, use a TBF on the interface you are forwarding the data to. Unless you want to shape traffic that may go out over several interfaces, in which case the only common factor is the incoming interface. In that case use the Ingress Policer.
If you don't want to shape, but only want to see if your interface is so loaded that it has to queue, use the pfifo queue (not pfifo_fast). It lacks internal bands but does account the size of its backlog.
Finally - you can also do "social shaping". You may not always be able to use technology to achieve what you want. Users experience technical constraints as hostile. A kind word may also help with getting your bandwidth to be divided right!

Basic terminology for classful qdiscs
classes form a tree
each leaf node has a class
At each interior node, there is a CLASSIFIER algorithm (a filter) and a set of child class nodes.

Linux: at router input(ingress), we can apply POLICING to drop packets; at egress, we apply SHAPING to put packets in the right queue. Terminology derives from the fact that there is no ingress queue in linux (or most systems).

Classful Queuing Disciplines

CBQ, an acronym for 'class-based queuing', is the best known.
It is not, however, the only classful queuing discipline. And it is rather baroque.

PRIO: divides into classes :1, :2, :3 (user-configurable this time)
dequeuing: take packet from :1 if avail; if not then go on to :2, etc

by default, packets go into the band they would go into in PFIFO_FAST, using TOS bits. But you can use "shapers" to adjust this.

Hierarchy of PRIO queues is equivalent to the "flattened" PRIO queue.
However, a hierarchy of PRIO queues with SFQ/TBF offspring is not "flattenable".

TBF: classic rate-based shaper; packets wait for their token
(policing version: you get dropped if you arrive before your token)

For a classless qdisc, we're done once we create it
(its parent might be nonroot, though).

For a classful qdisc, we add CLASSES to it.
Each class will then in turn have a qdisc added to it.

parent of a class is either a qdisc or a class of same type

class major numbers must match parent.

qdisc major numbers are new

each class needs to have something below it, although every class gets a fifo qdisc by default.

We then attach a sub-qdisc to each subclass.

LARTC example:

Hubert example in 9.5.3.2 has a prio qdisc at the root.
The subclasses 1:1, 1:2, and 1:3 are automatic, as is the filtering.

           1:   root qdisc
         / | \
        / | \
       /   |   \
     1:1 1:2 1:3    classes, added automatically
      |    |    |
     10: 20: 30:    qdiscs    qdiscs
     sfq tbf sfq
band 0    1    2

Bulk traffic will go to 30:, interactive traffic to 20: or 10:.

Command lines for adding prio queue

    # tc qdisc add dev eth0 root handle 1: prio

This automatically creates classes 1:1, 1:2, 1:3. We could say
    tc qdisc add dev eth0 root handle 2: prio bands 5
to get bands 2:1, 2:2, 2:3, 2:4, and 2:5. Then zap with
    tc qdisc del dev eth0 root

But suppose we stick with the three bands, and add:
# tc qdisc add dev eth0 parent 1:1 handle 10: sfq    // prob should be tbf too
# tc qdisc add dev eth0 parent 1:2 handle 20: tbf rate 20kbit buffer 1600 limit 3000
# tc qdisc add dev eth0 parent 1:3 handle 30: sfq

We now get a somewhat more complex example.

Hierarchical Fair Queuing

1. Why it's not flattenable

         / \
        /   \
       /     \
     50       50
     /\       /\
    / \     / \
   25 25   25 25
   A    B   C    D

ABC on, D idle:
Hierarchical divides 25/25/50
Flat divides 33/33/33

2. How to define using fluid flows (as we did with flat fair queuing)

3. Because the slow-clock algorithm (or any finishing-time algorithm) implies that the finishing order of two packets cannot depend on future arrivals, we cannot still use that strategy!

Example: from Bennett & Zhang, Hierarchical packet fair queueing algorithms, IEEE/ACM Transactions on Networking, Oct 1997
2.2

         / \
        /   \
       /     \
     80       20
     /\        |
    / \       |
   75   5      |
   A1   A2     B

All packets have size 1; link speed is 1 (ie 1 packet/unit_time)
T=0: A1's queue is idle; A2's and B's are very full. A2 gets 80%, B gets 20%. Finishing time calculations are such that A2 sends 4, then B sends 1, then A2 sends 4, then B sends 1....

But now let a packet arrive on A1. All of a sudden, A2 should get 5%, or 1/4 the rate of B.
But the finishing-time model can't change, because those calculations don't allow it!

Example 3: from Bennett & Zhang, 3.1

11 users. User 1 has a guarantee of 50%, the others all have 5%. WFQ sends 10 packets for user 1, then 10 in all, one for each of the other users. So-called Generalized Processor Sharing model: 5 of user 1 / 10 of others / 10 of user 1 / ...

difference between WFQ (our algorithm, non-hierarchical) and fluid model

There is active research on algorithms that work for packets, have bounded delay w.r.t. the fluid model, and are fast.

CBQ

The acronym stands for "Class Based Queuing", though there are several other forms of classful queuing disciplines. CBQ is an attempt to combine some elements of Fair Queuing and Token Bucket in a single mechanism.

Goal: classful shaping. But the shaping (rate-limiting) doesn't work the way you'd like, because the rate is ESTIMATED somewhat artificially by measuring average idle time at the device interface.

Example from LARTC [Hubert] 9.5.4.4

Goal:
    webserver traffic     limited to 5 mbit (class 1:3, qdisc 30:)
    smtp traffic              limited to 3 mbit (class 1:4, qdisc 40:)
    combination            limited to 6 mbit

               1:           root qdisc
               |
              1:1           child class
             /   \
            /     \
          1:3     1:4       leaf classes
           |       |
          30:     40:       qdiscs
         (sfq)   (sfq)

Create root:

# tc qdisc add dev eth0 root handle 1:0 cbq bandwidth 100Mbit avpkt 1000 cell 8

create CLASS at root node to limit to 6 mbit
"bounded" (at end) means this class can't borrow from other idle classes.
This caps the rate at 6Mbit

# tc class add dev eth0 parent 1:0 classid 1:1 cbq bandwidth 100Mbit \
rate 6Mbit weight 0.6Mbit prio 8 allot 1514 cell 8 maxburst 20 avpkt 1000 bounded

Now create the two leaf classes, with classids 1:3 and 1:4
These are not bounded (which means they can borrow), and also not isolated, which means they can lend. Classid n:m is our choice, but must have n=1 to match parent definition above.

# tc class add dev eth0 parent 1:1 classid 1:3 cbq bandwidth 100Mbit \
rate 5Mbit weight 0.5Mbit prio 5 allot 1514 cell 8 maxburst 20 avpkt 1000

# tc class add dev eth0 parent 1:1 classid 1:4 cbq bandwidth 100Mbit \
rate 3Mbit weight 0.3Mbit prio 5 allot 1514 cell 8 maxburst 20 avpkt 1000

Both leaf classes have fifo qdisc by default. We could leave it that way, but here's how to replace it with sfq

# tc qdisc add dev eth0 parent 1:3 handle 30: sfq
# tc qdisc add dev eth0 parent 1:4 handle 40: sfq

Now we attach filtering rules, to the root node.
sport 80 (srcport 80) means web traffic.

# tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip sport 80 0xffff flowid 1:3
# tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip sport 25 0xffff flowid 1:4

Note that we use 'tc class add' to CREATE classes within a qdisc, but that we use 'tc qdisc add' to actually add qdiscs to these classes.

Traffic that is not classified by any of the two rules will then be processed within 1:0, and be unlimited.

If SMTP+web together try to exceed the set limit of 6mbit/s, bandwidth will be divided according to the weight parameter, giving 5/8 of traffic to the webserver and 3/8 to the mail server.

Hubert 9.5.4.4: CBQ

               1:           root qdisc, cbq
               |
              1:1           child class, cbq, added by us,
             /   \          with params to limit traffic
            /     \
          1:3     1:4       leaf classes, each cbq (of necessity)
           |       |
          30:     40:       qdiscs
         (sfq)   (sfq)

Note we must add the immediate child manually!

We also add filters to direct the traffic to 1:3 and 1:4 as appropriate:

# tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip sport 80 0xffff flowid 1:3
# tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip sport 25 0xffff flowid 1:4

If traffic is not picked up by a filter, it goes to 1:0.
filter flow-id matches a class-id

How do we have one class represent a subnet, and another class represent "everything else"? Use a default class.
Note that this is specific to the class. cbq doesn't have this.
Boxman, 8.2.1
tc qdisc add dev eth0 root handle 1: htb default 90

Hierarchical token bucket (htb)

This is a classful version of tbf. Note that because we may now have sibling classes, we have an important sharing feature: each sibling class is allocated bandwidth the minimum of what it requests and what its assigned share is; that is, each class is guaranteed a minimum share (like fair queuing). However, the fairness of the division in the absence of traffic from some nodes may be suspect.

Commands to create an htb qdisc with three child classes:
   1. Hosts 10.0.0.1 and 10.0.0.2 go to class :10
   2. subnet 10.0.0.0/29 (10.0.0.0 - 10.0.0.7 except the above two!) goes to class :29
   3. All other traffic goes to class :100

The qdisc is placed on the interior interface of the router, to regulate inbound traffic.

We suppose that this is to control flow over a link that has a sustained bandwidth limit of BW, a bucket size of BURST, and a peak bandwidth of PBW. (PBW is not used here.)
BW=56         #kbps
BURST=350     #mb

tc qdisc add dev eth0 root handle 1: htb default 100
tc class add dev eth0 parent 1: classid 1:1 htb rate ${BW}kbps burst ${BURST}mb

# class 10 is limited by parent only
tc class add dev eth0 parent 1:1 classid 1:10 htb rate ${BW}kbps burst ${BURST}mb

# class 29 has same rate, but half the burst
HBURST=$(expr $BURST / 2)
tc class add dev eth0 parent 1:1 classid 1:29 htb rate ${BW}kbps burst ${HBURST}mb

# class 100 has 3/4 the refill rate, too
BW100=$(expr 3 \* $BW / 4)
tc class add dev eth0 parent 1:1 classid 1:100 htb rate ${BW100}kbps burst ${HBURST}mb

tc filter add dev eth0 parent 1:0 protocol ip u32 \
    match ip dst 10.0.0.1/32 flowid 1:10
tc filter add dev eth0 parent 1:0 protocol ip u32 \
    match ip dst 10.0.0.2/32 flowid 1:10
tc filter add dev eth0 parent 1:0 protocol ip u32 \
    match ip dst 10.0.0.0/29 classid 1:29
;; no rule for flowid 1:100; taken care of by default rule