Network Management Week 13
managing traffic rather than devices
iproute2
tc
BGP
"ip route" v "ip rule"
iproute2
From http://linux-ip.net/html/routing-tables.html, emphasis added:
The multiple routing table system provides a flexible infrastructure on
top of which to implement policy routing. By allowing multiple
traditional routing tables (keyed primarily to destination address)
to be combined with the
routing policy database (RPDB) (keyed primarily to source address),
the
kernel supports a well-known and well-understood interface while
simultaneously expanding and extending its routing capabilities.
Each routing table still operates in the traditional and expected
fashion. Linux simply allows you to choose from a
number of routing tables, and to traverse routing tables in a
user-definable sequence until a matching route is found.
Here is an example combining iptables and iproute2 that will
allow special routing for all packets arriving on interface eth2 (from http://ornellas.apanela.com/dokuwiki/pub:firewall_and_adv_routing):
iptables: this command "marks" all packets arriving on eth2
iptables -t mangle -A PREROUTING -i eth2 -j MARK --set-mark 1
Now we issue this command to create the rule:
ip rule add from all fwmark 0x1 lookup 33
Here is the modified iproute2 ruleset, where a special table 33 has been created (together with rule 32765). See below, Example 1, for more details on how this table should be created.
# ip rule list
0: from all lookup local
32765: from all fwmark 0x1 lookup 33
32766: from all lookup main
32767: from all lookup default
Note that the rules for iptables don't allow us to check the arriving interface directly, hence
the use of iptables, the fwmark, and packet mangling.
Note also that we haven't created table 33 yet (see below).
visit /etc/iproute2:
rt_tables, etc
1. Add an entry to rt_tables:
100 foo
2. ip rule list:
doesn't show it
3. ip rule add fwmark 1 table foo
Now: ip rule list shows
0: from all lookup local
32765: from all fwmark 0x1 lookup foo
32766: from all lookup main
32767: from all lookup default
The rule number is assigned automatically.
Cleanup: ip rule del fwmark 1
If there's a way to delete by number, I don't know it.
Example 1: simple source routing (http://lartc.org/howto/lartc.rpdb.html), Hubert
Suppose one of my house mates only
visits hotmail and wants to pay less. This is fine with me, but you'll
end up using the low-end cable modem
fast link: local router is 212.64.94.251
slow link: local end is 212.64.78.148; local_end <--link--> 195.96.98.253
(The 212.64.0.0/16 are the local addresses)
user JOHN has ip addr 10.0.0.10 on the local subnet 10.0.0.0/8
Step 1: Create in /etc/iproute2/rt_tables a line
200 JOHN_TABLE
This makes JOHN_TABLE a synonym for 200. /etc/iproute2/rt_tables contains a number of <num, tablename> pairs.
Step 2: have John's host use JOHN_TABLE:
# ip rule add from 10.0.0.10 lookup JOHN_TABLE
output of "ip rule list"
0: from all lookup local
32765: from 10.0.0.10 lookup JOHN_TABLE
32766: from all lookup main
32767: from all lookup default
Step 3: create JOHN_TABLE
main: default outbound route is the fast link
JOHN_TABLE: default = slow link
ip route add default via 195.96.98.253 dev ppp2 table JOHN_TABLE
This is a standard "unicast" route.
Other options:
unreachable
blackhole
prohibit
local (route back to this host; cf "local" policyroute table
broadcast
throw: terminate table search; go on to next policy-routing table
nat
via (used above)
for table_id in main local default
do
ip route show table $table_id
done
Note that what we REALLY want is to limit John's bandwidth, even if
we have a single outbound link that John shares with everyone. We'll
see how to do this with tc/tbf below.
Example 2
from: http://lartc.org/howto/lartc.netfilter.html (Hubert, ch 11)
iptables: allows MARKING packet headers (this is the fwmark).
Marking packets destined for port 25:
# iptables -A PREROUTING -i eth0 -t mangle -p tcp --dport 25 -j MARK --set-mark 1
Let's say that we have multiple connections, one that is fast (and
expensive) and one that is slower. We would most certainly like
outgoing mail to go via the cheap route.
We've already marked the packets with a '1', we now instruct the routing policy database to act on this:
# echo 201 mail.out >> /etc/iproute2/rt_tables
# ip rule add fwmark 1 table mail.out
# ip rule ls
0: from all lookup local
32764: from all fwmark 1 lookup mail.out
32766: from all lookup main
32767: from all lookup default
Now we generate a route to the slow but cheap link in the mail.out table:
# ip route add default via 195.96.98.253 dev ppp0 table mail.out
Example 3: special subnet:
Subnet S of site A has its own link to the outside world.
This is easy when this link attachment point (call it RS) is on S: other machines on S just need to be told that RS is their router.
However, what if the topology is like this:
S----R1---rest of A ----GR----link1---INTERNET
\__link2_INTERNET
How does GR (for Gateway Router) route via link1 for most traffic, but link2 for traffic originating in S?
Use matching on source subnet to route via second link!
Again, we're probably going to have to set the fwmark.
LARTC section 4.2 on two outbound links.
PRwL chapter 5.2 is probably better here.
Examples from Policy Routing with Linux
Example 5.2.2: Basic Router Filters
coreRouter
|
Actg_Router------+-----EngrRouter----192.168.2.0/24
172.17.0.0/16 |
|
corp backbone 10.0.0.0/8
Now we configure so most 10/8 traffic can't enter the side networks.
Accounting: admin denial except for 10.2.3.32/27 and 10.3.2.0/27.
Engineering test network accessible from 10.10.0.0/14
From accounting network - 172.17.0.0/16
Rules for inbound traffic
10.2.3.32/27 - full route
10.3.2.0/27 - full route
10/8
- prohibit (block
everyone else; note longest-match)
172.16/16 - prohibit (explained in more detail in 5.2.1)
From Engineering test network - 192.168.2.0/24
10.10/14 -
full route (special subnet granted
access)
10/8
- blackhole
(zero access to corporate backbone
172.17/16 - blackhole (zero access to accounting)
172.16/16 - blackhole
Possible configuration for EngrRouter:
ip addr add 10.254.254.253/32 dev eth0 brd 10.255.255.255
ip route add 10.10/14 scope link proto kernel dev eth0 src 10.254.254.253
ip route add blackhole 10/8
ip route add blackhole 172.17/16
ip route add blackhole 172.16/16
Possible configuration for Actg_Router
ip route add 10.2.3.32/27 scope link proto kernel dev eth0 src 10.254.254.252
ip route add 10.3.2.0/27 scope link proto kernel dev eth0 src 10.254.254.252
ip route add prohibit 10/8
ip route add prohibit 172.16/16
ip route add prohibit 192.168.2/24
See also http://linux-ip.net/html/routing-tables.html.
Examples from Jason Boxman, A Traffic Control Journey: Real World Scenarios
8.1.1: setting TOS flags for OpenSSH connections. Normal ssh use is
interactive, and the TOS settings are thus for Minimize-Delay. However,
ssh tunnels are not meant for interactive use, and we probably want to
reset the TOS flags to Maximize-Throughput. If we have both tunnels and
interactive connections, then without this option our router will
probably pretty much bring ssh interactive traffic to a halt while the
bulk traffic proceeds.
This is a good example of the --limit and --limit-burst options. Note the -m limit module-loading option before. See also hashlimit in man iptables.
Other examples in this section involve tc, and we'll look at them soon.
Demo of linux1 and linux2
Step 1: start tcp_writer on the host system, tcp_reader on linux2
Step 2: block the traffic with
iptables --table filter --append FORWARD --destination linux2 --protocol tcp --source-port 5432 --jump DROP
And then unblock with
iptables --table filter --delete FORWARD --destination linux2 --protocol tcp --source-port 5432 --jump DROP
or
iptables --table filter --delete FORWARD 1
Note that putting this rule in the INPUT or OUTPUT chains doesn't help.
We could use the POSTROUTING chain, but it doesn't support the FILTER
table.
"ip route" v "ip rule"
To tweak a table: "ip route ..."
To administer the routing-policy database (RPDB): "ip rule ..."
The linux iproute and tc packages allow us to manage the traffic rather than the devices. Why do we want to do this??
To tweak a table: "ip route ..."
To administer the routing-policy database (RPDB): "ip rule ..."
RPDB rules look at <srcaddr, dstaddr, in_interface, tos, fw_mark>
These
rules can't look at anything else! BUT: the fw_mark field (a "virtual"
field not really in the packet) can be set with other things outside of
RPDB (like iptables).
Marking packets destined for port 25:
table: mangle; chain: PREROUTING
# iptables -A PREROUTING -i eth0 -t mangle -p tcp --dport 25 -j MARK --set-mark 1
# echo 201 mail.out >> /etc/iproute2/rt_tables
# ip rule add fwmark 1 table mail.out ;; routes on mark set above!
From ip man page:
1. Priority: 0, Selector: match anything,
Action: lookup routing table local (ID 255).
The local table is a special routing table containing high priority
control routes for local and broadcast addresses.
Rule 0 is special. It cannot be deleted or overridden.
2. Priority: 32766, Selector: match anything,
Action: lookup routing table main (ID 254). The main table is
the normal routing table containing all non-policy routes.
This rule may be deleted and/or overridden with other ones by the
administrator.
3. Priority: 32767, Selector: match anything, Action:
lookup routing table default (ID 253). The default
table is empty. It is reserved for some post-processing if no
previous default rules selected the packet. This rule
may also be deleted.
Warning: table main is updated by routing-table protocol, RIP/EIGRP/etc
Other tables are not: if a routing change occurs, other tables (and traffic that uses them) may be out of luck.
Traffic shaping and traffic control
Generally there's not much point in doing shaping from the bottleneck
link into a faster link. The bottleneck link has done all the shaping
already!
fair queuing
Restricts bandwidth when there is
competition, but allows full use when network is idle. Caps share
only when link is busy!
token bucket
Restricts bandwidth to
a fixed rate, period (but also allows for burstiness as per the bucket,
which can always be small)
tc command:
shaping: output rate limiting (delay or drop)
scheduling: prioritizing. Classic application: sending voip ahead of bulk traffic
policing: input rate regulation
dropping: what gets done to nonconforming traffic
Two scenarios to restrict user/host Joe:
1. Reduce absolute bandwidth (in/out?) available to Joe. If the link is otherwise idle, Joe is still cut
2. Guarantee non-Joe users a min share; ie cap Joe's bandwidth only when the link is busy.
qdisc: queuing discipline
You can think of this as the TYPE of queue. Examples: fifo, fifo+taildrop, fifo+randomdrop, fair_queuing, RED, tbf
queuing disciplines are applied to INTERFACES, using the tc command.
Queuing disciplines can be "classless" or "classful" (hierarchical)
Queuing Disciplines (qdisc): does scheduling. Some also support shaping/policing
how packets are enqueued, dequeued
- fifo + taildrop
- fifo + random drop
- RED: introduces random drops not for policing, but to encourage good behavior by senders. Used in core networks, not leaf networks
- stochastic fair queuing
(each TCP connection is a flow): gives each flow a guaranteed
fraction of bandwidth, when needed. Other flavors: flows are subnets,
etc. However, if we're doing scheduling of inbound traffic, it doesn't
do much good to do sfq based on destination (unless we can do it at the
upstream router at our ISP). Some reasons for an application to open
multiple TCP connections:
- cheating SFQ limits
- much improved high-bandwidth performance
- the data naturally divides into multiple connections
- pfifo_fast (or, generically, pfifo): priority fifo. tc's pfifo_fast has three priority bands built-in.
enqueuing: figure out which band the packet goes into
dequeuing: take from band 0 if nonempty, else 1 if nonempty, else 2
Basic "classless" qdiscs
pfifo_fast
(see man pfifo_fast): three-band FIFO queue
Consider the following iptables command, to set the TOS bits on outbound ssh traffic to "Minimize-Delay":
# iptables -A OUTPUT -t mangle -p tcp --dport 22 -j TOS --set-tos Minimize-Delay
This works with pfifo_fast, which provides three bands. Band selection by
default is done using TOS bits of packet header (which you probably
have to mangle to set). See Hubert, §9.2.1.1, for a table of the TOS-to-band map.
Dequeuing algorithm (typically invoked whenever the hardware is
ready for a packet (or whenever the qdisc reports to the hardware that
it has a packet):
if there are any packets in band 1, dequeue the first one and send it
else if there are any packets in band 2, dequeue the first one and send it
else if there are any packets in band 3, dequeue the first one and send it
else report no packets available
Note that in a very direct sense pfifo_fast does support three
"classes" of traffic. However, it is not considered to be classful,
since
- we cannot control how traffic is classified
- we cannot attach subqueues to the individual classes
Example: queue flooding on upload
In practice, it is very important to set interactive traffic to have a
higher priority than normal traffic (eg web browsing, file downloads).
However, you don't have much control of the downlink traffic, and if
the uplink queue is on the ISP's hardware (eg their cablemodem), then
you won't have much control of the upload side.
me------<fast>------[broadband gizmo]-----------<slow>-----...internet
In the scenario above, suppose the broadband gizmo, BG, has a queue
capacity of 30K (30 packets?). A bulk UPLOAD (though not download) will
fill BG's queue, no matter what you do at your end with pfifo_fast.
This means that every interactive packet will now wait behind a 30KB
queue to get up into the internet. As the upload packets are sent, they
will be ACKed and then your machine will replenish BG's queue.
One approach is to reduce BG's queue. But this may not be possible.
Here's another approach:
me---<fast>---[tq_router]---<slow2>---[broadband gizmo]---<slow>--internet
Make sure slow2 ~ slow. Then upload will fill tq_router's queue, but fast traffic can still bypass.
Logically, can have "me" == "tq_router"
_______________________
Token bucket filter (tbf)
See man tbf or man tc-tbf
restrict flow to a set average rate, while allowing bursts. The tbf qdisc slows down excessive bursts to meet filter limits. This is shape-only, no scheduling.
tbf (or hbf) is probably the preferred way of implementing bandwidth caps.
tokens are put into a bucket at a set rate. If a packet arrives:
tokens available: send immediately and decrement the bucket
no token: drop (or wait, below)
Over the long term, your transmission rate will equal the token rate.
Over the short term, you can transmit bursts of up to token size.
Parameters:
bucketsize (burst): how large the bucket can be
rate: rate at which tokens are put in
limit:
number of bytes that can wait for tokens. All packets wait, in essence;
if you set this to zero then the throughput is zero. While
theoretically it makes sense to set this to zero, in practice that
appears to trigger serious clock-granularity problems.
latency: express limit in time units
mpu: minimum packet unit; size of an "empty" packet
Granularity issue: buckets are typically updated every 10 ms.
During 10 ms, on a 100Mbps link, 1MB can accumulate, or ~100 large
packets! Generally, tbf introduces no burstiness itself up to 1mbit (a
10 kbit packet takes 10 ms on a 1mbit link). Beyond that, a steady
stream of packets may "bunch up" due to the every-10-ms bucket refil.
Use of tbf for bulk traffic into a modem:
# tc qdisc add \
dev ppp0 \
root \
tbf \
rate 220kbit \ # if actual bandwidth is 250kbit
latency 50ms \
burst 1540 # one packet
You want packets queued at your end, NOT within your modem!
Otherwise you will have no way to use pfifo_fast to have interactive traffic leapfrog the queue.
What you REALLY want is for TBF to apply only to the bulk bands of PFIFO_FAST.
Can't do this with TBF; that would be classFUL traffic control (though we can do that with PRIO, the classful analogue of pfifo_fast)
Can't be used to limit a subset of the traffic!
TBF is not WORK-CONSERVING:
the link can be idle and someone can have a packet ready to send,
and yet it still waits.
Demo: linux1, linux2, and tc
start tcp_writer in ~/networks/java/tcp_reader. This accepts
connections to port 5432, and then sends them data as fast as it can.
tbf1:
tc qdisc add dev eth1 root tbf rate $BW burst $BURST limit $LIMIT
also try:
tc qdisc change dev eth1 root tbf rate newrate burst newburst limit newlimit
This might cause a spike in kbit/sec numbers:
479
479
159
4294
564
564
...
clear with tc qdisc del dev eth1 root
tc_stats
demo: enable rate 1000 kbit/sec (1 kbit/ms), burst 20kb, limit 100kb. Then try limit = 5kb v 6kb.
At the given rate, 1kb takes 8ms. The bucket is replenished every 10ms
(hardware clock limitation), so a burst should not be consumed much
during a 10ms interval.
At 10mbit, 10ms is 12kb, and we won't get that rate unless burst is set to about that.
___________________________
Fair Queuing
Note
that the FQ clock can be reset to zero whenever all queues are empty,
and can in fact just stay at zero until something arrives.
Linux "sfq":
Flows are individual tcp connections
NOT hosts, or subnets, etc! Can't get this?!
Each flow is hashed by srcaddr,destaddr,port.
Each bucket is considered a separate input "pseudoqueue"
Collisions result in unfairness, so the hash function is altered at regular intervals to minimize this.
What
we really want is a way to define flow_id's, so they can be created
dynamically, and connections can be assigned to flow_id by:
- connection
- host
- subnet
- etc
sfq is schedule-only, no shaping
What you probably want is to apply this to DOWNLOADs.
tc doesn't do that if your linux box is tied directly to your broadband gizmo. Applied to downloads at a router,
joe,mary,alice---<---1--------[router R]------<---2----internet
it would mean that each of joe,mary,alice's connections would get
1/3 the bw, BUT that would be 1/3 of the internal link, link 1.
Regulating shares of link 2 would have to be done upstream.
If we know that link 2 has a bandwidth of 3 Mbps, we can use CBQ
(below) to restruct each of joe,mary,alice to 1 Mbps, by controlling
the outbound queue at R into link 1.
Further sharing considerations:
If we divide by 1 flow = 1 tcp connection, joe can double throughput by adding a second tcp connection.
If we divide by 1 flow = 1 host, we do a little better at achieving
per-user fairness, assuming 1 user = 1 machine. Linux sfq does not
support this.
Linux sfq creates multiple virtual queues. It's important to realize
that there is only one physical queue; if one sender dominates that
queue by keeping it full, to the point that the other connections get
less bandwidth than their FQ share, then the later division into
LOGICAL queues won't help the underdogs much.
sfq IS work-conserving
----------------------------
RED
Generally intended for internet routers
We drop packets at random (at a very low rate) when queue capacity
reaches a preset limit (eg 50% of max), to signal tcp senders to slow
down.
---------------------------
These gizmos are added to interfaces, basically. If you want to slow a
particular sender down, create a virtual interface for them or use
classful qdiscs.
classful qdiscs
CBQ, HTB, PRIO
Disneyland example: what is the purpose of having one queue feed into another queue?
However, under tc we can have these classful qdiscs form a tree, possibly of considerable depth.
LARTC 9.3: (http://lartc.org/howto/lartc.qdisc.advice.html)
- To purely slow down outgoing traffic, use the Token Bucket Filter. Works up to huge bandwidths, if you scale the bucket.
- If your link is truly full and you want to make sure that no
single session can dominate your outgoing bandwidth, use Stochastical
Fairness Queueing.
- If you have a big backbone and know what you are doing, consider Random Early Drop (see Advanced chapter).
- To 'shape' incoming traffic which you are not forwarding, use the Ingress Policer. Incoming shaping is called 'policing', by the way, not 'shaping'.
- If you are forwarding
it, use a TBF on the interface you are forwarding the data to. Unless
you want to shape traffic that may go out over several interfaces, in
which case the only common factor is the incoming interface. In that
case use the Ingress Policer.
- If you don't want to shape, but only want to see if your
interface is so loaded that it has to queue, use the pfifo queue (not
pfifo_fast). It lacks internal bands but does account the size of its
backlog.
- Finally - you can also do "social shaping". You may not always be
able to use technology to achieve what you want. Users experience
technical constraints as hostile. A kind word may also help with
getting your bandwidth to be divided right!
Basic terminology for classful qdiscs
classes form a tree
each leaf node has a class
At each interior node, there is a CLASSIFIER algorithm (a filter) and a set of child class nodes.
Linux: at router input(ingress), we can apply POLICING to drop
packets; at egress, we apply SHAPING to put packets in the right queue.
Terminology derives from the fact that there is no ingress queue in
linux (or most systems).
Classful Queuing Disciplines
CBQ, an acronym for 'class-based queuing', is the best known.
It is not, however, the only classful queuing discipline. And it is rather baroque.
PRIO: divides into classes :1, :2, :3 (user-configurable this time)
dequeuing: take packet from :1 if avail; if not then go on to :2, etc
by default, packets go into the band they would go into in PFIFO_FAST, using TOS bits. But you can use "shapers" to adjust this.
Hierarchy of PRIO queues is equivalent to the "flattened" PRIO queue.
However, a hierarchy of PRIO queues with SFQ/TBF offspring is not "flattenable".
TBF: classic rate-based shaper; packets wait for their token
(policing version: you get dropped if you arrive before your token)
For a classless qdisc, we're done once we create it
(its parent might be nonroot, though).
For a classful qdisc, we add CLASSES to it.
Each class will then in turn have a qdisc added to it.
parent of a class is either a qdisc or a class of same type
class major numbers must match parent.
qdisc major numbers are new
each class needs to have something below it, although every class gets a fifo qdisc by default.
We then attach a sub-qdisc to each subclass.
LARTC example:
Hubert example in 9.5.3.2 has a prio qdisc at the root.
The subclasses 1:1, 1:2, and 1:3 are automatic, as is the filtering.
1: root qdisc
/ | \
/ | \
/ | \
1:1 1:2 1:3 classes, added automatically
| | |
10: 20: 30: qdiscs qdiscs
sfq tbf sfq
band 0 1 2
Bulk traffic will go to 30:, interactive traffic to 20: or 10:.
Command lines for adding prio queue
# tc qdisc add dev eth0 root handle 1: prio
This automatically creates classes 1:1, 1:2, 1:3. We could say
tc qdisc add dev eth0 root handle 2: prio bands 5
to get bands 2:1, 2:2, 2:3, 2:4, and 2:5. Then zap with
tc qdisc del dev eth0 root
But suppose we stick with the three bands, and add:
# tc qdisc add dev eth0 parent 1:1 handle 10: sfq // prob should be tbf too
# tc qdisc add dev eth0 parent 1:2 handle 20: tbf rate 20kbit buffer 1600 limit 3000
#
tc qdisc add dev eth0 parent 1:3 handle 30:
sfq
We now get a somewhat more complex example.
Hierarchical Fair Queuing
1. Why it's not flattenable
/ \
/ \
/ \
50 50
/\ /\
/ \ / \
25 25 25 25
A B C D
ABC on, D idle:
Hierarchical divides 25/25/50
Flat divides 33/33/33
2. How to define using fluid flows (as we did with flat fair queuing)
3. Because the slow-clock algorithm (or any finishing-time
algorithm) implies that the finishing order of two packets cannot
depend on future arrivals, we cannot still use that strategy!
Example: from Bennett & Zhang, Hierarchical packet fair queueing algorithms, IEEE/ACM Transactions on Networking, Oct 1997
2.2
/ \
/ \
/ \
80 20
/\ |
/ \ |
75 5 |
A1 A2 B
All packets have size 1; link speed is 1 (ie 1 packet/unit_time)
T=0:
A1's queue is idle; A2's and B's are very full. A2 gets 80%, B gets
20%. Finishing time calculations are such that A2 sends 4, then B sends
1, then A2 sends 4, then B sends 1....
But now let a packet arrive on A1. All of a sudden, A2 should get 5%, or 1/4 the rate of B.
But the finishing-time model can't change, because those calculations don't allow it!
Example 3: from Bennett & Zhang, 3.1
11 users. User 1 has a guarantee of 50%, the others all have 5%. WFQ
sends 10 packets for user 1, then 10 in all, one for each of the other
users. So-called Generalized Processor Sharing model: 5 of user 1 / 10
of others / 10 of user 1 / ...
difference between WFQ (our algorithm, non-hierarchical) and fluid model
There is active research on algorithms that work for packets, have bounded delay w.r.t. the fluid model, and are fast.
CBQ
The acronym stands for "Class Based Queuing", though there are several
other forms of classful queuing disciplines. CBQ is an attempt to
combine some elements of Fair Queuing and Token Bucket in a single
mechanism.
Goal: classful shaping. But the shaping (rate-limiting) doesn't work
the way you'd like, because the rate is ESTIMATED somewhat artificially
by measuring average idle time at the device interface.
Example from LARTC [Hubert] 9.5.4.4
Goal:
webserver traffic limited to 5 mbit (class 1:3, qdisc 30:)
smtp traffic
limited to 3 mbit (class 1:4, qdisc 40:)
combination limited to 6 mbit
1: root
qdisc
|
1:1 child
class
/ \
/ \
1:3 1:4
leaf classes
| |
30: 40:
qdiscs
(sfq) (sfq)
Create root:
# tc qdisc add dev eth0 root handle 1:0 cbq bandwidth 100Mbit avpkt 1000 cell 8
create CLASS at root node to limit to 6 mbit
"bounded" (at end) means this class can't borrow from other idle classes.
This caps the rate at 6Mbit
# tc class add dev eth0 parent 1:0 classid 1:1 cbq bandwidth 100Mbit \
rate 6Mbit weight 0.6Mbit prio 8 allot 1514 cell 8 maxburst 20 avpkt 1000 bounded
Now create the two leaf classes, with classids 1:3 and 1:4
These are not bounded (which means they can borrow), and also not
isolated, which means they can lend. Classid n:m is our choice, but
must have n=1 to match parent definition above.
# tc class add dev eth0 parent 1:1 classid 1:3 cbq bandwidth 100Mbit \
rate 5Mbit weight 0.5Mbit prio 5 allot 1514 cell 8 maxburst 20 avpkt 1000
# tc class add dev eth0 parent 1:1 classid 1:4 cbq bandwidth 100Mbit \
rate 3Mbit weight 0.3Mbit prio 5 allot 1514 cell 8 maxburst 20 avpkt 1000
Both leaf classes have fifo qdisc by default. We could leave it that way, but here's how to replace it with sfq
# tc qdisc add dev eth0 parent 1:3 handle 30: sfq
# tc qdisc add dev eth0 parent 1:4 handle 40: sfq
Now we attach filtering rules, to the root node.
sport 80 (srcport 80) means web traffic.
# tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip sport 80 0xffff flowid 1:3
# tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip sport 25 0xffff flowid 1:4
Note that we use 'tc class add' to CREATE classes within a qdisc, but
that we use 'tc qdisc add' to actually add qdiscs to these classes.
Traffic that is not classified by any of the two rules will then be processed within 1:0, and be unlimited.
If SMTP+web together try to exceed the set limit of 6mbit/s, bandwidth
will be divided according to the weight parameter, giving 5/8 of
traffic to the webserver and 3/8 to the mail server.
Hubert 9.5.4.4: CBQ
1: root
qdisc, cbq
|
1:1 child
class, cbq, added by us,
/ \
with params to limit traffic
/ \
1:3 1:4
leaf classes, each cbq (of necessity)
| |
30: 40:
qdiscs
(sfq) (sfq)
Note we must add the immediate child manually!
We also add filters to direct the traffic to 1:3 and 1:4 as appropriate:
# tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip sport 80 0xffff flowid 1:3
# tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip sport 25 0xffff flowid 1:4
If traffic is not picked up by a filter, it goes to 1:0.
filter flow-id matches a class-id
How do we have one class represent a subnet, and another class represent "everything else"? Use a default class.
Note that this is specific to the class. cbq doesn't have this.
Boxman, 8.2.1
tc qdisc add dev eth0 root handle 1: htb default 90
Hierarchical token bucket (htb)
This is a classful version of tbf. Note that because we may now have sibling classes, we have an important sharing
feature: each sibling class is allocated bandwidth the minimum of what
it requests and what its assigned share is; that is, each class is
guaranteed a minimum share (like fair queuing). However, the fairness
of the division in the absence of traffic from some nodes may be suspect.
Commands to create an htb qdisc with three child classes:
1. Hosts 10.0.0.1 and 10.0.0.2 go to class :10
2. subnet 10.0.0.0/29 (10.0.0.0 - 10.0.0.7 except the above two!) goes to class :29
3. All other traffic goes to class :100
The qdisc is placed on the interior interface of the router, to regulate inbound traffic.
We suppose that this is to control flow over a link that has a
sustained bandwidth limit of BW, a bucket size of BURST, and a peak
bandwidth of PBW. (PBW is not used here.)
BW=56 #kbps
BURST=350 #mb
tc qdisc add dev eth0 root handle 1: htb default 100
tc class add dev eth0 parent 1: classid 1:1 htb rate ${BW}kbps burst ${BURST}mb
# class 10 is limited by parent only
tc class add dev eth0 parent 1:1 classid 1:10 htb rate ${BW}kbps burst ${BURST}mb
# class 29 has same rate, but half the burst
HBURST=$(expr $BURST / 2)
tc class add dev eth0 parent 1:1 classid 1:29 htb rate ${BW}kbps burst ${HBURST}mb
# class 100 has 3/4 the refill rate, too
BW100=$(expr 3 \* $BW / 4)
tc class add dev eth0 parent 1:1 classid 1:100 htb rate ${BW100}kbps burst ${HBURST}mb
tc filter add dev eth0 parent 1:0 protocol ip u32 \
match ip dst 10.0.0.1/32 flowid 1:10
tc filter add dev eth0 parent 1:0 protocol ip u32 \
match ip dst 10.0.0.2/32 flowid 1:10
tc filter add dev eth0 parent 1:0 protocol ip u32 \
match ip dst 10.0.0.0/29 classid 1:29
;; no rule for flowid 1:100; taken care of by default rule