Linux iptables
The linux iptables package includes support for queuing disciplines,
policing, traffic control, reservations, and prioritizing.
Mostly these are handled through the LARTC package: Linux Advanced Routing
& Traffic Control
leaf-node zones: here we can regulate who has what share of bandwidth
Core problem: we likely can't regulate inbound traffic directly, as it's
already been sent!
Notes on using tc to implement traffic control
Goal: introduce some notion of "state" to stateless routing
LARTC HOWTO -- Bert Hubert, et al
http://lartc.org/howto
A Practical Guide to Linux Traffic Control -- Jason Boxman
borg.uu3.net/traffic_shaping.
good diagrams
Traffic Control HOWTO, v 1.0.2 -- Martin Brown: local
copy in pdf format
Policy
Routing with Linux, Matthew Marsh (PRwL)
Good sites:
http://linux-ip.net/articles/Traffic-Control-HOWTO/classful-qdiscs.html
http://tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.adv-filter.u32.html:
good article on u32 classifier
Good stuff on "real-world scenarios":
http://www.trekweb.com/~jasonb/articles/traffic_shaping/scenarios.html
The linux packages we'll be looking at include:
iptables: for basic firewall
management, including marking packets. Dates from 1998
iproute2: for actually routing
packets based on more than their destination address. Also provides the ip command for maintaining all the
system network state. Dates from ~2001?
tc: traffic control, for creating
various qdiscs and bandwidth limits
Queuing Disciplines (qdisc): does scheduling. Some also support
shaping/policing.
A qdisc determines how packets are enqueued and dequeued. Some options:
- fifo + taildrop
- fifo + random drop
- RED: introduces random drops not for policing, but to encourage good
behavior by senders. Used in core networks, not
leaf networks
- stochastic fair queuing (each TCP connection is a flow). SFQ gives
each flow a guaranteed fraction of bandwidth, when needed. Other SFQ
flavors: flows are subnets, etc. However, if we're doing scheduling of
inbound traffic, it doesn't do much good to do SFQ based on destination
(unless we can do it at the upstream router at our ISP)
- pfifo_fast (or, generically, pfifo): priority fifo
Some reasons for an application to open multiple TCP connections:
- cheating SFQ limits
- much improved high-bandwidth
performance
- the nature of the data naturally divides into multiple connections
tc's pfifo_fast
qdisc has three priority bands built-in: 0, 1, and 2.
enqueuing: figure out which band the
packet goes into (based on any packet info; eg is it VOIP?)
dequeuing: take from band 0 if
nonempty, else 1 if nonempty, else 2
ipTables
iptables review: can filter traffic, mark/edit headers, and implement NAT.
Fundamentally, iptables is a firewall
tool.
How to direct traffic (at least without destaddr rewriting) was somewhat
limited.
Iptables has 5 builtin chains,
representing specific points in packet processing. Chains are lists of rules. The basic predefined chains
are:
- PREROUTING: for before a routing decision is made; includes packets
from the outside world and locally generated packets.
- INPUT: for packets to be delivered locally to this host.
- FORWARD: for packets that will be routed, and which are not
for local delivery.
- OUTPUT: locally generated packets; either remote or local destination.
- POSTROUTING: outbound traffic
You can define your own chains, but they are pretty esoteric unless you're
using them as "chain subroutines", called by one of the builtin chains.
Rules contain packet-matching patterns and actions. Typically, a packet
traverses a chain until a rule matches; sometimes the corresponding action
causes a jump to another chain or a continuation along the same chain, but
the most common case is that we're then done with the packet.
Tables are probably better thought as parts of chains, rather than the other
way around; they are in a sense rule targets.
Specifically, in iptables the tables are
- FILTER, involving the chains INPUT, OUTPUT and FORWARD (below)
- NAT, involving the chains PREROUTING, OUPUT and POSTROUTING.
- MANGLE, involving the chains PREROUTING and OUTPUT
- RAW, for various low-level packet updating
Targets: ACCEPT, DENY, REJECT, MASQuerade, REDIRECT, RETURN
The FILTER table is where we would do packet filtering. The MANGLE table
is where we would do packet-header rewriting. The MANGLE table has targets
TOS, TTL, and MARK.
Obvious application: blocking certain categories of traffic
Not-so-obvious: differential routing, and actually tweaking traffic (with
MANGLE; can be done before and after routing)
Here is a diagram from http://ornellas.apanela.com/dokuwiki/pub:firewall_and_adv_routing
indicating the relationship of the chains to one another and to routing.
Note that the Local Machine is a sink for all packets entering, and a source
for other packets. Packets do not flow through it. The second Routing
Decision is for packets created on the local machine which are sent outwards
(or possibly back to the local machine).

Incoming
Traffic
|
|
V
+----------+
|PREROUTING|
+----------+
| raw | <--------------+
| mangle | |
| nat | |
+----------+ |
| |
| |
Routing |
+- Decision -+ |
| | |
| | |
V V |
Local Remote |
Destination Destination |
| | |
| | |
V V |
+--------+ +---------+ |
| INPUT | | FORWARD | |
+--------+ +---------+ |
| mangle | | mangle | |
| filter | | filter | |
+--------+ +---------+ |
| | |
V | |
Local | |
Machine | |
| |
| |
Local | |
Machine | |
| | |
V | |
Routing | |
Decision | |
| | |
V | |
+--------+ | |
| OUTPUT | | |
+--------+ | |
| raw | | |
| mangle | | |
| nat | | |
| filter | | |
+--------+ | |
| | |
| +-------------+ |
| | POSTROUTING | Local
+----> +-------------+ --> Traffic
| mangle |
| nat |
+-------------+
|
|
V
Outgoing
Traffic
iptables: netfilter.org
iproute2: policyrouting.org
tables in iptables:
From the iptables man page:
filter This is the default
table. It contains the built-in chains
INPUT (for packets coming into the box itself),
FORWARD (for packets being routed through the box), and OUTPUT (for
locally-generated packets).
pld: generally, users add things to the forward chain. If the box is acting
as a router, that's the only one that makes sense.
nat This table is
consulted when a packet that creates a new connection is encountered.
It consists of three built-ins: PREROUTING (for altering
packets as soon as they come in), OUTPUT (for
altering
locally-generated packets before
routing), and POSTROUTING (for altering packets as they
are about to go out).
pld: The NAT table is very
specific:
it's there for implementing network address translation. Note that the
kernel must keep track of the TCP state of every connection it has
seen, and also at least something about UDP state. For UDP, the kernel
pretty much has to guess when the connection is ended. Even for TCP, if
the connection was between hosts A and B, and host A was turned off,
and host B eventually just timed out and deleted the connection (as
most servers do, though it isn't really in the TCP spec), then the NAT
router won't know this.
Part of NAT is to reuse the same port, if
it is available; port
translation is only done when another host inside NAT-world is already
using that port.
mangle This table is used for
specialized packet alteration. It has two
built-in chains: PREROUTING (for altering incoming packets before
routing) and OUTPUT (for altering locally-generated packets before routing).
pld: classic uses include tweaking the Type-Of-Service (TOS) bits. Note
that it's actually kind of hard to tell if an ssh connection is
interactive or bulk; see the example from Boxman below.
A second application is to set the fw_mark value based on fields the
iproute2 RPDB (Routing Policy DataBase) cannot otherwise see. (RPDB can
see
the fw_mark). This is often used as an alternative to "tc filter".
An extension of this is the CLASSIFY option:
iptables -t mangle -A
POSTROUTING -o eth2 -p tcp --sport 80 -j CLASSIFY --set-class 1:10
The CLASSIFY option is used with the tc
(advanced queuing) package; it allows us to place packets in a given tc queue by using iptables.
Examples
These examples are from http://netfilter.org/documentation/HOWTO//packet-filtering-HOWTO.html.
Here is an example of how to block responses to pings to this
host:
iptables -A INPUT -p icmp -j DROP
A more specific version is
iptables -A INPUT
-p icmp --icmp-type echo-request
-j DROP
To remove: iptables --delete
INPUT 1 (where 1 is the rule number), or just change -A to -D above
and leave everything else the same.
The icmp-type options can be obtained with the command iptables
-p icmp --help.
Demo on linux1. The idea is that we are zapping all icmp packets as they
arrive in the INPUT chain.
We are appending (-A) to the INPUT chain; the source address is
localhost (note that we're blocking our outbound responses), the
protocol is icmp, and if this is the case we jump (-j) to the DROP
target.
The above rule is in the INPUT chain because we are blocking pings to this host. If we want to block pings through this host, we add the rule to
the FORWARD chain:
iptables -A
FORWARD -p icmp --icmp-type echo-request
-j DROP
iptables -D
FORWARD -p icmp --icmp-type echo-request
-j DROP
Here is an example of how to block all traffic to port 80 on this host:
iptables -A INPUT -p tcp --dport 80 -j
DROP
The option --sport is also available, as is --tcp-flags. Also, -p (protocol)
works for icmp, udp, all
Here is how to allow inbound traffic only
to ports 80 and 22:
iptables -A INPUT -p tcp --dport 80 -j
ACCEPT
iptables -A INPUT -p tcp --dport 22 -j ACCEPT
iptables -A INPUT -p tcp --dport 31337 -j ACCEPT
iptables -A INPUT -p tcp -j DROP
You can also set the following, so the last line above doesn't have to be there (and, in particular, doesn't have to be last). This makes it easier to add new or temporary inbound exceptions.
iptables --policy INPUT DROP
On my home router, I have the command blockhost,
that does the following:
iptables --table filter --append
FORWARD --source $HOST --protocol tcp --destination-port 80 --jump DROP
iptables --table filter --append
FORWARD --destination $HOST --protocol tcp --source-port 80 --jump DROP
Note that the router lies between the host and the outside world; that is, I
must use the filter table. Also, I block inbound traffic from port 80, and
also outbound traffic to port 80. (I also have a command to block all
traffic, when necessary.)
Here is a set of commands to block all inbound tcp connections
on interface ppp0.
First, we create a new chain named block, so that we can put rules on one place to apply to both INPUT and FORWARD chains.
## Create chain which blocks new connections, except if coming from inside.
# iptables -N block
# create the block chain
# iptables -A block -m state --state ESTABLISHED,RELATED -j ACCEPT
# iptables -A block -m state --state NEW -i ! ppp0 -j ACCEPT
# iptables -A block -j DROP
## Jump to that chain from INPUT and FORWARD chains.
# iptables -A INPUT -j block
# iptables -A FORWARD -j block
The -j option means to jump to the block
chain, but then return if nothing matches. However, as the last rule always
matches, this doesn't actually happen.
The interface is specified with -i; the second block
entry states that the interface is anything not
ppp0.
The -m means to load the specific "matching" module; -m
state --state NEW means that we are loading the tcp state
matcher, and that we want to match packets starting a NEW
connection.
What if we want to block traffic of a particular user? We have the iptables
owner module. It applies only to
locally generated packets. If we're throttling on the same machine that the
user is using, we can use this module directly.
If the blocking (or throttling) needs to be done on a router different from
the user machine, then we need a two-step approach. First, we can use this
module to mangle the packets in some way (eg set some otherwise-unused
header bits, or forward the packet down a tunnel). Then, at the router, we
restore the packets and send them into the appropriate queue.
Iptables can also base decisions on the TCP connection state using the state module and the --state
state option, where state is a comma separated list of the
connection states to match. Possible states are INVALID meaning that
the packet could not be identified for some reason which includes running
out of memory and ICMP errors which don't correspond to any
known connection, ESTABLISHED meaning that the packet is associated with a
connection which has seen packets in both directions, NEW meaning that the
packet has started a new connection, or otherwise associated with a
connection which has not seen packets in both directions, and RELATED
meaning that the packet is starting a new connection, but is associated with
an existing connection, such as an FTP data transfer, or an ICMP error.
[from man iptables]
building firewalls w iptables/iproute2:
ip command: can build simple packet filter firewall:
only allow packets to server ports 25 and 80:
potential inconsistencies:
traffic to port 21 gets "ICMP message 1",
traffic to port 53 gets "ICMP message 2"
traffic to port 70 gets blackholed
But in practice this is not such an issue.
Here are ulam3's iptables entries for enabling NAT. Ethernet interface eth0 is the "internal" interface; eth1 connects to the outside world. In
the NAT setting, the internal tables are in charge of keeping track of
connection mapping; each outbound connection from the inside (any host/port)
is assigned a unique port on the iptables host.
iptables --table nat --append
POSTROUTING --out-interface eth1 -j
MASQUERADE
# the next entry is for the "PRIVATE NETWORK" interface
iptables --append FORWARD --in-interface eth0
-j ACCEPT
echo 1 > /proc/sys/net/ipv4/ip_forward
mangling
IpTables is partly a firewall tool, but NAT is really something quite
different from a simple firewall. IpTables has one other important
non-firewall feature: packet "mangling". Packets can be marked or
re-written in several ways. Here is one simple example:
iptables -t mangle -A PREROUTING -i eth2 -j MARK --set-mark 1
Internally, packets have an associated "firewall mark" or fwmark; the
comand above sets this mark for packets arriving via interface eth2. The
fwmark is not physically part of the packet, and is never transmitted, but
it is associated with the packet for its lifetime within the system.
Demo on linux1 and linux2
Step 1: start tcp_writer on the host system, tcp_reader on linux2
Step 2: block the traffic with
iptables --table filter --append FORWARD --destination
linux2 --protocol tcp --source-port 5431 --jump DROP
And then unblock with
iptables --table filter --delete FORWARD
--destination linux2 --protocol tcp --source-port 5431 --jump DROP
or
iptables --table filter --delete FORWARD 1
Note that putting this rule in the INPUT or OUTPUT chains has no effect. We
could use the POSTROUTING chain, but it doesn't support the FILTER table.
iproute2
ipRoute2 has some functional overlap with iptables, but it is fundamentally
for general routing, not firewalls (though note that firewalls are a
special-purpose form of routing).
iproute2 features support for:
- NAT
- tunnels
- load balancing; rate-limiting
- queuing discipline
- Route traffic from some hosts using a different path; eg if subnetA
has private highspeed link
- route traffic according to Acceptable Use Policy
- Throttle bandwidth for certain computers
- Throttle bandwidth TO certain computers
- Help you to fairly share your bandwidth
- Protect your network from DoS attacks
- Protect the Internet from your customers
- Multiplex several servers as one, for load balancing or enhanced
availability
- Restrict access to your computers
- Limit access of your users to other hosts
- Do routing based on user id (yes!), MAC address, source IP address,
port, type of service, time of day or content
- Firewall DROP choice between BLACKHOLE and ADMINISTRATIVELY_DENY
- give priority to VOIP traffic (perhaps also limiting bandwidth)
A typical iproute2 triad involves:
Address |
CIDRed IP address (eg 147.126.2.0/23) |
Next Hop
|
defines how to get to the address |
Rule |
defines when the above applies
|
Classically, routing involves only the first two elements: given a
destination, we route to a given NextHop. (Sometimes the Type_of_Service
bits are included in the routing destination too). iproute2 allows us to
introduce rules to take into account:
- ip destination address
- ip tos/qos bits
- ip source
- arrival interface
- other ip bits (eg ECN)
- tcp/udp port
- tcp-flags
- tcp connection state (man iptables => state)
- connlimit (to limit the number of connections from/to a host)
- payload data
- packet "fwmark" marks (perhaps indicating a userid?)
The central idea of iproute2 is to have multiple
classic ⟨dest,next_hop⟩ tables, with the desired table selected by the Rule
portion.
The ip command
old approach was to have interfaces have one "primary" ip addr and then
several "coloned" subinterfaces:
eth0 192.1.2.3
eth0:1 10.3.4.5
eth0:2 200.9.10.11
All address assignments are on equal footing now.
iproute2 has multiple tables, but is slightly different from "iptables". The
tables of iproute2 are actual routing tables.
Basics of the ip command:
ip link show|list list interfaces
ip address show|list show ip addresses
ip route show|list
ip rule show|list not on ulam2, but does work on ulam3
0: from all lookup local
32766: from all lookup main
32767: from all lookup default
The above is a list of rules
determining which table is used. The tables themselves are numbered from 0
to 255; local =255, main=254, and default=253.
Policy routing introduces additional routing tables, one per rule. Local,
main, and default
are the three standard tables.
The default rules here are to use each table for all lookups.
Classical routing uses local and main. The local
table is for high-priority routes for localhost/loopback, and also broadcast
and multicast addresses. (examine ulam3 table with ip
route list table local) Has b'cast and loopback routes
main: classical table
ulam3:
10.213.119.0/24 dev eth0 proto
kernel scope link src 10.213.119.254
10.38.2.0/24 dev eth1 proto kernel scope link src
10.38.2.42
default via 10.38.2.1 dev eth1
default: normally empty. Despite the
name, this has nothing to do with the traditional concept of a default
route.
rules: linearly ordered. Rules can access
- packet destination addr (conventional)
- quality-of-service/type-of-service (also conventional, but rare
outside iproute2)
- packet source addr: "from PREFIX"
- packet incoming interface "iif NAME"
- packet tos "tos TOS"
- fwmark values (packet marking) "fwmark MARK". We can apply a
fwmark using ipTABLES based on any other packet header fields.
The match rule for each of the tables above is to "match everything". Thus,
the local table is consulted first.
For nonlocal traffic that table will not yield a route, and so the main
table is consulted. Usually the main
table has a "default" route that will always match.
Typically a match identifies a (conventional) routing table, which then uses
the destaddr to make the final selection. This is the "unicast" action; the
use of the table is because destaddr is the packet field used most
intimately. Also, within a table, CIDR longest-match is used.
Other actions can include
- blackhole: just disappear it
- unreachable: drop & return ICMP Network
Unreachable
- prohibit: drop &
return ICMP Administratively Prohibited
- nat
iproute2 version of nat
- local
deliver locally
- throw
as if table
lookup failed, even if it didn't
Note the local rule always fires, but in MOST cases the return value is
"continue_along_the_rules_chain"
Here is an example combining iptables and iproute2 that will allow
special routing for all packets arriving on interface eth2 (from http://ornellas.apanela.com/dokuwiki/pub:firewall_and_adv_routing):
iptables: this command "marks" all packets arriving on eth2
iptables -t mangle -A PREROUTING -i eth2 -j MARK --set-mark 1
Now here is the modified iproute2 ruleset, where a special table 33 has been
created (together with rule 330). See below, Example
1, for more details on how this table should be created.
# ip rule list
0: from all lookup local
330: from all fwmark 0x1 lookup 33
32766: from all lookup main
32767: from all lookup default
Note that the rules for iptables don't (or didn't, I'm not sure if this has
changed) allow us to check the arriving interface directly, hence the use of
iptables, the fwmark, and packet mangling.
From http://linux-ip.net/html/routing-tables.html,
emphasis added:
The multiple routing table system provides a
flexible infrastructure on top of which to implement policy routing. By
allowing multiple
traditional
routing tables (keyed primarily to destination address) to be combined
with the
routing
policy database (RPDB) (keyed primarily to source address),
the
kernel supports a well-known and well-understood interface while
simultaneously expanding and extending its routing capabilities.
Each routing table still operates in the
traditional and expected fashion. Linux simply allows you to
choose from a number of routing tables, and to traverse routing tables in
a user-definable sequence until a matching route is found.
Here is an example combining iptables and iproute2 that will allow
special routing for all packets arriving on interface eth2 (from http://ornellas.apanela.com/dokuwiki/pub:firewall_and_adv_routing):
iptables: this command "marks" all packets arriving on eth2
iptables -t mangle -A PREROUTING -i eth2 -j MARK --set-mark 1
Now we issue this command to create the rule:
ip rule add from all fwmark 0x1 lookup 33
Here is the modified iproute2 ruleset, where a special table 33 has been
created (together with rule 32765). See below, Example
1, for more details on how this table should be created.
# ip rule list
0: from all lookup local
32765: from all fwmark 0x1 lookup 33
32766: from all lookup main
32767: from all lookup default
Note that the rules for iptables don't allow us to check the arriving
interface directly, hence the use of iptables, the fwmark, and packet
mangling.
Note also that we haven't created table 33 yet (see below).
visit
/etc/iproute2:
rt_tables, etc
1. Add an entry to rt_tables:
100 foo
2. ip rule list:
doesn't show it
3. ip rule add fwmark 1 table foo
Now: ip rule list shows
0: from all lookup local
32765: from all fwmark 0x1 lookup foo
32766: from all lookup main
32767: from all lookup default
The rule number is assigned automatically.
Cleanup: ip rule del fwmark 1
If there's a way to delete by number, I don't know it.
Example 1: simple source routing (http://lartc.org/howto/lartc.rpdb.html),
Hubert
Suppose one of my house mates only visits
hotmail and wants to pay less. This is fine with me, but you'll end up
using the low-end cable modem
fast link: local router is 212.64.94.251
slow link: local end is 212.64.78.148; local_end <--link-->
195.96.98.253
(The 212.64.0.0/16 are the local addresses)
user JOHN has ip addr 10.0.0.10 on the local subnet 10.0.0.0/8
Step 1: Create in /etc/iproute2/rt_tables a line
200 JOHN_TABLE
This makes JOHN_TABLE a synonym for 200. /etc/iproute2/rt_tables contains a
number of <num, tablename> pairs.
Step 2: have John's host use JOHN_TABLE:
# ip rule add from 10.0.0.10 lookup JOHN_TABLE
output of "ip rule list"
0: from all lookup local
32765: from 10.0.0.10 lookup JOHN_TABLE
32766: from all lookup main
32767: from all lookup default
Step 3: create JOHN_TABLE
main: default outbound route is the fast link
JOHN_TABLE: default = slow link
ip route add default via 195.96.98.253 dev ppp2 table
JOHN_TABLE
This is a standard "unicast" route.
Other options:
unreachable
blackhole
prohibit
local (route back to this host; cf "local" policyroute
table
broadcast
throw: terminate table search; go on to next
policy-routing table
nat
via (used above)
for table_id in main local default
do
ip route show table $table_id
done
Note that what we REALLY want is to limit John's bandwidth, even if we have
a single outbound link that John shares with everyone. We'll see how to do
this with tc/tbf below.
Example 2
from: http://lartc.org/howto/lartc.netfilter.html
(Hubert, ch 11)
iptables: allows MARKING packet headers (this is the fwmark).
Marking packets destined for port 25:
# iptables -A PREROUTING -i eth0 -t mangle -p tcp --dport 25 -j MARK
--set-mark 1
Let's say that we have multiple connections, one that is fast (and
expensive) and one that is slower. We would most certainly like outgoing
mail to go via the cheap route.
We've already marked the packets with a '1', we now instruct the routing
policy database to act on this:
# echo 201 mail.out >> /etc/iproute2/rt_tables
# ip rule add fwmark 1 table mail.out
# ip rule ls
0: from all lookup local
32764: from all
fwmark 1 lookup mail.out
32766: from all lookup main
32767: from all lookup default
Now we generate a route to the slow but cheap link in the mail.out table:
# ip route add default via 195.96.98.253 dev ppp0 table mail.out
Example
3: special subnet:
Subnet S of site A has its own link to the outside world.
This is easy when this link attachment point (call it RS) is
on S: other machines on S just need to be told that RS is their router.
However, what if the topology is like this:
S----R1---rest of
A ----GR----link1---INTERNET
\__link2_INTERNET
How does GR (for Gateway Router) route via link1 for most traffic, but
link2 for traffic originating in S?
Use matching on source subnet to
route via second link!
Again, we're probably going to have to set the fwmark.
LARTC section 4.2 on two outbound links.
PRwL chapter 5.2 is probably better here.
Examples from Policy
Routing with Linux
Example 5.2.2: Basic Router Filters
coreRouter
|
Actg_Router------+-----EngrRouter----192.168.2.0/24
172.17.0.0/16 |
|
corp backbone 10.0.0.0/8
Now we configure so most 10/8 traffic can't enter the side networks.
Accounting: admin denial except for 10.2.3.32/27 and 10.3.2.0/27.
Engineering test network accessible from 10.10.0.0/14
From accounting network - 172.17.0.0/16
Rules for inbound traffic
10.2.3.32/27 - full route
10.3.2.0/27 - full route
10/8
- prohibit (block everyone
else; note longest-match)
172.16/16 -
prohibit (explained in more detail in 5.2.1)
From Engineering test network - 192.168.2.0/24
10.10/14 - full
route (special subnet granted access)
10/8
- blackhole (zero
access to corporate backbone
172.17/16 -
blackhole (zero access to accounting)
172.16/16 - blackhole
Possible configuration for EngrRouter:
ip
addr add 10.254.254.253/32 dev eth0 brd 10.255.255.255
ip route add 10.10/14 scope link proto kernel dev eth0 src
10.254.254.253
ip
route add blackhole 10/8
ip route add blackhole 172.17/16
ip route add blackhole 172.16/16
Possible configuration for Actg_Router
ip
route add 10.2.3.32/27 scope link proto kernel dev eth0 src
10.254.254.252
ip route add 10.3.2.0/27 scope link proto kernel dev eth0 src
10.254.254.252
ip route add prohibit 10/8
ip route add prohibit 172.16/16
ip route add prohibit 192.168.2/24
See also http://linux-ip.net/html/routing-tables.html.
Examples from Jason Boxman, A
Traffic Control Journey: Real World Scenarios
8.1.1: setting TOS flags for OpenSSH connections. Normal ssh use is
interactive, and the TOS settings are thus for Minimize-Delay. However, ssh
tunnels are not meant for interactive use, and we probably want to reset the
TOS flags to Maximize-Throughput. If we have both tunnels and interactive
connections, then without this option our router will probably pretty much
bring ssh interactive traffic to a halt while the bulk traffic proceeds.
This is a good example of the --limit and --limit-burst options. Note the -m limit module-loading option before.
See also hashlimit in man iptables.
Other examples in this section involve tc,
and we'll look at them soon.
"ip route" v "ip rule"
To tweak a table: "ip route ..."
To administer the routing-policy database (RPDB): "ip rule ..."
The linux iproute and tc packages allow us to manage the traffic
rather than the devices. Why do we want to do this??
To tweak a table: "ip route ..."
To administer the routing-policy database (RPDB): "ip rule ..."
RPDB rules look at <srcaddr, dstaddr, in_interface, tos, fw_mark>
These rules can't look at anything else! BUT: the fw_mark field (a "virtual"
field not really in the packet) can be set with other things outside of RPDB
(like iptables).
Marking packets destined for port 25:
table: mangle; chain: PREROUTING
# iptables -A PREROUTING -i eth0 -t mangle -p tcp --dport
25 -j MARK --set-mark 1
# echo 201 mail.out >> /etc/iproute2/rt_tables
# ip rule add fwmark 1 table mail.out
;; routes on mark set above!
From ip man page:
1. Priority: 0, Selector: match anything, Action:
lookup routing table local (ID 255). The local table is a
special routing table containing high priority control routes for local and
broadcast addresses.
Rule 0 is special. It cannot be deleted or
overridden.
2. Priority: 32766, Selector: match anything, Action:
lookup routing table main (ID
254). The main table is the normal routing table
containing all non-policy routes. This rule may be deleted and/or overridden
with other ones by the administrator.
3. Priority: 32767, Selector: match anything, Action:
lookup routing table default (ID 253). The default table
is empty. It is reserved for some post-processing if no previous
default rules selected the packet. This rule may also be
deleted.
Warning: table main is updated by
routing-table protocol, RIP/EIGRP/etc
Other tables are not: if a routing
change occurs, other tables (and traffic that uses them) may be out of luck.
Traffic shaping and traffic control
Generally there's not much point in doing shaping from the bottleneck link
into a faster link. The bottleneck link has done all the shaping already!
fair queuing
Restricts bandwidth when there is competition, but allows full use
when network is idle. A user's bandwidth share is capped only when the link
is busy!
token bucket
Restricts bandwidth to a fixed rate, period (but also allows for burstiness
as per the bucket, which can always be small)
tc command:
shaping: output rate limiting (delay or drop)
scheduling: prioritizing. Classic application: sending voip ahead of bulk
traffic
policing: input rate regulation
dropping: what gets done to nonconforming traffic
Two scenarios to restrict user/host Joe:
1. Reduce absolute bandwidth (in/out?) available to Joe.
If the link is otherwise idle, Joe is still cut
2. Guarantee non-Joe users a min share; ie cap Joe's
bandwidth only when the link is busy.
qdisc: queuing discipline
You can think of this as the TYPE of queue. Examples: fifo, fifo+taildrop,
fifo+randomdrop, fair_queuing, RED, tbf
queuing disciplines are applied to INTERFACES, using the tc command.
Queuing disciplines can be "classless" or "classful" (hierarchical)
Queuing Disciplines (qdisc): does scheduling. Some also support
shaping/policing
how packets are enqueued, dequeued
- fifo + taildrop
- fifo + random drop
- RED: introduces random drops
not for policing, but to encourage good behavior by senders. Used in
core networks, not leaf
networks
- stochastic fair queuing (each
TCP connection is a flow): gives each flow a guaranteed fraction
of bandwidth, when needed. Other flavors: flows are subnets, etc.
However, if we're doing scheduling of inbound traffic, it doesn't do
much good to do sfq based on destination (unless we can do it at the
upstream router at our ISP). Some reasons for an application to open
multiple TCP connections:
- cheating SFQ limits
- much improved
high-bandwidth performance
- the data naturally divides into multiple connections
- pfifo_fast (or, generically,
pfifo): priority fifo. tc's pfifo_fast has three priority bands
built-in.
enqueuing: figure out which band
the packet goes into
dequeuing: take from band 0 if
nonempty, else 1 if nonempty, else 2
Basic "classless" qdiscs
pfifo_fast
(see man
pfifo_fast): three-band FIFO queue
Consider the following iptables command, to set the TOS bits on outbound ssh
traffic to "Minimize-Delay":
# iptables -A OUTPUT -t mangle -p tcp
--dport 22 -j TOS --set-tos Minimize-Delay
This works with pfifo_fast, which provides three bands. Band selection by
default is done using TOS bits of packet header (which you probably have to
mangle to set). See Hubert, §9.2.1.1, for a table of the TOS-to-band map.
Dequeuing algorithm (typically invoked whenever the hardware is ready for a
packet (or whenever the qdisc reports to the hardware that it has
a packet):
if there are any packets in band 1, dequeue the first one
and send it
else if there are any packets in band 2, dequeue the
first one and send it
else if there are any packets in band 3, dequeue the
first one and send it
else report no packets available
Note that in a very direct sense pfifo_fast does support three "classes" of
traffic. However, it is not considered to be classful, since
- we cannot control how traffic is classified
- we cannot attach subqueues to the individual classes
Example: queue flooding on upload
In practice, it is very important to set interactive traffic to have a
higher priority than normal traffic (eg web browsing, file downloads).
However, you don't have much control of the downlink traffic, and if the
uplink queue is on the ISP's hardware (eg their cablemodem), then you won't
have much control of the upload side.
me------<fast>------[broadband
gizmo]-----------<slow>-----...internet
In the scenario above, suppose the broadband gizmo, BG, has a queue capacity
of 30K (30 packets?). A bulk UPLOAD (though not download) will fill BG's
queue, no matter what you do at your end
with pfifo_fast. This means that every interactive packet will now
wait behind a 30KB queue to get up into the internet. As the upload packets
are sent, they will be ACKed and then your machine will replenish BG's
queue.
One approach is to reduce BG's queue. But this may not be possible.
Here's another approach:
me---<fast>---[tq_router]---<slow2>---[broadband
gizmo]---<slow>--internet
Make sure slow2 ~ slow. Then upload will fill tq_router's queue, but fast
traffic can still bypass.
Logically, can have "me" == "tq_router"
_______________________
Token bucket filter (tbf)
See man tbf or man
tc-tbf
restrict flow to a set average rate, while allowing bursts. The tbf
qdisc slows down excessive bursts to meet filter limits. This is shape-only,
no scheduling.
tbf (or hbf) is probably the preferred way of implementing bandwidth caps.
tokens are put into a bucket at a set rate. If a packet arrives:
tokens available: send immediately and decrement the
bucket
no token: drop (or wait, below)
Over the long term, your transmission rate will equal the token rate.
Over the short term, you can transmit bursts of up to token size.
Parameters:
bucketsize (burst):
how large the bucket can be
rate: rate at
which tokens are put in
limit: number of
bytes that can wait for tokens. All packets wait, in essence; if you set
this to zero then the throughput is zero. While theoretically it makes sense
to set this to zero, in practice that appears to trigger serious
clock-granularity problems.
latency: express
limit in time units
mpu: minimum
packet unit; size of an "empty" packet
Granularity issue: buckets are typically updated every 10 ms.
During 10 ms, on a 100Mbps link, 1MB can accumulate, or ~100 large packets!
Generally, tbf introduces no burstiness itself up to 1mbit (a 10 kbit packet
takes 10 ms on a 1mbit link). Beyond that, a steady stream of packets may
"bunch up" due to the every-10-ms bucket refil.
Use of tbf for bulk traffic into a modem:
# tc
qdisc add \
dev ppp0 \
root \
tbf
\
rate 220kbit \
# if actual bandwidth is 250kbit
latency 50ms \
burst 1540
# one packet
You want packets queued at your end, NOT within your modem!
Otherwise you will have no way to use pfifo_fast to have interactive traffic
leapfrog the queue.
What you REALLY want is for TBF to apply only to the bulk bands of
PFIFO_FAST.
Can't do this with TBF; that would be classFUL traffic control (though we can do that with PRIO, the classful
analogue of pfifo_fast)
This can't be used to limit a subset of the traffic!
TBF is not WORK-CONSERVING: the link can be idle and someone can have a
packet ready to send, and yet it still waits.
tbf demo: linux1, linux2, and tc
Start tcp_writer in ~/networks/java/tcp_reader (or start it on linux2). This
accepts connections to port 5431, and then sends them data as fast as it
can. Data flows from the writer to the reader. Linux1 has eth0 facing the
host system and eth1 facing linux2; tbf must be applied at the interface the
data stream exits via.
The write size is 1024 bytes, but TCP re-packages the data once it starts
queuing up, so the actual write size is typically 1448 bytes plus 14+20+32 =
66 bytes header, for 1514 bytes in all. The header adds an additional 4.6%.
tbf1:
tc qdisc add dev eth1 root tbf rate $BW
burst $BURST limit $LIMIT
tc qdisc add dev eth0 root tbf rate 1mbps burst 100kb limit 200kb
For mininet, try
tc qdisc add dev r1-eth1 root tbf rate 4000kbit
burst 100kb limit 200kb
Note: "mbps" is megaByte/sec. For a bit rate of 1 megabit/sec,
use "1mbit"
also try:
tc qdisc change dev eth1 root tbf rate newrate
burst newburst limit newlimit
This might cause a spike in kbit/sec numbers:
479
479
159
4294
564
564
...
clear with tc qdisc del dev eth1 root
tc_stats
tc -s qdisc show dev eth0
tc -s class show dev eth0
tc -d qdisc show dev eth0
tc -d class show dev eth0
demo: enable rate 1000 kbit/sec (1 kbit/ms), burst 20kb, limit 100kb. Then
try limit = 5kb v 6kb.
At the given rate, 1kb takes 8ms. The bucket is replenished every 10ms
(hardware clock limitation), so a burst should not be consumed much during a
10ms interval.
At 10mbit, 10ms is 12kb, and we won't get that rate unless burst
is set to about that.
Fair Queuing
Note that the FQ clock can be reset to zero whenever all queues are empty,
and can in fact just stay at zero until something arrives.
Linux "sfq":
Flows are individual tcp connections
NOT hosts, or subnets, etc! Can't get
this?!
Each flow is hashed by srcaddr,destaddr,port.
Each bucket is considered a separate input "pseudoqueue"
Collisions result in unfairness, so the hash function is
altered at regular intervals to minimize this.
What we really want is a way to define flow_id's, so they can be created
dynamically, and connections can be assigned to flow_id by:
sfq is schedule-only, no shaping
What you probably want is to apply this to DOWNLOADs.
tc doesn't do that if your linux box is tied directly to your broadband
gizmo. Applied to downloads at a router,
joe,mary,alice---<---1--------[router
R]------<---2----internet
it would mean that each of joe,mary,alice's connections would get 1/3 the
bw, BUT that would be 1/3 of the internal link, link 1.
Regulating shares of link 2 would have to be done upstream.
If we know that link 2 has a bandwidth of 3 Mbps, we can use CBQ (below) to
restruct each of joe,mary,alice to 1 Mbps, by controlling the outbound queue
at R into link 1.
Further sharing considerations:
If we divide by 1 flow = 1 tcp connection, joe can double throughput by
adding a second tcp connection.
If we divide by 1 flow = 1 host, we do a little better at achieving per-user
fairness, assuming 1 user = 1 machine. Linux sfq does not support this.
Linux sfq creates multiple virtual queues. It's important to realize that
there is only one physical queue; if one sender dominates that queue by
keeping it full, to the point that the other connections get less bandwidth
than their FQ share, then the later division into LOGICAL queues won't help
the underdogs much.
sfq IS work-conserving
RED
Generally intended for internet routers
We drop packets at random (at a very low rate) when queue capacity reaches a
preset limit (eg 50% of max), to signal tcp senders to slow down.
classful qdiscs
CBQ, HTB, PRIO
Disneyland example: what is the purpose of having one queue feed into
another queue?
However, under tc we can have these classful qdiscs form a tree, possibly of
considerable depth.
LARTC 9.3: (http://lartc.org/howto/lartc.qdisc.advice.html)
- To purely slow down outgoing traffic, use the Token Bucket Filter.
Works up to huge bandwidths, if you scale the bucket.
- If your link is truly full and you want to make sure that no single
session can dominate your outgoing bandwidth, use Stochastical Fairness
Queueing.
- If you have a big backbone and know what you are doing, consider
Random Early Drop (see Advanced chapter).
- To 'shape' incoming traffic which you are not
forwarding, use the Ingress Policer. Incoming shaping is called
'policing', by the way, not 'shaping'.
- If you are forwarding it,
use a TBF on the interface you are forwarding the data to. Unless you
want to shape traffic that may go out over several interfaces, in which
case the only common factor is the incoming interface. In that case use
the Ingress Policer.
- If you don't want to shape, but only want to see if your interface is
so loaded that it has to queue, use the pfifo queue (not pfifo_fast). It
lacks internal bands but does account the size of its backlog.
- Finally - you can also do "social shaping". You may not always be able
to use technology to achieve what you want. Users experience technical
constraints as hostile. A kind word may also help with getting your
bandwidth to be divided right!
Basic terminology for classful qdiscs
classes form a tree
each leaf node has a class
At each interior node, there is a CLASSIFIER algorithm (a filter) and a set
of child class nodes.
Linux: at router input(ingress), we can apply POLICING to drop packets; at
egress, we apply SHAPING to put packets in the right queue. Terminology
derives from the fact that there is no ingress queue in linux (or most
systems).
Classful Queuing Disciplines
CBQ, an acronym for 'class-based queuing', is the best known.
It is not, however, the only classful queuing discipline. And it is rather
baroque.
PRIO: divides into classes :1, :2, :3 (user-configurable this time)
dequeuing: take packet from :1 if avail; if not then go on to :2, etc
by default, packets go into the band they would go into in PFIFO_FAST, using
TOS bits. But you can use "shapers" to adjust this.
Hierarchy of PRIO queues is equivalent to the "flattened" PRIO queue.
However, a hierarchy of PRIO queues with SFQ/TBF offspring is not
"flattenable".
TBF: classic rate-based shaper; packets wait for their token
(policing version: you get dropped if you arrive before your token)
For a classless qdisc, we're done once we create it
(its parent might be nonroot, though).
For a classful qdisc, we add CLASSES to it.
Each class will then in turn have a qdisc added to it.
parent of a class is either a qdisc or a class of same type
class major numbers must match parent.
qdisc major numbers are new
each class needs to have something below it, although every class gets a
fifo qdisc by default.
We then attach a sub-qdisc to each subclass.
LARTC example:
Hubert example in 9.5.3.2 has a prio qdisc at the root.
The subclasses 1:1, 1:2, and 1:3 are automatic
for prio queuing, as is the filtering.
1: root qdisc
/ | \
/ | \
/ | \
1:1
1:2 1:3 classes, added automatically
| | |
10:
20: 30: qdiscs qdiscs
sfq
tbf sfq
band 0
1 2
Bulk traffic will go to 30:, interactive traffic to 20: or 10:.
Command lines for adding prio queue
# tc qdisc add dev eth0 root handle 1: prio
This automatically creates classes 1:1, 1:2, 1:3. We could say
tc qdisc add dev eth0 root handle 2: prio bands 5
to get bands 2:1, 2:2, 2:3, 2:4, and 2:5. Then zap with
tc qdisc del dev eth0 root
But suppose we stick with the three bands, and add:
# tc qdisc add dev eth0 parent 1:1 handle 10: sfq // prob
should be tbf too
# tc qdisc add dev eth0 parent 1:2 handle 20: tbf rate 20kbit buffer 1600
limit 3000
# tc qdisc add dev eth0 parent 1:3 handle 30:
sfq
We now get a somewhat more complex example.
Hierarchical token bucket (htb)
This is a classful version of tbf. Note that because we may now have sibling
classes, we have an important sharing
feature: each sibling class is allocated bandwidth the minimum of what
it requests and what its assigned share is; that is, each class is
guaranteed a minimum share (like fair queuing). However, the fairness
of the division in the absence of traffic from some nodes is limited by the
granularity of quantum round-robin.
pld home example
Commands to create an htb qdisc with three child classes:
1. Hosts 10.0.0.1 and 10.0.0.2 go to class :10
2. subnet 10.0.0.0/29 (10.0.0.0 - 10.0.0.7 except the above
two!) goes to class :29
3. All other traffic goes to class :100
The qdisc is placed on the interior interface of the router, to regulate inbound traffic.
We suppose that this is to control flow over a link that has a
sustained bandwidth limit of BW, a bucket size of BURST, and a peak
bandwidth of PBW. (PBW is not used here.)
BW=56 #kbps
BURST=350 #mb
tc qdisc add dev eth0 root handle 1: htb default 100
tc class add dev eth0 parent 1: classid 1:1 htb rate ${BW}kbps burst
${BURST}mb
# class 10 is limited by parent only
tc class add dev eth0 parent 1:1 classid 1:10 htb rate ${BW}kbps burst
${BURST}mb
# class 29 has same rate, but half the burst
HBURST=$(expr $BURST / 2)
tc class add dev eth0 parent 1:1 classid 1:29 htb rate ${BW}kbps burst
${HBURST}mb
# class 100 has 3/4 the refill rate, too
BW100=$(expr 3 \* $BW / 4)
tc class add dev eth0 parent 1:1 classid 1:100 htb rate ${BW100}kbps burst
${HBURST}mb
tc filter add dev eth0 parent 1:0 protocol ip u32 \
match ip dst 10.0.0.1/32 flowid 1:10
tc filter add dev eth0 parent 1:0 protocol ip u32 \
match ip dst 10.0.0.2/32 flowid 1:10
tc filter add dev eth0 parent 1:0 protocol ip u32 \
match ip dst 10.0.0.0/29 classid 1:29
;; no rule for flowid 1:100; taken care of by default rule
Actually, I can't use this, because tc does not handle bucket sizes so
large. So what I do instead is have a packet-sniffer-based usage
counter on my (linux) router, which issues tc
class change commands
to reduce any class that is over-quota. I then use more moderate values
for tc itself: I create the classes (I'm up to five now), and each one
gets a rate of ~1-10Mbit and a bucket of ~10mbyte. However, the rate is
reduced drastically when a class reaches its quota.
$RATE = 1-10 mbit
$BURST = 10 mbyte
# band 1
tc filter add dev $DEV parent 1:0 protocol ip u32 match ip dst 10.0.0.0/30
flowid 1:1
tc filter add dev $DEV parent 1:0 protocol ip u32 match ip dst 10.0.0.2/32
flowid 1:1
# band 2
tc filter add dev $DEV parent 1:0 protocol ip u32 match ip dst 10.0.0.4/30
flowid 1:2
# band 3 : gram
tc filter add dev $DEV parent 1:0 protocol ip u32 match ip dst 10.0.0.10/32
flowid 1:3
tc qdisc add dev $DEV parent 1:3 tbf rate $RATE burst $BURST limit
$LIMIT
# band 4 : upstairs
tc filter add dev $DEV parent 1:0 protocol ip u32 match ip dst 10.0.0.11/32
flowid 1:4
tc qdisc add dev $DEV parent 1:4 tbf rate $RATE burst $BURST limit
$LIMIT
# band 5:
tc filter add dev $DEV parent 1:0 protocol ip u32 match ip dst 10.0.0.0/24
flowid 1:5
tc qdisc add dev $DEV parent 1:4 tbf rate 32kbit burst 10kb limit 10kb
Note
that while iproute2 has some problems fitting into the Internet model
of one router table, tc applies just to the egress interfaces and so is
completely separate from routing.
These first two examples here, using htb, use the u32 classifier
Another HTB example
Token bucket is a simple rate-limiting strategy, but the basic version
limits everyone in one pool.
Why does hierarchical matter? Strict rate-limiting would not!
(why?)
However, HTB is not "flattenable" because you share the parent bucket with
your siblings. Consider
root
/ \
A:b=20 b=30
/ \
B:b=20 C:b=20
If B and C are both active, their respective buckets drops to about 15, as
each get half the parent's bucket.
htb lets you apply flow-specific rate limits (eg to specific users/machines)
Rate-limiting can be used to limit inbound
bandwidth to set values
above example using htb: LARTC 9.5.5.1
Functionally almost identical to the CBQ sample configuration below:
# tc qdisc add dev eth0 root handle 1: htb default 30
(30 is a reference to the 1:30 class, to be added)
# tc class add dev eth0 parent 1: classid 1:1 htb rate 6mbit burst 15k
Now here are the classes: note "ceil" in the second two. The ceil
parameter allows "borrowing": use of idle bandwidth. Parent of classes
here must be the class above. Only one class can have root as its
parent!
Sort of weird.
1:
|
1:1
/ | \
1:10 1:20 1:30
web smtp other
# tc class add dev eth0 parent 1:1 classid 1:10 htb rate 5mbit burst 15k
# tc class add dev eth0 parent 1:1 classid 1:20 htb rate 3mbit ceil 6mbit
burst 15k
# tc class add dev eth0 parent 1:1 classid 1:30 htb rate 1kbit ceil 6mbit
burst 15k
The author then recommends SFQ for beneath these classes (to replace
whatever default leaf qdisc is there)
# tc qdisc add dev eth0 parent 1:10 handle 10: sfq perturb 10
# tc qdisc add dev eth0 parent 1:20 handle 20: sfq perturb 10
# tc qdisc add dev eth0 parent 1:30 handle 30: sfq perturb 10
Add the filters which direct traffic to the right classes: here, we divide
by web/email/other
# tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 \
match ip dport 80 0xffff flowid 1:10
# tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 \
match ip sport 25 0xffff flowid 1:20
htb demo
We can set up the traffic flowing from linux2:tcp_writer to
valhal:tcp_reader, again
tcp_reader linux2 5431 2001
tcp_reader linux2 5431 2002
tcp_reader linux2 5431 2003
Then we can regulate the three flows using htb:
tc qdisc add dev eth0 root handle 1: htb
default 10
tc class add dev eth0 parent 1: classid 1:1 htb rate 5000kbps ceil
5000kbps burst 100kb
tc class add dev eth0 parent 1:1 classid 1:11 htb rate 1000kbps ceil
5000kbps burst 100kb
tc class add dev eth0 parent 1:1 classid 1:12 htb rate 1000kbps ceil
5000kbps burst 100kb
tc class add dev eth0 parent 1:1 classid 1:13 htb rate 1000kbps ceil
5000kbps burst 100kb
tc filter add dev eth0 parent 1:0 protocol ip u32 match ip dport $PORT1
0xffff flowid 1:11
tc filter add dev eth0 parent 1:0 protocol ip u32 match ip dport $PORT2
0xffff flowid 1:12
tc filter add dev eth0 parent 1:0 protocol ip u32 match ip dport $PORT3
0xffff flowid 1:13
Things to try:
1. Change the root rate
2. Make one flow's rate=ceil, and see how the other flows' traffic divides
When do we need htb, and when is plain tbf enough?
2.5: classifying/filtering:
fwmark:
# iptables -A PREROUTING -i eth0 -t mangle -p tcp --dport 25 -j MARK
--set-mark 1
mangle table & CLASSIFY
iptables -t mangle -A POSTROUTING -o eth2 -p tcp --sport 80 -j
CLASSIFY --set-class 1:10
u32:
allows matching on bits of packet headers. u32 is
completely stateless (that is, it doesn't remember past connection
state; it is strictly a per-packet matcher). The underlying matches are
all numeric, but there are preset symbolic names for some fields, to
help. See u32 examples above
(repeated here:)
# tc filter add dev eth0 parent 1:0 protocol ip prio 1
u32 match ip sport 80 0xffff flowid 1:3
# tc filter add dev eth0 parent 1:0 protocol ip prio 1
u32 match ip sport 25 0xffff flowid 1:4
3. Example for restricting bandwidth used by a single host (or subnet): From
www.roback.cc/howtos/bandwidth.php
uses cbq, which does some approximate calculations to limit the flow of
subclasses.
Also uses mangling/fwmark to do classification
DNLD = download bandwidth
UPLD = upload bandiwidth
DWEIGHT/UWEIGHT: weighting factors, more or less 1/10 of DNLD/UPLD
tc qdisc add dev eth0 root handle 11: cbq bandwidth 100Mbit avpkt 1000 mpu
64
tc class add dev eth0 parent 11:0 classid 11:1 cbq rate $DNLD weight
$DWEIGHT \
allot 1514 prio 1 avpkt 1000 bounded
tc filter add dev eth0 parent 11:0 protocol ip handle 4 fw flowid 11:1
tc qdisc add dev eth1 root handle 10: cbq bandwidth 10Mbit avpkt 1000 mpu 64
tc class add dev eth1 parent 10:0 classid 10:1 cbq rate $UPLD weight
$UWEIGHT \
allot 1514 prio 1 avpkt 1000 bounded
tc filter add dev eth1 parent 10:0 protocol ip handle 3 fw flowid 10:1
Now MARK the incoming packets, from the designated subnet:
# Mark packets to route
# Upload marking
$IPTABLES -t mangle -A FORWARD -s 192.168.0.128/29 -j MARK --set-mark 3
$IPTABLES -t mangle -A FORWARD -s 192.168.0.6 -j MARK --set-mark 3
# Download marking
$IPTABLES -t mangle -A FORWARD -s ! 192.168.0.0/24 -d 192.168.0.128/29 -j
MARK --set-mark 4
$IPTABLES -t mangle -A FORWARD -s ! 192.168.0.0/24 -d 192.168.0.6 -j MARK
--set-mark 4