%include "ahu.mgp" %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page %pcache 0 %charset "iso8859-1" #%filter "./resettime" #%endfilter #%filter "./counttime.pl 1" #%endfilter Traffic Shaping in Linux explained %center it's not that you are thick - it's really hard! %size 5, fore "red", center No manual available. Ask me, if you have problems (only try to guess answer yourself at first 8)) -- Alexey N. Kuznetsov in README.iproute2+tc %size 4, right, fore "yellow" bert hubert PowerDNS BV ahu@ds9a.nl http://ds9a.nl/ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page #%filter "./counttime.pl 2" #%endfilter Goals %size 9 This presentation seeks to explain: %prefix 20, size 12 The concepts involved How they correlate What you can achieve %prefix 0 %size 8 In other words, `the big picture', not documentation! %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page #%filter "./counttime.pl 3" #%endfilter Stuff I will talk about %prefix 20 Concepts Queueing Disciplines Token Bucket Filter CBQ Stochastic Fairness Queueing Classes CBQ (hard, complex) HTB (easy, accurate) Configuration basics Filters Real life examples %prefix 0 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page #%filter "./counttime.pl 4" #%endfilter Concepts Queueing Disciplines Determine how packets get sent Classes Some qdiscs contain other qdiscs Filters Select a qdisc/class for a packet %%%%%%%%%%%% %page #%filter "./counttime.pl 5" #%endfilter Queues are our friend and enemy Queues sit between userspace and the interface and determine how data \ gets %cont, fore "red" SENT %cont, fore "white" . Queues: * Buffer output in excess of your bandwidth This prevents packetloss in case of bursty traffic * Create latency, which hurts interactivity Your keystrokes must traverse a long queue. TCP/IP tries to fill any queue \ you offer it! %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page #%filter "./counttime.pl 6" #%endfilter Queues: between user & hardware Applications send data to the kernel. Kernel enqueues data to the queuing \ discipline and immediately tries to run the queue ('kick') to the hardware. %pause queue_run() dequeue()s as much packets as the network adapter will accept, \ or until the queue is empty, or no longer wants to send - This is what we call \ 'shaping' %%%%%%%%%%%%%%% %page #%filter "./counttime.pl 7" #%endfilter The Queue you are already using Default queue is "pfifo_fast", which has three bands - dequeue first returns \ data from 'front band' - this corresponds to TOS settings, 'minimum delay'. A typical pfifo_fast queue might look like this: %font "courier" [Port22 Port6667 ICMP] -> [Port80 Port80] -> eth0 [Port25 Port25] %font "standard" %page #%filter "./counttime.pl 8" #%endfilter A sample CBQ configuration %size 5,center 40 parameters for giving one host 5mbit, the other 1mbit! %size 3,left # tc qdisc add dev eth1 root handle 1:0 cbq bandwidth 100Mbit avpkt 1000 cell 8 %size 3 # tc class add dev eth1 parent 1:0 classid 1:1 cbq bandwidth 100Mbit rate \ 100Mbit weight 10Mbit prio 8 allot 1514 cell 8 maxburst 20 avpkt 1000 %size 3 # tc class add dev eth0 parent 1:1 classid 1:1280 cbq bandwidth 100Mbit \ rate 5Mbit weight 1Mbit prio 5 allot 1514 cell 8 maxburst 20 avpkt 1000 bounded # tc qdisc add dev eth0 parent 1:1280 tbf rate 5Mbit buffer 10Kb/8 limit \ 15Kb mtu 1500 # tc filter add dev eth0 parent 1:0 protocol ip prio 100 u32 match ip \ src 1.2.3.4/32 flowid 1:1280 # tc class add dev eth1 parent 1:1 classid 1:1281 cbq bandwidth 100Mbit \ rate 1Mbit weight 200Kbit prio 5 allot 1514 cell 8 maxburst 20 avpkt 1000 bounded # tc qdisc add dev eth1 parent 1:1281 tbf rate 1Mbit buffer 10Kb/8 limit \ 15Kb mtu 1500 # tc filter add dev eth1 parent 1:0 protocol ip prio 100 u32 match ip \ src 1.2.3.5/0 flowid 1:1281 %%%%%%%%%%%%% %page #%filter "./counttime.pl 9" #%endfilter Why all this complexity? Conceptually, this can be expressed as: # tc2 -s 1.2.3.4 -o eth1 5mbit/s # tc2 -s 1.2.3.5 -o eth1 1mbit/s Sadly it turns out that more parameters are needed, but let's see \ if we can do a better job. %%%%%%%%%%%%%% %page #%filter "./counttime.pl 10" #%endfilter Very Simpleminded Shaper, 1Mbit/s Kernel timer, every 10ms -> 10kbit per 'tick' Each packet 1400*8= ~10000bits = 10kbit httpd sends 5 packets which are enqueued and dequeued immediately to the \ shaper interface. %font "courier" -> 4 3 2 1 0 -> shp0 -> eth0 %pause %font "standard" Each tick, shaper is 'kicked' and determines how many packets it can send. %font "courier" -> 4 3 2 1 -> 0 -> eth0 %pause -> 4 3 2 -> 1 -> eth0 %pause -> 4 -> 3 2 -> eth0 %pause %font "standard" Works pretty well! %%%%%%%%%%5 %page #%filter "./counttime.pl 11" #%endfilter At 10mbit/s At higher rates, this starts to suck. 100 times per second we suddenly \ queue 10 or more packets to eth0 - which leads to choppy performance. With smaller packets, even 1mbit/s leads to major problems, hundreds of \ packets are then sent instantaneously to the interface. Alan Cox wrote a shaper like this, advises it only for <100kbit/s purposes. %%%%%%%%%%% %page #%filter "./counttime.pl 12" #%endfilter The Token Bucket Filter Qdisc Rapid operation, high bandwidth shaping, Network-friendly operations %font "courier" %mark, pause |X| |X| |X| |_| 3 2 1 0 | %again, mark, pause |X| | | |X| | | |X| | | |_| |_| 3 2 1 0 | 4 3 | 2 1 0 %again, mark, pause |X| | | |X| | | ZzzZz |X| | | |_| |_| 3 2 1 0 | 4 3 | 2 1 0 %font "standard" Until bucket is full, TBF gets a constant rate of new tokens. If out of \ tokens and a dequeue() comes in, will throttle for just the right amount of \ *ticks*. %font "courier" %again, pause |X| | | | | |X| | | | | |X| | | | | |_| |_| |_| 3 2 1 0 | 4 3 | 2 1 0 4 | 3 2 . %%%%%%%% %page #%filter "./counttime.pl 13" #%endfilter TBF details Timer ticks again? This is not a problem because as long as we are not exceeding our \ bandwidth, we don't need to wait! Waits keep us slow enough. %pause Overflowing the interface again? A large bucket might still empty too fast - real TBF has a \ tiny additional bucket to limit the speed of emptying. %page #%filter "./counttime.pl 14" #%endfilter TBF configuration %font "courier" # tc qdisc add dev ppp0 root tbf \ rate 220kbit limit 3000 \ burst 1500 %pause,font "standard" Parameters: dev ppp0 root tbf add directly to ppp0, a token bucket filter %pause rate 220kbit 220kbit worth of tokens/second added %pause limit 3000 burst 1500 3000 bytes buffer, 1500 bytes in bucket max %%%%%%%%%%%5 %page #%filter "./counttime.pl 15" #%endfilter Stochastic Fairness Queuing SFQ is a pure queue - it never delays, only reorders. %pause * Each 'session' may send packet in turn %pause * Uses hash to conserve memory %pause * Perturbs hash to restore fairness %pause * Often needs an additional shaper to be useful %page #%filter "./counttime.pl 16" #%endfilter The CBQ Queueing Discipline Unshaped 100mbit/s link, 40mbit/s traffic Average packet=10000bits, 100 usec/packet: %font "courier" [activity] | |_______^^^^_______^^^^_______^^^^ | +-------------------------->[time] 100us %font "standard" Link is idle 60% of time -> 150usec between packets %page #%filter "./counttime.pl 17" #%endfilter CBQ for shaping * Calculates for each packet if it came earlier or later \ than the calculated idle prediction %pause * In case of <40mbit/s load and average packets, average \ time difference will be >0. Too much traffic, <0. %pause * Moving average idle has an upper cap. If it is too negative, queue \ is shut down for a number of ticks! %page #%filter "./counttime.pl 18" #%endfilter CBQ is complex & doesn't work as a shaper! Due to lack of documentation and complexity, people assumed \ that CBQ is the tool to use. %pause * Needs to know 'link speed' - (pppoe, pptp?) %pause * Gets confused by different packet sizes %pause * Has to approximate idle time (no way to measure in Linux) %pause * Has way too many knobs - HTB is better & simpler! %page #%filter "./counttime.pl 19" #%endfilter Classes for multiple queuing disciplines %pause * Queueing disciplines delay, reorder or drop data %pause * A Classful queue can also divide bandwidth %pause * Classful queues *contain* other queues - it is not a tree! %pause * Three classful queues: CBQ, HTB, PRIO %page #%filter "./counttime.pl 20" #%endfilter Flow of packets in a classful queue %font "courier" %mark classful qdisc +-------- |- qdisc1 | write()-> |- qdisc2 | |- qdisc3 +-------- %pause,again,mark classful qdisc +-------- |- qdisc1 | write()-> |- qdisc2 | |- qdisc3 +-------- filter() determines where to enqueue() %pause,again,mark classful qdisc +-------- |- qdisc1 | write()-> |- qdisc2 | |* qdisc3 +-------- filter() determines where to enqueue() %pause,again,mark classful qdisc +----------+ |- qdisc1 | | | write()-> |- qdisc2 | ->dequeue() | | |* qdisc3 | +----------+ filter() determines where to enqueue() %pause,again,mark classful qdisc +----------+ |- qdisc1 | | | write()-> |- qdisc2 | ->dequeue() | | |* qdisc3 | +----------+ filter() tries dequeue().. determines where to enqueue() %pause,again,mark classful qdisc +----------+ |- qdisc1 ?| | | write()-> |- qdisc2 | ->dequeue() | | |* qdisc3 | +----------+ filter() tries dequeue().. determines where to enqueue() %pause,again,mark classful qdisc +----------+ |- qdisc1 ?| | | write()-> |- qdisc2 ?| ->dequeue() | | |* qdisc3 | +----------+ filter() tries dequeue().. determines where to enqueue() %pause,again,mark classful qdisc +----------+ |- qdisc1 ?| | | write()-> |- qdisc2 ?| ->dequeue() | | |* qdisc3 !| +----------+ filter() tries dequeue().. determines until success! where to enqueue() %pause, font "standard" This is the 'PRIO' qdisc. %page #%filter "./counttime.pl 21" #%endfilter CBQ and HTB qdiscs %leftfill As shown before, CBQ is not a good shaper. And it is complex too. %pause Hierarchial Token Bucket Shares same shaping qualities as TBF %pause Easy link sharing Limit certain kinds of traffic, prioritize others %pause Easy hierarchial sharing Multiple agencies, multiple kinds of traffic %pause One drawback Not in the main kernel yet. You must lobby! %page #%filter "./counttime.pl 22" #%endfilter Configuration basics We start at the root of the device, ppp0: %font "courier" # tc qdisc add dev ppp0 root \ handle 1: htb %font "standard" Installs HTB as the root qdisc, names it 1:0. %pause %font "courier" # tc qdisc add dev ppp0 parent 1: \ classid 1:1 htb rate 100kbps \ burst 2k %font "standard" This attaches a shaping HTB to the HTB root, 100kbps with a 2k bucket. %pause %font "courier" %page #%filter "./counttime.pl 23" #%endfilter Now add the classes %font "courier" # tc class add dev ppp0 \ parent 1:1 classid 1:10 htb \ rate 10kbps ceil 50kbps burst 2k # tc class add dev ppp0 \ parent 1:1 classid 1:11 htb \ rate 90kbps burst 2k %font "standard" The first class is guaranteed 10kbps of the 100kbps, but can grow to 50, if available. The second class however can take up to 90kbits. %page #%filter "./counttime.pl 24" #%endfilter Filtering to classify traffic When a packet enters the qdisc, it needs to be classified. \ This is done with 'tc filters', which have their own non-iptables syntax: %font "courier" # U32="tc filter add dev ppp0 \ protocol ip parent 1:0 prio 1\ u32" # $U32 match ip dport 25 0xffff \ flowid 1:10 # $U32 match ip sport 80 0xffff \ flowid 1:11 %font "standard" The u32 match is *very* generic and can match everthing. \ Baroque syntax, however. %page #%filter "./counttime.pl 25" #%endfilter Advanced Queues within Queues ADSL connection, hosting a webserver, mailserver, ssh traffic * Always need some room for ssh %pause * Email attachments should not hurt httpd %pause * scp must not hurt interactivity %pause * Prevent big queues in consumer modem %page #%filter "./counttime.pl 26" #%endfilter HTB setup with bells & whistles 128kbit upstream. %font "courier" +---------------------+ | +-----------------+ | | | [ SSH, prio 1,] | | | | [ SFQ, 128kb ] | | | | | | | | [ WEB, 128kbit] | | -|-+ [ burst 20k, ] +-|- | | [ prio 2,PFIFO] | | | | | | | | [ SMTP, prio3 ] | | | | [ SFQ, 128kbit] | | | +-----------------+ | +---------------------+ %font "standard" Combines PRIO, TBF and CBQ! %page #%filter "./counttime.pl 27" #%endfilter UN Mission: Satellite links & VoIP 5 locations, geostationary satellite links 600ms latency! Full mesh network, 4 transmitters Linux IP^3 machines for TCP/IP trickery, 'IPMAX' VoIP, HTTP & SMTP all over a single 128kbit connection Requirements: As many phonecalls as can fit Mail should come in speedily Webbrowsing should work Exchange Webmail in Brussels %page #%filter "./counttime.pl 28" #%endfilter Even Linux is not magic, but.. Network was broken by design and the requirements where conflicting. However, following tricks helped: * Removal of Cisco cruft * 5 additional Linux machines Iptables MSS clamp Agressive Queuecontrol Shaping (CBQ+TBF) Iptables driven MRTG * Adjustment of expectations %page #%filter "./counttime.pl 28" #%endfilter More information & Questions %center, size 9 On my homepage: http://ds9a.nl Or, if You Are Feeling Lucky: google://linux+routing