Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revisionBoth sides next revision
docs:guide-user:network:traffic-shaping:sqm-details [2018/11/14 10:08] – [FAQ] typo fix moeller0docs:guide-user:network:traffic-shaping:sqm-details [2022/11/17 20:00] – [SQM: Link Layer Adaptation Tab] moeller0
Line 1: Line 1:
-====== SQM - The Details ======+====== SQM Details ======
  
 If you want to set up SQM to minimize bufferbloat, you should start at the [[docs:guide-user:network:traffic-shaping:sqm|SQM Howto]] page.   If you want to set up SQM to minimize bufferbloat, you should start at the [[docs:guide-user:network:traffic-shaping:sqm|SQM Howto]] page.  
Line 10: Line 10:
 Smart Queue Management (SQM) is our name for an intelligent combination of better packet scheduling (flow queueing) techniques along with with active queue length management (AQM). Smart Queue Management (SQM) is our name for an intelligent combination of better packet scheduling (flow queueing) techniques along with with active queue length management (AQM).
  
-OpenWrt/LEDE has full capability of tuning the network traffic control parameters. If you want to do the work, you can read the full description at the [[doc/howto/traffic.control|Traffic Control HOWTO.]] You may still find it useful to get into all the details of classifying and prioritizing certain kinds of traffic, but the SQM algorithms and scripts (fq_codel, cake, and sqm-scripts) require a few minutes to set up, and work as well or better than most hand-tuned classification schemes.+OpenWrt/LEDE has full capability of tuning the network traffic control parameters. If you want to do the work, you can read the full description at the [[:docs:guide-user:network:traffic-shaping:packet.scheduler|QoS HOWTO]]You may still find it useful to get into all the details of classifying and prioritizing certain kinds of traffic, but the SQM algorithms and scripts (fq_codel, cake, and sqm-scripts) require a few minutes to set up, and work as well or better than most hand-tuned classification schemes.
  
 Current versions of OpenWrt/LEDE have SQM, fq_codel, and cake built in. These algorithms were developed as part of the [[http://www.bufferbloat.net/projects/cerowrt/wiki|CeroWrt]] project. They have been tested and refined over the last four years, and have been accepted back into OpenWrt, the Linux Kernel, and in dozens of commercial offerings.  Current versions of OpenWrt/LEDE have SQM, fq_codel, and cake built in. These algorithms were developed as part of the [[http://www.bufferbloat.net/projects/cerowrt/wiki|CeroWrt]] project. They have been tested and refined over the last four years, and have been accepted back into OpenWrt, the Linux Kernel, and in dozens of commercial offerings. 
Line 85: Line 85:
 ==== SQM: Link Layer Adaptation Tab ==== ==== SQM: Link Layer Adaptation Tab ====
  
-Set the Link Layer Adaptation options based on your connection to the InternetThe general rule for selecting the Link Layer Adaption is:+The purpose of Link Layer Adaptation is to give the shaper more knowledge about the actual size of the packets so it can calculate how long packets will take to sendWhen the upstream ISP technology adds overhead to the packet, we should try to account for it. This primarily makes a big difference for traffic using small packets, like VOIP or gaming traffic. If a packet is only 150 bytes and say 44 bytes are added to it, then the packet is 29% larger than expected and so the shaper will be under-estimating the bandwidth used if it doesn't know about this overhead.
  
-  * Choose **ATM: select for e.g. ADSL1, ADSL2ADSL2+** and set the Per-packet Overhead to 44 bytes if you use any kind of DSL/ADSL connection to the Internet (that is, if you get your internet service through a telephone line). +Getting this value exactly right is less important than getting it close, and over-estimating by a few bytes is generally better at keeping bufferbloat down than underestimating. With this in mindto get started, set the Link Layer Adaptation options based on your connection to the Internet. The general rule for selecting the Link Layer Adaption is:
-  * Choose **Ethernet with overhead: select for e.g. VDSL2** and set the Per-packet Overhead to 22 if you know you have a VDSL2 connection. If you have a cable modem, see the **Ethernet with Overhead** details below. +
-  * Choose **none (default)** if you use Fiber, direct Ethernet, or another kind of connection to the Internet. All the other parameters will be ignored.+
  
-If you are not sure what kind of link you have, first try using "None", then run the Quick Test for Bufferbloat. If the results are good, you’re done. Nexttry the ATM choice, then the Ethernet choice to see which performs bestRead the Details (belowto learn more about tuning the parameters for your link.+  * Choose **ATM: select for e.g. ADSL1, ADSL2, ADSL2+** and set the Per-packet Overhead to 44 bytes if you use any kind of DSL/ADSL connection to the Internet other than a modern VDSL high speed connection (20+Mbps). In other words if you have your internet service through a copper telephone line at around 1 or 2Mbps. 
 +  * Choose **Ethernet with overhead: select for e.g. VDSL2** and set the Per-packet Overhead to 34 if you know you have a VDSL2 connection (this is sometimes called Fiber to the Cabinet, for example in the UK). VDSL connections operate at 20-100Mbps over higher quality copper lines. If you are sure that PPPoE is not in use, you can reduce this to 26. 
 +  * If you have a cable modem, with a coaxial cable connector, you can try 22 bytes, or see the **Ethernet with Overhead** details below. If your contracted rate is greater than 760 Mbps set overhead 42 (mpu 84) as the ethernet link to the modem now affects worst case per-packet-overhead. 
 +  * Choose **Ethernet with overhead** if you have an actual Fiber to the Premises or metro-Ethernet connection and set the Per-Packet Overhead to 44 bytes. This can be reduced somewhat for example if you know you are not using VLAN tags, but will usually work well. 
 +  * Choose **none (default)** if you have some reason to not include overhead. All the other parameters will be ignored. 
 + 
 +If you are not sure what kind of link you have, first try using Ethernet with Overhead and set 44 bytes. Then run the Quick Test for Bufferbloat. If the results are good, you’re done. If you get your internet through an old-style copper wired phone line and your speeds are less than a couple of megabitsyou have ATM so see above for the ATM entry. If you have a slow connection such as less than 2Mbps in either direction and/or you regularly use several VOIP calls at once while gaming etc (so that more than 10 to 20% of your bandwidth is small packets) then it can be worth it to tune the overhead more carefully, see below for extra details 
 + 
 +An important exception to the above rules is when the bandwidth limit is set by the ISP's traffic shaper, not by the equipment that talks to the physical line. Let's consider an example. The ISP sells a 15 Mbit/s package and enforces this limit, but lets the ADSL modem connect at whatever speed is appropriate for the line. And the modem "thinks" (as confirmed in its web interfacethat 18 Mbps is appropriate. In this case, the ATM Link Layer Adaptation is likely inappropriate, because the ISP's shaper is the only relevant speed limiter, and it does not work at the ATM level. In fact, it is more likely to work at the IP level, which means that **none** is the appropriate setting.
  
 **Link Layer Adaptation - the details…** **Link Layer Adaptation - the details…**
  
-Various link-layer transmission methods affect the rate that data is transmitted/received. Setting the Link Layer properly helps SQM make accurate predictions, and improves performance. +Various link-layer transmission methods affect the rate that data is transmitted/received. Setting the Link Layer properly helps SQM make accurate predictions, and improves performance. There are several components of overhead, the first comes from the basic transport technology itself:
  
-  * **ATM:** It is especially important to set the Link Layer Adaptation on links that use ATM framing (almost all DSL/ADSL links do), because ATM adds five additional bytes of overhead to a 48-byte frame. Unless the SQM algorithm knows to account for the ATM framing bytes, short packets will appear to take longer to send than expected, and will be penalized. For true ATM links, one often can measure the real per-packet overhead empirically, see https://github.com/moeller0/ATM_overhead_detector for further information how to do that. +  * **ATM:** It is especially important to set the Link Layer Adaptation on links that use ATM framing (almost all DSL/ADSL links do), because ATM adds five additional bytes of overhead to a 48-byte frame. Unless the SQM algorithm knows to account for the ATM framing bytes, short packets will appear to take longer to send than expected, and will be penalized. For true ATM links, one often can measure the real per-packet overhead empirically, see https://github.com/moeller0/ATM_overhead_detector for further information how to do that. Getting the mpu right is tricky since ATM/AAL5 can either include the FCS or not, but setting mu to 96 should be save (that results in 2 ATM cells)
-  * **Ethernet with Overhead:** SQM can also account for the overhead imposed by VDSL2 links - add 22 bytes of overhead. Cable Modems (DOCSIS) are known to typically use 28 bytes of overhead in the upstream direction but only 14 bytes in the downstream direction. If your version of SQM only supports to specify one value for the overhead, select 28 Bytes… UPDATE: while the reported numbers are not wrong per se, it turned out that the user traffic shaper mandated? by DOCSIS systems only account for L2 ethernet frames including the frame check sequence (FCS), so the 2017 recommendations for cable users is to set both up- and downstream overhead to 18 bytes (6 bytes source MAC, 6 bytes destination MAC, 2 bytes ether-type, 4 bytes FCS). +  * **Ethernet with Overhead:** SQM can also account for the overhead imposed by //VDSL2// links - add 22 bytes of overhead (mpu 68). Cable Modems (//DOCSIS//) set both up- and downstream overhead to 18 bytes (6 bytes source MAC, 6 bytes destination MAC, 2 bytes ether-type, 4 bytes FCS), to allow for a possible 4 byte VLAN tag it is recommended to set the overhead to 18 + 4 = 22 (mpu 64); if you want to set shaper rates greater than 760 Mbps set overhead 42 (mpu 84) as now the worst case per-packet-overhead is on the ethernet link to the modem. For //FTTH// the answer is less clear cut, since different underlaying technologies have different relevant per-packet-overheads; however underestimating the per-packet-overhead is considerably worse for responsiveness than (gently) overestimating it, so for //FTTH// set the overhead to 44 (mpu 84) unless there is more detailed information about the true overhead on a link available
-  * **None:** Fiber, and direct Ethernet connections generally do not need any kind of link layer adaptation. Well, that is a joke, all shaping below the physical gross-rate requires correct per-packet overhead accounting, but for fiber and ethernet it is much harder to figure out the exact overhead to specify... (the question is typically how is the ISP's upstream traffic shaper configured). For true ethernet shaping without VLAns specify 38 bytes.+  * **None:** All shaping below the physical gross-rate of a link requires correct per-packet overhead accounting to be preciseso **None** is only useful if approximate shaping is sufficient, say if you want to clamp a guest network to at best ~50% of the available capacity or similar tasks, but even then configuring an approximate correct per-packet-overhead is recommended (overhead 44 (mpu 84) is a decent default to pick).
  
-Please note that if your IS uses a mandatory VLAN on the bottleneck link, please add 4 additional bytes to the overhead to tell sqm to properly account for it. 
  
-The "Advanced Link Layer" choices are relevant if you are sending packets larger than 1500 bytes. This would be unusual for most home setups, since ISPs generally limit traffic to 1500 byte packets. UPDATE 2017, most recent link technologies will transfer complete L2 ethernet frames including the FCS; that in turn means that they will effectively all inherit the ethernet minimal packet size of 64 bytes. It is hence recommended to set tcMPU to 64. Note that most (but not all) ATM based links will exclude the FCS and hence probably do not require that setting. As of March 2017 sqm-scripts does not evaluate tcMPU if cake is selected as "link layer adaptation mechanism". In that case add "mpu 64" to the advanced option strings for ingress and egress. As of middle of 2018 sqm-scripts will try to evaluate tcMPU for cake also.+In addition to those overheads it is common to have VLAN tags (4 extra bytes) or PPPoE encapsulation (8 bytes) or even more exotic issues such as ipv4 provided over ipv6 in the DS-Lite scheme (where ipv4 packets experience a 40 byte ipv6 header overhead). Because of these variables and the fact that overestimation is generally better, we offer the default suggested sizes in the first table). 
 + 
 + 
 +The "Advanced Link Layer" choices are relevant if you are sending packets larger than 1500 bytes. This would be unusual for most home setups, since ISPs generally limit traffic to 1500 byte packets. UPDATE 2017, most recent link technologies will transfer complete L2 ethernet frames including the FCS; that in turn means that they will effectively all inherit the ethernet minimal packet size of 64 bytes. It is hence recommended to set tcMPU to 64 (the actual values depends on the link technology and ranges from 0-96bytes). Note that most (but not all) ATM based links will exclude the FCS and hence probably do not require that setting. As of March 2017 sqm-scripts does not evaluate tcMPU if cake is selected as "link layer adaptation mechanism". In that case add "mpu 64" to the advanced option strings for ingress and egress. As of middle of 2018 sqm-scripts will try to evaluate tcMPU for cake also.  
 +Getting the mpu right seems not overly important at first since it only affects the accounting of the smallest of packets and will only be relevant if a link is saturated. But often especially DOCSIS/cable links are close to an 1/40 asymmetry between up- and downstream, and that is the same 1/40 ratio between data and reverse ACK traffic rates/volumes for TCP (think TCP Reno), and that in turn means that when saturating the Downstream-direction with a big TCP-download the upstream direction will als be close to saturated with ACK packets, and pure ACK packets can actually fall under the mpu limit resulting in both a saturated link and mostly small packets that need to be accounted with a size of >= mpu. In short getting the mpu right is not a purely theoretical exercise.
  
 Please note that as of middle 2018 cake, and cake only, will try to interpret any given overhead to be applied on top of IP packets, all other qdiscs (and cake if configured with the "raw" keyword) will add the specified overhead on top of the overhead the kernel already accounted for. This seems confusing, because it is ;) so if in doubt stick to cake. Please note that as of middle 2018 cake, and cake only, will try to interpret any given overhead to be applied on top of IP packets, all other qdiscs (and cake if configured with the "raw" keyword) will add the specified overhead on top of the overhead the kernel already accounted for. This seems confusing, because it is ;) so if in doubt stick to cake.
  
 Unless you are experimenting, you should use the default choice for the link layer adaptation mechanism. This will select cake if cake is used as qdisc other wise tc_stab. Unless you are experimenting, you should use the default choice for the link layer adaptation mechanism. This will select cake if cake is used as qdisc other wise tc_stab.
 +
 +
 +Now, the real challenge with the shaper gross rate and the per-packet-overhead is that they are not independent; say a link has a true gross rate of 100 rate-units and a true per-packet-overhead of 100 bytes (numbers are unrealistic, but allow for easier math) and an payload size of 1000 bytes, the expected throughput at the ethernet payload level is:
 +<code>
 +gross-rate * ((payload-size) / (pay_load-size + per-packet-overhead))
 +100 * ((1000) / (1000+100)) = 90.91
 +</code>
 +
 +now, any combination of gross-shaper rate and per-packet-overhead, that results in a throughput <= 90.91 will effectively remove bufferbloat (that is not fully correct for downstream shaping, but the logic also holds if we aim for say, 90% of 90.91 instead).
 +so in the extreme we can set the per-packet-overhead to 0 as long as we also set the shaper gross speed to 90.91:
 +<code>
 +90.91 * (1000+0) / (1000) = 90.91
 +90.91 * ((1000) / (1000+0)) = 90.91
 +</code>
 +
 +or the other way around, if we set the per-packet-overhead to an absurd 1000 bytes, we still will see the expected throughput if we also configure the shaper gross rate at 182:
 +<code>
 +90.91 * (1000+1000) / (1000) = 181.82
 +181.82 * ((1000) / (1000+ 1000)) = 90.91
 +</code>
 +
 +To sanity check whether a given combination of gross rate and per-packet-overhead seems sane (say, there is too little information about the true link properties available to make an educated guess) ione needs to repeat speedtests at different packet sizes. The following stanza added to /etc/firewall.user will use OpenWrt's MSS clamping to bidirectionally force the MTU to 216 (as e.g. Macosx will not accept smaller MSS values by default)
 +
 +<code>
 +# special rules to allow MSS clamping for in and outbound traffic                                                                   
 +# use ip6tables -t mangle -S ; iptables -t mangle -S to check                                                                       
 +forced_MSS=216                                                                                                                      
 +                                                                                                                                    
 +# affects both down- and upstream, egress seems to require at least 216                                                             
 +iptables -t mangle -A FORWARD -p tcp -m tcp --tcp-flags SYN,RST SYN -m comment --comment "custom: Zone wan MTU fixing" -j TCPMSS --set-mss ${forced_MSS}                                                                                                               
 +ip6tables -t mangle -A FORWARD -p tcp -m tcp --tcp-flags SYN,RST SYN -m comment --comment "custom6: Zone wan MTU fixing" -j TCPMSS  --set-mss ${forced_MSS}     
 +</code>
 +
 +Now, if we plug this into the numbers from above we get (note, MSS is the TCP/IP payload size, which in the IPv4 case is 40 bytes smaller than the ethernet payload):
 +<code>
 +100 * ((216+40) / (216+40+100)) = 68.3544303797 # as expected the throughput is smaller, since the fraction of overhead is simply larger
 +</code>
 +
 +now, if we underestimated the per-packet-overhead we get:
 +<code>
 +90.91 * ((216) / (216 +0)) = 90.91
 +</code>
 +since 90 >> 68 we will admit too much data into the link and will encounter bufferbloat.
 +
 +And the reverse error:
 +<code>
 +181.82 * ((216) / (216 + 1000)) = 32.2969736842
 +</code>
 +here we do not get bufferbloat (since 32 << 68) but we sacrifice way to much throughput.
 +
 +So the proposal is to "optimize" shaper gross-rate and per-packet-overhead at the normal MSS value and then measure at a considerable smaller MSS to confirm whether both bufferbloat and throughput are still acceptable.
 +
 +Please note one additional challenge here: testing a saturating load with small(er) packets will result in a considerably higher rate of packets the router needs to process (e.g. if you switch from MSS 1460 to MSS 146 you can expect ~10 times as many packets) and not all routers are capable of saturating a link with small packets, so for this test it is essential to confirm that the router does not run out of CPU cycles to process the data and as a consequence that the measured throughput is close to the theoretically expected one.
 +
 +
 +Please note to compare throughput measured with on-line speedtests with the theoretical prediction the following approximate formula can be used:
 +<code>
 +gross-rate * ((IP-packet-size - IP-header-size - TCP-header-size) / (IP-packet-size + per-packet-overhead))
 +e.g. for an ethernet link (effectively 38B overhead) with a VLAN tag (4B) and PPPoE (6+2=8B), IPv4 (without options: 20B), TCP (with rfc 1323 timestamps: 20+12=32B) 
 +one can expect ~93% throughput
 +100 * ((1500 - 8 - 20 - 20 - 12) / (1500 + 38 + 4)) = 93.39
 +</code>
 ===== Selecting the optimal queue setup script ===== ===== Selecting the optimal queue setup script =====
  
Line 135: Line 206:
  
 The key message of this note is that the right setup script for you will depend on your connection, your router and your LAN clients. It pays off to test the various setup scripts. The key message of this note is that the right setup script for you will depend on your connection, your router and your LAN clients. It pays off to test the various setup scripts.
-===== Making cake sing and dance, on a tight rope without a safety net =====+===== Making cake sing and dance, on a tight rope without a safety net (aka advanced features) =====
  
 **By now, we hope the SQM message has been clear: stick to the defaults and use cake.**  **By now, we hope the SQM message has been clear: stick to the defaults and use cake.** 
Line 153: Line 224:
   - Cake can use the information about true source and destination addresses to control traffic from/to internal external hosts by true IP address, not per-stream.   - Cake can use the information about true source and destination addresses to control traffic from/to internal external hosts by true IP address, not per-stream.
  
-Cake's original isolation mode was based on //flows//: each stream was isolated from all the others, and the link capacity was divided evenly between all active streams independent of IP addresses. More recently cake switched to triple-isolate, which will first make sure that no internal _or_ internal host will hog too much bandwidth and then will still guarantee for fairness for each host.+Cake's original isolation mode was based on //flows//: each stream was isolated from all the others, and the link capacity was divided evenly between all active streams independent of IP addresses. **More recently Cake switched** to ''triple-isolate'', which will first make sure that no internal __or__ external host will hog too much bandwidth and then will still guarantee for fairness for each host.
 In that mode, Cake mostly does the right thing. In that mode, Cake mostly does the right thing.
 It would ensure that no single stream and no single host could hog all the capacity of the WAN link.  It would ensure that no single stream and no single host could hog all the capacity of the WAN link. 
Line 162: Line 233:
 **To enable Per-Host Isolation** Add the following to the “Advanced option strings” (in the //Interfaces -> SQM-QoS// page; //Queue Discipline// tab, look for the //Dangerous Configuration// options): **To enable Per-Host Isolation** Add the following to the “Advanced option strings” (in the //Interfaces -> SQM-QoS// page; //Queue Discipline// tab, look for the //Dangerous Configuration// options):
  
-For queueing disciplines handling incoming packets from the internet (internet-ingress): ''nat dual-dsthost''+For queueing disciplines handling incoming packets from the internet (internet-**ingress**): ''nat dual-dsthost''
  
-For queueing disciplines handling outgoing packets to the internet (internet-egress): ''nat dual-srchost''+For queueing disciplines handling outgoing packets to the internet (internet-**egress**): ''nat dual-srchost'' 
 + 
 +Please note the addition of the ingress keyword to the “Advanced option strings” 
 + 
 +Regarding cake's "ingress" keyword: 
 +Conceptually a traffic shaper will drop and/or delay packets in a way that the rate of packets leaving the shaper is smaller or equal to the configured shaper-rate. This works well on egress, but for post-bottleneck shaping as is typical for the internet ingress (the download direction) this is not ideal. For this kind of shaping we actually want to make sure that there is as little as possible packet-backspill into the upstream devices buffers (if those buffers where sized and managed properly we would not need to shape on ingress in the first place). And to avoid backspill we need to make sure that the combined rate of packets coming into the upstream device (rarely) exceeds the bottleneck-link's true capacity. The "ingress" keyword instructs cake to basically try to keep the incoming packet rate <= the configured shaper rate. This leads to slightly more aggressive dropping, but this also ameliorates one issue we have with post-bottleneck shaping, namely the inherent dependency of the required bandwidth "sacrifice" with the expected number of concurrent bulk flows. As far as I can tell the more aggressive dropping in ingress-mode automatically scales with the load and hence it should make it possible to get away with configuring an ingress-mode rate closer to the true bottleneck-rate, and actually also get higher throughput if only few bulk flows are active. For further reference I recommend to have a look at cake's source at https://github.com/dtaht/sch_cake?files=1  
 + 
 +With the 'ingress' keyword, the above example for incoming packets from the internet would be ''nat dual-dsthost ingress''
  
 **Notes:** **Notes:**
Line 176: Line 254:
   * At some point in time, these advanced cake options may become better integrated into luci-app-sqm, but for the time being this is the way to make cake sing and dance…    * At some point in time, these advanced cake options may become better integrated into luci-app-sqm, but for the time being this is the way to make cake sing and dance… 
    
-  * This discussion assumes SQM is instantiated on an interface that directly faces the internet/WAN. If it is not (e.g., on a LAN port) the meaning of ingress/egress flips. In that case, specify egress queueing disciplines as ''nat dual-dsthost'' and the ingress one as ''nat dual-srchost''.+  * This discussion assumes SQM is instantiated on an interface that directly faces the internet/WAN. If it is not (e.g., on a **LAN port**) the meaning of ingress/egress **flips** and now your **Download** has to put it in **Upload speed (kbit/s) (egress)** and your **Upload** has to put it in **Download speed (kbit/s) (ingress)**, also **don't have to add the** ''nat'' **option**  on **LAN interfaces** (this option should only be used in the **WAN interface**) or **Per-Host Isolation** stops working. In that case, just add in **egress queueing disciplines** ''dual-dsthost ingress'' and in **ingress queueing disciplines** ''dual-srchost'' (remember that ingress/egress **flips** on **LAN interfaces**).
 ===== FAQ ===== ===== FAQ =====
  
Line 201: Line 279:
 if you want the typical "shape my internet access" configuration. </wrap>         | if you want the typical "shape my internet access" configuration. </wrap>         |
  
 +
 +**Measured goodput in speed tests with SQM is considerably lower than without**
 +
 +Traffic shaping is relative CPU intensive, not necessarily as sustained load. To be able to keep buffering in the device driver low SQM only releases small amounts of packets into the next layer (often the device driver). To keep throughput up, the qdisc now only has the small time window from handing the last packets up to the diver and the point these packets will be transmitted at the desired shaper rate to hand more packets to the driver. If SQM does not get access to the CPU inside that time window, it will effectively not use some nominal transmit opportunities, and hence throughput will stay below the configured rate.
 +
 +
 +One sign of such under-throughput by CPU-overload is that the CPU rarely falls idle. Aquick an dirty test for that is to run `top -d 1` and watch the %idle column in one of the upper rows, if that gets too close to 0% (you need to generate a load, like a speedtest and while this test runs observe the %idle column and try to get a feel what the minimum %idle is that shows up) sqm is likely CPU bound. Since that test will only show aggregate usage over full second intervals, but SQM operates on smaller time windows, often an observed min %idle of 10% already indicates CPU limitations to sqm. Please note that for multicore routers reading %idle gets more complicated, as 50% idle on a dual core, might mean one core is fully loaded and one is idle (bad if sqm runs on the overloaded core) or that both cores are loaded only 50%. Please note that htop (opkg update ; opkg install htop`) has a per CPU stats display that can be toggled by pressing t and includes the traditional mode as well:
 +
 +  top - 11:29:29 up 12 days, 14:42,  0 users,  load average: 0.06, 0.02, 0.00
 +  Tasks: 158 total,   1 running, 157 sleeping,   0 stopped,   0 zombie
 +  %Cpu0  :  2.0 us,  0.0 sy,  0.0 ni, 97.0 id,  0.0 wa,  0.0 hi,  1.0 si,  0.0 st
 +  %Cpu1  :  3.0 us,  1.0 sy,  0.0 ni, 93.1 id,  0.0 wa,  0.0 hi,  3.0 si,  0.0 st
 +
 +This should allow to eye-ball whether a single core might be pegged. In this example without load, both CPUs idle > 90% of the time, no sign of any overload ;)
 +Pressing F1 in htop, shows the color legend for the CPU bars, and F2 Setup -> Display options ->  Detailed CPU time (System/IO-Wait/Hard-IRQ/Soft-IRQ/Steal/Guest), allows to enable the display of the important (for network loads) Soft-IRQ category.
 +
 +
 +Also to make things even more complicated, CPU power/frequency scaling (to save power) can interfere negatively with SQM. Probably due to SQM's bursty nature it might not be recognised by the power governor and the CPU (that at 100% is capable of shaping to the desired sqm rate) gets too slow to service SQM in time and throughput suffers, and due to the burstyness the governor might never realise it should scale frequency back up. This can be remedied by trying to optimise the transition rules for up-scaling frequency/power or by switching to a non-scaling governor. The former requires a bit of trial and error but maintains power saving, while the latter probably is easier to achieve and hence might be a good way to figure out whether power saving might be an issue in the first place. @experts, please feel free to elaborate on which power save settings are worth exploring.
 +
 +** How do I get cake to consider IPv6 traffic in a 6in4 tunnel as separate flows?**
 +
 +See [[:docs:guide-user:network:ipv6:ipv6_henet#in4_with_cake_sqm|6in4 with cake config]]
  
 ===== Troubleshooting SQM ===== ===== Troubleshooting SQM =====
Line 244: Line 344:
  
 Finally, copy/paste the entire session into your report. Finally, copy/paste the entire session into your report.
 +
 +===== MORE HINTS & TIPS & INFO =====
 +
 +How I use CAKE to control bufferbloat and fair share my Internet connection on Openwrt.
 +
 +CAKE has been my go to solution to bufferbloat for years, not just because of solving bufferbloat but also for fairer sharing of my link.  Whilst CAKE has some sensible defaults there are a few extra options & tweaks that can improve things further.
 +
 +This note assumes that you have an Internet facing interface, usually eth0 and will call traffic leaving that interface TO the ISP egress traffic.  Traffic received on that interface FROM the ISP is called ingress traffic.  This interface is usually connected to an ISP's modem.
 +
 +Controlling egress bufferbloat to the ISP's modem is relatively straightforward.  If you ensure that traffic doesn't arrive at the modem faster than it can pass to the ISP then bufferbloat within the ISP's modem is eliminated.  This involves using something called a shaper to control how quickly data leaves eth0.  CAKE has a built in packet shaper that times the release of packets out of the interface so as to not overload the upstream modem.  Packets also tend to accumulate extra data or overhead the closer they get to the ISP's link.  For example VDSL modem links will have some framing overhead and may also acquire a 4 byte VLAN tag.  CAKE's shaper is also able to take into account a wide variety of overheads adjusting its timed release mechanism to cope.
 +
 +For my ISP (Sky UK) things are straightforward.  They basically run ethernet over PTM over VDSL2 and I have an 80mbit ingress / 20mbit egress link.  Looking at my modem's status page I can see that the egress of 20mbit is achieved but the ingress is slightly lower than 80mbit.  BT are the incumbent VDSL2 infrastructure provider and require a VLAN ID 101 tag on packets across the link so we have to account for that.  Similarly there's a minimum packet size limit, which empirically I'll set to 72 bytes, based on the smallest packets we ever see on the ingress side.  so we're already at a stage where we can start specifying things to cake:
 +
 +egress: 19950 bridged-ptm ether-vlan mpu 72
 +ingress: 78000 bridged-ptm ether-vlan mpu 72 ingress
 +
 +We'll come back to the 'ingress' option a little later.
 +
 +CAKE also has a means of fair sharing bandwidth amongst hosts.  A simple example/question:  If I have two hosts one which starts 2 data streams and the other starts 8, then one host would get 80% of the bandwidth allocated to it and the other 20%.  From a fairness point of view it would be better if the available bandwidth was split evenly across the active hosts irrespective of the number of flows each host starts, such that one host cannot obtain all of the bandwidth just because it starts all of the transfers.
 +
 +By default CAKE does 'triple-isolate' fair sharing which fair shares across both source machine addresses (internal lan) and destination addresses (external wan). In other words (and a bit simplistically) google's hosts cannot monopolise all the bandwidth from Apple's hosts (or Microsoft, Facebook, github etc) in the same way that one internal host cannot monopolise all the bandwidth.
 +
 +There is a small fly in this ointment in the form of IPv4 Network Address Translation (NAT) where typically the ISP subscriber is given one Internet facing IPv4 address in which all the internal LAN traffic (usually in 192.168.x.x) is masqueraded behind.  Since all the internal addresses have been hidden behind this one external Internet address, how can CAKE fair share across the internal hosts?  If CAKE is running on the device performing IPv4 NAT then it can look into the device's NAT tables to determine the internal addresses and then base the fairness on that.  Unfortunately it's not the default, so we have to specify it:
 +
 +egress: 19950 bridged-ptm ether-vlan nat mpu 72
 +ingress: 78000 bridged-ptm ether-vlan nat ingress
 +
 +
 +In fact what I do is force cake to only worry about internal fairness.  In other words I care that my internal machines get a fair share of the traffic irrespective of number of flows each machine has, but I don't care if external machines are unbalanced (eg. google vs netflix)
 +
 +egress: 19950 dual-srchost bridged-ptm ether-vlan nat mpu 72
 +ingress 78000 dual-dsthost bridged-ptm ether-vlan nat ingress
 +
 +
 +Having dealt with host fairness we now need to deal with flow fairness & control.  A full link isn't a bad thing, in fact it's the best thing since we're using all the bandwidth that we're paying for.  What is bad is when access to that link cannot be achieved in a timely manner because an excessive queue has built up in front of our packet.  CAKE prevents the queue building up and ensures fair access by using a variation of codel to control delay (latency) on individual flows.
 +
 +Simplistically, codel works by looking at how long a packet has been in a queue at the time it is scheduled to be sent and if it is too old the packet gets dropped.  This may seem like madness after all 'isn't packet loss bad?', well packet loss is a mechanism that TCP uses to determine whether it is sending too much data and overflowing a link capacity.  So shooting the right packets at the right time is actually a fundamental signalling mechanism to avoid excessive queueing.  Some TCP stacks/connections support another signalling mechanism called ECN, whereby packets are marked or flagged instead of being dropped.  CAKE supports ECN marking too.
 +
 +CAKE looks at each data flow in turn and either releases or drops a packet from each flow to match the shapers' schedule and by dropping packets from a flow before it has built up a significant queue is able to keep each flow under control.
 +
 +ingress mode modifies how CAKE's shaper accounts for dropped packets, in essence they still count to the bandwidth used calculation even though they're dropped - this makes sense, since they arrived with us but we decided that the particular flow was occupying too much bandwidth so we dropped a packet to signal the other end to slow down.  The shaper on egress doesn't count dropped packets, instead it looks in the queues to find a more worthy packet to occupy the space.  The bottom line, if you're trying to control ingress packet flow use ingress mode, else don't.
 +
 +
 +Traffic classification.
 +
 +At this point we have a reasonably fair system.  Flow fairness has been handled by a variation codel and we deliberately make that unfair if required by ensuring per-host fairness.  But what if I have some types of traffic that are less or more important than others?  CAKE deals with traffic classification by dividing it up into priority tins based on packet DSCP (diffserv) values.  By default CAKE uses a 3 tin classification mode called 'diffserv3' Other modes of possible interest are 'besteffort', 'diffserv4' and 'diffserv8' Besteffort effectively ignores DSCP values, every packet is as important as the other and so just flow & host fairness come into play.  The diffserv modes split DSCP into an increasing number of tins.  I prefer diffserv4 as I then have 4 traffic categories in increasing order of importance: 'Bulk', 'Best Effort', 'Video', 'Voice' CAKE enforces minimum bandwidth limits for each category, Voice gets a minimum of 1/4 bandwidth, Video 1/2, Bulk 1/16th and Best Effort has a minimum of all of it(!?) Note these are minimums so if 'Video' needed all the bandwidth and there was no other competing traffic in any other category it can by all means take it.  Similarly, the lowest priority tin 'Bulk' can have all the capacity if there's no other traffic, though it is guaranteed 1/16th of the bandwidth in order to prevent it from being completely starved.  Best effort having the full bandwidth as a minimum appears mad but what it is in essence saying is it can have whatever is left from full - (1/2 + 1/4 + 1/16)
 +
 +I use diffserv4 over diffserv3 because of the 'Bulk' category - in other words I have somewhere to de-prioritise traffic to, eg. a long term download or upload (think network backup) probably isn't important in 'wall clock time' but I don't want it disturbing general web browsing or worse video streaming or worse voip/facetime calls.  I've had backups running for days at Bulk, vacuuming up all the spare upload capacity and it's completely unnoticeable (my network monitoring tells me there's an average 2mS increase in latency with peaks up to 4mS)
 +
 +egress: 19950 diffserv4 dual-srchost bridged-ptm ether-vlan nat mpu 72
 +ingress 78000 diffserv4 dual-dsthost bridged-ptm ether-vlan nat ingress
 +
 +
 +The cherry on top.
 +
 +This is only really relevant for egress traffic and for asymmetric links.  In essence TCP acknowledgements can be sitting in a queue waiting to be sent.  We only really need the newest ack to be sent, since it acknowledges everything the 'old' acks acknowledge - so let's not send too many of the old acks - it saves a little egress bandwidth.
 +
 +egress: 19950 diffserv4 dual-srchost bridged-ptm ether-vlan nat mpu 72 ack-filter
 +ingress 78000 diffserv4 dual-dsthost bridged-ptm ether-vlan nat ingress
 +
 +
  • Last modified: 2024/06/14 10:22
  • by moeller0