Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revisionBoth sides next revision
docs:guide-user:network:traffic-shaping:sqm-details [2020/12/25 11:35] – htop configuration additions moeller0docs:guide-user:network:traffic-shaping:sqm-details [2022/11/17 20:00] – [SQM: Link Layer Adaptation Tab] moeller0
Line 10: Line 10:
 Smart Queue Management (SQM) is our name for an intelligent combination of better packet scheduling (flow queueing) techniques along with with active queue length management (AQM). Smart Queue Management (SQM) is our name for an intelligent combination of better packet scheduling (flow queueing) techniques along with with active queue length management (AQM).
  
-OpenWrt/LEDE has full capability of tuning the network traffic control parameters. If you want to do the work, you can read the full description at the [[doc/howto/traffic.control|Traffic Control HOWTO.]] You may still find it useful to get into all the details of classifying and prioritizing certain kinds of traffic, but the SQM algorithms and scripts (fq_codel, cake, and sqm-scripts) require a few minutes to set up, and work as well or better than most hand-tuned classification schemes.+OpenWrt/LEDE has full capability of tuning the network traffic control parameters. If you want to do the work, you can read the full description at the [[:docs:guide-user:network:traffic-shaping:packet.scheduler|QoS HOWTO]]You may still find it useful to get into all the details of classifying and prioritizing certain kinds of traffic, but the SQM algorithms and scripts (fq_codel, cake, and sqm-scripts) require a few minutes to set up, and work as well or better than most hand-tuned classification schemes.
  
 Current versions of OpenWrt/LEDE have SQM, fq_codel, and cake built in. These algorithms were developed as part of the [[http://www.bufferbloat.net/projects/cerowrt/wiki|CeroWrt]] project. They have been tested and refined over the last four years, and have been accepted back into OpenWrt, the Linux Kernel, and in dozens of commercial offerings.  Current versions of OpenWrt/LEDE have SQM, fq_codel, and cake built in. These algorithms were developed as part of the [[http://www.bufferbloat.net/projects/cerowrt/wiki|CeroWrt]] project. They have been tested and refined over the last four years, and have been accepted back into OpenWrt, the Linux Kernel, and in dozens of commercial offerings. 
Line 91: Line 91:
   * Choose **ATM: select for e.g. ADSL1, ADSL2, ADSL2+** and set the Per-packet Overhead to 44 bytes if you use any kind of DSL/ADSL connection to the Internet other than a modern VDSL high speed connection (20+Mbps). In other words if you have your internet service through a copper telephone line at around 1 or 2Mbps.   * Choose **ATM: select for e.g. ADSL1, ADSL2, ADSL2+** and set the Per-packet Overhead to 44 bytes if you use any kind of DSL/ADSL connection to the Internet other than a modern VDSL high speed connection (20+Mbps). In other words if you have your internet service through a copper telephone line at around 1 or 2Mbps.
   * Choose **Ethernet with overhead: select for e.g. VDSL2** and set the Per-packet Overhead to 34 if you know you have a VDSL2 connection (this is sometimes called Fiber to the Cabinet, for example in the UK). VDSL connections operate at 20-100Mbps over higher quality copper lines. If you are sure that PPPoE is not in use, you can reduce this to 26.   * Choose **Ethernet with overhead: select for e.g. VDSL2** and set the Per-packet Overhead to 34 if you know you have a VDSL2 connection (this is sometimes called Fiber to the Cabinet, for example in the UK). VDSL connections operate at 20-100Mbps over higher quality copper lines. If you are sure that PPPoE is not in use, you can reduce this to 26.
-  * If you have a cable modem, with a coaxial cable connector, you can try 22 bytes, or see the **Ethernet with Overhead** details below.+  * If you have a cable modem, with a coaxial cable connector, you can try 22 bytes, or see the **Ethernet with Overhead** details below. If your contracted rate is greater than 760 Mbps set overhead 42 (mpu 84) as the ethernet link to the modem now affects worst case per-packet-overhead.
   * Choose **Ethernet with overhead** if you have an actual Fiber to the Premises or metro-Ethernet connection and set the Per-Packet Overhead to 44 bytes. This can be reduced somewhat for example if you know you are not using VLAN tags, but will usually work well.   * Choose **Ethernet with overhead** if you have an actual Fiber to the Premises or metro-Ethernet connection and set the Per-Packet Overhead to 44 bytes. This can be reduced somewhat for example if you know you are not using VLAN tags, but will usually work well.
   * Choose **none (default)** if you have some reason to not include overhead. All the other parameters will be ignored.   * Choose **none (default)** if you have some reason to not include overhead. All the other parameters will be ignored.
Line 103: Line 103:
 Various link-layer transmission methods affect the rate that data is transmitted/received. Setting the Link Layer properly helps SQM make accurate predictions, and improves performance. There are several components of overhead, the first comes from the basic transport technology itself: Various link-layer transmission methods affect the rate that data is transmitted/received. Setting the Link Layer properly helps SQM make accurate predictions, and improves performance. There are several components of overhead, the first comes from the basic transport technology itself:
  
-  * **ATM:** It is especially important to set the Link Layer Adaptation on links that use ATM framing (almost all DSL/ADSL links do), because ATM adds five additional bytes of overhead to a 48-byte frame. Unless the SQM algorithm knows to account for the ATM framing bytes, short packets will appear to take longer to send than expected, and will be penalized. For true ATM links, one often can measure the real per-packet overhead empirically, see https://github.com/moeller0/ATM_overhead_detector for further information how to do that. +  * **ATM:** It is especially important to set the Link Layer Adaptation on links that use ATM framing (almost all DSL/ADSL links do), because ATM adds five additional bytes of overhead to a 48-byte frame. Unless the SQM algorithm knows to account for the ATM framing bytes, short packets will appear to take longer to send than expected, and will be penalized. For true ATM links, one often can measure the real per-packet overhead empirically, see https://github.com/moeller0/ATM_overhead_detector for further information how to do that. Getting the mpu right is tricky since ATM/AAL5 can either include the FCS or not, but setting mu to 96 should be save (that results in 2 ATM cells)
-  * **Ethernet with Overhead:** SQM can also account for the overhead imposed by VDSL2 links - add 22 bytes of overhead. Cable Modems (DOCSIS) set both up- and downstream overhead to 18 bytes (6 bytes source MAC, 6 bytes destination MAC, 2 bytes ether-type, 4 bytes FCS). +  * **Ethernet with Overhead:** SQM can also account for the overhead imposed by //VDSL2// links - add 22 bytes of overhead (mpu 68). Cable Modems (//DOCSIS//) set both up- and downstream overhead to 18 bytes (6 bytes source MAC, 6 bytes destination MAC, 2 bytes ether-type, 4 bytes FCS), to allow for a possible 4 byte VLAN tag it is recommended to set the overhead to 18 + 4 = 22 (mpu 64); if you want to set shaper rates greater than 760 Mbps set overhead 42 (mpu 84) as now the worst case per-packet-overhead is on the ethernet link to the modem. For //FTTH// the answer is less clear cut, since different underlaying technologies have different relevant per-packet-overheads; however underestimating the per-packet-overhead is considerably worse for responsiveness than (gently) overestimating it, so for //FTTH// set the overhead to 44 (mpu 84) unless there is more detailed information about the true overhead on a link available
-  * **None:** Fiber, and direct Ethernet connections generally do not need any kind of link layer adaptation. Well, I am kidding, all shaping below the physical gross-rate requires correct per-packet overhead accounting, but for fiber and ethernet it is much harder to figure out the exact overhead to specify... (the question is typically how is the ISP's upstream traffic shaper configured). For true ethernet shaping without VLANs specify 38 bytes.+  * **None:** All shaping below the physical gross-rate of a link requires correct per-packet overhead accounting to be preciseso **None** is only useful if approximate shaping is sufficient, say if you want to clamp a guest network to at best ~50% of the available capacity or similar tasks, but even then configuring an approximate correct per-packet-overhead is recommended (overhead 44 (mpu 84) is a decent default to pick).
  
  
Line 111: Line 111:
  
  
-The "Advanced Link Layer" choices are relevant if you are sending packets larger than 1500 bytes. This would be unusual for most home setups, since ISPs generally limit traffic to 1500 byte packets. UPDATE 2017, most recent link technologies will transfer complete L2 ethernet frames including the FCS; that in turn means that they will effectively all inherit the ethernet minimal packet size of 64 bytes. It is hence recommended to set tcMPU to 64. Note that most (but not all) ATM based links will exclude the FCS and hence probably do not require that setting. As of March 2017 sqm-scripts does not evaluate tcMPU if cake is selected as "link layer adaptation mechanism". In that case add "mpu 64" to the advanced option strings for ingress and egress. As of middle of 2018 sqm-scripts will try to evaluate tcMPU for cake also.+The "Advanced Link Layer" choices are relevant if you are sending packets larger than 1500 bytes. This would be unusual for most home setups, since ISPs generally limit traffic to 1500 byte packets. UPDATE 2017, most recent link technologies will transfer complete L2 ethernet frames including the FCS; that in turn means that they will effectively all inherit the ethernet minimal packet size of 64 bytes. It is hence recommended to set tcMPU to 64 (the actual values depends on the link technology and ranges from 0-96bytes). Note that most (but not all) ATM based links will exclude the FCS and hence probably do not require that setting. As of March 2017 sqm-scripts does not evaluate tcMPU if cake is selected as "link layer adaptation mechanism". In that case add "mpu 64" to the advanced option strings for ingress and egress. As of middle of 2018 sqm-scripts will try to evaluate tcMPU for cake also.  
 +Getting the mpu right seems not overly important at first since it only affects the accounting of the smallest of packets and will only be relevant if a link is saturated. But often especially DOCSIS/cable links are close to an 1/40 asymmetry between up- and downstream, and that is the same 1/40 ratio between data and reverse ACK traffic rates/volumes for TCP (think TCP Reno), and that in turn means that when saturating the Downstream-direction with a big TCP-download the upstream direction will als be close to saturated with ACK packets, and pure ACK packets can actually fall under the mpu limit resulting in both a saturated link and mostly small packets that need to be accounted with a size of >= mpu. In short getting the mpu right is not a purely theoretical exercise.
  
 Please note that as of middle 2018 cake, and cake only, will try to interpret any given overhead to be applied on top of IP packets, all other qdiscs (and cake if configured with the "raw" keyword) will add the specified overhead on top of the overhead the kernel already accounted for. This seems confusing, because it is ;) so if in doubt stick to cake. Please note that as of middle 2018 cake, and cake only, will try to interpret any given overhead to be applied on top of IP packets, all other qdiscs (and cake if configured with the "raw" keyword) will add the specified overhead on top of the overhead the kernel already accounted for. This seems confusing, because it is ;) so if in doubt stick to cake.
  
 Unless you are experimenting, you should use the default choice for the link layer adaptation mechanism. This will select cake if cake is used as qdisc other wise tc_stab. Unless you are experimenting, you should use the default choice for the link layer adaptation mechanism. This will select cake if cake is used as qdisc other wise tc_stab.
 +
 +
 +Now, the real challenge with the shaper gross rate and the per-packet-overhead is that they are not independent; say a link has a true gross rate of 100 rate-units and a true per-packet-overhead of 100 bytes (numbers are unrealistic, but allow for easier math) and an payload size of 1000 bytes, the expected throughput at the ethernet payload level is:
 +<code>
 +gross-rate * ((payload-size) / (pay_load-size + per-packet-overhead))
 +100 * ((1000) / (1000+100)) = 90.91
 +</code>
 +
 +now, any combination of gross-shaper rate and per-packet-overhead, that results in a throughput <= 90.91 will effectively remove bufferbloat (that is not fully correct for downstream shaping, but the logic also holds if we aim for say, 90% of 90.91 instead).
 +so in the extreme we can set the per-packet-overhead to 0 as long as we also set the shaper gross speed to 90.91:
 +<code>
 +90.91 * (1000+0) / (1000) = 90.91
 +90.91 * ((1000) / (1000+0)) = 90.91
 +</code>
 +
 +or the other way around, if we set the per-packet-overhead to an absurd 1000 bytes, we still will see the expected throughput if we also configure the shaper gross rate at 182:
 +<code>
 +90.91 * (1000+1000) / (1000) = 181.82
 +181.82 * ((1000) / (1000+ 1000)) = 90.91
 +</code>
 +
 +To sanity check whether a given combination of gross rate and per-packet-overhead seems sane (say, there is too little information about the true link properties available to make an educated guess) ione needs to repeat speedtests at different packet sizes. The following stanza added to /etc/firewall.user will use OpenWrt's MSS clamping to bidirectionally force the MTU to 216 (as e.g. Macosx will not accept smaller MSS values by default)
 +
 +<code>
 +# special rules to allow MSS clamping for in and outbound traffic                                                                   
 +# use ip6tables -t mangle -S ; iptables -t mangle -S to check                                                                       
 +forced_MSS=216                                                                                                                      
 +                                                                                                                                    
 +# affects both down- and upstream, egress seems to require at least 216                                                             
 +iptables -t mangle -A FORWARD -p tcp -m tcp --tcp-flags SYN,RST SYN -m comment --comment "custom: Zone wan MTU fixing" -j TCPMSS --set-mss ${forced_MSS}                                                                                                               
 +ip6tables -t mangle -A FORWARD -p tcp -m tcp --tcp-flags SYN,RST SYN -m comment --comment "custom6: Zone wan MTU fixing" -j TCPMSS  --set-mss ${forced_MSS}     
 +</code>
 +
 +Now, if we plug this into the numbers from above we get (note, MSS is the TCP/IP payload size, which in the IPv4 case is 40 bytes smaller than the ethernet payload):
 +<code>
 +100 * ((216+40) / (216+40+100)) = 68.3544303797 # as expected the throughput is smaller, since the fraction of overhead is simply larger
 +</code>
 +
 +now, if we underestimated the per-packet-overhead we get:
 +<code>
 +90.91 * ((216) / (216 +0)) = 90.91
 +</code>
 +since 90 >> 68 we will admit too much data into the link and will encounter bufferbloat.
 +
 +And the reverse error:
 +<code>
 +181.82 * ((216) / (216 + 1000)) = 32.2969736842
 +</code>
 +here we do not get bufferbloat (since 32 << 68) but we sacrifice way to much throughput.
 +
 +So the proposal is to "optimize" shaper gross-rate and per-packet-overhead at the normal MSS value and then measure at a considerable smaller MSS to confirm whether both bufferbloat and throughput are still acceptable.
 +
 +Please note one additional challenge here: testing a saturating load with small(er) packets will result in a considerably higher rate of packets the router needs to process (e.g. if you switch from MSS 1460 to MSS 146 you can expect ~10 times as many packets) and not all routers are capable of saturating a link with small packets, so for this test it is essential to confirm that the router does not run out of CPU cycles to process the data and as a consequence that the measured throughput is close to the theoretically expected one.
 +
 +
 +Please note to compare throughput measured with on-line speedtests with the theoretical prediction the following approximate formula can be used:
 +<code>
 +gross-rate * ((IP-packet-size - IP-header-size - TCP-header-size) / (IP-packet-size + per-packet-overhead))
 +e.g. for an ethernet link (effectively 38B overhead) with a VLAN tag (4B) and PPPoE (6+2=8B), IPv4 (without options: 20B), TCP (with rfc 1323 timestamps: 20+12=32B) 
 +one can expect ~93% throughput
 +100 * ((1500 - 8 - 20 - 20 - 12) / (1500 + 38 + 4)) = 93.39
 +</code>
 ===== Selecting the optimal queue setup script ===== ===== Selecting the optimal queue setup script =====
  
Line 161: Line 224:
   - Cake can use the information about true source and destination addresses to control traffic from/to internal external hosts by true IP address, not per-stream.   - Cake can use the information about true source and destination addresses to control traffic from/to internal external hosts by true IP address, not per-stream.
  
-Cake's original isolation mode was based on //flows//: each stream was isolated from all the others, and the link capacity was divided evenly between all active streams independent of IP addresses. **More recently Cake switched** to ''triple-isolate'', which will first make sure that no internal _or_ internal host will hog too much bandwidth and then will still guarantee for fairness for each host.+Cake's original isolation mode was based on //flows//: each stream was isolated from all the others, and the link capacity was divided evenly between all active streams independent of IP addresses. **More recently Cake switched** to ''triple-isolate'', which will first make sure that no internal __or__ external host will hog too much bandwidth and then will still guarantee for fairness for each host.
 In that mode, Cake mostly does the right thing. In that mode, Cake mostly does the right thing.
 It would ensure that no single stream and no single host could hog all the capacity of the WAN link.  It would ensure that no single stream and no single host could hog all the capacity of the WAN link. 
Line 222: Line 285:
  
  
-One sign of such under-throughput by CPU-overload is that the CPU rarely falls idle. Aquick an dirty test for that is to run `top -d 1` and watch the %idle column in one of the upper rows, if that gets too close to 0% (you need to generate a load, like a speedtest and while this test runs observe the %idle column and try to get a feel what the minimum %idle is that shows up) sqm is likely CPU bound. Since that test will only show aggregate usage over full second intervals, but SQM operates on smaller time windows, often an observed min %idle of 10% already indicates CPU limitations to sqm. Please note that for multicore routers reading %idle gets more complicated, as 50% idle on a dual core, might mean one core is fully loaded and one is idle (bad if sqm runs on the overloaded core) or that both cores are loaded only 50%. Please note that stop (òpkg update ; opkg install htop`) has a per CPU stats display that can be toggled by pressing t and includes the traditional mode as well:+One sign of such under-throughput by CPU-overload is that the CPU rarely falls idle. Aquick an dirty test for that is to run `top -d 1` and watch the %idle column in one of the upper rows, if that gets too close to 0% (you need to generate a load, like a speedtest and while this test runs observe the %idle column and try to get a feel what the minimum %idle is that shows up) sqm is likely CPU bound. Since that test will only show aggregate usage over full second intervals, but SQM operates on smaller time windows, often an observed min %idle of 10% already indicates CPU limitations to sqm. Please note that for multicore routers reading %idle gets more complicated, as 50% idle on a dual core, might mean one core is fully loaded and one is idle (bad if sqm runs on the overloaded core) or that both cores are loaded only 50%. Please note that htop (opkg update ; opkg install htop`) has a per CPU stats display that can be toggled by pressing t and includes the traditional mode as well:
  
   top - 11:29:29 up 12 days, 14:42,  0 users,  load average: 0.06, 0.02, 0.00   top - 11:29:29 up 12 days, 14:42,  0 users,  load average: 0.06, 0.02, 0.00
Line 235: Line 298:
 Also to make things even more complicated, CPU power/frequency scaling (to save power) can interfere negatively with SQM. Probably due to SQM's bursty nature it might not be recognised by the power governor and the CPU (that at 100% is capable of shaping to the desired sqm rate) gets too slow to service SQM in time and throughput suffers, and due to the burstyness the governor might never realise it should scale frequency back up. This can be remedied by trying to optimise the transition rules for up-scaling frequency/power or by switching to a non-scaling governor. The former requires a bit of trial and error but maintains power saving, while the latter probably is easier to achieve and hence might be a good way to figure out whether power saving might be an issue in the first place. @experts, please feel free to elaborate on which power save settings are worth exploring. Also to make things even more complicated, CPU power/frequency scaling (to save power) can interfere negatively with SQM. Probably due to SQM's bursty nature it might not be recognised by the power governor and the CPU (that at 100% is capable of shaping to the desired sqm rate) gets too slow to service SQM in time and throughput suffers, and due to the burstyness the governor might never realise it should scale frequency back up. This can be remedied by trying to optimise the transition rules for up-scaling frequency/power or by switching to a non-scaling governor. The former requires a bit of trial and error but maintains power saving, while the latter probably is easier to achieve and hence might be a good way to figure out whether power saving might be an issue in the first place. @experts, please feel free to elaborate on which power save settings are worth exploring.
  
 +** How do I get cake to consider IPv6 traffic in a 6in4 tunnel as separate flows?**
  
 +See [[:docs:guide-user:network:ipv6:ipv6_henet#in4_with_cake_sqm|6in4 with cake config]]
  
 ===== Troubleshooting SQM ===== ===== Troubleshooting SQM =====
  • Last modified: 2024/06/14 10:22
  • by moeller0