Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision Next revisionBoth sides next revision | ||
| docs:guide-user:network:traffic-shaping:sqm-details [2019/03/05 17:09] – [SQM: Link Layer Adaptation Tab] dlakelan | docs:guide-user:network:traffic-shaping:sqm-details [2022/11/17 20:00] – [SQM: Link Layer Adaptation Tab] moeller0 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ====== SQM - The Details ====== | + | ====== SQM Details ====== |
| If you want to set up SQM to minimize bufferbloat, | If you want to set up SQM to minimize bufferbloat, | ||
| Line 10: | Line 10: | ||
| Smart Queue Management (SQM) is our name for an intelligent combination of better packet scheduling (flow queueing) techniques along with with active queue length management (AQM). | Smart Queue Management (SQM) is our name for an intelligent combination of better packet scheduling (flow queueing) techniques along with with active queue length management (AQM). | ||
| - | OpenWrt/ | + | OpenWrt/ |
| Current versions of OpenWrt/ | Current versions of OpenWrt/ | ||
| Line 89: | Line 89: | ||
| Getting this value exactly right is less important than getting it close, and over-estimating by a few bytes is generally better at keeping bufferbloat down than underestimating. With this in mind, to get started, set the Link Layer Adaptation options based on your connection to the Internet. The general rule for selecting the Link Layer Adaption is: | Getting this value exactly right is less important than getting it close, and over-estimating by a few bytes is generally better at keeping bufferbloat down than underestimating. With this in mind, to get started, set the Link Layer Adaptation options based on your connection to the Internet. The general rule for selecting the Link Layer Adaption is: | ||
| - | * Choose **ATM: select for e.g. ADSL1, ADSL2, ADSL2+** and set the Per-packet Overhead to 44 bytes if you use any kind of DSL/ADSL connection to the Internet (that is, if you get your internet service through a telephone line). | + | * Choose **ATM: select for e.g. ADSL1, ADSL2, ADSL2+** and set the Per-packet Overhead to 44 bytes if you use any kind of DSL/ADSL connection to the Internet |
| - | * Choose **Ethernet with overhead: select for e.g. VDSL2** and set the Per-packet Overhead to 22 if you know you have a VDSL2 connection (this is sometimes called Fiber to the Cabinet, for example in the UK). If you have a cable modem, you can try 22 bytes, or see the **Ethernet with Overhead** details below. | + | * Choose **Ethernet with overhead: select for e.g. VDSL2** and set the Per-packet Overhead to 34 if you know you have a VDSL2 connection (this is sometimes called Fiber to the Cabinet, for example in the UK). VDSL connections operate at 20-100Mbps over higher quality copper lines. If you are sure that PPPoE is not in use, you can reduce this to 26. |
| - | * Choose **Ethernet with overhead** if you have an actual Fiber to the Premises or Ethernet connection and set the Per-Packet Overhead to 38 bytes. | + | * If you have a cable modem, with a coaxial cable connector, you can try 22 bytes, or see the **Ethernet with Overhead** details below. If your contracted rate is greater than 760 Mbps set overhead 42 (mpu 84) as the ethernet link to the modem now affects worst case per-packet-overhead. |
| + | * Choose **Ethernet with overhead** if you have an actual Fiber to the Premises or metro-Ethernet connection and set the Per-Packet Overhead to 44 bytes. This can be reduced somewhat for example if you know you are not using VLAN tags, but will usually work well. | ||
| * Choose **none (default)** if you have some reason to not include overhead. All the other parameters will be ignored. | * Choose **none (default)** if you have some reason to not include overhead. All the other parameters will be ignored. | ||
| - | If you are not sure what kind of link you have, first try using Ethernet with Overhead and set 38 bytes. Then run the Quick Test for Bufferbloat. If the results are good, you’re done. Next, try the ATM choice, then the VDSL choice to see which performs best. If you have a slow connection such as less than 2Mbps in either direction and/or you regularly use several VOIP calls at once while gaming etc (so that more than 10 to 20% of your bandwidth is small packets) then it can be worth it to tune the overhead more carefully, see below for details. | + | If you are not sure what kind of link you have, first try using Ethernet with Overhead and set 44 bytes. Then run the Quick Test for Bufferbloat. If the results are good, you’re done. If you get your internet through an old-style copper wired phone line and your speeds are less than a couple of megabits, you have ATM so see above for the ATM entry. If you have a slow connection such as less than 2Mbps in either direction and/or you regularly use several VOIP calls at once while gaming etc (so that more than 10 to 20% of your bandwidth is small packets) then it can be worth it to tune the overhead more carefully, see below for extra details. |
| + | An important exception to the above rules is when the bandwidth limit is set by the ISP's traffic shaper, not by the equipment that talks to the physical line. Let's consider an example. The ISP sells a 15 Mbit/s package and enforces this limit, but lets the ADSL modem connect at whatever speed is appropriate for the line. And the modem " | ||
| **Link Layer Adaptation - the details…** | **Link Layer Adaptation - the details…** | ||
| - | Various link-layer transmission methods affect the rate that data is transmitted/ | + | Various link-layer transmission methods affect the rate that data is transmitted/ |
| - | * **ATM:** It is especially important to set the Link Layer Adaptation on links that use ATM framing (almost all DSL/ADSL links do), because ATM adds five additional bytes of overhead to a 48-byte frame. Unless the SQM algorithm knows to account for the ATM framing bytes, short packets will appear to take longer to send than expected, and will be penalized. For true ATM links, one often can measure the real per-packet overhead empirically, | + | * **ATM:** It is especially important to set the Link Layer Adaptation on links that use ATM framing (almost all DSL/ADSL links do), because ATM adds five additional bytes of overhead to a 48-byte frame. Unless the SQM algorithm knows to account for the ATM framing bytes, short packets will appear to take longer to send than expected, and will be penalized. For true ATM links, one often can measure the real per-packet overhead empirically, |
| - | * **Ethernet with Overhead:** SQM can also account for the overhead imposed by VDSL2 links - add 22 bytes of overhead. Cable Modems (DOCSIS) | + | * **Ethernet with Overhead:** SQM can also account for the overhead imposed by //VDSL2// links - add 22 bytes of overhead |
| - | * **None: | + | * **None: |
| - | Please note that if your ISP uses a mandatory VLAN on the bottleneck link, please add 4 additional bytes to the overhead to tell sqm to properly account for it. | ||
| - | The " | + | In addition to those overheads it is common to have VLAN tags (4 extra bytes) or PPPoE encapsulation (8 bytes) or even more exotic issues such as ipv4 provided over ipv6 in the DS-Lite scheme (where ipv4 packets experience a 40 byte ipv6 header overhead). Because of these variables and the fact that overestimation is generally better, we offer the default suggested sizes in the first table). |
| + | |||
| + | |||
| + | The " | ||
| + | Getting the mpu right seems not overly important at first since it only affects the accounting of the smallest of packets and will only be relevant if a link is saturated. But often especially DOCSIS/ | ||
| Please note that as of middle 2018 cake, and cake only, will try to interpret any given overhead to be applied on top of IP packets, all other qdiscs (and cake if configured with the " | Please note that as of middle 2018 cake, and cake only, will try to interpret any given overhead to be applied on top of IP packets, all other qdiscs (and cake if configured with the " | ||
| Unless you are experimenting, | Unless you are experimenting, | ||
| + | |||
| + | |||
| + | Now, the real challenge with the shaper gross rate and the per-packet-overhead is that they are not independent; | ||
| + | < | ||
| + | gross-rate * ((payload-size) / (pay_load-size + per-packet-overhead)) | ||
| + | 100 * ((1000) / (1000+100)) = 90.91 | ||
| + | </ | ||
| + | |||
| + | now, any combination of gross-shaper rate and per-packet-overhead, | ||
| + | so in the extreme we can set the per-packet-overhead to 0 as long as we also set the shaper gross speed to 90.91: | ||
| + | < | ||
| + | 90.91 * (1000+0) / (1000) = 90.91 | ||
| + | 90.91 * ((1000) / (1000+0)) = 90.91 | ||
| + | </ | ||
| + | |||
| + | or the other way around, if we set the per-packet-overhead to an absurd 1000 bytes, we still will see the expected throughput if we also configure the shaper gross rate at 182: | ||
| + | < | ||
| + | 90.91 * (1000+1000) / (1000) = 181.82 | ||
| + | 181.82 * ((1000) / (1000+ 1000)) = 90.91 | ||
| + | </ | ||
| + | |||
| + | To sanity check whether a given combination of gross rate and per-packet-overhead seems sane (say, there is too little information about the true link properties available to make an educated guess) ione needs to repeat speedtests at different packet sizes. The following stanza added to / | ||
| + | |||
| + | < | ||
| + | # special rules to allow MSS clamping for in and outbound traffic | ||
| + | # use ip6tables -t mangle -S ; iptables -t mangle -S to check | ||
| + | forced_MSS=216 | ||
| + | | ||
| + | # affects both down- and upstream, egress seems to require at least 216 | ||
| + | iptables -t mangle -A FORWARD -p tcp -m tcp --tcp-flags SYN,RST SYN -m comment --comment " | ||
| + | ip6tables -t mangle -A FORWARD -p tcp -m tcp --tcp-flags SYN,RST SYN -m comment --comment " | ||
| + | </ | ||
| + | |||
| + | Now, if we plug this into the numbers from above we get (note, MSS is the TCP/IP payload size, which in the IPv4 case is 40 bytes smaller than the ethernet payload): | ||
| + | < | ||
| + | 100 * ((216+40) / (216+40+100)) = 68.3544303797 # as expected the throughput is smaller, since the fraction of overhead is simply larger | ||
| + | </ | ||
| + | |||
| + | now, if we underestimated the per-packet-overhead we get: | ||
| + | < | ||
| + | 90.91 * ((216) / (216 +0)) = 90.91 | ||
| + | </ | ||
| + | since 90 >> 68 we will admit too much data into the link and will encounter bufferbloat. | ||
| + | |||
| + | And the reverse error: | ||
| + | < | ||
| + | 181.82 * ((216) / (216 + 1000)) = 32.2969736842 | ||
| + | </ | ||
| + | here we do not get bufferbloat (since 32 << 68) but we sacrifice way to much throughput. | ||
| + | |||
| + | So the proposal is to " | ||
| + | |||
| + | Please note one additional challenge here: testing a saturating load with small(er) packets will result in a considerably higher rate of packets the router needs to process (e.g. if you switch from MSS 1460 to MSS 146 you can expect ~10 times as many packets) and not all routers are capable of saturating a link with small packets, so for this test it is essential to confirm that the router does not run out of CPU cycles to process the data and as a consequence that the measured throughput is close to the theoretically expected one. | ||
| + | |||
| + | |||
| + | Please note to compare throughput measured with on-line speedtests with the theoretical prediction the following approximate formula can be used: | ||
| + | < | ||
| + | gross-rate * ((IP-packet-size - IP-header-size - TCP-header-size) / (IP-packet-size + per-packet-overhead)) | ||
| + | e.g. for an ethernet link (effectively 38B overhead) with a VLAN tag (4B) and PPPoE (6+2=8B), IPv4 (without options: 20B), TCP (with rfc 1323 timestamps: 20+12=32B) | ||
| + | one can expect ~93% throughput | ||
| + | 100 * ((1500 - 8 - 20 - 20 - 12) / (1500 + 38 + 4)) = 93.39 | ||
| + | </ | ||
| ===== Selecting the optimal queue setup script ===== | ===== Selecting the optimal queue setup script ===== | ||
| Line 157: | Line 224: | ||
| - Cake can use the information about true source and destination addresses to control traffic from/to internal external hosts by true IP address, not per-stream. | - Cake can use the information about true source and destination addresses to control traffic from/to internal external hosts by true IP address, not per-stream. | ||
| - | Cake's original isolation mode was based on //flows//: each stream was isolated from all the others, and the link capacity was divided evenly between all active streams independent of IP addresses. More recently | + | Cake's original isolation mode was based on //flows//: each stream was isolated from all the others, and the link capacity was divided evenly between all active streams independent of IP addresses. |
| In that mode, Cake mostly does the right thing. | In that mode, Cake mostly does the right thing. | ||
| It would ensure that no single stream and no single host could hog all the capacity of the WAN link. | It would ensure that no single stream and no single host could hog all the capacity of the WAN link. | ||
| Line 166: | Line 233: | ||
| **To enable Per-Host Isolation** Add the following to the “Advanced option strings” (in the // | **To enable Per-Host Isolation** Add the following to the “Advanced option strings” (in the // | ||
| - | For queueing disciplines handling incoming packets from the internet (internet-ingress): | + | For queueing disciplines handling incoming packets from the internet (internet-**ingress**): '' |
| - | For queueing disciplines handling outgoing packets to the internet (internet-egress): | + | For queueing disciplines handling outgoing packets to the internet (internet-**egress**): '' |
| Please note the addition of the ingress keyword to the “Advanced option strings” | Please note the addition of the ingress keyword to the “Advanced option strings” | ||
| Line 175: | Line 242: | ||
| Conceptually a traffic shaper will drop and/or delay packets in a way that the rate of packets leaving the shaper is smaller or equal to the configured shaper-rate. This works well on egress, but for post-bottleneck shaping as is typical for the internet ingress (the download direction) this is not ideal. For this kind of shaping we actually want to make sure that there is as little as possible packet-backspill into the upstream devices buffers (if those buffers where sized and managed properly we would not need to shape on ingress in the first place). And to avoid backspill we need to make sure that the combined rate of packets coming into the upstream device (rarely) exceeds the bottleneck-link' | Conceptually a traffic shaper will drop and/or delay packets in a way that the rate of packets leaving the shaper is smaller or equal to the configured shaper-rate. This works well on egress, but for post-bottleneck shaping as is typical for the internet ingress (the download direction) this is not ideal. For this kind of shaping we actually want to make sure that there is as little as possible packet-backspill into the upstream devices buffers (if those buffers where sized and managed properly we would not need to shape on ingress in the first place). And to avoid backspill we need to make sure that the combined rate of packets coming into the upstream device (rarely) exceeds the bottleneck-link' | ||
| + | With the ' | ||
| **Notes:** | **Notes:** | ||
| Line 186: | Line 254: | ||
| * At some point in time, these advanced cake options may become better integrated into luci-app-sqm, | * At some point in time, these advanced cake options may become better integrated into luci-app-sqm, | ||
| - | * This discussion assumes SQM is instantiated on an interface that directly faces the internet/ | + | * This discussion assumes SQM is instantiated on an interface that directly faces the internet/ |
| ===== FAQ ===== | ===== FAQ ===== | ||
| Line 211: | Line 279: | ||
| if you want the typical "shape my internet access" | if you want the typical "shape my internet access" | ||
| + | |||
| + | **Measured goodput in speed tests with SQM is considerably lower than without** | ||
| + | |||
| + | Traffic shaping is relative CPU intensive, not necessarily as sustained load. To be able to keep buffering in the device driver low SQM only releases small amounts of packets into the next layer (often the device driver). To keep throughput up, the qdisc now only has the small time window from handing the last packets up to the diver and the point these packets will be transmitted at the desired shaper rate to hand more packets to the driver. If SQM does not get access to the CPU inside that time window, it will effectively not use some nominal transmit opportunities, | ||
| + | |||
| + | |||
| + | One sign of such under-throughput by CPU-overload is that the CPU rarely falls idle. Aquick an dirty test for that is to run `top -d 1` and watch the %idle column in one of the upper rows, if that gets too close to 0% (you need to generate a load, like a speedtest and while this test runs observe the %idle column and try to get a feel what the minimum %idle is that shows up) sqm is likely CPU bound. Since that test will only show aggregate usage over full second intervals, but SQM operates on smaller time windows, often an observed min %idle of 10% already indicates CPU limitations to sqm. Please note that for multicore routers reading %idle gets more complicated, | ||
| + | |||
| + | top - 11:29:29 up 12 days, 14: | ||
| + | Tasks: 158 total, | ||
| + | %Cpu0 : 2.0 us, 0.0 sy, 0.0 ni, 97.0 id, 0.0 wa, 0.0 hi, 1.0 si, 0.0 st | ||
| + | %Cpu1 : 3.0 us, 1.0 sy, 0.0 ni, 93.1 id, 0.0 wa, 0.0 hi, 3.0 si, 0.0 st | ||
| + | |||
| + | This should allow to eye-ball whether a single core might be pegged. In this example without load, both CPUs idle > 90% of the time, no sign of any overload ;) | ||
| + | Pressing F1 in htop, shows the color legend for the CPU bars, and F2 Setup -> Display options -> Detailed CPU time (System/ | ||
| + | |||
| + | |||
| + | Also to make things even more complicated, | ||
| + | |||
| + | ** How do I get cake to consider IPv6 traffic in a 6in4 tunnel as separate flows?** | ||
| + | |||
| + | See [[: | ||
| ===== Troubleshooting SQM ===== | ===== Troubleshooting SQM ===== | ||
| Line 254: | Line 344: | ||
| Finally, copy/paste the entire session into your report. | Finally, copy/paste the entire session into your report. | ||
| + | |||
| + | ===== MORE HINTS & TIPS & INFO ===== | ||
| + | |||
| + | How I use CAKE to control bufferbloat and fair share my Internet connection on Openwrt. | ||
| + | |||
| + | CAKE has been my go to solution to bufferbloat for years, not just because of solving bufferbloat but also for fairer sharing of my link. Whilst CAKE has some sensible defaults there are a few extra options & tweaks that can improve things further. | ||
| + | |||
| + | This note assumes that you have an Internet facing interface, usually eth0 and will call traffic leaving that interface TO the ISP egress traffic. | ||
| + | |||
| + | Controlling egress bufferbloat to the ISP's modem is relatively straightforward. | ||
| + | |||
| + | For my ISP (Sky UK) things are straightforward. | ||
| + | |||
| + | egress: 19950 bridged-ptm ether-vlan mpu 72 | ||
| + | ingress: 78000 bridged-ptm ether-vlan mpu 72 ingress | ||
| + | |||
| + | We'll come back to the ' | ||
| + | |||
| + | CAKE also has a means of fair sharing bandwidth amongst hosts. | ||
| + | |||
| + | By default CAKE does ' | ||
| + | |||
| + | There is a small fly in this ointment in the form of IPv4 Network Address Translation (NAT) where typically the ISP subscriber is given one Internet facing IPv4 address in which all the internal LAN traffic (usually in 192.168.x.x) is masqueraded behind. | ||
| + | |||
| + | egress: 19950 bridged-ptm ether-vlan nat mpu 72 | ||
| + | ingress: 78000 bridged-ptm ether-vlan nat ingress | ||
| + | |||
| + | |||
| + | In fact what I do is force cake to only worry about internal fairness. | ||
| + | |||
| + | egress: 19950 dual-srchost bridged-ptm ether-vlan nat mpu 72 | ||
| + | ingress 78000 dual-dsthost bridged-ptm ether-vlan nat ingress | ||
| + | |||
| + | |||
| + | Having dealt with host fairness we now need to deal with flow fairness & control. | ||
| + | |||
| + | Simplistically, | ||
| + | |||
| + | CAKE looks at each data flow in turn and either releases or drops a packet from each flow to match the shapers' | ||
| + | |||
| + | ingress mode modifies how CAKE's shaper accounts for dropped packets, in essence they still count to the bandwidth used calculation even though they' | ||
| + | |||
| + | |||
| + | Traffic classification. | ||
| + | |||
| + | At this point we have a reasonably fair system. | ||
| + | |||
| + | I use diffserv4 over diffserv3 because of the ' | ||
| + | |||
| + | egress: 19950 diffserv4 dual-srchost bridged-ptm ether-vlan nat mpu 72 | ||
| + | ingress 78000 diffserv4 dual-dsthost bridged-ptm ether-vlan nat ingress | ||
| + | |||
| + | |||
| + | The cherry on top. | ||
| + | |||
| + | This is only really relevant for egress traffic and for asymmetric links. | ||
| + | |||
| + | egress: 19950 diffserv4 dual-srchost bridged-ptm ether-vlan nat mpu 72 ack-filter | ||
| + | ingress 78000 diffserv4 dual-dsthost bridged-ptm ether-vlan nat ingress | ||
| + | |||
| + | |||