Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision Next revisionBoth sides next revision | ||
| docs:guide-user:network:traffic-shaping:sqm-details [2020/10/31 10:41] – Spelling fix moeller0 | docs:guide-user:network:traffic-shaping:sqm-details [2022/11/17 20:00] – [SQM: Link Layer Adaptation Tab] moeller0 | ||
|---|---|---|---|
| Line 10: | Line 10: | ||
| Smart Queue Management (SQM) is our name for an intelligent combination of better packet scheduling (flow queueing) techniques along with with active queue length management (AQM). | Smart Queue Management (SQM) is our name for an intelligent combination of better packet scheduling (flow queueing) techniques along with with active queue length management (AQM). | ||
| - | OpenWrt/ | + | OpenWrt/ |
| Current versions of OpenWrt/ | Current versions of OpenWrt/ | ||
| Line 91: | Line 91: | ||
| * Choose **ATM: select for e.g. ADSL1, ADSL2, ADSL2+** and set the Per-packet Overhead to 44 bytes if you use any kind of DSL/ADSL connection to the Internet other than a modern VDSL high speed connection (20+Mbps). In other words if you have your internet service through a copper telephone line at around 1 or 2Mbps. | * Choose **ATM: select for e.g. ADSL1, ADSL2, ADSL2+** and set the Per-packet Overhead to 44 bytes if you use any kind of DSL/ADSL connection to the Internet other than a modern VDSL high speed connection (20+Mbps). In other words if you have your internet service through a copper telephone line at around 1 or 2Mbps. | ||
| * Choose **Ethernet with overhead: select for e.g. VDSL2** and set the Per-packet Overhead to 34 if you know you have a VDSL2 connection (this is sometimes called Fiber to the Cabinet, for example in the UK). VDSL connections operate at 20-100Mbps over higher quality copper lines. If you are sure that PPPoE is not in use, you can reduce this to 26. | * Choose **Ethernet with overhead: select for e.g. VDSL2** and set the Per-packet Overhead to 34 if you know you have a VDSL2 connection (this is sometimes called Fiber to the Cabinet, for example in the UK). VDSL connections operate at 20-100Mbps over higher quality copper lines. If you are sure that PPPoE is not in use, you can reduce this to 26. | ||
| - | * If you have a cable modem, with a coaxial cable connector, you can try 22 bytes, or see the **Ethernet with Overhead** details below. | + | * If you have a cable modem, with a coaxial cable connector, you can try 22 bytes, or see the **Ethernet with Overhead** details below. If your contracted rate is greater than 760 Mbps set overhead 42 (mpu 84) as the ethernet link to the modem now affects worst case per-packet-overhead. |
| * Choose **Ethernet with overhead** if you have an actual Fiber to the Premises or metro-Ethernet connection and set the Per-Packet Overhead to 44 bytes. This can be reduced somewhat for example if you know you are not using VLAN tags, but will usually work well. | * Choose **Ethernet with overhead** if you have an actual Fiber to the Premises or metro-Ethernet connection and set the Per-Packet Overhead to 44 bytes. This can be reduced somewhat for example if you know you are not using VLAN tags, but will usually work well. | ||
| * Choose **none (default)** if you have some reason to not include overhead. All the other parameters will be ignored. | * Choose **none (default)** if you have some reason to not include overhead. All the other parameters will be ignored. | ||
| Line 103: | Line 103: | ||
| Various link-layer transmission methods affect the rate that data is transmitted/ | Various link-layer transmission methods affect the rate that data is transmitted/ | ||
| - | * **ATM:** It is especially important to set the Link Layer Adaptation on links that use ATM framing (almost all DSL/ADSL links do), because ATM adds five additional bytes of overhead to a 48-byte frame. Unless the SQM algorithm knows to account for the ATM framing bytes, short packets will appear to take longer to send than expected, and will be penalized. For true ATM links, one often can measure the real per-packet overhead empirically, | + | * **ATM:** It is especially important to set the Link Layer Adaptation on links that use ATM framing (almost all DSL/ADSL links do), because ATM adds five additional bytes of overhead to a 48-byte frame. Unless the SQM algorithm knows to account for the ATM framing bytes, short packets will appear to take longer to send than expected, and will be penalized. For true ATM links, one often can measure the real per-packet overhead empirically, |
| - | * **Ethernet with Overhead:** SQM can also account for the overhead imposed by VDSL2 links - add 22 bytes of overhead. Cable Modems (DOCSIS) set both up- and downstream overhead to 18 bytes (6 bytes source MAC, 6 bytes destination MAC, 2 bytes ether-type, 4 bytes FCS). | + | * **Ethernet with Overhead:** SQM can also account for the overhead imposed by //VDSL2// links - add 22 bytes of overhead |
| - | * **None: | + | * **None: |
| Line 111: | Line 111: | ||
| - | The " | + | The " |
| + | Getting the mpu right seems not overly important at first since it only affects the accounting of the smallest of packets and will only be relevant if a link is saturated. But often especially DOCSIS/ | ||
| Please note that as of middle 2018 cake, and cake only, will try to interpret any given overhead to be applied on top of IP packets, all other qdiscs (and cake if configured with the " | Please note that as of middle 2018 cake, and cake only, will try to interpret any given overhead to be applied on top of IP packets, all other qdiscs (and cake if configured with the " | ||
| Unless you are experimenting, | Unless you are experimenting, | ||
| + | |||
| + | |||
| + | Now, the real challenge with the shaper gross rate and the per-packet-overhead is that they are not independent; | ||
| + | < | ||
| + | gross-rate * ((payload-size) / (pay_load-size + per-packet-overhead)) | ||
| + | 100 * ((1000) / (1000+100)) = 90.91 | ||
| + | </ | ||
| + | |||
| + | now, any combination of gross-shaper rate and per-packet-overhead, | ||
| + | so in the extreme we can set the per-packet-overhead to 0 as long as we also set the shaper gross speed to 90.91: | ||
| + | < | ||
| + | 90.91 * (1000+0) / (1000) = 90.91 | ||
| + | 90.91 * ((1000) / (1000+0)) = 90.91 | ||
| + | </ | ||
| + | |||
| + | or the other way around, if we set the per-packet-overhead to an absurd 1000 bytes, we still will see the expected throughput if we also configure the shaper gross rate at 182: | ||
| + | < | ||
| + | 90.91 * (1000+1000) / (1000) = 181.82 | ||
| + | 181.82 * ((1000) / (1000+ 1000)) = 90.91 | ||
| + | </ | ||
| + | |||
| + | To sanity check whether a given combination of gross rate and per-packet-overhead seems sane (say, there is too little information about the true link properties available to make an educated guess) ione needs to repeat speedtests at different packet sizes. The following stanza added to / | ||
| + | |||
| + | < | ||
| + | # special rules to allow MSS clamping for in and outbound traffic | ||
| + | # use ip6tables -t mangle -S ; iptables -t mangle -S to check | ||
| + | forced_MSS=216 | ||
| + | | ||
| + | # affects both down- and upstream, egress seems to require at least 216 | ||
| + | iptables -t mangle -A FORWARD -p tcp -m tcp --tcp-flags SYN,RST SYN -m comment --comment " | ||
| + | ip6tables -t mangle -A FORWARD -p tcp -m tcp --tcp-flags SYN,RST SYN -m comment --comment " | ||
| + | </ | ||
| + | |||
| + | Now, if we plug this into the numbers from above we get (note, MSS is the TCP/IP payload size, which in the IPv4 case is 40 bytes smaller than the ethernet payload): | ||
| + | < | ||
| + | 100 * ((216+40) / (216+40+100)) = 68.3544303797 # as expected the throughput is smaller, since the fraction of overhead is simply larger | ||
| + | </ | ||
| + | |||
| + | now, if we underestimated the per-packet-overhead we get: | ||
| + | < | ||
| + | 90.91 * ((216) / (216 +0)) = 90.91 | ||
| + | </ | ||
| + | since 90 >> 68 we will admit too much data into the link and will encounter bufferbloat. | ||
| + | |||
| + | And the reverse error: | ||
| + | < | ||
| + | 181.82 * ((216) / (216 + 1000)) = 32.2969736842 | ||
| + | </ | ||
| + | here we do not get bufferbloat (since 32 << 68) but we sacrifice way to much throughput. | ||
| + | |||
| + | So the proposal is to " | ||
| + | |||
| + | Please note one additional challenge here: testing a saturating load with small(er) packets will result in a considerably higher rate of packets the router needs to process (e.g. if you switch from MSS 1460 to MSS 146 you can expect ~10 times as many packets) and not all routers are capable of saturating a link with small packets, so for this test it is essential to confirm that the router does not run out of CPU cycles to process the data and as a consequence that the measured throughput is close to the theoretically expected one. | ||
| + | |||
| + | |||
| + | Please note to compare throughput measured with on-line speedtests with the theoretical prediction the following approximate formula can be used: | ||
| + | < | ||
| + | gross-rate * ((IP-packet-size - IP-header-size - TCP-header-size) / (IP-packet-size + per-packet-overhead)) | ||
| + | e.g. for an ethernet link (effectively 38B overhead) with a VLAN tag (4B) and PPPoE (6+2=8B), IPv4 (without options: 20B), TCP (with rfc 1323 timestamps: 20+12=32B) | ||
| + | one can expect ~93% throughput | ||
| + | 100 * ((1500 - 8 - 20 - 20 - 12) / (1500 + 38 + 4)) = 93.39 | ||
| + | </ | ||
| ===== Selecting the optimal queue setup script ===== | ===== Selecting the optimal queue setup script ===== | ||
| Line 161: | Line 224: | ||
| - Cake can use the information about true source and destination addresses to control traffic from/to internal external hosts by true IP address, not per-stream. | - Cake can use the information about true source and destination addresses to control traffic from/to internal external hosts by true IP address, not per-stream. | ||
| - | Cake's original isolation mode was based on //flows//: each stream was isolated from all the others, and the link capacity was divided evenly between all active streams independent of IP addresses. **More recently Cake switched** to '' | + | Cake's original isolation mode was based on //flows//: each stream was isolated from all the others, and the link capacity was divided evenly between all active streams independent of IP addresses. **More recently Cake switched** to '' |
| In that mode, Cake mostly does the right thing. | In that mode, Cake mostly does the right thing. | ||
| It would ensure that no single stream and no single host could hog all the capacity of the WAN link. | It would ensure that no single stream and no single host could hog all the capacity of the WAN link. | ||
| Line 170: | Line 233: | ||
| **To enable Per-Host Isolation** Add the following to the “Advanced option strings” (in the // | **To enable Per-Host Isolation** Add the following to the “Advanced option strings” (in the // | ||
| - | For queueing disciplines handling incoming packets from the internet (internet-**ingress**): | + | For queueing disciplines handling incoming packets from the internet (internet-**ingress**): |
| For queueing disciplines handling outgoing packets to the internet (internet-**egress**): | For queueing disciplines handling outgoing packets to the internet (internet-**egress**): | ||
| Line 179: | Line 242: | ||
| Conceptually a traffic shaper will drop and/or delay packets in a way that the rate of packets leaving the shaper is smaller or equal to the configured shaper-rate. This works well on egress, but for post-bottleneck shaping as is typical for the internet ingress (the download direction) this is not ideal. For this kind of shaping we actually want to make sure that there is as little as possible packet-backspill into the upstream devices buffers (if those buffers where sized and managed properly we would not need to shape on ingress in the first place). And to avoid backspill we need to make sure that the combined rate of packets coming into the upstream device (rarely) exceeds the bottleneck-link' | Conceptually a traffic shaper will drop and/or delay packets in a way that the rate of packets leaving the shaper is smaller or equal to the configured shaper-rate. This works well on egress, but for post-bottleneck shaping as is typical for the internet ingress (the download direction) this is not ideal. For this kind of shaping we actually want to make sure that there is as little as possible packet-backspill into the upstream devices buffers (if those buffers where sized and managed properly we would not need to shape on ingress in the first place). And to avoid backspill we need to make sure that the combined rate of packets coming into the upstream device (rarely) exceeds the bottleneck-link' | ||
| + | With the ' | ||
| **Notes:** | **Notes:** | ||
| Line 221: | Line 285: | ||
| - | One sign of such under-throughput by CPU-overload is that the CPU rarely falls idle. Aquick an dirty test for that is to run `top -d 1` and watch the %idle column in one of the upper rows, if that gets too close to 0% (you need to generate a load, like a speedtest and while this test runs observe the %idle column and try to get a feel what the minimum %idle is that shows up) sqm is likely CPU bound. Since that test will only show aggregate usage over full second intervals, but SQM operates on smaller time windows, often an observed min %idle of 10% already indicates CPU limitations to sqm. Please note that for multicore routers reading %idle gets more complicated, | + | One sign of such under-throughput by CPU-overload is that the CPU rarely falls idle. Aquick an dirty test for that is to run `top -d 1` and watch the %idle column in one of the upper rows, if that gets too close to 0% (you need to generate a load, like a speedtest and while this test runs observe the %idle column and try to get a feel what the minimum %idle is that shows up) sqm is likely CPU bound. Since that test will only show aggregate usage over full second intervals, but SQM operates on smaller time windows, often an observed min %idle of 10% already indicates CPU limitations to sqm. Please note that for multicore routers reading %idle gets more complicated, |
| top - 11:29:29 up 12 days, 14: | top - 11:29:29 up 12 days, 14: | ||
| Line 229: | Line 293: | ||
| This should allow to eye-ball whether a single core might be pegged. In this example without load, both CPUs idle > 90% of the time, no sign of any overload ;) | This should allow to eye-ball whether a single core might be pegged. In this example without load, both CPUs idle > 90% of the time, no sign of any overload ;) | ||
| + | Pressing F1 in htop, shows the color legend for the CPU bars, and F2 Setup -> Display options -> Detailed CPU time (System/ | ||
| Also to make things even more complicated, | Also to make things even more complicated, | ||
| + | ** How do I get cake to consider IPv6 traffic in a 6in4 tunnel as separate flows?** | ||
| + | See [[: | ||
| ===== Troubleshooting SQM ===== | ===== Troubleshooting SQM ===== | ||