How to Load Balance OpenWrt

How to Load Balance OpenWrt

Introduction

This guide explains about how to manually load balance OpenWrt by fixing IRQs for specific ethernet ports and assigning one or more CPU cores for the networking queues.

SMP IRQ affinity and bitmask setting

From : https://www.kernel.org/doc/html/latest/core-api/irq/irq-affinity.html

/proc/irq/IRQ#/smp_affinity and /proc/irq/IRQ#/smp_affinity_list specify which target CPUs are permitted for a given IRQ source. It’s a bitmask (smp_affinity) or CPU list (smp_affinity_list) of allowed CPUs. It’s not allowed to turn off all CPUs, and if an IRQ controller does not support IRQ affinity then the value will not change from the default of all CPUs.

/proc/irq/default_smp_affinity specifies default affinity mask that applies to all non-active IRQs. Once IRQ is allocated/activated its affinity bitmask will be set to the default mask. It can then be changed as described above. Default mask is 0xffffffff.

To set an IRQ to a specific CPU or group of CPU's requires a bit mask. So using binary we enable each CPU as required and then convert to hex to get the bitmask setting. This way we can restrict IRQs to specific CPUs to aid in load balancing or for heterogeneous SOCs use high/low power cores instead.

Bitmasks for CPUs

Binary	Hex	CPU
00000001	1	0
00000010	2	1
00000011	3	0,1
00000100	4	2
00000101	5	0,2
00000110	6	1,2
00000111	7	0,1,2
00001000	8	3
00001001	9	0,3
00001010	A	1,3
00001011	B	0,1,3
00001100	C	2,3
00001101	D	0,2,3
00001110	E	1,2,3
00001111	F	0,1,2,3
...	...	...
00110000	30	4,5

OpenWrt Defaults

OpenWrt routers have set defaults for multi CPU usage.

The following scripts are responsible for setting these up.

 cat /etc/hotplug.d/net/20-smp-packet-steering

 cat /etc/hotplug.d/net/40-net-smp-affinity

A more automated solution is to use irqbalance to help spread the load.

Irqbalance is a Linux daemon that distributes interrupts over multiple logical CPUs. This design intent being to improve overall performance which can result in a balanced load and power consumption.

However this does not always produce a predictable load distribution. Instead we can use manual tuning to improve load distribution.

Interrupts

First of all we need to find and identify the interrupts.

In order to monitor settings or changes :

cat /proc/interrupts

The code below is from an NanoPi R4S which has 4 A53 cores (CPU 0-3) and 2 A72 cores (CPU 4 and 5)

root@OpenWrt:~# cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5
 23:   27142318   12185540    5391618    2352924  137831569  145154023     GICv3  30 Level     arch_timer
 25:   67873664   61308794   11619382    2662637   16876546   43550490     GICv3 113 Level     rk_timer
 26:          0          0          0          0          0          0  GICv3-23   0 Level     arm-pmu
 27:          0          0          0          0          0          0  GICv3-23   1 Level     arm-pmu
 28:          0          0          0          0          0          0     GICv3  37 Level     ff6d0000.dma-controller
 29:          0          0          0          0          0          0     GICv3  38 Level     ff6d0000.dma-controller
 30:          0          0          0          0          0          0     GICv3  39 Level     ff6e0000.dma-controller
 31:          0          0          0          0          0          0     GICv3  40 Level     ff6e0000.dma-controller
 32:          1          0          0          0          0          0     GICv3  81 Level     pcie-sys
 34:          0          0          0          0          0          0     GICv3  83 Level     pcie-client
 35:          0          0          0          0  165575364          0     GICv3  44 Level     eth0
 36:   20438175          0          0          0          0          0     GICv3  97 Level     dw-mci
 37:          0          0          0          0          0          0     GICv3  58 Level     ehci_hcd:usb1
 38:          0          0          0          0          0          0     GICv3  60 Level     ohci_hcd:usb3
 39:          0          0          0          0          0          0     GICv3  62 Level     ehci_hcd:usb2
 40:          0          0          0          0          0          0     GICv3  64 Level     ohci_hcd:usb4
 42:          0          0          0          0          0          0     GICv3  91 Level     ff110000.i2c
 43:          6          0          0          0          0          0     GICv3  67 Level     ff120000.i2c
 44:          0          0          0          0          0          0     GICv3  68 Level     ff160000.i2c
 45:          6          0          0          0          0          0     GICv3 132 Level     ttyS2
 46:          0          0          0          0          0          0     GICv3 129 Level     rockchip_thermal
 47:    6393498          0          0          0          0          0     GICv3  89 Level     ff3c0000.i2c
 50:          0          0          0          0          0          0     GICv3 147 Level     ff650800.iommu
 52:          0          0          0          0          0          0     GICv3 149 Level     ff660480.iommu
 56:          0          0          0          0          0          0     GICv3 151 Level     ff8f3f00.iommu
 57:          0          0          0          0          0          0     GICv3 150 Level     ff903f00.iommu
 58:          0          0          0          0          0          0     GICv3  75 Level     ff914000.iommu
 59:          0          0          0          0          0          0     GICv3  76 Level     ff924000.iommu
 69:          0          0          0          0          0          0     GICv3  59 Level     rockchip_usb2phy
 70:          0          0          0          0          0          0     GICv3 137 Level     xhci-hcd:usb5
 71:          0          0          0          0          0          0     GICv3 142 Level     xhci-hcd:usb7
 72:          0          0          0          0          0          0  rockchip_gpio_irq  21 Level     rk808
 78:          0          0          0          0          0          0     rk808   5 Edge      RTC alarm
 82:          0          0          0          0          0          0  rockchip_gpio_irq   7 Edge      fe320000.mmc cd
 84:          0          0          0          0          0          0   ITS-MSI   0 Edge      PCIe PME, aerdrv
 85:         10          0          0          0          0          0  rockchip_gpio_irq  10 Level     stmmac-0:01
 86:          0          0          0          0          0          0  rockchip_gpio_irq  22 Edge      gpio-keys
 87:          0          0          0          0          0 1156859750   ITS-MSI 524288 Edge      eth1
IPI0:   7085496   10371429    7027071    6124604     310818     114897       Rescheduling interrupts
IPI1:   2817025    2457651     882759     515246    2752519     543745       Function call interrupts
IPI2:         0          0          0          0          0          0       CPU stop interrupts
IPI3:         0          0          0          0          0          0       CPU stop (for crash dump) interrupts
IPI4:   5558568    4633615    2762056    1122565     763629    3435183       Timer broadcast interrupts
IPI5:    413711     300799     161541     117511     109020      76881       IRQ work interrupts
IPI6:         0          0          0          0          0          0       CPU wake-up interrupts
Err:          0

To find the IRQs for your ethernet ports :

grep eth /proc/interrupts

root@OpenWrt:~# grep eth /proc/interrupts
 35:          0          0          0          0  165661665          0     GICv3  44 Level     eth0
 87:          0          0          0          0          0 1157284700   ITS-MSI 524288 Edge      eth1

So here eth0 is IRQ 35 and eth1 is 87.

Interrupts can only be set one per core.

Set eth0 interrupt to core 0

 echo 1 > /proc/irq/35/smp_affinity

Set eth1 interrupt to core 1

 echo 2 > /proc/irq/87/smp_affinity

You could get the IRQ with a command instead of hard coding it:

#eth0 IRQ
echo f > /proc/irq/`grep eth0 /proc/interrupts|awk -F ':' '{print $1}'|xargs`/smp_affinity
#eth1 IRQ
echo f > /proc/irq/`grep eth1 /proc/interrupts|awk -F ':' '{print $1}'|xargs`/smp_affinity

The grep command prints the line from /proc/interrupts, awk prints the first column which is the IRQ number, and xargs trims the whitespace.

For kernel 5.15 need to use:

 echo -n #HEX# > /proc/irq/#IRQ-NUMBER#/smp_affinity

If you restart Smart Queue Management or change SQM settings, it will reset the CPU affinity and you will need to reset your settings or re-apply them.

Network Queues

Reference: Receive Packet Steering (RPS)

Receive Packet Steering (RPS) is similar to RSS in that it is used to direct packets to specific CPUs for processing. However, RPS is implemented at the software level, and helps to prevent the hardware queue of a single network interface card from becoming a bottleneck in network traffic.

Network Queues can be spread across all CPUs if required or fixed to just one.

Set eth0 queue to core 3

 echo 4 > /sys/class/net/eth0/queues/rx-0/rps_cpus

Set eth1 queue to core 4

 echo 8 > /sys/class/net/eth1/queues/rx-0/rps_cpus

Set eth0 and eth1 to use all 6 cores

echo 3f > /sys/class/net/eth0/queues/rx-0/rps_cpus
echo 3f > /sys/class/net/eth1/queues/rx-0/rps_cpus

Making it permanent

Either edit

 cat /etc/hotplug.d/net/40-net-smp-affinity

or create your own script to run with your new values and insert that in

/etc/hotplug.d/net/

to run after the default script.

eg:

 /etc/hotplug.d/net/50-mysettings-for-net-smp-affinity

#eth0 core 0
echo 1 > /proc/irq/35/smp_affinity

#eth1 core 2
echo 2 > /proc/irq/87/smp_affinity

#queues on all cores
echo 3f > /sys/class/net/eth0/queues/rx-0/rps_cpus
echo 3f > /sys/class/net/eth1/queues/rx-0/rps_cpus

Now reboot and check to ensure the settings have taken.

Notes

Thanks to the following for discussions/contributions :

mercygroundabyss
moeller0
walmartshopper
xShARkx

Reference threads :

Table of Contents