This page is not fully translated, yet. Please help completing the translation.
(remove this paragraph once the translation is finished)
A Cryptographic Hardware Accelerator can be
The purpose is to load off the very computing intensive tasks of encryption/decryption and compression/decompression.
As can be seen in this AES instruction set article, the acceleration is usually achieved by doing certain arithmetic calculation in hardware.
When the acceleration is not in the instruction set of the CPU, it is supported via a kernel driver (/dev/crypto or AF_ALG socket). There are two drivers offering /dev/crypto in OpenWRT:
Both ways result to a /dev/crypto device which can be used by userspace crypto applications (e.g., the ones that utilize openssl or gnutls).
Depending on which arithmetic calculations exactly are being done in the specific hardware, the results differ widely. You should not concern yourself with theoretical bla,bla but find out how a certain implementation performs in the task you want to do with it! You could want to
sshfs. You would set up a sshfs.server on your device and a sshfs.client on the other end. Now how fast can you read/write to this with and without Cryptographic Hardware Accelerators. If the other end, the client, is a “fully grown PC” with a 2GHz CPU, it will probably perform fast enough to use the entire bandwidth of your Internet connection. If the server side is some embedded device, with let's say some 400MHz MIPS CPU, it could benefit highly from some integrated (and supported!) acceleration. You probably want enough performance, that you can use your entire bandwidth. Well, now go and find some benchmark showing you precisely the difference with enabled/disabled acceleration. Because you will not be able to extrapolate this information from specifications you find on this page or on the web.
make menuconfig and select
This must not be combined with cryptodev-linux.
Kernel modules → Cryptographic API modules
Libraries → SSL
Note that there are some known issues with openssl's /dev/crypto support.
Both AF_ALG and /dev/crypto interfaces allow userspace access to any crypto driver offering symmetric-key ciphers, and digest algorithms. That means hardware acceleration, but also software-only drivers. The use of software drivers is almost always slower than implementing it in userspace, as the context switches slow things down considerably. To see all of the available crypto drivers, take a look at /proc/crypto.
$ cat /proc/crypto name : cbc(aes) driver : cbc(aes-aesni) module : kernel priority : 300 refcnt : 1 selftest : passed internal : no type : skcipher async : no blocksize : 16 min keysize : 16 max keysize : 32 ivsize : 16 chunksize : 16 walksize : 16 name : cbc(aes) driver : cbc-aes-aesni module : kernel priority : 400 refcnt : 1 selftest : passed internal : no type : skcipher async : yes blocksize : 16 min keysize : 16 max keysize : 32 ivsize : 16 chunksize : 16 walksize : 16 name : sha256 driver : sha256-generic module : kernel priority : 0 refcnt : 1 selftest : passed internal : no type : shash blocksize : 64 digestsize : 32 name : sha256 driver : sha256-avx module : kernel priority : 160 refcnt : 7 selftest : passed internal : no type : shash blocksize : 64 digestsize : 32 name : sha256 driver : sha256-ssse3 module : kernel priority : 150 refcnt : 1 selftest : passed internal : no type : shash blocksize : 64 digestsize : 32
Notice in this case, that are two drivers offering
cbc-aes-aesni, and three offering
sha256-ssse3. The kernel will export the one with the highest priority for each algorithm. In this case, it would be: cbc-aes-aesni, and sha256-avx. The first step in finding out if you're using hardware acceleration is ensuring that the driver is listed there, and that it's the highest priority among them. For openssl support, the
types supported are
For IPsec, which is done by the Kernel, this will tell you if you are able to use the crypto accelerator. Just make sure you're using the same algorithm made available by your crypto driver. For other uses, openssl needs to be checked.
Openssl supports hardware crypto acceleration through an engine. You may find out what engines are available, along with the enabled algorithms, and configuration commands by running
openssl engine -t -c:
(devcrypto) /dev/crypto engine [DES-CBC, DES-EDE3-CBC, BF-CBC, AES-128-CBC, AES-192-CBC, AES-256-CBC, AES-128-CTR, AES-192-CTR, AES-256-CTR, AES-128-ECB, AES-192-ECB, AES-256-ECB, CAMELLIA-128-CBC, CAMELLIA-192-CBC, CAMELLIA-256-CBC,MD5, SHA1, RIPEMD160, SHA224, SHA256, SHA384, SHA512] [ available ] (rdrand) Intel RDRAND engine [RAND] [ available ] (dynamic) Dynamic engine loading support [ unavailable ]
For openssl-1.0.2 and earlier, the engine was called
cryptodev. It was renamed to
devcrypto in openssl 1.1.0. In this example, engine 'devcrypto' is available, showing the list of algorithms available.
Starting with 19.x branch, the /dev/crypto support can be packaged separately from the main openssl library, as the
libopenssl-devcrypto package. The engine is not enabled by default, even when the package is installed. It requires editing the
/etc/ssl/openssl.cnf, as follows.
When using the standalone package, this becomes mandatory. If the engine is built into libcrypto, it is only optional.
To configure the engine, you must first add a line to the default section (i.e. the first, unnamed section), adding the following line, which tells which section is used to configure the library. The sections are lines within brackets :
# this points to the main library configuration section openssl_conf = openssl_def
This will point the main openssl configuration to be done in a section called
Then, that section needs to be created, anywhere past the last line in the unnamed section. I'd just add it to the very end of the file. It will add an engine configuration section, where you can add a section for every engine you're configuring, and finally, a section to configure the engine itself. In this example, we are configuring just the /dev/crypto engine:
[openssl_def] # this is the main library configuration section engines=engine_section [engine_section] # this is the engine configuration section, where the engines are listed devcrypto=devcrypto_section [devcrypto_section] # this is the section where the devcrypto engine commands are used CIPHERS=ALL DIGESTS=NONE
You can use the
-vv option of the
openssl engine command to view the available configuration commands, along with a description of each command. Notice the line disabling all digests. Digests are computed fairly fast in software, and to use hardware crypto requires a context switch, which is an “expensive” operation. So in order to be worth using an algorithm, it needs to be much faster than software to offset the cost of the context switch. Efficiency depends on the length of each operation. For ciphers, it is faster to use hardware than software in chunks of 1000 bytes or more, depending on your hardware. For digests, the size has to be 10x greater. Considering an MTU of 1500 bytes, which is the de facto limit on TLS encryption block limit, it will not be worth to enable digests. You can use a different configuration file for every application, by setting the environment variable
OPENSSL_CONF to the full path of the configuration file to be used.
There's a command for the devcrypto engine, not to be used in
openssl.cnf, that will show some useful information about the algorithms available. It shows a list of engine-supported algorithms, if it can be used (a session can be opened) with /dev/crypto or not, along with the corresponding kernel driver, and if it hw-accelerated or no. To use it, run:
# openssl engine -pre DUMP_INFO devcrypto (devcrypto) /dev/crypto engine Information about ciphers supported by the /dev/crypto engine: Cipher DES-CBC, NID=31, /dev/crypto info: id=1, driver=mv-cbc-des (hw accelerated) Cipher DES-EDE3-CBC, NID=44, /dev/crypto info: id=2, driver=mv-cbc-des3-ede (hw accelerated) Cipher BF-CBC, NID=91, /dev/crypto info: id=3, CIOCGSESSION (session open call) failed Cipher CAST5-CBC, NID=108, /dev/crypto info: id=4, CIOCGSESSION (session open call) failed Cipher AES-128-CBC, NID=419, /dev/crypto info: id=11, driver=mv-cbc-aes (hw accelerated) Cipher AES-192-CBC, NID=423, /dev/crypto info: id=11, driver=mv-cbc-aes (hw accelerated) Cipher AES-256-CBC, NID=427, /dev/crypto info: id=11, driver=mv-cbc-aes (hw accelerated) Cipher RC4, NID=5, /dev/crypto info: id=12, CIOCGSESSION (session open call) failed Cipher AES-128-CTR, NID=904, /dev/crypto info: id=21, driver=ctr-aes-neonbs (software) Cipher AES-192-CTR, NID=905, /dev/crypto info: id=21, driver=ctr-aes-neonbs (software) Cipher AES-256-CTR, NID=906, /dev/crypto info: id=21, driver=ctr-aes-neonbs (software) Cipher AES-128-ECB, NID=418, /dev/crypto info: id=23, driver=mv-ecb-aes (hw accelerated) Cipher AES-192-ECB, NID=422, /dev/crypto info: id=23, driver=mv-ecb-aes (hw accelerated) Cipher AES-256-ECB, NID=426, /dev/crypto info: id=23, driver=mv-ecb-aes (hw accelerated) Information about digests supported by the /dev/crypto engine: Digest MD5, NID=4, /dev/crypto info: id=13, driver=mv-md5 (hw accelerated), CIOCCPHASH capable Digest SHA1, NID=64, /dev/crypto info: id=14, driver=mv-sha1 (hw accelerated), CIOCCPHASH capable Digest RIPEMD160, NID=117, /dev/crypto info: id=102, driver=unknown. CIOCGSESSION (session open) failed Digest SHA224, NID=675, /dev/crypto info: id=103, driver=sha224-neon (software), CIOCCPHASH capable Digest SHA256, NID=672, /dev/crypto info: id=104, driver=mv-sha256 (hw accelerated), CIOCCPHASH capable Digest SHA384, NID=673, /dev/crypto info: id=105, driver=sha384-neon (software), CIOCCPHASH capable Digest SHA512, NID=674, /dev/crypto info: id=106, driver=sha512-neon (software), CIOCCPHASH capable
As stated above, the best way to determine the speed is benchmarking the actual application you're using.
If that's not feasible,
openssl speed can be used to compare the algorithm speed with and without the engine. To measure the speed without the engine, set
/etc/ssl/openssl.cnf. You must use the
-elapsed option to get a reasonable calculation. That's because the speed command will use the CPU user time by default. When using the engine, most all of the processing will be done in kernel time, and the user time will be close to zero, yielding an exaggerated result. This is the measurement of the AES-256-CTR algorithm, implemented 100% in software (you must configure
openssl.cnf to be able to use software drivers with devcrypto).
# time openssl speed -evp aes-256-ctr Doing aes-256-ctr for 3s on 16 size blocks: 1506501 aes-256-ctr's in 0.32s Doing aes-256-ctr for 3s on 64 size blocks: 830921 aes-256-ctr's in 0.18s Doing aes-256-ctr for 3s on 256 size blocks: 526267 aes-256-ctr's in 0.15s Doing aes-256-ctr for 3s on 1024 size blocks: 167828 aes-256-ctr's in 0.07s Doing aes-256-ctr for 3s on 8192 size blocks: 22723 aes-256-ctr's in 0.00s Doing aes-256-ctr for 3s on 16384 size blocks: 11400 aes-256-ctr's in 0.00s OpenSSL 1.1.1b 26 Feb 2019 built on: Wed Dec 13 18:43:03 2017 UTC options:bn(64,32) rc4(char) des(long) aes(partial) blowfish(ptr) compiler: arm-openwrt-linux-muslgnueabi-gcc -fPIC -pthread -Wa,--noexecstack -Wall -O3 -pipe -mcpu=cortex-a9 -mfpu=vfpv3-d16 -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -mfloat-abi=hard -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -O3 -fpic -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -DOPENSSL_PREFER_CHACHA_OVER_GCM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-256-ctr 75325.05k 295438.58k 898162.35k 2455083.89k infk infk real 0m 18.04s user 0m 0.72s sys 0m 17.27s
Notice the infinite speeds. If you spend 0 seconds in CPU user time, and use that as a divisor, you'll get infinite. The speed command, with the addtion of the
-elapsed parameter will return a more realistic result:
# time openssl speed -evp aes-256-ctr -elapsed type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-256-ctr 7975.70k 17403.54k 44777.30k 57178.79k 62076.25k 62395.73k real 0m 18.04s user 0m 0.88s sys 0m 17.11s
This is the result of the AES-256-CTR without the engine:
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-256-ctr 39684.36k 47027.86k 53044.99k 60888.06k 63548.07k 63706.45k real 0m 18.04s user 0m 17.98s sys 0m 0.00s
In this case
-elapsed does not matter much, as almost 100% of the execution time is spent in user-mode, and CPU user time would actually be a better measurement by not counting time spent in other processes.
With that out of the way, let's see an actual hardware-implemented cipher:
# time openssl speed -evp aes-256-cbc -elapsed The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-256-cbc 1551.04k 6126.44k 21527.81k 55995.05k 95027.20k 99936.94k real 0m 18.04s user 0m 0.21s sys 0m 5.13s
For comparison, this is the same cipher, implemented by the libcrypto software:
# time openssl speed -evp aes-256-cbc -elapsed type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-256-cbc 39603.10k 47420.37k 50270.38k 51002.71k 51249.15k 51232.77k real 0m 18.03s user 0m 18.00s sys 0m 0.01s
This is typical for a /dev/crypto cipher. There's a cost in CPU usage, the context switches needed to run the code in the kernel, represented by the 5.13s of system time used. That cost will not vary much with the size of the crypto operation. Because of that, for small batches, the use of hardware drivers will slow you down considerably. As the block size increases, /dev/crypto becomes the best choice. You must be aware of how the application uses the cipher. For example, AES-128-ECB is used by openssl to seed the rng, using 16-bytes calls. I haven't seen any other use of the ECB ciphers, so it is best to disable them entirely.
Please, don't enable digests unless you know what you're doing. They are usually slower than software, except for large (> 10k) blocks. Some applications–openssh, for example–will not work with /dev/crypto digests. This is a limitation of how the engine works. Openssh will save a partial digest, and then fork, duplicating that context, and working with successive copies of it, which is useful for HMAC, where the hash of the key remains constant. In the kernel, however, those contexts are still linked to the same session, so when one process calls another update, or closes that digest context, the kernel session is changed/closed for all of the instances, and you'll get a libcrypto failure. For well-behaved applications using large update blocks, you may enable digests. Use a separate copy of the
openssl.cnf configuration file, and set
OPENSSL_CONF=_path_to_file in the environment before running it (add it to the respective file in /etc/init.d/). Again, benchmarking the actual application you're using is the best way to gauge the impact of hardware crypto.
make menuconfig and select
Kernel modules → Cryptographic API modules
Cryptographic Engine and Security Acceleration
Note: If you want to learn about the current situation, you should search the Internet or maybe ask in the forum. This is outdated. Especially if you want to know, how fast a copy from a mounted filesystem (say ext3 over USB) over the scp is, you should specifically search for such benchmarks. Some models of the BCM47xx/53xx family support hardware accelerated encryption for IPSec (AES, DES, 3DES), simple hash calculations (MD5, SHA1) and TLS/SSL+HMAC processing. Not all devices have a hw crypto supporting chip. At least Asus WL500GD/X, Netgear WGT634U and Asus WL700gE do have hw crypto. However, testing of a WGT634U indicates that a pin under the BCM5365 was not pulled low to enable strong bulk cryptography, limiting the functionality to single DES.
The specification states the hardware is able to support 75Mbps (9,4MB/s) of encrypted throughput. Without hardware acceleration using the blowfish encryption throughput is only ~0,4MB/s. Benchmark results that show the difference between software and hardware accelerated encryption/decryption can be found here. Due to the overhead of hardware/DMA transfers and buffer copies between kernel/user space it gives only a good return for packet sizes greater than 256 bytes. This size can be reduced for IPSec, because network hardware uses DMA and there is no need to copy the (encrypted) data between kernel and user space. The hardware specification needed for programming the crypto API of the bcm5365P (Broadcom 5365P) can be found here.
| SoC / CPU | Accelerated Methods | Datasheet |
| BCM94704AGR | WEP 128, AES OCB AES CCM | 94704AGR-PB00-R.pdf |
| BCM?4704P | WEP 128, AES OCB, AES CCM, VPN | 94704AGR-PB00-R.pdf |
| BCM5365 | AES (up to 256-bit CTR and CBC modes), DES, 3DES (CBC), HMAC-SHA1, HMAC-MD5, SHA1 and MD5. IPSec encryption and single pass authentication. | 5365_5365P-PB01-R.pdf |
| BCM5365P | AES (up to 256-bit CTR and CBC modes), DES, 3DES (CBC), HMAC-SHA1, HMAC-MD5, SHA1 and MD5. IPSec encryption and single pass authentication. | 5365_5365P-PB01-R.pdf |