| Both sides previous revision Previous revision Next revision | Previous revision |
| docs:techref:hardware:cryptographic.hardware.accelerators [2020/08/19 01:16] – vitaliy86 | docs:techref:hardware:cryptographic.hardware.accelerators [2024/11/10 20:17] (current) – [Measuring the algorithm speed] systemcrash |
|---|
| ====== Cryptographic Hardware Accelerators ====== | ====== Cryptographic Hardware Accelerators ====== |
| A Cryptographic Hardware Accelerator can be | A Cryptographic Hardware Accelerator can be |
| * integrated into the [[docs:techref:hardware:soc]] as a separate processor, as special purpose CPU (aka Core). | * integrated into the [[docs:techref:hardware:soc]] as a separate processor, as a special purpose CPU (aka Core). |
| * integrated in a [[wp>Coprocessor]] on the circuit board | * integrated in a [[wp>Coprocessor]] on the circuit board |
| * contained on a Chip on an extension circuit board, this can be connected to the mainboard via some BUS, e.g. PCI | * contained on a Chip on an extension circuit board, this can be connected to the mainboard via some BUS, e.g. PCI |
| * an [[wp>Template:Multimedia_extensions|ISA extension]] like e.g. [[wp>AES instruction set]] and thus integral part of the CPU (in that case a kernel driver is not needed) | * an [[wp>Template:Multimedia_extensions|ISA extension]] like e.g. [[wp>AES instruction set]] and thus integral part of the CPU (in that case, a kernel driver is not needed) |
| |
| The purpose is to load off the very computing intensive tasks of encryption/decryption and compression/decompression.\\ | The purpose is to offload the very computing intensive tasks of encryption/decryption and compression/decompression.\\ |
| As can be seen in this [[wp>AES instruction set]] article, the acceleration is usually achieved by doing certain arithmetic calculation in hardware. | As can be seen in this [[wp>AES instruction set]] article, the acceleration is usually achieved by doing certain arithmetic calculation in hardware. |
| |
| Its use in application usually involve a number of layers: | Its use in applications usually involves a number of layers: |
| |
| * The kernel needs a hardware-specific driver to use its capabilities. It is usually built into the kernel for boards that support them, and allow access by services that run in kernel-mode. | * The kernel needs a hardware-specific driver to use its capabilities. It is usually built into the kernel for boards that support them, and allows access by services that run in kernel-mode. |
| |
| * To use them in userspace, when the acceleration is not in the instruction set of the CPU, it is supported via a kernel driver (/dev/crypto or AF_ALG socket). | * To use them in userspace, when the acceleration is not in the instruction set of the CPU, it is supported via a kernel driver (/dev/crypto or AF_ALG socket). |
| * The above steps provide the bare minimum to allow userspace use, but it is more usual to use them inside a crypto library, such as gnutls or openssl, allowing access by most apps linked to them. | * The above steps provide the bare minimum to allow userspace use, but it is more usual to use them inside a crypto library, such as gnutls or openssl, allowing access by most apps linked to them. |
| ===== Performance ===== | ===== Performance ===== |
| Depending on which arithmetic calculations exactly are being done in the specific hardware, the results differ widely. You should not concern yourself with theoretical bla,bla but find out how a certain implementation performs in the task you want to do with it! You could want to | Depending on which arithmetic calculations are being done in the specific hardware, the results differ widely. You should not concern yourself with theoretical bla,bla but find out how a certain implementation performs in the task you want to do with it! |
| |
| * you could attach a USB drive to your device and mount a [[docs:guide-user:storage:usb-drives|local filesystem]] like ext3 from it. Then you want to read from and write to this filesystem from the Internet over a secured protocol. Let's use ''sshfs''. You would set up a [[docs:guide-user:services:ssh:sshfs.server]] on your device and a [[docs:guide-user:services:ssh:sshfs.client]] on the other end. Now how fast can you read/write to this with and without Cryptographic Hardware Accelerators. If the other end, the client, is a "fully grown PC" with a 2GHz CPU, it will probably perform fast enough to use the entire bandwidth of your Internet connection. If the server side is some embedded device, with let's say some 400MHz MIPS CPU, it could benefit highly from some integrated (and supported!) acceleration. You probably want enough performance, that you can use your entire bandwidth. Well, now go and find some benchmark showing you precisely the difference with enabled/disabled acceleration. Because you will not be able to extrapolate this information from specifications you find on this page or on the web. | * You could attach a USB drive to your device and mount a [[docs:guide-user:storage:usb-drives|local filesystem]] like ext3 from it. Then you want to read from and write to this filesystem from the Internet over a secured protocol. Let's use ''sshfs''. You would set up a [[docs:guide-user:services:ssh:sshfs.server]] on your device and a [[docs:guide-user:services:ssh:sshfs.client]] on the other end. Now how fast can you read/write to this with and without Cryptographic Hardware Accelerators. If the other end, the client, is a "fully grown PC" with a 2GHz CPU, it will probably perform fast enough to use the entire bandwidth of your Internet connection. If the server side is some embedded device, with let's say some 400MHz MIPS CPU, it will benefit highly from some integrated (and supported!) acceleration. You probably want sufficient performance such that you can consume your entire bandwidth. Now go and find some benchmark showing you the difference; both with and without acceleration. You will not be able to extrapolate this information from specifications you find on this page or on the web. |
| |
| * you could want to run an OpenVPN or an OpenConnect server on your router/embedded device, instead of using WEP/WPA/WPA2. There will be no reading from/writing to a USB device. Find benchmarks that show you exactly the performance for this purpose. You won't be able to extrapolate this information from other benchmarks. | * you may wish to run an OpenVPN or an OpenConnect server on your router/embedded device, instead of using WEP/WPA/WPA2. There will be no reading from/writing to a USB device. Find benchmarks that show you exactly the performance for this purpose. You won't be able to extrapolate this information from other benchmarks. |
| |
| * think of other practical uses, and find specific benchmarks. | * think of other practical uses, and find specific benchmarks. |
| ===== Finding out what's available in the Kernel ===== | ===== Finding out what's available in the Kernel ===== |
| |
| If your boards supports hardware crypto acceleration, the respective drivers should already be built into the kernel. Some crypto engines have their own packages, and these may need to be installed first. | If your board has cryptographic acceleration hardware, the respective drivers should already be built into the kernel. Some crypto engines have their own packages, and these may need to be installed first. |
| |
| To see all of the available crypto drivers running on your system (this means **after** installing the packages, if needed), take a look at /proc/crypto. | To see all of the available crypto drivers running on your system (this means **after** installing the packages, if needed), take a look at ''/proc/crypto''. |
| |
| <code> | <code> |
| digestsize : 32 | digestsize : 32 |
| </code> | </code> |
| This was edited to only show AES-CBC and SHA256. Both AF_ALG and /dev/crypto interfaces allow userspace access to any crypto driver offering symmetric-key ciphers, and digest algorithms. That means hardware acceleration, but also software-only drivers. The use of software drivers is almost always slower than implementing it in userspace, as the context switches slow things down considerably. | This was cropped to show only AES-CBC and SHA256. Both AF_ALG and ''/dev/crypto'' interfaces allow userspace access to any crypto driver offering symmetric-key ciphers, and digest algorithms. This means hardware acceleration, but also software-only drivers. The use of software drivers is almost always slower than an implementation in userspace, because the context switches slow operations down considerably. |
| |
| To identify hardware-drivers, look for drivers with types ''skcipher'' and ''shash'', having priority >= 300, but beware that AES-NI and similar CPU instructions will have a high priority as well, and do not need /dev/crypto or AF_ALG to be used! | To identify hardware-drivers, look for drivers with types ''skcipher'' and ''shash'', having priority >= 300, but beware that AES-NI and similar CPU instructions will have a high priority as well, and do not need ''/dev/crypto'' or AF_ALG to be used! |
| |
| Notice in this case, that are two drivers offering ''cbc(aes)'': ''cbc-aes-neonbs'' (software driver, using neon asm instruction, and ''mv-cbc-aes'' (Marvell CESA, hw accelerated), and four offering ''sha256'': ''sha256-generic'' (soft, generic C code), ''sha256-asm'' (soft, basic arm asm), ''sha256-neon'' (soft, using neon asm instruction), and ''mv-sha256'' (Marvell CESA). The kernel will export the one with the highest priority for each algorithm. In this case, it would be the hw accelerated Marvell CESA drivers: mv-cbc-aes, and mv-sha256. | Notice in this case the two drivers offering ''cbc(aes)'': ''cbc-aes-neonbs'' (software driver, using neon asm instruction, and ''mv-cbc-aes'' (Marvell CESA, hw accelerated), and four offering ''sha256'': ''sha256-generic'' (soft, generic C code), ''sha256-asm'' (soft, basic arm asm), ''sha256-neon'' (soft, using neon asm instruction), and ''mv-sha256'' (Marvell CESA). The kernel will export the one with the highest priority for each algorithm. In this case, it would be the hw accelerated Marvell CESA drivers: mv-cbc-aes, and mv-sha256. |
| |
| For IPsec ESP, which is done by the Kernel, this will be enough to tell you if you are able to use the crypto accelerator, and you don't need to do anything further. Just make sure you're using the same algorithm made available by your crypto driver. For other uses, openssl should be checked. | For IPsec ESP, which is done by the Kernel, this will be enough to tell you if you are able to use the crypto accelerator, and you don't need to do anything further. Just make sure you're using the same algorithm made available by your crypto driver. For other uses, openssl should be checked. |
| |
| ===== Enabling the userspace interface ===== | ===== Enabling the userspace interface ===== |
| The crypto drivers enable the algorithms for kernel use. To be able to access them from userspace, another driver needs to be used. In OpenWrt, there are two of them: ''cryptodev'', and ''AF_ALG''. Opinions on the subject may vary, but /dev/crypto has the speed advantage here. | The crypto drivers enable the algorithms for kernel use. To be able to access them from userspace, another driver needs to be used. In OpenWrt, there are two of them: ''cryptodev'', and ''AF_ALG''. Opinions on the subject may vary, but ''cryptodev'' has the speed advantage here. |
| |
| ==== cryptodev ===== | ==== cryptodev ===== |
| Cryptodev uses a ''/dev/crypto'' device to export the kernel algorithms. In OpenWrt 19.07 and later, it is provided by the ''kmod-cryptodev'', and is installed automatically when you install ''libopenssl-devcrypto''. | Cryptodev uses a ''/dev/crypto'' device to export the kernel algorithms. In OpenWrt 19.07 and later, it is provided by the ''kmod-cryptodev'' and is installed automatically when you install ''libopenssl-devcrypto''. |
| |
| In OpenWrt 18.06.x and earlier, /dev/crypto required compiling the driver yourself. Run ''make menuconfig'' and select | In OpenWrt 18.06.x and earlier, ''/dev/crypto'' required compiling the driver yourself. Run ''make menuconfig'' and select |
| |
| * kmod-crypto-core: m | * kmod-crypto-core: m |
| * kmod-cryptodev: m | * kmod-cryptodev: m |
| |
| Installing the `kmod-cryptodev` package will create a `/dev/crypto` device, even if you don't have any hw-crypto. **''/dev/crypto'' will export kernel crypto drivers regardless of being implemented in software or hardware. Use of kernel software drivers may severely slow crypto performance! Don't install this package unless you know you have hw-crypto drivers installed!** | Installing the `kmod-cryptodev` package will create a `/dev/crypto` device, even if you don't have any hw-crypto. **''/dev/crypto'' will export kernel crypto drivers regardless of being implemented in software or hardware. Use of kernel software drivers may severely slow crypto performance, so don't install this package unless you know you have hw-crypto drivers installed!** |
| |
| ==== AF_ALG ===== | ==== AF_ALG ===== |
| |
| ===== Checking openssl support ===== | ===== Checking openssl support ===== |
| Openssl supports hardware crypto acceleration through an engine. You may find out what engines are available, along with the enabled algorithms, and configuration commands by running ''openssl engine -t -c'': | Openssl supports hardware crypto acceleration through an engine. You can see what engines are available, along with the enabled algorithms, and configuration commands by running ''openssl engine -t -c'': |
| <code> | <code> |
| (devcrypto) /dev/crypto engine | (devcrypto) /dev/crypto engine |
| For openssl-1.0.2 and earlier, the engine was called ''cryptodev''. It was renamed to ''devcrypto'' in openssl 1.1.0. In this example, engine 'devcrypto' is available, showing the list of algorithms available. | For openssl-1.0.2 and earlier, the engine was called ''cryptodev''. It was renamed to ''devcrypto'' in openssl 1.1.0. In this example, engine 'devcrypto' is available, showing the list of algorithms available. |
| |
| Starting in OpenSSL 1.1.0, an AF_ALG engine can be used. In OpenWrt 19.07, it is packaged as ''libopenssl-afalg'', but it requires a custom built: the package will not show up under 'Libraries', 'SSL', 'libopenssl' unless you go to 'Global build settings', 'Kernel build options', and select 'Compile the kernel with asynchronous IO support'. This engine supports only AES-CBC, and needs to be enabled in ''/etc/ssl/openssl.cnf'', but it does not accept the ''CIPHERS'', ''DIGESTSS'', or ''USE_SOFTDRIVERS'' options. | Starting in OpenSSL 1.1.0, an AF_ALG engine can be used. In OpenWrt 19.07, it is packaged as ''libopenssl-afalg'', but it requires a custom build: the package will not show up under 'Libraries', 'SSL', 'libopenssl' unless you go to 'Global build settings', 'Kernel build options', and select 'Compile the kernel with asynchronous IO support'. This engine supports only AES-CBC, and needs to be enabled in ''/etc/ssl/openssl.cnf'', but it does not accept the ''CIPHERS'', ''DIGESTSS'', or ''USE_SOFTDRIVERS'' options. |
| |
| In OpenWrt 19.07, the shipped ''/etc/ssl/openssl.cnf'' already has the basic engine configuration sections for both the devcrypto and the orignal afalg engines. To enable them, uncomment the respective lines under the ''[engines]'' section. | In OpenWrt 19.07, the shipped ''/etc/ssl/openssl.cnf'' already has the basic engine configuration sections for both the devcrypto and the original afalg engines. To enable them, uncomment the respective lines under the ''[engines]'' section. |
| |
| Shortly after 19.07.0 was released, an alternate AF_ALG engine was added, ''libopenssl-afalg_sync'' that is basically a mirror of the devcrypto engine, but using the AF_ALG interface. It accepts all of the options, and is configured the same way as the ''devcrypto'' engine. You may follow the steps below, just configure it under ''afalg'', instead of ''devcrypto''. As of 19.07.0, the ''openssl.cnf'' file does not have the ''CIPHERS'', ''DIGESTS'' and ''USE_SOFTDRIVERS'' options listed, but you can just copy them from the [devcrypto] section. Note that the OpenWrt package is called ''afalg_sync'', but for openssl the engine it is simply ''afalg''. It can't coexist with the original engine. Even though opinions my vary, its creator, cotequeiroz, states that the afalg_sync (as of v1.0.1) performance is better than the original afalg engine, but poorer than devcrypto. | Shortly after 19.07.0 was released, an alternate AF_ALG engine was added, ''libopenssl-afalg_sync'' that is basically a mirror of the devcrypto engine, but using the AF_ALG interface. It accepts all of the options, and is configured the same way as the ''devcrypto'' engine. You may follow the steps below, just configure it under ''afalg'', instead of ''devcrypto''. As of 19.07.0, the ''openssl.cnf'' file does not have the ''CIPHERS'', ''DIGESTS'' and ''USE_SOFTDRIVERS'' options listed, but you can just copy them from the [devcrypto] section. Note that the OpenWrt package is called ''afalg_sync'', but for openssl the engine, it is simply ''afalg''. It can't coexist with the original engine. Even though opinions my vary, its creator, cotequeiroz, states that the afalg_sync (as of v1.0.1) performance is better than the original afalg engine, but poorer than devcrypto. |
| | |
| | ===== Checking openssl support for AES-NI hw crypto on x86_64 (normal PC hardware) ===== |
| | |
| | OpenSSL in OpenWrt on x86 supports AES-NI CPU instructions natively and should use them automatically where available. |
| | |
| | You can try two different commands and see whether performance differs. |
| | |
| | This should use AES-NI and should have better performance: |
| | |
| | <code>openssl speed -elapsed -evp aes-128-cbc</code> |
| | |
| | This has a runtime switch that disables use of AES-NI in openSSL and therefore has lower performance. |
| | |
| | <code>OPENSSL_ia32cap="~0x200000200000000" openssl speed -elapsed -evp aes-128-cbc</code> |
| | |
| | This is an example of the results showing OpenSSL with AES-NI support (faster) |
| | <code>root@routegateway:~# openssl speed -elapsed -evp aes-128-cbc |
| | You have chosen to measure elapsed time instead of user CPU time. |
| | Doing aes-128-cbc for 3s on 16 size blocks: 117879925 aes-128-cbc's in 3.00s |
| | Doing aes-128-cbc for 3s on 64 size blocks: 39584711 aes-128-cbc's in 3.00s |
| | Doing aes-128-cbc for 3s on 256 size blocks: 10062149 aes-128-cbc's in 3.00s |
| | Doing aes-128-cbc for 3s on 1024 size blocks: 2530718 aes-128-cbc's in 3.00s |
| | Doing aes-128-cbc for 3s on 8192 size blocks: 318704 aes-128-cbc's in 3.00s |
| | Doing aes-128-cbc for 3s on 16384 size blocks: 158373 aes-128-cbc's in 3.00s |
| | OpenSSL 1.1.1g 21 Apr 2020 |
| | built on: Sun Aug 2 16:16:00 2020 UTC |
| | options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr) |
| | compiler: x86_64-openwrt-linux-musl-gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -pipe -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -O3 -fpic -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG |
| | The 'numbers' are in 1000s of bytes per second processed. |
| | type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes |
| | aes-128-cbc 628692.93k 844473.83k 858636.71k 863818.41k 870274.39k 864927.74k</code> |
| | |
| | This is the result without AES-NI support (slower). |
| | <code>root@routegateway:~# OPENSSL_ia32cap="~0x200000200000000" openssl speed -elapsed -evp aes-128-cbc |
| | You have chosen to measure elapsed time instead of user CPU time. |
| | Doing aes-128-cbc for 3s on 16 size blocks: 37905593 aes-128-cbc's in 3.00s |
| | Doing aes-128-cbc for 3s on 64 size blocks: 10779104 aes-128-cbc's in 3.00s |
| | Doing aes-128-cbc for 3s on 256 size blocks: 2769347 aes-128-cbc's in 3.00s |
| | Doing aes-128-cbc for 3s on 1024 size blocks: 702288 aes-128-cbc's in 3.00s |
| | Doing aes-128-cbc for 3s on 8192 size blocks: 88129 aes-128-cbc's in 3.00s |
| | Doing aes-128-cbc for 3s on 16384 size blocks: 44055 aes-128-cbc's in 3.00s |
| | OpenSSL 1.1.1g 21 Apr 2020 |
| | built on: Sun Aug 2 16:16:00 2020 UTC |
| | options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr) |
| | compiler: x86_64-openwrt-linux-musl-gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -pipe -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -O3 -fpic -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG |
| | The 'numbers' are in 1000s of bytes per second processed. |
| | type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes |
| | aes-128-cbc 202163.16k 229954.22k 236317.61k 239714.30k 240650.92k 240599.04k</code> |
| |
| ==== Using the libopenssl-devcrypto package ==== | ==== Using the libopenssl-devcrypto package ==== |
| |
| ==== Showing /dev/crypto algorithm information ==== | ==== Showing /dev/crypto algorithm information ==== |
| There's a command for the devcrypto engine, not to be used in ''openssl.cnf'', that will show some useful information about the algorithms available. It shows a list of engine-supported algorithms, if it can be used (a session can be opened) with /dev/crypto or not, along with the corresponding kernel driver, and if it hw-accelerated or no. To use it, run: | There's a command for the devcrypto engine not to be used in ''openssl.cnf'' that will show some useful information about the algorithms available. It shows a list of engine-supported algorithms, whether it can be used (a session can be opened) with /dev/crypto or not, along with the corresponding kernel driver, and whether it is hw-accelerated or not. To use it, run: |
| <code> | <code> |
| # openssl engine -pre DUMP_INFO devcrypto | # openssl engine -pre DUMP_INFO devcrypto |
| Digest SHA512, NID=674, /dev/crypto info: id=106, driver=sha512-neon (software), CIOCCPHASH capable | Digest SHA512, NID=674, /dev/crypto info: id=106, driver=sha512-neon (software), CIOCCPHASH capable |
| </code> | </code> |
| | |
| ==== Measuring the algorithm speed ==== | ==== Measuring the algorithm speed ==== |
| **As stated above, the best way to determine the speed is benchmarking the actual application you're using.** | **As stated above, the best way to determine the speed is to benchmark the actual application you're using.** |
| If that's not feasible, ''openssl speed'' can be used to compare the algorithm speed with and without the engine. To measure the speed without the engine, set ''CIPHERS=NONE'' and ''DIGESTS=NONE'' in ''/etc/ssl/openssl.cnf''. You must use the ''-elapsed'' option to get a reasonable calculation. That's because the speed command will use the CPU user time by default. When using the engine, most all of the processing will be done in kernel time, and the user time will be close to zero, yielding an exaggerated result. This is the measurement of the AES-256-CTR algorithm, implemented 100% in software (you must configure ''USE_SOFTDRIVERS=1'' in ''openssl.cnf'' to be able to use software drivers with devcrypto). | If that's not feasible, ''openssl speed'' can be used to compare the algorithm speed with and without the engine. To measure the speed without the engine, set ''CIPHERS=NONE'' and ''DIGESTS=NONE'' in ''/etc/ssl/openssl.cnf''. You must use the ''-elapsed'' option to get a reasonable calculation. That's because the speed command will use the CPU user time by default. When using the engine, most all of the processing will be done in kernel time, and the user time will be close to zero, yielding an exaggerated result. This is the measurement of the AES-256-CTR algorithm, implemented 100% in software (you must configure ''USE_SOFTDRIVERS=1'' in ''openssl.cnf'' to be able to use software drivers with devcrypto). |
| | |
| <code> | <code> |
| # time openssl speed -evp aes-256-ctr | # time openssl speed -evp aes-256-ctr |
| sys 0m 17.27s | sys 0m 17.27s |
| </code> | </code> |
| Notice the infinite speeds. If you spend 0 seconds in CPU user time, and use that as a divisor, you'll get infinite. The speed command, with the addtion of the ''-elapsed'' parameter will return a more realistic result: | |
| | Notice the infinite speeds. If you spend 0 seconds in CPU user time, and use that as a divisor, you get infinity. The speed command, with the addtion of the ''-elapsed'' parameter will return a more realistic result: |
| <code> | <code> |
| # time openssl speed -evp aes-256-ctr -elapsed | # time openssl speed -evp aes-256-ctr -elapsed |
| sys 0m 17.11s | sys 0m 17.11s |
| </code> | </code> |
| | |
| This is the result of the AES-256-CTR without the engine: | This is the result of the AES-256-CTR without the engine: |
| | |
| <code> | <code> |
| type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes | type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes |
| sys 0m 0.00s | sys 0m 0.00s |
| </code> | </code> |
| | |
| In this case ''-elapsed'' does not matter much, as almost 100% of the execution time is spent in user-mode, and CPU user time would actually be a better measurement by not counting time spent in other processes. | In this case ''-elapsed'' does not matter much, as almost 100% of the execution time is spent in user-mode, and CPU user time would actually be a better measurement by not counting time spent in other processes. |
| With that out of the way, let's see an actual hardware-implemented cipher: | With that out of the way, let's see an actual hardware-implemented cipher: |
| | |
| <code> | <code> |
| # time openssl speed -evp aes-256-cbc -elapsed | # time openssl speed -evp aes-256-cbc -elapsed |
| sys 0m 5.13s | sys 0m 5.13s |
| </code> | </code> |
| | |
| For comparison, this is the same cipher, implemented by the libcrypto software: | For comparison, this is the same cipher, implemented by the libcrypto software: |
| | |
| <code> | <code> |
| # time openssl speed -evp aes-256-cbc -elapsed | # time openssl speed -evp aes-256-cbc -elapsed |
| sys 0m 0.01s | sys 0m 0.01s |
| </code> | </code> |
| This is typical for a /dev/crypto cipher. There's a cost in CPU usage, the context switches needed to run the code in the kernel, represented by the 5.13s of system time used. That cost will not vary much with the size of the crypto operation. Because of that, for small batches, the use of hardware drivers will slow you down considerably. As the block size increases, /dev/crypto becomes the best choice. You must be aware of how the application uses the cipher. For example, AES-128-ECB is used by openssl to seed the rng, using 16-bytes calls. I haven't seen any other use of the ECB ciphers, so it is best to disable them entirely. | |
| | This is typical for a ''/dev/crypto'' cipher. There's a cost in CPU usage: the context switches needed to run the code in the kernel, represented by the 5.13s of system time used. That cost will not vary much with the size of the crypto operation. Because of that, for small batches, the acceleration of hardware drivers will be penalised by context switches and slow you down considerably. As the block size increases, ''/dev/crypto'' becomes the best choice. Be aware of how the application uses the cipher. For example, AES-128-ECB is used by openssl to seed the rng, using 16-byte calls. I haven't seen any other use of the ECB ciphers, so it is best to disable them entirely. |
| |
| ==== Disabling digests ==== | ==== Disabling digests ==== |
| Please, don't enable digests unless you know what you're doing. They are usually slower than software, except for large (> 10k) blocks. Some applications--openssh, for example--will not work with /dev/crypto digests. This is a limitation of how the engine works. Openssh will save a partial digest, and then fork, duplicating that context, and working with successive copies of it, which is useful for HMAC, where the hash of the key remains constant. In the kernel, however, those contexts are still linked to the same session, so when one process calls another update, or closes that digest context, the kernel session is changed/closed for all of the instances, and you'll get a libcrypto failure. For well-behaved applications using large update blocks, you may enable digests. Use a separate copy of the ''openssl.cnf'' configuration file, and set ''OPENSSL_CONF=_path_to_file'' in the environment before running it (add it to the respective file in /etc/init.d/). Again, **benchmarking the actual application you're using is the best way to gauge the impact of hardware crypto.** | Don't enable digests unless you know what you're doing. They are usually slower than software, except for large (> 10k) blocks. Some applications--openssh, for example--will not work with ''/dev/crypto'' digests. This is a limitation of how the engine works. Openssh will save a partial digest, and then fork, duplicating that context, and working with successive copies of it, which is useful for HMAC, where the hash of the key remains constant. In the kernel, however, those contexts are still linked to the same session, so when one process calls another update, or closes that digest context, the kernel session is changed/closed for all of the instances, and you'll get a libcrypto failure. For well-behaved applications using large update blocks, you may enable digests. Use a separate copy of the ''openssl.cnf'' configuration file, and set ''OPENSSL_CONF=_path_to_file'' in the environment before running it (add it to the respective file in /etc/init.d/). Again, **benchmarking the actual application you're using is the best way to gauge the impact of hardware crypto.** |
| |
| ===== Enabling specific hardware driver ===== | ===== Enabling specific hardware driver ===== |
| ==== Historical ==== | ==== Historical ==== |
| |
| <sup>**Note:** If you want to learn about the current situation, you should search the Internet or maybe ask in the forum. This is outdated. Especially if you want to know, how fast a copy from a mounted filesystem (say ext3 over USB) over the scp is, you should specifically search for such benchmarks.</sup> | <sup>**Note:** If you want to learn about the current situation, you should search the Internet or maybe ask in the forum. This is outdated. Especially if you want to know how fast a copy from a mounted filesystem (say ext3 over USB) via scp is. Search for such benchmarks.</sup> |
| <sup>Some models of the BCM47xx/53xx family support hardware accelerated encryption for IPSec (AES, DES, 3DES), simple hash calculations (MD5, SHA1) and TLS/SSL+HMAC processing. Not all devices have a hw crypto supporting chip. At least Asus WL500GD/X, Netgear WGT634U and Asus WL700gE do have hw crypto. However, testing of a WGT634U indicates that a pin under the BCM5365 was not pulled low to enable strong bulk cryptography, limiting the functionality to single DES.</sup> | |
| | <sup>Some models of the BCM47xx/53xx family support hardware accelerated encryption for IPSec (AES, DES, 3DES), simple hash calculations (MD5, SHA1) and TLS/SSL+HMAC processing. Not all devices have a hw crypto supporting chip. At least Asus WL500GD/X, Netgear WGT634U and Asus WL700gE do have hw crypto. Testing of a WGT634U indicates, however, that a pin under the BCM5365 was not pulled low to enable strong bulk cryptography, limiting the functionality to single DES.</sup> |
| * <sup>How did you find that out?</sup> | * <sup>How did you find that out?</sup> |
| * <sup>Do you get an interrupt when sending a crypto job to the chip and limiting the request to DES only?)</sup> | * <sup>Do you get an interrupt when sending a crypto job to the chip and limiting the request to DES only?)</sup> |