Collection of software AES implementations optimized for real world microcontrollers.
Repository root directory is expected to be the only include path.
If repo is added as eclipse linked folder the root folder has to be added to ASM, C and CPP include paths (-I) (proj preporties -> C++ build -> settings)
Includes also have to start from root (e.g. #include <aes/cipher.hpp>
)
No cmake yet.
- asm functions (and CM*.h headers) can be extracted and used as C only code, but that may require extra boilerplate code (structures etc.)
- C++ API doesn't use exceptions nor dynamic memory allocation
- Do not use base implementations (ECB mode) for any serious encryption. It's provided for building proper modes.
- Do not blindly trust in timming constantness of LUT based ciphers since it depends on many factors that are
unknown or just implementation defined like section placement or pipeline suprises (you need to verify it, especially where is
.data
section). - LUT tables have to be placed in deterministic memory section, usally TCMs and non-waitstated SRAMs (by default it lands in
.data
section) - FLASH memory is unsafe even on simplest cortex m0(+) as there might be a prefetcher with a few entry cache (like stm32f0/l0). However in some cases it's still possible when running at reduced clock, with flash configured to 0ws and explicitly disabled prefetch.
- None of the currently available implementations protects against power/EMI analysis or glitch attacks.
- using implementations on wrong microarchitecture might introduce timming leaks (e.g. CM3_1T run on CM7).
- Unrolled implementations might perform slower than looped versions due to (usually LRU) cache pressure and flash waitstates. (like STM32F4 with 1K ART cache and up to 8WS)
- for optimization gimmicks refer to pipeline cycle test repo
- included unit tests don't cover timming leaks (performance difference on different runs may not be a data dependent ones, there are special tools like dudect for that)
some of the cryptoanalysis works/papers, that tested one or more of the provided implementations.
https://webthesis.biblio.polito.it/secure/26870/1/tesi.pdf - (CM3_1T on cortex-m4 @ 1871e94)
Available implementations, by C++ wrapper, consist of the following modes:
Handles PKCS7 padding, unpadded encryption can be achieved by not calling xxxAppendFinalize()
function.
CBC_GENERIC
SP 800-38A compliant, with 32 bit (big endian) counter. Can be used to build more common AEAD modes.
All implementations internally require nonce placed before expanded roundkey.
typedef struct {
uint8_t nonce[16];
uint8_t rk[(n+1)*16];
} ctx;
CTR32_GENERIC
- target specific implementations ?????
SP 800-38D compliant, GCM mode. Typially used in TLS.
All implementations are currently based on CTR_32 class. (there will be fused ctr+ghash encryption which will require some redesign)
Table based implementations are implemented accoding to: https://luca-giuzzi.unibs.it/corsi/Support/papers-cryptography/gcm-spec.pdf
The BEAR_CT{32}
implementations come from bearSSL package and are constant time with
single cycle multipliers (use CT32 for cortex-m0 and cortex-m3).
See https://www.bearssl.org/constanttime.html for details.
GHASH_GENERIC_BEAR_CT
GHASH_GENERIC_BEAR_CT32
GHASH_GENERIC_SHOUP_M4
(not yet)GHASH_GENERIC_SHOUP_M8
(not yet)GHASH_GENERIC_FULL_M4
(not yet)- target specific implementations ?????
bytes of memory used by | per session | constants |
---|---|---|
BEAR_* |
64 + AES roundkey | 0 + AES |
SHOUP_M4 |
304 + AES roundkey | 64 + AES |
SHOUP_M8 |
4144 + AES roundkey | 1024 + AES |
FULL_M4 |
8240 + AES roundkey | 0 + AES |