-
Notifications
You must be signed in to change notification settings - Fork 15
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
PoC that AEGIS-X(p) can be as fast as AEGIS-X(p-1)
Right now, without 512-bit registers, AEGIS-X4 is generally slower tan AEGIS-X2. AEGIS-X2 may also be slower than AEGIS-X1 on architectures with limited registers and AES pipelines. The reason for that is register spills. We simulate large vector registers, so actual registers constantly need to be spilled and restored to/from the stack A different strategy is to evaluate the AEGIS instances sequentially, instead of in parallel. By doing so, and ignoring initialization/finalization, an intuition is that AEGIS-X4 has the same cost as the sum of 4 AEGIS-X1 runs. That is, AEGIS-X4 is not slower than AEGIS-X1 on large messages. If we need multiple passes over the entire message, memory accesses would defeat this. Unless the message is split into small chunks, and AEGIS instances are sequentially run on individual chunks. Stack spills happens way less frequently than when emulating large registers. But also, once loaded during the first pass, the chunk is likely to be available in the L1 or L2 caches, ready to be immediately processed by the next AEGIS instances. Using that trick, negotiating X2 or X4 would be acceptable most of the time: if an endpoint has registers/pipelines large enough to take advantage of them, they will. But if it doesn't, it wouldn't be significantly slower than using a variant with a lower parallelism degree. The downside is a bit of implementation complexity, but also the fact that the optimal chunk size depends on the architecture and on the use cases. We may pick that chunk size to look great on benchmarks, but AEGIS is about real-world usage, not synthetic benchmarks. So, the benefits of this approach needs to be properly measured.
- Loading branch information
Showing
2 changed files
with
187 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters