Merge pull request #32 from 0xPolygonMiden/next

Tracking PR for next release
0xPolygonMiden · Dec 14, 2022 · 3c60484 · 3c60484
2 parents ed07f89 + ad2b63c
commit 3c60484
Show file tree

Hide file tree

Showing 11 changed files with 794 additions and 24 deletions.
diff --git a/Cargo.toml b/Cargo.toml
@@ -14,6 +14,10 @@ edition = "2021"
 name = "hash"
 harness = false
 
+[[bench]]
+name = "smt"
+harness = false
+
 [features]
 default = ["blake3/default", "std", "winter_crypto/default", "winter_math/default", "winter_utils/default"]
 std = ["blake3/std", "winter_crypto/std", "winter_math/std", "winter_utils/std"]
@@ -25,6 +29,6 @@ winter_math = { version = "0.4.1", package = "winter-math", default-features = f
 winter_utils = { version = "0.4.1", package = "winter-utils", default-features = false }
 
 [dev-dependencies]
-criterion = "0.4"
+criterion = { version = "0.4", features = ["html_reports"] }
 proptest = "1.0.0"
 rand_utils = { version = "0.4", package = "winter-rand-utils" }
diff --git a/README.md b/README.md
@@ -2,24 +2,26 @@
 This crate contains cryptographic primitives used in Polygon Miden.
 
 ## Hash
-[Hash module](./src/hash) provides a set of cryptographic hash functions which are used by Miden VM and Miden Rollup. Currently, these functions are:
+[Hash module](./src/hash) provides a set of cryptographic hash functions which are used by the Miden VM and the Miden rollup. Currently, these functions are:
 
 * [BLAKE3](https://github.com/BLAKE3-team/BLAKE3) hash function with 256-bit, 192-bit, or 160-bit output. The 192-bit and 160-bit outputs are obtained by truncating the 256-bit output of the standard BLAKE3.
 * [RPO](https://eprint.iacr.org/2022/1577) hash function with 256-bit output. This hash function is an algebraic hash function suitable for recursive STARKs.
 
+For performance benchmarks of these hash functions and their comparison to other popular hash functions please see [here](./benches/).
+
 ## Merkle
-[Merkle module](./src/merkle/) provides a set of data structures related to Merkle tree. All these data structures are implemented using RPO hash function described above. The data structure are:
+[Merkle module](./src/merkle/) provides a set of data structures related to Merkle trees. All these data structures are implemented using the RPO hash function described above. The data structures are:
 
 * `MerkleTree`: a regular fully-balanced binary Merkle tree. The depth of this tree can be at most 64.
 * `MerklePathSet`: a collection of Merkle authentication paths all resolving to the same root. The length of the paths can be at most 64.
 
 ## Crate features
-This carate can be compiled with the following features:
+This crate can be compiled with the following features:
 
 * `std` - enabled by default and relies on the Rust standard library.
 * `no_std` does not rely on the Rust standard library and enables compilation to WebAssembly.
 
-Both of these features imply use of [alloc](https://doc.rust-lang.org/alloc/) to support heap-allocated collections.
+Both of these features imply the use of [alloc](https://doc.rust-lang.org/alloc/) to support heap-allocated collections.
 
 To compile with `no_std`, disable default features via `--no-default-features` flag.
 

diff --git a/benches/README.md b/benches/README.md
@@ -0,0 +1,49 @@
+# Miden VM Hash Functions 
+In the Miden VM, we make use of different hash functions. Some of these are "traditional" hash functions, like `BLAKE3`, which are optimized for out-of-STARK performance, while others are algebraic hash functions, like `Rescue Prime`, and are more optimized for a better performance inside the STARK. In what follows, we benchmark several such hash functions and compare against other constructions that are used by other proving systems. More precisely, we benchmark:
+
+* **BLAKE3** as specified [here](https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf) and implemented [here](https://github.com/BLAKE3-team/BLAKE3) (with a wrapper exposed via this crate).
+* **SHA3** as specified [here](https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf) and implemented [here](https://github.com/novifinancial/winterfell/blob/46dce1adf0/crypto/src/hash/sha/mod.rs).
+* **Poseidon** as specified [here](https://eprint.iacr.org/2019/458.pdf) and implemented [here](https://github.com/mir-protocol/plonky2/blob/806b88d7d6e69a30dc0b4775f7ba275c45e8b63b/plonky2/src/hash/poseidon_goldilocks.rs) (but in pure Rust, without vectorized instructions).
+* **Rescue Prime (RP)** as specified [here](https://eprint.iacr.org/2020/1143) and implemented [here](https://github.com/novifinancial/winterfell/blob/46dce1adf0/crypto/src/hash/rescue/rp64_256/mod.rs).
+* **Rescue Prime Optimized (RPO)** as specified [here](https://eprint.iacr.org/2022/1577) and implemented in this crate.
+
+## Comparison and Instructions
+
+### Comparison
+We benchmark the above hash functions using two scenarios. The first is a 2-to-1 $(a,b)\mapsto h(a,b)$ hashing where both $a$, $b$ and $h(a,b)$ are the digests corresponding to each of the hash functions.
+The second scenario is that of sequential hashing where we take a sequence of length $100$ field elements and hash these to produce a single digest. The digests are $4$ field elements in a prime field with modulus $2^{64} - 2^{32} + 1$ (i.e., 32 bytes) for Poseidon, Rescue Prime and RPO, and an array `[u8; 32]` for SHA3 and BLAKE3.
+
+#### Scenario 1: 2-to-1 hashing `h(a,b)` 
+
+| Function            | BLAKE3 | SHA3    | Poseidon  | Rp64_256  | RPO_256 |
+| ------------------- | ------ | --------| --------- | --------- | ------- |
+| Apple M1 Pro        | 80 ns  | 245 ns  |  1.5 us   |  9.1 us   | 5.4 us  |
+| Apple M2            | 76 ns  | 233 ns  |  1.3 us   |  7.9 us   | 5.0 us  |
+| Amazon Graviton 3   | 116 ns |         |           |           | 8.8 us  |
+| AMD Ryzen 9 5950X   | 64 ns  | 273 ns  |  1.2 us   |  9.1 us   | 5.5 us  |
+| Intel Core i5-8279U | 80 ns  |         |           |           | 8.7 us  |
+| Intel Xeon 8375C    | 67 ns  |         |           |           | 8.2 us  |
+
+#### Scenario 2: Sequential hashing of 100 elements `h([a_0,...,a_99])`
+
+| Function            | BLAKE3 | SHA3    | Poseidon  | Rp64_256  | RPO_256 |
+| ------------------- | -------| ------- | --------- | --------- | ------- |
+| Apple M1 Pro        | 1.1 us | 1.5 us  |  19.4 us  |   118 us  | 70 us   |
+| Apple M2            | 1.0 us | 1.5 us  |  17.4 us  |   103 us  | 65 us   |
+| Amazon Graviton 3   | 1.4 us |         |           |           | 114 us  |
+| AMD Ryzen 9 5950X   | 0.8 us | 1.7 us  |  15.7 us  |   120 us  | 72 us   |
+| Intel Core i5-8279U | 1.0 us |         |           |           | 116 us  |
+| Intel Xeon 8375C    | 0.8 ns |         |           |           | 110 us  |
+
+### Instructions
+Before you can run the benchmarks, you'll need to make sure you have Rust [installed](https://www.rust-lang.org/tools/install). After that, to run the benchmarks for RPO and BLAKE3, clone the current repository, and from the root directory of the repo run the following:
+
+ ```
+ cargo bench hash
+ ```
+
+To run the benchmarks for Rescue Prime, Poseidon and SHA3, clone the following [repository](https://github.com/Dominik1999/winterfell.git) as above, then checkout the `hash-functions-benches` branch, and from the root directory of the repo run the following:
+
+```
+cargo bench hash
+```
diff --git a/benches/hash.rs b/benches/hash.rs
@@ -1,9 +1,13 @@
 use criterion::{black_box, criterion_group, criterion_main, BatchSize, Criterion};
 use miden_crypto::{
-    hash::rpo::{Rpo256, RpoDigest},
+    hash::{
+        blake::Blake3_256,
+        rpo::{Rpo256, RpoDigest},
+    },
     Felt,
 };
 use rand_utils::rand_value;
+use winter_crypto::Hasher;
 
 fn rpo256_2to1(c: &mut Criterion) {
     let v: [RpoDigest; 2] = [Rpo256::hash(&[1_u8]), Rpo256::hash(&[2_u8])];
@@ -53,5 +57,60 @@ fn rpo256_sequential(c: &mut Criterion) {
     });
 }
 
-criterion_group!(hash_group, rpo256_sequential, rpo256_2to1);
+fn blake3_2to1(c: &mut Criterion) {
+    let v: [<Blake3_256 as Hasher>::Digest; 2] =
+        [Blake3_256::hash(&[1_u8]), Blake3_256::hash(&[2_u8])];
+    c.bench_function("Blake3 2-to-1 hashing (cached)", |bench| {
+        bench.iter(|| Blake3_256::merge(black_box(&v)))
+    });
+
+    c.bench_function("Blake3 2-to-1 hashing (random)", |bench| {
+        bench.iter_batched(
+            || {
+                [
+                    Blake3_256::hash(&rand_value::<u64>().to_le_bytes()),
+                    Blake3_256::hash(&rand_value::<u64>().to_le_bytes()),
+                ]
+            },
+            |state| Blake3_256::merge(&state),
+            BatchSize::SmallInput,
+        )
+    });
+}
+
+fn blake3_sequential(c: &mut Criterion) {
+    let v: [Felt; 100] = (0..100)
+        .into_iter()
+        .map(Felt::new)
+        .collect::<Vec<Felt>>()
+        .try_into()
+        .expect("should not fail");
+    c.bench_function("Blake3 sequential hashing (cached)", |bench| {
+        bench.iter(|| Blake3_256::hash_elements(black_box(&v)))
+    });
+
+    c.bench_function("Blake3 sequential hashing (random)", |bench| {
+        bench.iter_batched(
+            || {
+                let v: [Felt; 100] = (0..100)
+                    .into_iter()
+                    .map(|_| Felt::new(rand_value()))
+                    .collect::<Vec<Felt>>()
+                    .try_into()
+                    .expect("should not fail");
+                v
+            },
+            |state| Blake3_256::hash_elements(&state),
+            BatchSize::SmallInput,
+        )
+    });
+}
+
+criterion_group!(
+    hash_group,
+    rpo256_2to1,
+    rpo256_sequential,
+    blake3_2to1,
+    blake3_sequential
+);
 criterion_main!(hash_group);
diff --git a/benches/smt.rs b/benches/smt.rs
@@ -0,0 +1,84 @@
+use core::mem::swap;
+use criterion::{black_box, criterion_group, criterion_main, Criterion};
+use miden_crypto::{merkle::SimpleSmt, Felt, Word};
+use rand_utils::prng_array;
+
+fn smt_rpo(c: &mut Criterion) {
+    // setup trees
+
+    let mut seed = [0u8; 32];
+    let mut trees = vec![];
+
+    for depth in 14..=20 {
+        let leaves = ((1 << depth) - 1) as u64;
+        for count in [1, leaves / 2, leaves] {
+            let entries: Vec<_> = (0..count)
+                .map(|i| {
+                    let word = generate_word(&mut seed);
+                    (i, word)
+                })
+                .collect();
+            let tree = SimpleSmt::new(entries, depth).unwrap();
+            trees.push(tree);
+        }
+    }
+
+    let leaf = generate_word(&mut seed);
+
+    // benchmarks
+
+    let mut insert = c.benchmark_group(format!("smt update_leaf"));
+
+    for tree in trees.iter_mut() {
+        let depth = tree.depth();
+        let count = tree.leaves_count() as u64;
+        let key = count >> 2;
+        insert.bench_with_input(
+            format!("simple smt(depth:{depth},count:{count})"),
+            &(key, leaf),
+            |b, (key, leaf)| {
+                b.iter(|| {
+                    tree.update_leaf(black_box(*key), black_box(*leaf)).unwrap();
+                });
+            },
+        );
+    }
+
+    insert.finish();
+
+    let mut path = c.benchmark_group(format!("smt get_leaf_path"));
+
+    for tree in trees.iter_mut() {
+        let depth = tree.depth();
+        let count = tree.leaves_count() as u64;
+        let key = count >> 2;
+        path.bench_with_input(
+            format!("simple smt(depth:{depth},count:{count})"),
+            &key,
+            |b, key| {
+                b.iter(|| {
+                    tree.get_leaf_path(black_box(*key)).unwrap();
+                });
+            },
+        );
+    }
+
+    path.finish();
+}
+
+criterion_group!(smt_group, smt_rpo);
+criterion_main!(smt_group);
+
+// HELPER FUNCTIONS
+// --------------------------------------------------------------------------------------------
+
+fn generate_word(seed: &mut [u8; 32]) -> Word {
+    swap(seed, &mut prng_array(*seed));
+    let nums: [u64; 4] = prng_array(*seed);
+    [
+        Felt::new(nums[0]),
+        Felt::new(nums[1]),
+        Felt::new(nums[2]),
+        Felt::new(nums[3]),
+    ]
+}
diff --git a/src/hash/blake/mod.rs b/src/hash/blake/mod.rs
@@ -1,5 +1,7 @@
 use super::{Digest, ElementHasher, Felt, FieldElement, Hasher, StarkField};
-use crate::utils::{ByteReader, ByteWriter, Deserializable, DeserializationError, Serializable};
+use crate::utils::{
+    uninit_vector, ByteReader, ByteWriter, Deserializable, DeserializationError, Serializable,
+};
 use core::{
     mem::{size_of, transmute, transmute_copy},
     ops::Deref,
@@ -276,13 +278,15 @@ where
     let digest = if Felt::IS_CANONICAL {
         blake3::hash(E::elements_as_bytes(elements))
     } else {
-        E::as_base_elements(elements)
-            .iter()
-            .fold(blake3::Hasher::new(), |mut hasher, felt| {
-                hasher.update(&felt.as_int().to_le_bytes());
-                hasher
-            })
-            .finalize()
+        let base_elements = E::as_base_elements(elements);
+        let blen = base_elements.len() << 3;
+
+        let mut bytes = unsafe { uninit_vector(blen) };
+        for (idx, element) in base_elements.iter().enumerate() {
+            bytes[idx * 8..(idx + 1) * 8].copy_from_slice(&element.as_int().to_le_bytes());
+        }
+
+        blake3::hash(&bytes)
     };
     *shrink_bytes(&digest.into())
 }

diff --git a/src/lib.rs b/src/lib.rs
@@ -23,11 +23,14 @@ pub mod utils {
 // ================================================================================================
 
 /// A group of four field elements in the Miden base field.
-pub type Word = [Felt; 4];
+pub type Word = [Felt; WORD_SIZE];
 
 // CONSTANTS
 // ================================================================================================
 
+/// Number of field elements in a word.
+pub const WORD_SIZE: usize = 4;
+
 /// Field element representing ZERO in the Miden base filed.
 pub const ZERO: Felt = Felt::ZERO;
 

diff --git a/src/merkle/merkle_tree.rs b/src/merkle/merkle_tree.rs
@@ -1,4 +1,4 @@
-use super::{Digest, Felt, MerkleError, Rpo256, Vec, Word};
+use super::{Felt, MerkleError, Rpo256, RpoDigest, Vec, Word};
 use crate::{utils::uninit_vector, FieldElement};
 use core::slice;
 use winter_math::log2;
@@ -22,7 +22,7 @@ impl MerkleTree {
     pub fn new(leaves: Vec<Word>) -> Result<Self, MerkleError> {
         let n = leaves.len();
         if n <= 1 {
-            return Err(MerkleError::DepthTooSmall);
+            return Err(MerkleError::DepthTooSmall(n as u32));
         } else if !n.is_power_of_two() {
             return Err(MerkleError::NumLeavesNotPowerOfTwo(n));
         }
@@ -35,7 +35,8 @@ impl MerkleTree {
         nodes[n..].copy_from_slice(&leaves);
 
         // re-interpret nodes as an array of two nodes fused together
-        let two_nodes = unsafe { slice::from_raw_parts(nodes.as_ptr() as *const [Digest; 2], n) };
+        let two_nodes =
+            unsafe { slice::from_raw_parts(nodes.as_ptr() as *const [RpoDigest; 2], n) };
 
         // calculate all internal tree nodes
         for i in (1..n).rev() {
@@ -68,7 +69,7 @@ impl MerkleTree {
     /// * The specified index not valid for the specified depth.
     pub fn get_node(&self, depth: u32, index: u64) -> Result<Word, MerkleError> {
         if depth == 0 {
-            return Err(MerkleError::DepthTooSmall);
+            return Err(MerkleError::DepthTooSmall(depth));
         } else if depth > self.depth() {
             return Err(MerkleError::DepthTooBig(depth));
         }
@@ -89,7 +90,7 @@ impl MerkleTree {
     /// * The specified index not valid for the specified depth.
     pub fn get_path(&self, depth: u32, index: u64) -> Result<Vec<Word>, MerkleError> {
         if depth == 0 {
-            return Err(MerkleError::DepthTooSmall);
+            return Err(MerkleError::DepthTooSmall(depth));
         } else if depth > self.depth() {
             return Err(MerkleError::DepthTooBig(depth));
         }
@@ -123,7 +124,7 @@ impl MerkleTree {
 
         let n = self.nodes.len() / 2;
         let two_nodes =
-            unsafe { slice::from_raw_parts(self.nodes.as_ptr() as *const [Digest; 2], n) };
+            unsafe { slice::from_raw_parts(self.nodes.as_ptr() as *const [RpoDigest; 2], n) };
 
         for _ in 0..depth {
             index /= 2;