Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support SHA256 precompile #213

Open
eigmax opened this issue Jan 11, 2025 · 4 comments
Open

Support SHA256 precompile #213

eigmax opened this issue Jan 11, 2025 · 4 comments

Comments

@eigmax
Copy link
Member

eigmax commented Jan 11, 2025

SHA256 starky reference: https://github.com/succinctlabs/starkyx/blob/1644af58bfc7d305c0066597c303a129fec7f427/starkyx/src/machine/hash/sha/sha256/register.rs

@VanhGer
Copy link
Contributor

VanhGer commented Jan 16, 2025

Algorithm:

Sha256

We implemented the Block decomposition and Hash computation phases, which we refer to as ShaExtend and ShaCompress, respectively.

There are 4 tables.

1. ShaExtend

This table has 48 rows per block.

Columns:

  • timestamp:
  • is_normal_round: A flag indicating whether the row is a normal round or padded.
  • w_i_minus_15, w_i_minus_2, w_i_minus_16, w_i_minus_7: Inputs, represented in big-endian bit order.
  • Intermediate values: w_i_minus_15_rr_7, w_i_minus_15_rr_8, w_i_minus_15_rs_3, s_0, w_i_minus_2_rr_17, w_i_minus_2_rr_19, w_i_minus_2_rs_10, s_1, all in big-endian bit order.
  • Wrapping add operation values: w_i_inter_0, carry_0, w_i_inter_1, carry_1, w_i, carry_2, all in big-endian bit order.

Number of columns: $2 + 32 * 18 = 578$

Constraints:

  • Input, intermediate, and output values are binary (0 or 1).
  • Correct application of rotation and shift operations.
  • Correct computation of s_0, s_1
  • Correct wrapping_add computation of w_i_inter_0, w_i_inter_1
  • Correct computation of w_i

To perform and verify wrapping_add of 2 numbers in binary form, we need to create the result and carry tables. These store the result and the reminder values for each bit-wise addition.

For example, given x = [1,0,1,0,1] and y = [1,0,0,1,1] (both in big-endian form):

  • result = [0,1,1,1,0]
  • carry = [1,0,0,0,1]

The constraints for this addition are:

let mut pre_carry = P::ZEROS;  
for i in 0..N {  
    let sum = x[i] + y[i] + pre_carry;  

	// If sum is 1 or 3, the result must be 1. 
	// If sum is 0 or 2, the result must be 0.
    let out_constraint = (sum - P::ONES) * (sum - P::ONES - P::ONES - P::ONES) * out[i] + sum * (sum - P::ONES - P::ONES) * (out[i] - P::ONES);  

	// Ensuring 2 * carry[i] + result[i] == sum
    let carry_constraint = carry[i] + carry[i] + result[i] - sum;  
    pre_carry = carry[i];  
}

See SP1's implementation of wrapping_add for 4 numbers: Add4Operation

Degree: 3

2. ShaExtendSponge

Handles the input/output of ShaExtend. This table has 48 rows per block.

Columns:

  • timestamp
  • round: [T; 48] : round[i] = 1 if this is round number i. Otherwise, round[i] = 0.
  • w_i_minus_15, w_i_minus_2, w_i_minus_16, w_i_minus_7
  • w_i in big-endian binary form
  • context
  • segment
  • input_virt: [T; 4] The addresses of w_i_minus_15, w_i_minus_2, w_i_minus_16, w_i_minus_7
  • output_virt: The address of output w_i

Number of columns: 216

Constraints:

  • Input and output values must be in binary form.
  • If this is neither the final step nor a padding row:
    • The timestamp should be increased correctly
    • Round index should be increased by one
    • Input and output addresses should be increased by 4 each
  • The sum of round is 0 or 1.
    Degree: 3

3. ShaCompress

This table has 64 rows per block.

Columns:

  • timestamp
  • is_normal_round
  • input_state: [; 256] The binary representation of a, b, ..., h.
  • w_i: [; 32]
  • k_i: [; 32]: The SHA_COMPRESS_KEY i-th
  • Intermediate values: e_rr_6, e_rr_11, e_rr_25, s1, e_and_f, not_e_and_g, ch, inter_1, carry_1, inter_2, carry_2, inter_3, carry_3, temp1, carry_4, a_rr_2, a_rr_13, a_rr_22, s0, a_and_b, a_and_c, b_and_c, maj, temp2, carry_5, carry_a, carry_e. (all in big-endian binary representation).
  • output_state: [; 256]: The updated value of a, b, ..., h

Number of columns: 1442

Constraints:
Ensures the validity of:

  • Input, intermediate, and output values (must be binary).
  • The rotation
  • The XOR operation
  • The wrapping_add computation
  • The AND and NAND operations
  • The output state of previous row is the input state of the next row.
    Degree: 3

4. ShaCompressSponge

This table has 64 rows per block.

  • timestamp
  • round: [T; 64]
  • hx: [; 256]: the big-endian binary representation of h[0],..., h[7].
  • input_state: [; 256] The binary representation of a, b, ..., h.
  • w_i
  • k_i
  • output_state: [; 256]: The updated value of a, b, ..., h
  • output_hx: [; 256]: the updated values of hx in the finalize phase (usable after each 64 rounds)
  • Intermediate values: carry: [;256]
  • context
  • segment
  • hx_virt: [;8]: the address of hx
  • w_virt: the address of w_i

Number of columns: 1420

Constraints:

  • Input, intermediate, and output values must be binary.
  • The sum of round is 0 or 1.
  • If this is not the final step or a padding row:
    • The local and next timestamps must match.
    • The local and next context hx_virt must match
    • The address of w_i must be increased by 4
    • The output state of local row must be the input state of next row
  • The key_i value must correspond to the correct round.
  • Correct wrapping add operation.

Degree: 3

Performance

Compare the performance of sha2-rust and sha2-syscall(using SHA precompile).

Command:
ARGS="fa3cfb3f1bb823aa9501f88f1f95f732ee6fef2c3a48be7f1d38037b216a549f 1223" RUST_LOG=info SEG_OUTPUT=/tmp/output cargo run --release

Number of Committed Cells (Before Padding):

  • sha2-rust: 7214067
  • sha2-syscall: 4774835
  • Reduction ~ 35%

Proving Time

  • sha2-rust: 95s
  • sha2-syscall: 92s

Sha2-rust:

Image

Sha2-syscall:

Image

@eigmax
Copy link
Member Author

eigmax commented Jan 19, 2025

Note that the max constraint degree is 3.

@eigmax
Copy link
Member Author

eigmax commented Feb 3, 2025

For the benchmark, plz stat how large the cells reduced by precompile sha2.

Meanwhile, you also test hash computation with different input size, like 32, 256, 1024, and 65536.

@VanhGer
Copy link
Contributor

VanhGer commented Feb 4, 2025

Hash computation with different input sizes

Size of input Sha-rust committed cells Sha precompile committed cells Reduction %
4 7214781 4775534 2439247 -34%
32 7261692 4822445 2439247 -33%
256 13525631 9570935 3954696 -29%
1024 38509352 21852615 16656737 -43%
8192 263817769 159838069 103979700 -39%
Size of input Sha-rust proving time Sha precompile proving time
4 94.9346s 94.0968s
32 93.1967 91.4709
256 93.7883 91.9840
1024 136.7243 101.7800
8192 397.1998 212.3387

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants