SHA-256 (vs. SP1)

As of May 2024

This note gives a benchmark comparison of proving execution of SHA-256 on a 32-byte input, on Valida and on SP1, using their precompile for SHA-256. In this benchmark, single-core Valida proving was estimated to be about 304 times more efficient than multi-core SP1 proving, in terms of the time and energy expended on computing the proof. Multi-core Valida proving was estimated to be about 53.5 times faster than multi-core SP1 proving on this example. These are not the best results that could be obtained on this problem with Valida, since much work remains to be done on optimizing the prover and the code generated by the compiler.

Results

The raw data is available. The following table records the means, medians, and standard deviations of the various sample groups. All measurements are denoted in seconds.

Prover

Measure

Mean

Median

Standard deviation

Valida serial

User t.

1.1237

1.1235

0.01787

Valida parallel

User t.

2.7835

2.7865

0.06926

SP1

User t.

341.757

322.061

63.214

Valida serial

Wall clock t.

1.151

1.1505

0.0219

Valida parallel

Wall clock t.

0.249

0.2485

0.0051

SP1

Wall clock t.

13.326

13.328

0.0597

The following table displays the ratios between the mean measurements for Valida and SP1 of the user time and the wall clock time. A number greater than 1 indicates that Valida is faster; a number less than 1 indicates that SP1 is faster.

Measure

Valida condition

Valida advantage

User time

Serial

304.135

User time

Parallel

122.78

Wall clock time

Serial

11.578

Wall clock time

Parallel

53.47

Methodology

The following zk-VM versions were used:

Valida: git commit https://github.com/valida-xyz/valida/commit/54f3f5a0c8a499569d0f8c22ec37b8d73bb0631d
SP1: git commit https://github.com/succinctlabs/sp1/commit/f50fb1c3b11993fa926c128b1d38db7af969ef51

The following version of the Valida LLVM compiler was used: 45bce621680189d5d006f88cbadbe9cbef403b89

The C implementation of SHA-256 which was compiled to run on Valida is a modified version of the reference implementation of SHA-256 by Brad Conte. That modified version is available here: https://github.com/lita-xyz/valida-c-examples/blob/main/sha256_32byte_in.c

The Rust implementation of SHA-256 which was compiled to run on SP1 is available in this repository. The SHA-256 library points to SP1's patched crate, which calls their precompiles under the hood.

The inputs for both Valida and SP1 are the same and are a 32-byte array of 5s.

The following commands were run to execute the benchmarks:

For Valida:

# excluded in the benchmarking time
./llvm-valida/build/bin/clang -c -target delendum ../../valida-c-examples/sha256_32byte_in.c -o ./buildValidaTests/sha256_32byte_in.o
./llvm-valida/build/bin/ld.lld --script=./llvm-valida/valida.ld -o ./buildValidaTests/sha256_32byte_in.out ./buildValidaTests/sha256_32byte_in.o

# included in the benchmarking time, run as a bash script and timed
RAYON_NUM_THREADS=32 ./valida/target/release/valida run ./buildValidaTests/sha256_32byte_in.out ./buildValidaTests/sha256_32byte_in.log
RAYON_NUM_THREADS=32 ./valida/target/release/valida prove ./buildValidaTests/sha256_32byte_in.out ./buildValidaTests/sha256_32byte_in.proof
RAYON_NUM_THREADS=32 ./valida/target/release/valida verify ./buildValidaTests/sha256_32byte_in.out ./buildValidaTests/sha256_32byte_in.proof

The above command shows multi-threaded execution; for single-threaded execution, 32 is replaced with 1.

Note that Valida currently fails to verify this program, but the output is checked to be correct by examining the log time. We are working on fixing this problem but the verification time should not make a meaningful impact on the results.

For SP1, first we built the program in the program directory with the following command:

git clone git@github.com:lita-xyz/benchmarks.git
cd SP1SHA256/program/
cargo prove build

Then, run the following command in the script directory:

time cargo run --release

In order to measure the run time of a program, this study used GNU time. The test system has a AMD Ryzen 9 7950X 16-Core Processor, with 32 threads, and 124.9 GB DDR5 RAM. During the tests there was no other programs running on the system other than the tests themselves. Tests were performed sequentially, one after another.

For some unknown reason, running cargo run --release on the test system caused the host program to be recompiled every time, which is not supposed to happen. This made it a little harder to measure the execution time of the SP1 prover. The build time listed in the output is therefore subtracted from the recorded time such that only the execution, proving, and verifying times are counted.

PreviousSHA-256 (vs. RISC Zero)NextSHA-256 (vs. Jolt)

Last updated 1 year ago