Fibonacci (vs. SP1)

As of August 2024

Results

Iterative Fibonacci

The following tables records the means, medians, and standard deviations of the various sample groups. All measurements are denoted in seconds.

The following table displays the ratios between the mean measurements for Valida and SP1 of the user time and the wall clock time. A number greater than 1 indicates that Valida is faster; a number less than 1 indicates that SP1 is faster.

Analysis

Serial Valida was more efficient (user time) than SP1 and parallel Valida, under all conditions. Parallel Valida was faster (wall clock time) than SP1 and serial Valida, under all conditions.

The wall clock and user time advantages of parallel Valida over SP1 declined as n increased, but remained significant in all of the problem sizes studied. The same can be said of the user time advantage of serial Valida over SP1. The reason for this is because the increase in n increases proving time only slightly for both VMs. However, because Valida is so fast, the slight increase is of a higher percentage than the rest of the proving time. For SP1, it takes longer to prove the program and thus a slight increase increases the proving time in a lesser proportion.

The most dramatic advantages observed were in the serial user time. This indicates that serial Valida is a particularly efficient prover, in terms of energy consumption / computing resource usage.

Limitations

This study only looked at executions of one example. Looking at a wider variety of algorithms would provide much better insight into the relative performance of these two systems. This is true because the ISAs and compilers used in the three study groups (namely Valida and SP1) are substantially different from one another. Hypothetically, the cost of proving is roughly a function of the size of the trace, and which specific instructions are executed does not make a big difference. To the extent that hypothesis is true and accurate, then testing a single program may provide adequate insight into the cost of proving as long as traces of a wide enough variety of lengths are considered.

This study only looked at proof generation on a single machine. This does not tell how these proof systems would perform on different machines with different processors, amounts of RAM, etc. It also does not provide insight into how these proof systems might perform with different implementations on specialized vector processing hardware such as GPUs or Fabric Cryptography’s VPUs.

The Valida implementation is incomplete in that the STARK is under-constrained. It is possible that the changes required to make the STARK fully constrained will substantially affect the performance characteristics of the prover. We as Lita Foundation do not expect a dramatic change in the performance of the Valida prover to result from the changes required to make the STARK fully constrained, but this risk exists.

This study did not compare to SP1 in a single threaded proving mode, but only to SP1 in a multi threaded proving mode. It would be interesting to see if serial SP1 proving is more efficient (user time) than parallel SP1 proving, and if so by how much.

So far, minimal attention has been given to the efficiency of the code generated by the Valida compiler backend. A lot of effort will need to be made to optimize the efficiency of the code generated by the Valida compiler. However, even given this the compiled program still performs significantly better than SP1.

This study used a small sample size and ran only the most rudimentary of statistics over the raw data. This type of study cannot be expected to give a full statistical picture of the underlying population being sampled from, especially with regard to outliers.

See Section 1.5 (“Accuracy”) of the GNU time manual for limitations on the accuracy of GNU time. This manual may be accessible on your system by running info time in the shell (if you have GNU time installed on your system).

Because Valida currently only supports 32 bits integers, n cannot go higher than 47 in this study, although SP1 supports up to 128 bits integers. The programs compared, besides being in different languages, are also notably different in that the SP1 example uses 128-bit integers, whereas the Valida example uses 32 bit integers. This is a possible confounder which could help to explain the different results for Valida vs SP1 in this study. However, using 32-bit integers in place of 128-bit integers in the SP1 example did not produce significantly different results for SP1, in the small amount of testing that Lita did on this possible confounder.


Methodology

Scope of benchmarks: proving systems & test program

These benchmarks compare the proving performance of Valida to that of SP1 and itself. They do a three-way comparison between (a) the Valida prover running in single-threaded mode, (b) the Valida prover running in multi-threaded mode, and (c) SP1 running in multi-threaded mode.

These benchmarks compare iterative implementations of the Fibonacci sequence.

This is the C source code for the iterative benchmark:

int main () {
  int n = 25;
  int a = 0;
  int b = 1;
  int sum;
  unsigned i;
  for (i = 1; i < n; i++) {
    sum = a + b;
    a = b;
    b = sum;
    };
  return 0;
}

The values in the first line of the main function body are replaced with different values in different instances in order to test the proving performance on problems of different sizes. This is the Rust source code for the same example:

//! A simple program to be proven inside the zkVM.

#![no_main]
sp1_zkvm::entrypoint!(main);

pub fn main() {
    // NOTE: values of n larger than 186 will overflow the u128 type,
    // resulting in output that doesn't match fibonacci sequence.
    // However, the resulting proof will still be valid!
    let n = 46;//sp1_zkvm::io::read::<u32>();
    let mut a: u128 = 0;
    let mut b: u128 = 1;
    let mut sum: u128;
    for _ in 1..n {
        sum = a + b;
        a = b;
        b = sum;
    }

    sp1_zkvm::io::commit(&a);
    sp1_zkvm::io::commit(&b);
}

This Rust code is specialized for compilation to run as a SP1 program. Ideally, this study would have used the same source program to run on both Valida and SP1, in order to form a more direct comparison. However, SP1 currently only supports Rust but the Valida compiler has only limited support for Rust at the moment, the most direct readily available comparison was to run two programs implementing the same algorithm, one in Rust and one in C.

The two programs have exactly the same executions, except that the C program does not write out the results of a and b but the Rust program does. This is because at the time of writing, Valida does not support writing the outputs to stdout in C. However, SP1's IO is not separable from how it commits data. To ensure fairness, the time used to print two lines was estimated from the mean of 10 recorded time differences with the printing included and not, using Jolt's equivalent Rust program. The time used to print was found to be negligible. See the raw data for details.

Metrics

There are two key timing metrics this study pays attention to: user time and wall clock time. User time is equal to the amount of computer time expended by the program code. Wall clock time refers to the amount of time elapsed from the start to the finish of the computation. If, for example, a program fully utilizes 5 cores or threads for 5 seconds, that would be 25 seconds of user time and 5 seconds of wall clock time.

It’s clear that wall clock time is interesting as a metric because it’s related to how long a user needs to wait for their proof. It’s assumed that some users demand fast delivery of proofs.

User time is interesting as a proxy for the energy consumed by running a computation. In a situation where all cores are fully utilized almost all the time, the energy consumption of a CPU should remain relatively constant. In this situation, the user time spent by a CPU-bound, predominantly user-mode code execution should be roughly linearly proportional to the number of watt-hours (i.e., energy) consumed by that code execution.

Economizing on energy consumption, or the usage of computing resources, should be interesting from the point of view of any proving operations which have enough scale that their energy bill (or their cloud compute bill) for proving is a considerable operational cost. For these reasons, user time is an interesting metric. Lower costs of proving allow service providers to pass on lower proving prices to cost-sensitive customers.

Procedure

The steps for generating a proof of execution are specific to the zk-VM and were taken directly from the official documentation for each respective zk-VM. The following zk-VM versions were used:

See the official documentation for each zk-VM for exact details, and below for a brief summary of the steps to generate a proof of execution and measure its user time and wall clock time.

Build the Valida compiler toolchain from source. The process of doing this is described in the section titled Building and running the Valida backend. This includes building clang (the C compiler), lld (the linker).

To prove the execution of the Rust Fibonacci program in SP1, install sp1up by following the install instructions. Follow the quick start instructions. Replace the contents of program/src/main.rs with the Rust Fibonacci program displayed above. Compile the Fibonacci program using cargo build --release then run it

In order to measure the run time of a program, this study used GNU time. The test system was a z1d.metal AWS instance with 48 vCPUs and 384GB of RAM. During the tests there were no other programs running on the system other than the tests themselves, and the AWS instance was bare-metal to avoid interference from other VMs running on the same hypervisor. Tests were performed sequentially, one after another.

Tests performed & statistics ran

The tests include n = 25 and 46. Ten samples were taken for each of the following groups: Valida (single-threaded, set with RAYON_NUM_THREADS=1), Valida (multi-threaded, set with RAYON_NUM_THREADS=32), and Rust on SP1. Each sample included measurements of the user time and wall clock time. For each group of 30 (Valida) and 5 (SP1) samples, the means, medians, and standard deviations of the user time and wall clock time were computed. The ratio of the mean user time for SP1 over the mean user time for Valida was computed for each of the two conditions in which Valida was run (single-threaded and multi-threaded). Same for the wall clock time.

Last updated