# Evaluating zk-VMs

What are the criteria by which we should evaluate zk-VMs? In other words, when should we say that one zk-VM is better than another? The answer to this depends on the use case, but it’s possible to give some generally useful answers, because the main criteria that matter don’t vary a lot between different use cases. This note proposes that zk-VMs should be evaluated based on the following criteria: correctness, security, zero knowledge, trust assumptions, other assumptions, efficiency, speed, and succinctness.

A zero knowledge virtual machine (zk-VM) is a software system for proving the results of computations using short, easy to verify proofs which do not reveal the inputs to those computations. A zk-VM usually consists mainly of:

an execution engine, which runs computations in the zk-VM’s model of computation;

a prover, which proves the results of computations; and

a verifier, which checks proofs of the results of computations.

## Correctness

Correctness means that the execution engine implements the intended semantics for the model of computation, and that the prover and verifier actually satisfy the claimed security properties. The main security properties that a zk-VM should satisfy are soundness, completeness, and zero knowledge:

Soundness means roughly that the verifier does not accept purported proofs of statements that are false. In other words, the verifier does not accept that a computation has a claimed result unless some inputs can be provided to the program being executed which cause the program to produce the claimed result.

Completeness means roughly that for any computation for which the prover can purportedly prove the result, the prover can in fact produce a proof of that computation’s result, which the verifier will accept.

Zero knowledge means roughly that having a proof of the result of a computation is not more helpful in deducing the inputs which produced that result, as compared to simply knowing the result of the computation.

## Security

Security means roughly the tolerances that are claimed by the correctness properties. Soundness, zero knowledge, and completeness all can have non-zero tolerances. A “tolerance” in this context is the maximum tolerable probability that the property will fail. Zero tolerances are of course the ideal, but zk-VMs do not achieve zero tolerances on all of these properties in practice. Perfect soundness and completeness appear to be incompatible with succinctness, and there are no known methods for achieving perfect zero knowledge. A common way of measuring security is in bits of security, where a tolerance of 1/(2^n) is referred to as n bits of security. More bits of security is better.

If a zk-VM is perfectly correct, that does not necessarily imply that it is secure. Correctness only implies that the zk-VM satisfies its security properties up to the claimed tolerances. It does not imply that the claimed tolerances are low enough for comfort. Also, if a zk-VM is sufficiently secure, that does not imply that it is correct; security refers to the claimed tolerances, not the tolerances that are actually achieved. Only when a zk-VM is both perfectly correct and sufficiently secure can it be said that the zk-VM is reliable up to the claimed tolerances.

## Zero knowledge

Not all zk-VMs are zero knowledge. As of this writing, it appears that most zk-VMs do not have the zero knowledge property. This is a bit confusing; it would be more accurate to use "succinct VMs" to refer to succinct execution provers without the zero knowledge property, but in keeping with common practice, these docs apply the term "zk-VM" whether the system has the zero knowledge property or not. The presence of the zero knowledge property is another criterion by which we can evaluate zk-VMs.

## Trust assumptions

Trust assumptions are the assumptions about other people that one must make in order to reach the conclusion that a zk-VM is reliable. When a zk-VM has trust assumptions, then it is necessary to trust in the integrity of some individuals in order to reach the conclusion that it functions reliably. For example, an honest majority trust assumption states that out of some set of N individuals, more than N/2 of those individuals exhibited integrity in some particular interaction(s) with the system. A 1 out of N trust assumption states that out of some set of N individuals, at least one of those individuals exhibited integrity in some particular interaction(s) with the system.

Some zk-VMs have no trust assumptions. When zk-VMs have trust assumptions, this usually takes the form of a trusted setup process. A setup process for a ZK proof system runs once, before the first proof is generated using the proof system, in order to generate some information called “the setup data.” In a trusted setup process, one or more individuals generate some randomness which gets incorporated into the setup data, and it needs to be assumed that at least one of those individuals deleted the randomness which they incorporated into the setup data.

## Other assumptions

Arguments for the correctness of a zk-VM always make assumptions. These may include trust assumptions, but they also include other assumptions, such as assumptions about the adversary, or assumptions about the difficulty of certain problems.

Classical, i.e. non-probabilistic, proof theory is subject to impossibility results which show that classical proof theory cannot be succinct. One such result is Gödel's speed-up theorem. This is the motivation for probabilistic proof theory: unlike classical proof systems, probabilistic proof systems can be succinct.

A proof system being probabilistic implies that there is some possibility, however remote, that a valid proof's conclusion is not true. In order to reach the conclusion that such a possibility is sufficiently remote, some assumptions need to be made about the operational environment. For example, given an adversary who can run an unlimited amount of computations and collect their results in a short amount of time, a probabilistic proof theory would not provide assurances, because by brute force (guess and check), the adversary could quickly discover a proof of a false conclusion whenever such a proof can exist. It seems reasonable to assume that such an adversary cannot exist, but this can be noted as an assumption underlying the security arguments for a probabilistic proof theory.

Security arguments for probabilistic proof systems tend to make assumptions about the adversaries, such as the assumption of computationally bounded adversaries. They also tend to make assumptions about the hardness of certain problems; for example:

that a certain hash function is collision resistant;

that computing discrete logarithms in a certain subgroup of an elliptic curve group is hard.

Probably the most notable distinction between different succinct proof systems, in terms of assumptions about the adversary or the hardness of certain problems, is the distinction between plausibly post-quantum (PPQ) proof systems and non-PPQ proof systems. A PPQ proof system is plausibly secure (has no known exploits) if we assume an adversary who has access to a powerful quantum computer. A non-PPQ proof system has known exploits if we assume an adversary who has access to a powerful quantum computer.

It is widely believed that no powerful quantum computers exist; certainly none are commercially available. The quantum computers which the general public knows about are not powerful enough to break succinct proof systems. However, it is known that in theory, sufficiently powerful quantum computers would be able to break succinct proof systems which assume that computing discrete logarithms is hard. Quantum computing theory posits that Shor's algorithm would be an efficiently quantumly computable algorithm for computing discrete logarithms on a sufficiently powerful quantum computer.

Broadly speaking, elliptic curve cryptography based proof systems tend to assume that computing discrete logarithms is hard. Therefore, elliptic curve based proof systems tend to be non-PPQ. STARKs, and more generally FRI-based proof systems, tend to be PPQ, without much in the way of assumptions, beyond the assumption that the hash function used in FRI is collision resistant.

## Speed

Speed means the amount of time it takes for the prover to run. Lower is better. Speed is a metric of interest for latency-sensitive applications, where it matters how soon the proof is available after the proof is requested.

To operationalize the definition of speed, so that speed can be measured, the definition has to be made relative to one or more test programs, one or more test inputs to each of those programs, and one or more test systems.

## Efficiency

Efficiency means the amount of resources the prover consumes. Lower is better. The prover consumes two kinds of resources: core-time and space. Efficiency can therefore be subdivided into core-time efficiency and space efficiency.

“Space” refers to information storage capacity, such as in RAM. “Core-time” refers to the average amount of time spent in the prover over all cores running the prover, multiplied by the number of cores running the prover. For a single-core prover, the core-time consumption and the speed are the same thing. For a multi-core capable prover running in multi-core mode on a multi-core system, core-time consumption and speed are not the same thing.

Both core-time and space efficiency are related to the financial cost of proving. Both kinds of efficiency are related to the energy consumption of the proving process and the amount of capital utilized by the proving process.

To operationalize the definition of efficiency, so that efficiency can be measured, the definition has to be made relative to one or more test programs, one or more test inputs to each of those programs, and one or more test systems.

## Succinctness

Succinctness means the size of the proofs generated by the prover, and the complexity of verifying them. The complexity of verifying proofs can be further subdivided into the time and space complexity of verifying proofs. Verifying is usually a single-core operation, so there is usually no need to distinguish between speed and core-time efficiency with respect to verifying. It is therefore useful to think of succinctness as a group of three distinct metrics: proof size, proof verification time complexity, and proof verification space complexity.

As with speed and efficiency, operationalizing the definition of succinctness requires specifying sets of test programs, test inputs, and test systems.

## Conclusions

When evaluating zk-VMs for mission critical applications, correctness and security should be considered as the baseline. There needs to be sufficient reason to be confident about the correctness, and sufficiently strong claimed security. Also, the trust assumptions need to be sufficiently weak for the application. Without all of these properties, the zk-VM is potentially worse than useless for the application, as it may fail to perform as specified and expose users to hacking.

Speed, efficiency, and succinctness are all sliding scale properties. How they should be weighted in an evaluation depends on the application. It tends to be that the fastest solution is not the most efficient, or the most succinct; the most succinct solution is not the fastest or the most efficient; and so forth. A good way of weighting these properties in an evaluation may be to define what are acceptable, good enough levels for each property, and then to define which properties are most important. The most important properties should be optimized, subject to maintaining good enough levels on all of the other properties.

Lita’s market research suggests that for most commercial use cases, the most important properties, out of speed, efficiency, and succinctness, are either speed or core-time efficiency, depending on the application. Some applications are price sensitive and want to optimize for low energy consumption and low use of capital in proving; for these, core-time efficiency is probably the most important metric to optimize. Other applications are latency sensitive and want to optimize for speed.

We as Lita believe that the Valida zk-VM offers the best set of trade-offs between speed, efficiency, and succinctness, for the needs of many commercial applications. Valida is designed to optimize for speed and core-time efficiency, subject to keeping succinctness and space efficiency reasonably good. Valida compares favorably with RISC Zero, Succinct SP1, and Jolt in Lita’s benchmarking studies.

In general, single-core computation tends to be more core-time efficient than multi-core computation. This is because multi-core computation usually entails some communication overhead to coordinate the activities of the different cores. Valida takes advantage of this and is capable of running in single-core mode or multi-core mode. In single-core mode, Valida achieves its maximum core-time efficiency. In multi-core mode, Valida achieves its maximum speed for a given test environment.

Using known methods, Valida’s prover could be faster and more core-time efficient than it is, but not without sacrificing succinctness. Valida currently uses a FRI polynomial commitment scheme. If instead Valida used a Brakedown polynomial commitment scheme, then it would achieve faster, more efficient proving, but it would achieve less succinctness, and probably not enough succinctness for most applications. We as Lita continue to keep abreast of the latest research in zero knowledge cryptography, with the intention to stay at the leading edge of speed and efficiency in zk-VM design and implementation.

Valida does not require any trust assumptions. Valida has some known issues with correctness, and its security level is not known. The Lita team is working to address the known issues with Valida’s correctness and to flush out any unknown correctness issues. Quantifying Valida’s security is a problem which Lita plans to address as part of future work to rigorously and formally check the correctness of Valida.

For some use cases, Valida is not succinct enough. For purposes which require a zk-VM proof to be verified on a layer 1 blockchain, the computational and space constraints of the layer 1 blockchain may require or incentivize greater succinctness than what Valida provides. For these use cases, Lita plans to create a Groth16-based wrapper for Valida proofs. The wrapper would take as input a Valida proof, and produce a Groth16 proof of the same statement. This would provide greater succinctness than Valida itself offers, at the cost of speed and efficiency, as well as the additional cost of introducing a trusted setup process.

Last updated