LLVM-Valida Compiler
Last updated
Last updated
This section is mostly written by Michael Liao.
The goal of Lita’s compiler work is to let programs written in mainstream languages be compiled to run in the Valida zk-VM. This work leverages the LLVM compiler infrastructure.
The LLVM compiler infrastructure is a collection of libraries and programs for compiling, linking, and performing related tasks, from a variety of source languages to a variety of target ISAs and executable file formats. The LLVM compiler infrastructure is based around LLVM. “LLVM” stands for “Low Level Virtual Machine.” LLVM is a language which is designed as an intermediate target for compilation. This is the compilation strategy of the LLVM compiler infrastructure: compile the source languages to LLVM, and compile LLVM to the target ISAs.
The advantages of this approach are clear, for a compiler which supports many source languages and many target ISAs. By having a common intermediate representation, one needs to code just one compiler frontend per source language or family of source languages, and just one compiler backend per target ISA or family of target ISAs. One challenge of this approach is that the intermediate representation needs to be chosen fairly carefully to be both a good target language for efficient compilation from all of the supported source languages, and a good source language for efficient compilation to all of the supported target ISAs.
The LLVM compiler infrastructure is an old, mature, and well regarded open source project. Started before June 2001, its source repo contains over 5 million lines of C++ code. LLVM was the winner of the ACM Software System Award in 2012. Alongside GCC, LLVM is one of the most mature open source compiler toolchains for C and C++. LLVM is also the basis of the standard Rust compiler toolchain.
Because Valida is optimized for efficient zk-VM implementation, it is significantly different from most ISAs targeted by compilers. Unlike in silicon, where accessing RAM is significantly more expensive (takes more time) compared to accessing registers, there is no difference in cost between accessing registers and accessing RAM in Valida’s zk-VM prover design. For this reason, Valida lacks any general-purpose registers; instead, instructions directly access RAM, using a stack machine architecture. This is one example of a significant difference between Valida and most ISAs targeted by compilers.
Utilizes the LLVM project to generate optimized Intermediate Representations (IR) from various frontends
Current plans to support C, C++, Rust
Compiles the LLVM IR to Valida assembly:
Implementing a Valida LLVM backend poses some challenges, due to the peculiarities of Valida. Because LLVM is designed with assumptions in mind, and those assumptions are not all true of Valida, there is some impedance mismatch between Valida and LLVM that needs to be worked around. Some of these challenges have been surmounted, and some remain to be solved. For instance, LLVM assumes that there are general purpose registers, but Valida has none. For another example, Valida has separate address spaces for RAM and program ROM, and the program ROM address space can only be read by the instruction decoder and not in any other way. LLVM assumes that it can emit static data into the object code which the program can then read. That is not the case in Valida.
As of this writing, the LLVM compiler backend for Valida is capable of compiling some basic C programs. Support is planned for most of standard C, C++, Rust, and Go. Support is not planned for programs which use floating point. Support is planned for standard I/O functions, with the input tape represented by stdin, and the output tape represented by stdout. No support is planned for any I/O functions beyond reading stdin and writing stdout and stderr. Programs running in Valida will not be able to ask for the current time, because it makes no sense for a program executing in a non-interactive proof to ask this. The Valida VM in its default configuration has no I/O peripherals except for the input tape and output tape. It would be nice to add an additional debug output tape, which could be represented by stderr, and implemented only when compiling in debug mode.
The Valida VM has no ability to read or write files, communicate via network sockets, and so forth. Programs running in the Valida VM have no ability to make system calls, because there is no operating system to call out to. There is no parallel computing or memory protection in Valida.
In short, Valida is a bare metal computing environment, such as is seen in the smallest of embedded systems. Valida programs run “on bare metal,” which means that they run with no operating system and no memory protection. There are no guard rails in place to prevent or catch segmentation faults.
To run in such an environment, programs frequently need to be modified. Not all of libc will compile to run in a bare metal computing environment; in particular, those functions in libc which make system calls will not compile to run on bare metal. LLVM’s included libc can compile to run on bare metal, and in this compilation mode it excludes those standard library functions which make system calls. LLVM’s bare metal libc can mostly compile to Valida, currently, with the floating point functions and the functions that make system calls excluded. Future work plans to make standard I/O work on LLVM’s libc when compiling to run on Valida.
Once completed, this work will let people compile programs written in supported languages (initially, C/C++ and Rust) to Valida machine code which can be run on the Valida zk-VM. This will make it practical to use the Valida zk-VM for applications which require succinct and/or zero knowledge proving. We expect this to provide significant performance advantages, leading to cost savings for large scale applications of (zk-)SNARKs, and opening up use cases which are not currently viable due to the computational complexity of creating (zk-)SNARKs.
Currently Lita’s Valida compiler toolchain is not publicly available, but we are open to providing early access to select partners. The status of this work currently is that it works for some very basic programs, but there are quite a few open issues that need to be addressed before it will be viable for applications.