Technical Design

The Design of Valida LLVM-based Backend

Challenges

Different from conventional register-based CPU design, zkVM does not operate on general purpose registers. Instead, instructions refer to variables local to the call frame, i.e. relative to the current frame pointer fp. Such difference imposes a significant mismatch from the LLVM compilation framework, especially the LLVM IR and its machine IR used in its Retargetable LLVM Code Generator.

LLVM IR is a low-level RISC-like virtual instruction set. Resembling but different from real RISC ISAs, LLVM IR doesn't use a fixed set of named registers, it uses an infinite set of temporaries named with a % character, i.e. LLVM values. Within its retargetable code generator or backend, LLVM values are mapped on registers and assigned to physical registers by the register allocator.

The mismatch between zkVM and LLVM imposes challenges on the backend design to generate high quality zkVM code.

Without optimization, zkVM matches the original C programming model, where all auto variables reside on the local stack or the call frame. But, once optimizations are turned on, those auto variables would be promoted into LLVM values in SSA form. Most compiler optimizations benefit from the value representation in SSA form and could perform static data flow analysis efficiently. For operations on memory, most optimizations need to take conservative approach and heavily rely on the alias analysis to ensure memory operations won't overlap each other and break the transformation.
The machine IR (MIR) within the backend has a similar representation and replaces LLVM values with virtual registers. The backend optimizations have the similar situation to the middle-end LLVM optimizations.

Virtual Register vs Stack Object

Even though zkVM instructions only accesses variables on call frame, virtual registers could still be introduced as an immediate representation and lowered or translated later onto call frame variables after all necessary optimizations. For each native zkVM instruction, it has one or several internal representation accessing virtual registers per operand. For example, in this code fragment which copies the first parameter value stored in 12(fp) to a local frame variable -4(fp)

sw -4(fp), 12(fp)

could be represented an alternative form as

sw %r0, 12(fp)

where %r0 is a virtual 32-bit register and will be allocated or assigned into -4(fp) when all value-based or register-based optimizations are performed on the program code.

With the introduction of the virtual register, the backend of zkVM is able to match the LLVM compiler infrastructure perfectly.

New Backend Design

Pseudo Instruction

For each native instruction, one or more pseudo instructions are added by replacing one or more frame variable operands with virtual register operands. For example, besides the native add instruction (denoted as ADDfff specifying all operands as local frame variables), there are alternative pseudo forms like

ADDrrr specifies all operands as virtual register operands.
ADDrrf specifies the destination and the first source operand as the frame variables but the second operand as virtual register operands.
ADDfrr specifies the destination operand as the frame variables but all source operands as virtual register operands.
etc.

Totally, there are 8 add instructions (including the native one) and each of them specifies different operand forms. The reason why all combinations are used is discussed later.

Instruction Selection

Compared to the SelectionDAG-based instruction selection (SDAG), the global instruction selection (GISel) is still not mature enough. There are several targets have basic support of GISel and none of them uses GISel by default. As pseudo instructions referencing virtual registers are added, both instruction selectors could be used directly. SDAG is selected for the simplicity. During the instruction selection, only instruction forms referencing virtual registers are selected as they match the current code generation design. There are several exceptions:

Accesses to formal arguments or parameters refer to frame objects.
Accesses to return values refer to frame objects.
The call frame setup accesses frame objects.

The following example LLVM IR

define i32 @t0(i32 %v0, i32 %v1) {
  %r0 = add i32 %v0, %v1
  ret i32 %r0
}

is translated into

Frame Objects:
  fi#-3: size=4, align=4, fixed, at location [SP+4]
  fi#-2: size=4, align=4, fixed, at location [SP+16]
  fi#-1: size=4, align=4, fixed, at location [SP+12]

bb.0 (%ir-block.0):
  %0:r32 = LWrf %fixed-stack.1, 0 :: (load (s32) from %fixed-stack.1)
  %1:r32 = LWrf %fixed-stack.2, 0 :: (load (s32) from %fixed-stack.2)
  %2:r32 = ADDrrr killed %1:r32, killed %0:r32
  SWfr %fixed-stack.0, 0, killed %2:r32 :: (store (s32) into %fixed-stack.0)
  RET

Virtual Register Lowering

After the instruction selection, the program is represented in almost the final instruction but in different forms. At a certain point of compilation phases, all virtual registers are required to be replaced with stack objects. Instructions referencing virtual registers need replacing the corresponding form with one or all virtual register operands replaced. The following example shows the MIR after virtual register lowering:

Frame Objects:
  fi#-3: size=4, align=4, fixed, at location [SP+4]
  fi#-2: size=4, align=4, fixed, at location [SP+16]
  fi#-1: size=4, align=4, fixed, at location [SP+12]
  fi#0: size=4, align=4, at location [SP]
  fi#1: size=4, align=4, at location [SP]
  fi#2: size=4, align=4, at location [SP]

0B	bb.0 (%ir-block.0):
16B	  LWff %stack.0, 0, %fixed-stack.1, 0 :: (store (s32) into %stack.0), (load (s32) from %fixed-stack.1)
32B	  LWff %stack.1, 0, %fixed-stack.2, 0 :: (store (s32) into %stack.1), (load (s32) from %fixed-stack.2)
48B	  ADDfff %stack.2, 0, %stack.1, 0, %stack.0, 0 :: (store (s32) into %stack.2), (load (s32) from %stack.1), (load (s32) from %stack.0)
64B	  SWff %fixed-stack.0, 0, %stack.2, 0 :: (store (s32) into %fixed-stack.0), (load (s32) from %stack.2)
80B	  RET

Note that 3 stack objects are created for each virtual register defined/used in the MIR after the instruction selection. The original instructions defining/using virtual registers are replaced with their native forms using frame objects.

In the current implementation, this virtual register lowering pass is run just before the register allocation to eliminate all virtual registers.

Target-specific Optimizations

This part is currently in the plan phase.

Operand Folder

Due to the use of virtual registers, the program code has redundency loading values from frame variables to virtual registers, such as

16B	  LWff %stack.0, 0, %fixed-stack.1, 0 :: (load (s32) from %fixed-stack.1)

as the add instruction could directly access the frame variable used for the input parameter. However, that load or copy could be folded into its users if possible and safe. For example,

  %0:r32 = LWrf %fixed-stack.1, 0 :: (load (s32) from %fixed-stack.1)
  %1:r32 = LWrf %fixed-stack.2, 0 :: (load (s32) from %fixed-stack.2)
  %2:r32 = ADDrrr killed %1:r32, killed %0:r32
  SWfr %fixed-stack.0, 0, killed %2:r32 :: (store (s32) into %fixed-stack.0)

Both %0 and %1 could be folded into that add instruction as both of them are single-used and there is no overlapping memory operations between their lw instructions and that add instruction. After operand folding, the MIR looks like

  %2:r32 = ADDrff %fixed-stack.2, 0, %fixed-stack.1, 0 :: (load (s32) from %fixed-stack.2), (load (s32) from %fixed-stack.1)
  SWfr %fixed-stack.0, 0, killed %2:r32 :: (store (s32) into %fixed-stack.0)

Such folding is called folding up. Another kind of folding is to fold sw into its definition, or so-called folding down. After folding down, that MIR looks like

  ADDfff %fixed-stack.0, 0, %fixed-stack.1, 0, %fixed-stack.2, 0 :: (store (s32) into %fixed-stack.0), (load (s32) from %fixed-stack.2), (load (s32) from %fixed-stack.1)

Such operand folding pass should be applied just before the MIR is transited into the non-SSA form. On one hand, most optimizations could take advantage of the virtual register SSA form and reduce redundant instructions as much as possible. On the other hand, the folder pass itself also runs easier on the SSA form. As planned, this pass is scheduled as a pre-RA optimization in SSA form.

Virtual Register Allocation

From the different perspective, the virtual register lowering is just a special register allocator on an infinite set of registers. The current virtual register lowering is a straight-forward implementation. It could be enhanced to check virtual register live intervals and reused allocated stack object if two virtual registers' live intervals do not inference each other. This reduces stack objects in the generated program code.

As the current lowering pass is scheduled just before the real register allocator, live intervals are still available. A naive linear scan register allocator could be added to reduce stack objects used in the final program code.

Take the same example illustrated in the operand folding pass:

16B	  %0:r32 = LWrf %fixed-stack.1, 0 :: (load (s32) from %fixed-stack.1)
32B	  %1:r32 = LWrf %fixed-stack.2, 0 :: (load (s32) from %fixed-stack.2)
48B	  %2:r32 = ADDrrr %1:r32, %0:r32
64B	  SWfr %fixed-stack.0, 0, %2:r32 :: (store (s32) into %fixed-stack.0)

The live intervals of %0, %1, [16B,48B) and [32B,48B) respectively, do not inference with the live interval of %2, [48B,64B). %2 could reuse the stack object assigned to either %0 or %1. As the result, the compiler needs to allocate 2 stack objects instead of 3 in the current implementation.

PreviousLLVM-Valida Compiler NextBenchmarks

Last updated 7 months ago