Lita Docs
  • Introduction
    • Getting Started
  • Quick Start
    • Tutorial: Rust via Docker
    • Installation & System Requirements
    • Valida Compiler Toolchain
      • Rust Usage
      • C Usage
      • WASM Usage
      • Rust API
      • Client-side API
    • Valida zk-VM
  • ADVANCED USAGE
    • zk-VM: Advanced Usage
    • Using the Rust Toolchain
    • Using LLVM libc
  • Architecture
    • Overview
    • Proving System: Plonky3
      • Future Directions
    • Valida zk-VM
      • Technical Design: VM
      • Technical Design: Prover
      • GitHub Link
    • LLVM-Valida Compiler
      • Technical Design
      • GitHub Link
    • Benchmarks
      • Fibonacci (vs. RISC Zero)
      • Fibonacci (vs. SP1)
      • Fibonacci (vs. Jolt)
      • Fibonacci (Rust vs. C)
      • SHA-256 (vs. RISC Zero)
      • SHA-256 (vs. SP1)
      • SHA-256 (vs. Jolt)
  • Core Concepts
    • zk-VM
    • Proofs: Classical, Probabilistic, Succinct, and ZK
    • Evaluating zk-VMs
    • ZK-VM Design Tradeoffs
    • Valida Design Considerations
  • Contributing
    • Overview
    • Early Access Program
Powered by GitBook
On this page
  • Architecture
  • Instruction list
  • Heap allocation
  • Assembly
  • Trace
  • Design notes
  1. Architecture
  2. Valida zk-VM

Technical Design: VM

PreviousValida zk-VMNextTechnical Design: Prover

Last updated 4 months ago

The content of this section is largely copied from the Valida . This is a work in progress.

Architecture

A Valida zkVM consists of a CPU and several coprocessors, which are connected with communication buses. A basic example of a machine layout, omitting some standard chips for simplicity, would be

Communication buses are implemented in permutation arguments (either grand product or multi-set checks), and may be multiplexed for efficiency when only one of a subset of buses will be used in a given cycle.

There are multiple VM configurations. The "Core" configuration is always present, and provides instructions for basic control flow and memory access. Additional configurations, such as "Field Arithmetic" or "Additional Jump" build upon the core configuration and offer additional instructions.

Instruction format

Instructions are encoded in groups of 6 field elements. The first element in the group contains the opcode, followed by three elements representing the operands and two immediate value flags: opcode,opa\text{opcode}, \text{op}_aopcode,opa​, opb\text{op}_bopb​, opc\text{op}_copc​, immb\text{imm}_bimmb​, immc\text{imm}_cimmc​.

Program ROM

Our VM operates under the Harvard architecture, where program code is stored separately from main memory. Code is addressed by any field element, starting from 000. The program counter pc stores the location (a field element) of the instruction that is being executed.

Memory

Memory is comprised of word-addressable cells. A given cell contains 4 field elements, each of which are typically used to store a single byte (arbitrary field elements can also be stored). All core and ALU-related instructions operate on cells (i.e. any operand address is word aligned -- a multiple of 4). In the VM compiler, the address of newly added local variables in the stack is word aligned.

For example, a U32 is represented in memory by its byte decomposition (4 elements). To initialize a U32 from an immediate value, we use the SETL16 instruction (see the complete instruction list below), which sets the first two bytes in memory. To initialize a U32 value greater than 16 bits, we can also call the SETH16 instruction to set the upper two bytes.

Immediate Values

Our VM cannot represent operand values that are greater than the prime ppp, and cannot distinguish between 000 and ppp. Therefore, any immediate values greater than or equal to ppp need to be expanded into smaller values.

Registers

Our zkVM does not operate on general purpose registers. Instead, instructions refer to variables local to the call frame, i.e. relative to the current frame pointer fp.

Notation

The following notation is used throughout this document:

Operand values: opa, opb, opc denote the value encoded in the operand a, b, or c of the current instruction.

CPU registers: fp, pc denote the value of the current frame pointer and program counter, respectively.

Relative addressing: [a] denote the cell value at address a offset from fp, i.e. fp + a. Variables local to the call frame are denoted in this form. Note that we are omitting fp in the expression here, but that the first dereference of an operand is always relative to the frame pointer.

Absolute addressing: [[a]] denotes the cell value at absolute address [a]. Heap-allocated values are denoted in this form.

To refer to relative or absolute element values, we use the notation [a]elem[a]_\text{elem}[a]elem​ or [[a]]elem[[a]]_\text{elem}[[a]]elem​ respectively.

Instruction list

Each instruction contains 5 field element operands, a,b,c,d,ea, b, c, d, ea,b,c,d,e. Often, ddd and eee are binary flags indicating whther operands aaa and bbb are immediate values or relative offets.

Listed below are the instructions offered in each configuration.

Core

Mnemonic
Operands(asm)
Description

LW / LOAD32

a(fp), c(fp)

SW / STORE32

b(fp), c(fp)

LOADFP

a(fp), b

Load the value of fp + b into local stack variable at offset a.

JAL

a(fp), b, c

JALV

a(fp), b(fp), c(fp)

BEQ

a, b(fp), c(fp)

BEQI

a, b(fp), c

BNE

a, b(fp), c(fp)

BNEI

a, b(fp), c

IMM32

a(fp), b, c, d, e

STOP

No operands

Halt the program.

FAIL

No operands

Halt the program with exit code 1.

READ_ADVICE

a(fp)

Read the next byte on the input tape into the cell located at offset a.

Field arithmetic

Mnemonic
Operands(asm)
Description

FEADD

a(fp), b(fp), c(fp)

FEMUL

a(fp), b(fp), c(fp)

TO_FE

a(fp), b(fp)

FROM_FE

a(fp), b(fp)

Note that field arithmetic instructions only operate on the first element in a cell, which represents a field element instead of a single byte.

U32 Arithmetic

Mnemonic
Operands(asm)
Description

ADD

a(fp), b(fp), c(fp)

ADDI

a(fp), b(fp), c

SUB

a(fp), b(fp), c(fp)

Unchecked subtraction

SUBI

a(fp), b(fp), c

Unchecked subtraction

MUL

a(fp), b(fp), c(fp)

Unchecked multiplication

MULI

a(fp), b(fp), c

Unchecked multiplication

MULHU

a(fp), b(fp), c(fp)

High bits of unsigned multiplication

MULHUI

a(fp), b(fp), c

High bits of unsigned multiplication

MULHS

a(fp), b(fp), c(fp)

High bits of signed multiplication

MULHSI

a(fp), b(fp), c

High bits of signed multiplication

DIV

a(fp), b(fp), c(fp)

Unsigned division

DIVI

a(fp), b(fp), c

Unsigned division

SDIV

a(fp), b(fp), c(fp)

Signed division

SDIVI

a(fp), b(fp), c

Signed division

SH{L,R}

a(fp), b(fp), c(fp)

SH{L,R}I

a(fp), b(fp), c

SRA

a(fp), b(fp), c(fp)

SRAI

a(fp), b(fp), c

LT

a(fp), b(fp), c(fp)

LTI

a(fp), b(fp), c

NE

a(fp), b(fp), c(fp)

Set local variable a to 0 if [b] = [c] and 1 otherwise.

NEI

a(fp), b(fp), c

Set local variable a to 0 if [b] = c and 1 otherwise.

EQ

a(fp), b(fp), c(fp)

Set local variable a to 1 if [b] = [c] and 0 otherwise.

EQI

a(fp), b(fp), c

Set local variable a to 1 if [b] = c and 0 otherwise.

Bitwise

Mnemonic
Operands(asm)
Description

AND

a(fp), b(fp), c(fp)

ANDI

a(fp), b(fp), c

OR

a(fp), b(fp), c(fp)

ORI

a(fp), b(fp), c

XOR

a(fp), b(fp), c(fp)

XORI

a(fp), b(fp), c

Output

Mnemonic
Operands(asm)
Description

WRITE

b(fp)

Byte Manipulation

Note: These will not be supported in the initial version.

Instruction
Operands(asm)

LOAD8

a(fp), b(fp)

STORE8

b(fp), c(fp)

STORE8I

b(fp), c

Heap allocation

Notes:

  • Fixed configurable stack size (e.g. 8MB), growing in opposite direction of the heap.

  • Allocate-only malloc (no de-allocation using free)

Assembly

Instructions

We will closely follow RISC-V assembly, making modifications as necessary. The most important difference between our zkVM assembly and RV32IM is that instead of registers x0-31, we only have two special-purpose registers fp and pc. However, we have (up to 231−12^{31}-1231−1) local variables, addressed relative to the current frame point fp.

Calling convention / stack frame

Stack (grows downwards, i.e. address decreases from top row to bottom row)

Arg 2

Arg 1

Return FP offset

Return value

Return address

Local 1 (<- Current fp)

Local 2

...

Local N

We follow the RISC-V convention and grow the stack downwards. For a function call, the arguments are pushed onto the stack in reverse order. We only allow statically sized allocation on the stack, unlike traditional architectures where alloca can be used to allocate dynamically. All dynamic allocation will be compiled to heap allocations. We have a special-purpose register dedicated to holding the frame pointer, but not a special-purpose register dedicated to holding the stack pointer.

Note that:

  • Functions arguments are stored at fp+16, fp+20, ...

  • Return FP offset (the value to add to FP to get to the value of FP before the call) is stored at fp+12

  • Return value is stored at fp+8

  • Return address is stored at fp+4

  • Local variables are stored at fp, fp-4, fp-8, ...

Pseudo instructions

Pseudo Instruction
Instruction

call label

jal 0(fp), -b, label; addifp b, where b is size of the current stack frame plus the call frame size for instantiate a call to lable

ret

jalv 4(fp), 0(fp), 12(fp), set pc to [fp] where the return address is stored

Implementing MEMCPY/SET/MOVE

Heap allocations

Example programs

Multiply stored immediates and return

define i32 @main() {
  %1 = alloca i32, align 4
  %2 = alloca i32, align 4
  %3 = alloca i32, align 4
  store i32 24, i32* %1, align 4
  store i32 7, i32* %2, align 4
  %4 = load i32, i32* %1, align 4
  %5 = load i32, i32* %2, align 4
  %6 = mul nsw i32 %4, %5
  store i32 %6, i32* %3, align 4
  %7 = load i32, i32* %1, align 4
  ret i32 %7
}
main:
  sub      -4(fp), 0(fp), 0(fp)      # Setup the 0 local variable at fp - 4
  add     -8(fp), -4(fp), 24,    # Set [fp - 8] to 24
  add    -12(fp), -4(fp), 7,     # Set [fp - 12] to 7

  mul     -16(fp), 8(fp), 12(fp)    # Set [fp - 16] to 24 * 7 

  add    4(fp), -8(fp), 0     # Set return value at [fp + 4] to [fp - 8]
  ret

Multiply arguments and return

define i32 @main() {
  %1 = alloca i32, align 4
  store i32 0, i32* %1, align 4
  %2 = call i32 @mul(i32 938253, i32 7)
  ret i32 %2
}

define i32 @mul2(i32 %0, i32 %1) {
  %3 = alloca i32, align 4
  %4 = alloca i32, align 4
  store i32 %0, i32* %3, align 4
  store i32 %1, i32* %4, align 4
  %5 = load i32, i32* %3, align 4
  %6 = load i32, i32* %4, align 4
  %7 = mul nsw i32 %5, %6
  ret i32 %7
}
main: 
    imm32     -4(fp), 938253
    imm32     -8(fp), 7
    call         mul2
    # call translates to
    # jal 0(fp), -16, mul2  " store pc + 1 to [fp], add -16 to fp, set pc to mul2
    # addifp         16

    ret

mul2:
    mul    4(fp), 8(fp), 12(fp)
    ret

The stack at the time of executing mul inside mul (line 11) looks like:

Stack

.. (<- fp of main)

7

938253

0 (before mul2) -> 6567771 (after mul2)

main:7 (<- fp of mul2)

Trace

Main CPU

Columns
Configuration
Description

Core

Clock cycle

Core

Program counter

Core

Frame pointer

Core

Instruction opcode

Core

Core

Core

Core

Core

Core

Core

Core

Columns opcode,opa,opb,opc,opd,ope\text{opcode}, \text{op}_a, \text{op}_b, \text{op}_c, \text{op}_d, \text{op}_eopcode,opa​,opb​,opc​,opd​,ope​ are specified by the program code (see the "Instruction Trace" section below).

Trace cells are also allocated to hold buffered read memory values for addra\text{addr}_aaddra​ and addrb\text{addr}_baddrb​, and buffered write values for addrc\text{addr}_caddrc​. We read and write 4 elements from memory at a time to the main trace. These elements are only constrained when the immediate value flags are not set (see the "Instruction Decoding" section below):

Cell
Configuration
Description

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Memory

Cell
Description

Address

Clock cycle

Value 0

Value 1

Value 2

Value 3

The memory table is sorted by (addr,clk\text{addr}, \text{clk}addr,clk)

U32 Arithmetic

TODO: Replace this trace table and associated constraints with more efficient nondeterministic methods

Cell
Description

Clock cycle

Requested instruction (constrained by communication bus)

Selector flag for addition

Selector flag for subtraction

Selector flag for multiplication

Selector flag for division

Address of input 1

Address of input 2

Address of output

Input 1 (12-bit limb)

Input 1 (12-bit limb)

Input 1 (12-bit limb)

Input 2 (12-bit limb)

Input 2 (12-bit limb)

Input 2 (12-bit limb)

Output (12-bit limb)

Output (12-bit limb)

Output (12-bit limb)

Output (12-bit limb)

Output (12-bit limb)

There are also 5 helper value cells: h0h_0h0​ through h4h_{4}h4​.

Instruction Trace

Each instruction is encoded as 6 field elements

Core

Mnemonic
Operands(asm)
Encoded

LW / LOAD32

a(fp), c(fp)

SW / STORE32

b(fp), c(fp)

JAL

a(fp), b, c

JALV

a(fp), b(fp), c

BEQ

a, b(fp), c(fp)

BEQI

a, b(fp), c

BNE

a, b(fp), c(fp)

BNEI

a, b(fp), c

IMM32

a, b, c, d, e

READ_ADVICE

a(fp)

OP_{READ_ADVICE}, a, 0, 0, 0, 0

STOP

No operands

OP_STOP, 0, 0, 0, 0, 0

FAIL

No operands

OP_FAIL, 0, 0, 0, 0, 0

LOADFP

a(fp), b

OP_LOADFP, a, b, 0, 0, 0

Field arithmetic

Mnemonic
Operands(asm)
Encoded

FEADD

a(fp), b(fp), c(fp)

FEADDI

a(fp), b(fp), c

FEMUL

a(fp), b(fp), c(fp)

FEMULI

a(fp), b(fp), c

OP_FEMULI, a, b, c, 0, 1

TO_FE

a(fp), b(fp)

OP_{TO_FE}, a, b, 0, 0, 0

FROM_FE

a(fp), b(fp)

OP_{FROM_FE}, a, b, 0, 0, 0

U32 Arithmetic

Mnemonic
Operands(asm)
Encoded

ADD

a(fp), b(fp), c(fp)

ADDI

a(fp), b(fp), c

SUB

a(fp), b(fp), c(fp)

SUBI

a(fp), b(fp), c

MUL

a(fp), b(fp), c(fp)

MULI

a(fp), b(fp), c

DIV

a(fp), b(fp), c(fp)

SH{L,R}

a(fp), b(fp), c(fp)

SH{L,R}I

a(fp), b(fp), c

ISH{L,R}

a(fp), b, c(fp)

SLT

a(fp), b(fp), c(fp)

SLT

a(fp), b(fp), c

Bitwise

Mnemonic
Operands(asm)
Encoded

AND

a(fp), b(fp), c(fp)

OR

a(fp), b(fp), c(fp)

XOR

a(fp), b(fp), c(fp)

Instruction decoding

Trace cells are also allocated for each selector. In each cycle, the opcode is decoded into selector flags, including but not limited to the following, which are grouped by type (not configuration) below for convenience. All flags are binary values, except for the instruction code.

Instruction code

Selector
Configuration
Description

Core

Instruction code

Operand modifiers

Selector
Configuration
Description

Core

Core

Field arithmetic

Selector
Configuration
Description

Field Arithmetic

Addition

Field Arithmetic

Multiplication

Memory

Selector
Configuration
Description

Core

Store 4 bytes at at a memory address

Core

Load 4 bytes from a memory address

Core

Store a byte at a memory address

Core

Load a byte from a memory address

Call

Selector
Configuration
Description

Core

Update the frame pointer

Jumps

Selector
Configuration
Description

Core

Jump equal flag

Additional Jumps

Jump not equal flag

Additional Jumps

Jump greater than or equal to flag

Additional Jumps

Jump less than flag

U32 arithmetic

Selector
Configuration
Description

U32 Arithmetic

U32 Arithmetic

U32 Arithmetic

U32 Arithmetic

U32 Arithmetic

U32 Arithmetic

Binary operations

Selector
Configuration
Description

Bitwise

Bitwise

Bitwise

Design notes

Frontend target

ZK stack

Field choice

We plan to use the 32-bit field defined by p = 2^31 - 1, which should give very good performance on GPUs or with most vector instruction sets.

Registers

Our VM has no general purpose registers, since memory is cheap.

Memory

We will use a conventional R/W memory.

Tables

The CPU can do up to three memory operations per cycle, to support binary operations involving two reads and one write.

If we used a single-trace model, we could support this by adding columns for 6 memory operations in each row of our trace: 3 for the chronological memory log and 3 for the (address, timestamp) ordered memory log.

Instead, we make the memory a separate table (i.e. a separate STARK which gets connected with a permutation argument). We also use multi-table support to implement other coprocessors that are wasteful to include in the main CPU, as their operations may not be used during most cycles (e.g. Keccak).

Continuations

TODO: Explain the permutation-based continuation implementation.

Lookups

Initially, we will support lookups only against prover-supplied tables. The main use case is range checks. To perform a 16-bit range check, for example, we would have the prover send a table containing [0 .. 2^16 - 1] in order. (If the trace was not already 2^16, we would pad it. If it was longer than 2^16, the prover would include some duplicates.) We would then use constraints to enforce that this table starts at 0, ends at 2^16 - 1, and increments by 0 or 1.

Preprocessed tables can also be useful, particularly for bitwise operations like xor. However, we will not support them initially because they require non-succinct preprocessing.

Floating point arithmetic

Fast floating point arithmetic doesn't seem important for our anticipated use cases, so we do not plan to support compiling source programs which use floating point primitives.

Follow the pointer stored at offset from the current frame pointer and write the next 4 byte values to those beginning at offset a. Operand is unused, but is constrained to in the trace. LOAD32 is used to load 4 bytes from the heap, and is aligned (i.e. the address at offset is assumed to be a multiple of ).

Write the 4 byte values beginning at the address stored at offset to those beginning at offset . Operand is unused, but is constrained to in the trace. STORE32 is used to write 4 bytes to the heap, and is aligned.

Jump to address and link: Store the to local stack variable at offset , then set to field element . Set to .

Jump to variable and link: Store the to local stack variable at offset , then set to the field element . Set to fp+.

If , then set the program counter to

If , then set the program counter to

If , then set the program counter to

If , then set the program counter to

Write the immediate values to the cell located at offset .

and are a flags denoting whether and are interpreted as an immediate or offset. Let if and otherwise. Let if and otherwise. The instruction compute , and write the result to offset

and are a flags denoting whether and are interpreted as an immediate or offset. Let if and otherwise. Let if and otherwise. The instruction compute , and write the result to offset

Convert an U32, represented by 4 field elements starting at offset b, to a field element stored to the first field element at offset . is assumed to be a multiple of .

Convert a field element to an U32 stored at offset a, which is assumed to be a multiple of .

Compute the unchecked addition of the U32 values at cell offsets and and write the sum to cell offset . Note that because a full 32-bit value does not fit within one field element, we assume that values have been decomposed into 4 8-byte elements. The summed output is stored at cell offset . The same limb decomposition is used for the other U32 operations listed below.

Compute the unchecked addition of the U32 variable at cell offsets and an immediate , and write the sum to cell offset .

Bitwise shift [b] left/right by [c] and write to offset .

Bitwise shift [b] left/right by c and write to offset .

Arithmetic shift b right by [c] and write to offset .

Arithmetic shift b right by c and write to offset

Set local variable to if and otherwise. Uses unsigned comparison.

Set local variable to if and otherwise. Uses unsigned comparison.

Set to bitwise-and

Set to bitwise-and c

Set to bitwise-or

Set to bitwise-or c

Set to bitwise-xor

Set to bitwise-xor c

Write byte to the output

Load a byte at the address specified by local variable at offset to local variable at offset .

Store a byte encoded at offset to the address encoded in offset

Store a byte encoded in the field element to the address encoded in offset

Memcpy will require roughly 2 cycles per word. We can follow .

Operand

Operand

Operand

Operand , flag for if is immediate or offset.

Operand , flag for if is immediate or offset.

(if is not set)

(if is not set)

Value written to

Value written to

Value written to

Value written to

Lower 16 bits of

Upper 16 bits of

Nondeterministic inverse of

Selector flag for checked addition (requires to be set as well)

Selector flag for checked multiplication (requires to be set as well)

, , _, , ,

, _, , , ,

, , , , ,

, , , , ,

, , , , ,

, , , , ,

, , , , ,

, , , , ,

, , , , ,

, , , , 0, 0

, , , , 0, 1

, , , , 0, 0

, , , , ,

, , , , ,

, , , , ,

, , , , ,

, , , , ,

, , , , ,

, , , , ,

, , , , ,

, , , , ,

, , , , ,

, , , , ,

, , , , ,

, , , , , _

, , , , , _

, , , , , _

Immediate value flag for operand ( denotes offset, denotes immediate value)

Immediate value flag for operand ( denotes offset, denotes immediate value)

We are writing a from LLVM IR to our ISA.

This is a STARK-based zkVM. We are using to implement the polynomial IOP and PCS.

ccc
bbb
[c][c][c]
ccc
444
ccc
bbb
aaa
[c][c][c]
pc+1pc+1pc+1
aaa
pcpcpc
bbb
fpfpfp
fp+cfp + cfp+c
pc+1pc+1pc+1
aaa
pcpcpc
[b]elem[b]_{elem}[b]elem​
fpfpfp
[c][c][c]
[b]=[c][b] = [c][b]=[c]
pc\text{pc}pc
aaa
[b]=c[b] = c[b]=c
pc\text{pc}pc
aaa
[b]≠[c][b] \neq [c][b]=[c]
pc\text{pc}pc
aaa
[b]≠c[b] \neq c[b]=c
pc\text{pc}pc
aaa
b,c,d,eb, c, d, eb,c,d,e
aaa
ddd
eee
aaa
bbb
A=aA = aA=a
d=1d = 1d=1
[a]elem[a]_{elem}[a]elem​
B=bB = bB=b
e=1e = 1e=1
[b]elem[b]_{elem}[b]elem​
A+BA + BA+B
ccc
ddd
eee
aaa
bbb
A=aA = aA=a
d=1d = 1d=1
[a]elem[a]_{elem}[a]elem​
B=bB = bB=b
e=1e = 1e=1
[b]elem[b]_{elem}[b]elem​
A⋅BA \cdot BA⋅B
ccc
aaa
aaa
444
[b]elem[b]_{elem}[b]elem​
444
bbb
ccc
aaa
aaa
bbb
ccc
aaa
aaa
aaa
aaa
aaa
aaa
111
[b]<[c][b] < [c][b]<[c]
000
aaa
111
[b]<c[b] < c[b]<c
000
[a][a][a]
[b][b][b]
[c][c][c]
[a][a][a]
[b][b][b]
[a][a][a]
[b][b][b]
[c][c][c]
[a][a][a]
[b][b][b]
[a][a][a]
[b][b][b]
[c][c][c]
[a][a][a]
[b][b][b]
[b][b][b]
bbb
aaa
ccc
bbb
ccc
bbb
clk\text{clk}clk
pc\text{pc}pc
fp\text{fp}fp
opcode\text{opcode}opcode
opa\text{op}_aopa​
aaa
opb\text{op}_bopb​
bbb
opc\text{op}_copc​
ccc
opd\text{op}_dopd​
ddd
opa\text{op}_aopa​
ope\text{op}_eope​
eee
opb\text{op}_bopb​
addra\text{addr}_aaddra​
fp+opa\text{fp} + \text{op}_afp+opa​
opd\text{op}_dopd​
addrb\text{addr}_baddrb​
fp+opb\text{fp} + \text{op}_bfp+opb​
ope\text{op}_eope​
addrc\text{addr}_caddrc​
fp+opc\text{fp} + \text{op}_cfp+opc​
va,0v_{a,0}va,0​
[addra]elem[\text{addr}_a]_\text{elem}[addra​]elem​
va,1v_{a,1}va,1​
[addra+1]elem[\text{addr}_a+1]_\text{elem}[addra​+1]elem​
va,2v_{a,2}va,2​
[addra+2]elem[\text{addr}_a+2]_\text{elem}[addra​+2]elem​
va,3v_{a,3}va,3​
[addra+3]elem[\text{addr}_a+3]_\text{elem}[addra​+3]elem​
vb,0v_{b,0}vb,0​
[addrb]elem[\text{addr}_b]_\text{elem}[addrb​]elem​
vb,1v_{b,1}vb,1​
[addrb+1]elem[\text{addr}_b+1]_\text{elem}[addrb​+1]elem​
vb,2v_{b,2}vb,2​
[addrb+2]elem[\text{addr}_b+2]_\text{elem}[addrb​+2]elem​
vb,3v_{b,3}vb,3​
[addrb+3]elem[\text{addr}_b+3]_\text{elem}[addrb​+3]elem​
vc,0v_{c,0}vc,0​
addrc\text{addr}_caddrc​
vc,1v_{c,1}vc,1​
addrc+1\text{addr}_c+1addrc​+1
vc,2v_{c,2}vc,2​
addrc+2\text{addr}_c+2addrc​+2
vc,3v_{c,3}vc,3​
addrc+3\text{addr}_c+3addrc​+3
addr\text{addr}addr
clk\text{clk}clk
val0\text{val}_0val0​
val1\text{val}_1val1​
val2\text{val}_2val2​
val3\text{val}_3val3​
d0d_0d0​
clk′−clk−1\text{clk}'-\text{clk}-1clk′−clk−1
d1d_1d1​
clk′−clk−1\text{clk}'-\text{clk}-1clk′−clk−1
ttt
addr′−addr\text{addr}'-\text{addr}addr′−addr
clk\text{clk}clk
sinstrs_\text{instr}sinstr​
sadds_\text{add}sadd​
ssubs_\text{sub}ssub​
smuls_\text{mul}smul​
sdivs_\text{div}sdiv​
sadds_\text{add}sadd​
smuls_\text{mul}smul​
aaa
bbb
ccc
a0a_0a0​
a1a_1a1​
a2a_2a2​
b0b_0b0​
b1b_1b1​
b2b_2b2​
c0c_0c0​
c1c_1c1​
c2c_2c2​
c3c_3c3​
c4c_4c4​
OPLW\text{OP}_{LW}OPLW​
aaa
ccc
000
000
OPSW\text{OP}_{SW}OPSW​
bbb
ccc
000
000
OPjump\text{OP}_{jump}OPjump​
aaa
bbb
ccc
111
111
OPjump\text{OP}_{jump}OPjump​
aaa
bbb
ccc
000
111
OPbranch\text{OP}_{branch}OPbranch​
aaa
bbb
ccc
000
000
OPbranch\text{OP}_{branch}OPbranch​
aaa
bbb
ccc
000
111
OPbranch\text{OP}_{branch}OPbranch​
aaa
bbb
ccc
111
000
OPbranch\text{OP}_{branch}OPbranch​
aaa
bbb
ccc
111
111
OPIMM\text{OP}_{IMM}OPIMM​
aaa
bbb
ccc
ddd
eee
OPFEADD\text{OP}_{FEADD}OPFEADD​
aaa
bbb
ccc
OPFEADD\text{OP}_{FEADD}OPFEADD​
aaa
bbb
ccc
OPFEMUL\text{OP}_{FEMUL}OPFEMUL​
aaa
bbb
ccc
OPADD\text{OP}_{ADD}OPADD​
aaa
bbb
ccc
000
000
OPADD\text{OP}_{ADD}OPADD​
aaa
bbb
c0c_0c0​
111
c1c_1c1​
OPSUB\text{OP}_{SUB}OPSUB​
aaa
bbb
ccc
000
000
OPSUB\text{OP}_{SUB}OPSUB​
aaa
bbb
c1c_1c1​
111
c2c_2c2​
OPMUL\text{OP}_{MUL}OPMUL​
aaa
bbb
ccc
000
000
OPMUL\text{OP}_{MUL}OPMUL​
aaa
bbb
c1c_1c1​
111
c2c_2c2​
OPMUL\text{OP}_{MUL}OPMUL​
aaa
bbb
ccc
000
000
OPSHIFT\text{OP}_{SHIFT}OPSHIFT​
aaa
bbb
ccc
000
000
OPSHIFT\text{OP}_{SHIFT}OPSHIFT​
aaa
bbb
ccc
000
111
OPSHIFT\text{OP}_{SHIFT}OPSHIFT​
aaa
bbb
ccc
000
111
OPSLT\text{OP}_{SLT}OPSLT​
aaa
bbb
ccc
000
000
OPSLT\text{OP}_{SLT}OPSLT​
aaa
bbb
ccc
000
111
OPAND\text{OP}_{AND}OPAND​
aaa
bbb
ccc
000
OPOR\text{OP}_{OR}OPOR​
aaa
bbb
ccc
000
OPXOR\text{OP}_{XOR}OPXOR​
aaa
bbb
ccc
000
sinstrs_\text{instr}sinstr​
simmas_{\text{imm}_a}simma​​
aaa
000
111
simmbs_{\text{imm}_b}simmb​​
bbb
000
111
sadds_\text{add}sadd​
smuls_\text{mul}smul​
sstore32s_\text{store32}sstore32​
sload32s_\text{load32}sload32​
sstore8s_\text{store8}sstore8​
sload8s_\text{load8}sload8​
scalls_\text{call}scall​
this memcpy implementation on RISC-V
compiler
Plonky3
Working ISA Spec