Skip to content

Performance

ProveKit is fast enough to ship in production for many workloads, but “fast” depends almost entirely on circuit shape. This page tells you how to measure your circuit honestly, what dimensions matter, and where the benchmark suite lives.

DimensionEffect
Constraint countRoughly linear in R1CS row count for the dominant proving phases (witness solve, commitment, sumcheck).
Hash choiceskyscraper is the fastest option for BN254 Merkle commitments. SHA-256, Keccak, Blake3, and Poseidon2 are slower in different ways, Keccak and SHA-256 have larger constraint footprints inside Noir circuits, while Skyscraper, Blake3, and Poseidon2 are cheaper in-circuit.
Witness layer countWitness builders execute in layers (see Proving flow). Deep layer graphs add coordination overhead.
CPU architectureaarch64 benefits from SIMD-accelerated BN254 arithmetic in skyscraper/core. x86_64 falls back to portable arithmetic.
ParallelismProving uses Rayon. More cores help up to the parallelism inherent in the circuit. WASM threading depends on SharedArrayBuffer.
Host memoryMobile FFI hosts can swap to disk via pk_configure_memory(...). File-backed mmap allocation is slower than RAM but unlocks larger circuits.

The CLI prints span timings and memory statistics through its tracing layer. The simplest measurement:

Terminal window
cargo run --release --bin provekit-cli -- prove --prover circuit.pkp

Inspect the structured timing output it prints. For finer-grained profiling, build with the Tracy feature:

Terminal window
cargo run --release --features tracy --bin provekit-cli -- --tracy prove

For programmatic benchmarking, across hash configurations, host variants, and circuit sizes, use the provekit-bench crate in tooling/provekit-bench/. It hooks into criterion and reports proving time, verification time, peak memory, and proof size.

Two CLI commands tell you what you’re about to prove:

Terminal window
# R1CS structure and ACIR statistics.
cargo run --release --bin provekit-cli -- circuit-stats target/<circuit>.json
# Size breakdown of the prover key, matrix sparsity, witness builders, hash config.
cargo run --release --bin provekit-cli -- analyze-pkp <circuit>.pkp

Use circuit-stats to confirm constraint counts match your expectations before committing to a host. A circuit that fits comfortably on a server may exceed practical proving time on mobile.

noir-examples/csp-benchmarks/ contains the Ethproofs CSP benchmarks, a standardized suite of client-side proving targets used to compare proof systems on common workloads.

TargetCircuit sizesImplementation note
SHA-256128, 256, 512, 1024, 2048 bytesUses noir-lang/sha256::sha256_var, lowering compression through Noir’s SHA-256 blackbox.
Keccak-256128, 256, 512, 1024, 2048 bytesNative Noir Keccak circuit with a witness-focused u32 lane representation.
Poseidon2, 4, 8, 12, 16 field elementsnoir-lang/poseidon BN254 native Noir helpers.
Poseidon22, 4, 8, 12, 16 field elementsTaceoLabs/noir-poseidon for states 2, 8, 12, 16; state 4 intentionally exercises Noir’s Poseidon2 blackbox.
ECDSAsecp256r1 over a 32-byte digestzkpassport/noir-ecdsa native P-256 verification (P-256 blackbox is not yet lowered by ProveKit).

To run any benchmark target:

Terminal window
cd noir-examples/csp-benchmarks/sha256_512
cargo run --release --bin provekit-cli -- prepare
cargo run --release --bin provekit-cli -- prove
cargo run --release --bin provekit-cli -- verify

Combine that with the CLI’s timing output (or the provekit-bench harness) to capture proving time, verification time, and memory for each target on your machine.

The fundamentals don’t change between hosts, but resource constraints do:

  • Native Rust on a workstation. The reference platform. Smallest measured proving time, largest available memory.
  • WASM in a browser. Slower than native, the proof system runs single-threaded unless SharedArrayBuffer is available, and JavaScript marshalling adds overhead at the boundaries.
  • WASM in Node.js. Closer to native than browser WASM, but still single-process unless you orchestrate workers externally.
  • iOS / Android via FFI. Bounded by device RAM unless you configure pk_configure_memory for file-backed mmap. Modern phones can prove non-trivial credential circuits on-device; budget memory carefully.
  • Verifier server. Verification dominates. Concurrency is configurable through VERIFIER_SEMAPHORE_LIMIT; the default of one keeps memory usage predictable.

The Groth16 wrapper through the Go/gnark recursive verifier adds a fixed per-proof cost, typically a one-time setup cost (proving key generation) and a recurring proving cost for the outer proof. The base proof’s verification is the inner workload; everything else is the wrapper. Plan for the Groth16 step to dominate end-to-end latency when on-chain settlement is the goal.

If proving is slower than you need:

  1. Run circuit-stats. Confirm constraint count matches expectations. Unexpected blowups usually indicate accidentally-quadratic constraint generation.
  2. Switch to skyscraper if you haven’t. It’s the default but worth confirming. Other hashes are slower in-circuit.
  3. Audit black-box vs native lowerings. Some Noir black boxes (SHA-256, Keccak) are heavier in ProveKit than their native R1CS implementations. The CSP benchmarks call this out explicitly.
  4. Profile with Tracy. Run cargo run --release --features tracy --bin provekit-cli -- --tracy prove and inspect span timings. Look for layers that dominate the witness solve.
  5. Split the circuit if possible. Two smaller proofs may verify faster than one giant proof, depending on transport overhead and what the verifier needs to check.