Performance

Proving time depends almost entirely on circuit shape. This page tells you how to measure your circuit honestly, what dimensions matter, and where the benchmark suite lives.

What drives performance

Dimension	Effect
Witness count	The dominant scaling factor for proving time. The witness vector is what gets committed via Merkle tree, and the prover’s main work (witness solve, commitment, sumcheck) is over the witness space. R1CS constraint count grows alongside witness count for most circuits but doesn’t drive proving time directly.
Merkle commitment hash (`--hash`)	Sets the hash used for WHIR’s Merkle commitments. `sha256` and `blake3` are the fastest in proving thanks to hardware acceleration. `skyscraper` is the default because it’s BN254-friendly: slower to prove natively, but dramatically cheaper inside the Groth16 recursive wrap. `keccak` and `poseidon2` are also available for specific interop scenarios.
Witness layer count	Witness builders execute in layers (see Proving flow). Deep layer graphs add coordination overhead.
CPU architecture	`aarch64` benefits from SIMD-accelerated BN254 arithmetic in `skyscraper/core`. x86_64 falls back to portable arithmetic.
Parallelism	Proving uses Rayon. More cores help up to the parallelism inherent in the circuit. WASM threading depends on `SharedArrayBuffer`.
Host memory	Mobile FFI hosts can swap to disk via `pk_configure_memory(...)`. File-backed mmap allocation is slower than RAM but unlocks larger circuits.

Measuring your circuit

The CLI prints span timings and memory statistics through its tracing layer. The simplest measurement:

cargo run --release --bin provekit-cli -- prove --prover circuit.pkp

Inspect the structured timing output it prints. For finer-grained profiling, build with the Tracy feature:

cargo run --release --features tracy --bin provekit-cli -- --tracy prove

For repeatable timing comparisons, the provekit-bench crate in tooling/provekit-bench/ ships a divan bench harness over the poseidon-rounds example, with separate benches for prover-key read, prove, prove-with-IO, and verify. Use it as a template for benchmarking your own circuits by pointing the benches at additional .pkp / .pkv / proof artifacts.

Inspecting the circuit before proving

Two CLI commands tell you what you’re about to prove:

# R1CS structure and ACIR statistics.
cargo run --release --bin provekit-cli -- circuit-stats target/<circuit>.json

# Postcard-encoded byte sizes per prover-key component, plus an R1CS sub-breakdown
# (Interner, Matrix A/B/C) and the bytes saved by column-delta encoding.
cargo run --release --bin provekit-cli -- analyze-pkp <circuit>.pkp

Use circuit-stats to confirm witness and constraint counts match your expectations before committing to a host. A circuit that fits comfortably on a server may exceed practical proving time on mobile.

The ProveKit benchmark suite

noir-examples/csp-benchmarks/ contains the Ethproofs CSP benchmarks, a standardized suite of client-side proving targets used to compare proof systems on common workloads.

Target	Circuit sizes	Implementation note
SHA-256	128, 256, 512, 1024, 2048 bytes	Uses `noir-lang/sha256::sha256_var`, lowering compression through Noir’s SHA-256 blackbox.
Keccak-256	128, 256, 512, 1024, 2048 bytes	Native Noir Keccak circuit with a witness-focused u32 lane representation.
Poseidon	2, 4, 8, 12, 16 field elements	`noir-lang/poseidon` BN254 native Noir helpers.
Poseidon2	2, 4, 8, 12, 16 field elements	`TaceoLabs/noir-poseidon` for states 2, 8, 12, 16; state 4 intentionally exercises Noir’s Poseidon2 blackbox.
ECDSA	secp256r1 over a 32-byte digest	`zkpassport/noir-ecdsa` native P-256 verification (P-256 blackbox is not yet lowered by ProveKit).

To run any benchmark target:

cd noir-examples/csp-benchmarks/sha256_512
cargo run --release --bin provekit-cli -- prepare
cargo run --release --bin provekit-cli -- prove
cargo run --release --bin provekit-cli -- verify

Combine that with the CLI’s timing output (or the provekit-bench harness) to capture proving time, verification time, and memory for each target on your machine.

What to expect across hosts

The fundamentals don’t change between hosts, but resource constraints do:

Native Rust on a workstation. The reference platform. Smallest measured proving time, largest available memory.
WASM in a browser. Slower than native, the proof system runs single-threaded unless SharedArrayBuffer is available, and JavaScript marshalling adds overhead at the boundaries.
WASM in Node.js. Closer to native than browser WASM, but still single-process unless you orchestrate workers externally.
iOS / Android via FFI. Bounded by device RAM unless you configure pk_configure_memory for file-backed mmap. Modern phones can prove non-trivial credential circuits on-device; budget memory carefully.
Verifier server. Verification dominates. Concurrency is configurable through VERIFIER_SEMAPHORE_LIMIT; the default of one keeps memory usage predictable.

Recursive verification cost

The Go/gnark recursive verifier wraps a WHIR proof inside a Groth16 proof for on-chain settlement. The wrapper has two costs:

One-time setup: trusted-setup ceremony for the outer Groth16 circuit, producing the recursive proving and verifying keys. Run once per recursive-verifier R1CS shape.
Per-proof wrap: a Groth16 proving run over the WHIR verifier R1CS. Typically the largest single step in an on-chain workflow, dwarfing the base proving time for small circuits.

Measure both costs separately when benchmarking on-chain end-to-end latency.

Optimization checklist

If proving is slower than you need:

Run circuit-stats. Confirm witness and constraint counts match expectations. Unexpected blowups in witness count are the strongest signal of accidentally-quadratic constraint generation.
Pick --hash for your settlement path. skyscraper (default) is optimal when you’re wrapping the proof with Groth16 for on-chain verification. If you only verify off-chain, sha256 or blake3 will prove faster thanks to hardware acceleration.
Audit black-box vs native lowerings. Some Noir black boxes (SHA-256, Keccak) are heavier in ProveKit than their native R1CS implementations. The CSP benchmarks call this out explicitly.
Profile with Tracy. Run cargo run --release --features tracy --bin provekit-cli -- --tracy prove and inspect span timings. Look for layers that dominate the witness solve.

CLI reference, flags for circuit-stats, analyze-pkp, and Tracy profiling.
Examples catalog, circuits you can benchmark against.
Proving flow, the conceptual pipeline being measured.