The Proof is in the Pairing

March 24th, 2026

Blockchains are especially well-suited for two use cases: payments and trading. We’ve made real progress in scaling these systems with many chains supporting 10K–100K TPS. But what happens when we introduce privacy?

Recall that there are really only two tools at our disposal:

zero-knowledge proofs for private payments, smart contracts, and zk-identity
advanced encryption schemes for privacy of pending bids/trades

The chain (validators) needs to verify every proof and decrypt every ciphertext to process the block. Very loosely speaking, this translates to a “few” public key operations (exponentiations/pairings) per transaction when using pairing-based techniques. Note that this is (asymptotically) optimal, so doing much better is difficult. So if we budget \approx 1 ms privacy overhead per transaction (about the time taken for a pairing on BLS12-381), processing 100K transactions requires 100s in single-core CPU time. Assuming perfect parallelization, we can of course reach 100K TPS with 100 cores—or a GPU—but this is deeply unsatisfying both cryptographically and because it places specialized hardware requirements on validators.

Verifying zkSNARKs at Scale

Note that if we just want to quickly verify proofs on chain, there are known techniques that utilize an untrusted third party to reduce the cost of verification on chain (MBK+19, BMM+19, GMN21). In fact, GMN21 shows that verification time can be sub-linear in the number of proofs.

But this strategy comes with two limitations:

introduces an extra “hop” in each slot from proposer \rightarrow aggregator \rightarrow validator, which in turn increases latency
relies on availability of the aggregator to maintain the system’s throughput

Our goal: fast verification of SNARKs on chain with zero latency overhead and no additional assumptions.

The zero-latency requirement immediately disallows any strategy that delegates verification to a third party. This also means that verification cannot be sub-linear time, as validators need to, at the very least, read all proofs in the block. Giving up sub-linear verification might seem like we’re going in the opposite direction of scaling, but in practice validators typically still perform a non-trivial amount of work for each transaction. In private payments, for instance, they update the state with newly minted UTXOs and check nullifiers to prevent double spending. Thus, if we can drive amortized proof verification costs down to roughly the cost of transaction processing/dissemination, we are in very good shape.

Our Solution

While it’s true that we can’t verify individual proofs any faster, verifying a batch of proofs can be much faster. Similar ideas can be traced back to the early days of pairing-based signatures and Groth-Sahai proofs (see CHP07, FGHP09, and BFI+10). In fact, the ZCash team also discussed batch verification of Groth16 proofs in the context of private payments back in 2017.

Our starting point is the Pari proof system (DMS24), which was later improved in Glock (Eagen25). We recall the verification algorithm:

Parse the verification key \textsf{ivk} = ((A,B), [\alpha]_1, [\beta]_1, [\delta_1]_2, [\delta_2]_2, [\tau]_2, [\delta_1 \tau]_2), where A and B form the square R1CS relation.
Parse the proof \pi = (T, U, v_a, v_b) \in \mathbb{G}_1^2 \times \mathbb{F}^2.
Compute the challenge r := H(\textsf{transcript}).
Compute x_A := A \cdot (x \| 0) and x_B := B \cdot (x \| 0), and interpolate over domain K to obtain polynomials \hat{x}_A and \hat{x}_B.
Compute the quotient evaluation, where z_K is the vanishing polynomial on the domain K:

v_q := \frac{(v_a + \hat{x}_A(r))^2 - (v_b + \hat{x}_B(r))}{z_K(r)}
Check the pairing equation:

e(T,\; [\delta_2]_2) \stackrel{?}{=} e(U,\; [\delta_1 \tau]_2 - r \cdot [\delta_1]_2) \cdot e(v_a \cdot [\alpha]_1 + v_b \cdot [\beta]_1 + v_q \cdot [1]_1,\; [1]_2)

Rearranging the last equation, we have:

e(\colorbox{lightgrey}{$T$},\; [\delta_2]_2) \stackrel{?}{=} e(\colorbox{lightgrey}{$U$},\; [\delta_1 \tau]_2) \cdot e(\colorbox{lightgrey}{$-r \cdot U$},\; [\delta_1]_2) \cdot e(\colorbox{lightgrey}{$v_a$} \cdot [\alpha]_1 + \colorbox{lightgrey}{$v_b$} \cdot [\beta]_1 + \colorbox{lightgrey}{$v_q$} \cdot [1]_1,\; [1]_2)

where only the \colorbox{lightgrey}{\text{highlighted}} terms change across different proofs (under the same verification key). Thus, we can batch verify multiple proofs (see FGHP09) by taking a random linear combination using three \mathbb{G}_1 MSMs for the T, U, r\cdot U terms, field multiplications for the v_a, v_b, v_q terms, and finally checking a single multi-pairing. Of course, we still need to carry out steps 1-5 for each proof, but these are very fast hashing and field operations. This strategy applies more broadly to KZG opening proofs and KZG-based SNARKs such as Plonk. For example, gnark uses it for more efficient recursion of Plonk proofs.

A quick implementation (with room for further optimization) of this idea can be found here, and it shows a 60\times speedup when verifying 2^{16} proofs relative to naive individual verification. Both experiments were run in single-threaded mode. Concretely, this amounts to \approx 10\mu\text{s} per proof on an M5 MacBook Pro, down from 0.6 ms per proof. And the more proofs you verify, the faster it gets!

N	Individual	Batch	Speedup
2	1.361 ms	0.882 ms	1.54x
32	19.123 ms	2.007 ms	9.53x
512	303.118 ms	9.934 ms	30.51x
8192	4834.206 ms	100.030 ms	48.33x
65536	38307.696 ms	644.543 ms	59.43x

Even at just 32 proofs, we see an order-of-magnitude speedup in verification time. If we can bring down gas costs proportionally, verifying a SNARK can be cheaper than an ERC-20 transfer. A parallel implementation using 8 threads is able to verify over 500K proofs in <750 ms.

N	Individual	Parallel Batch	Speedup
16	8.898 ms	1.635 ms	5.44x
256	154.882 ms	3.124 ms	49.58x
4096	2456.701 ms	14.391 ms	170.71x
65536	38665.625 ms	120.024 ms	322.15x
524288	311908.648 ms	735.246 ms	424.22x

Any project currently using Groth16 proofs can use Pari/Glock as a drop-in replacement. In fact, you don’t even have to rewrite circuits as the Pari proof system can be extended to support standard R1CS circuits (see Remark 2.3 in DMS24) by increasing the proof size to 2 \mathbb{G}_1 elements and 3 \mathbb{F} elements.

Stay tuned for our upcoming posts on scaling timelock encryption and batched threshold encryption for encrypted mempools!