+ ~ - + - + - + - ~ ~ *
| commonware
* ~ + + - ~ - + - * - +

Merkle Mountain Ranges for Performant Data Authentication

February 13, 2025

Decentralized systems require the ability to prove authenticity of data received from potentially untrustworthy sources. The most well known data structure for this task is the Merkle tree, which allows one to efficiently prove inclusion of an element within a list without having to obtain the entire list from a trusted source.

A Merkle tree is a binary tree whose leaves represent the list elements by storing the hash of the element, and whose internal nodes store the hash of their children. The only trusted piece of data required to validate a Merkle proof is a single hash value (~32 bytes) corresponding to the root of the Merkle tree. And the Merkle proof is itself quite small, consisting of a single value from each node along the path from root to the leaf element (logarithmic in the size of the list N) as illustrated in Figure 1.

Example Merkle tree over a list of 7 elements
Figure 1: Example Merkle tree over a list of 7 elements

Because of these powerful properties, it's no surprise that the Merkle tree and its variants (such as Ethereum's Merkle Patricia Trie) are fundamental components of most blockchains. And making these data structures performant is among today’s primary challenges to scaling blockchains further.

One particularly interesting variant of the Merkle tree, now available in the Commonware Library, is the Merkle Mountain Range (MMR). A MMR differs from a Merkle tree in that it is append-only. While Merkle trees support element updates and insertions at arbitrary list positions, MMRs only allow new elements to be added to the end of the list.

The append-only restriction may appear burdensome, but it turns out updates can still be simulated through re-appending data (as in QMDB). What this restriction yields, however, is extreme performance. When adding an element to the MMR, (1) very little data needs to be read in from storage (typically cached), and (2) new data generated by the addition can be persisted to storage with one contiguous write. Contrast this to a standard Merkle tree, where adding or updating an element can require reading and updating a logarithmically sized amount of data scattered randomly across storage.

A perfect binary tree is a binary tree that is both balanced and has a full set of leaves. A MMR is a list of perfect binary trees, called mountains, each of strictly decreasing height. The leaves of the MMR store the (positioned) hash of each element in the order of their insertion. We won't cover all the details here, but note that this structure is easy to maintain: if you add a new element to the end of the list, you need only generate at most a log2(N) number of new internal nodes to re-impose the required properties without modifying any existing nodes, as depicted in Figure 2 below.

Appending new elements to an MMR
Figure 2: Appending new elements to an MMR

The root nodes of each mountain are called the peaks of the MMR. The root hash of the MMR is computed by hashing the total number of nodes in the MMR together with the set of all peak hashes, a process called bagging the peaks. The amount of trusted data for MMR data authentication thus remains a single hash. A MMR inclusion proof consists of hashes derived from the path connecting its leaf element to the peak of its mountain, along with the hashes of the peaks of all other mountains. Both sets of hashes are bounded by log2(N) in number.

Because elements are all written sequentially, there is no need for intensive free space recovery techniques such as garbage collection or compaction. As soon as we no longer require proving the inclusion of any element older than a certain point in time, we can drop nearly all of those elements (and most of their ancestor nodes) from the MMR without losing the ability to generate new roots.

Wrapping the MMR root with a consensus certificate
Figure 3: Wrapping the MMR root with a consensus certificate

MMRs and threshold consensus signatures make for a powerful combination. Consider, for example, using an MMR to store successfully executed blockchain transactions in the order of their execution. If the MMR root is included in the block digest, then a user (or external blockchain) running a lite client can prove successful execution of any historical transaction from a consensus certificate and the latest MMR root. Have a proof from an old MMR root? The latest MMR root (effectively) includes references to all previous roots and can be combined with some recent intermediate hashes to verify it.

Generating a lot of data but need a proof of inclusion (and order)? Reach for the MMR.