Merkle Trees
A Merkle tree is a binary hash tree where every leaf contains the hash of a data block, every intermediate node contains the hash of its two children, and the single root hash at the top serves as a fingerprint for the entire dataset. Invented in 1979 by Ralph Merkle for efficient digital signature authentication, this data structure is now foundational to virtually every blockchain — Bitcoin uses it for transaction verification, Ethereum extends it for global state management, and Smart Contracts use it for gas-efficient airdrop distribution and allowlist verification.
TL;DR
- A Merkle tree hashes data recursively into a single root that represents the entire dataset — change one byte anywhere and the root changes.
- Merkle proofs verify data membership in O(log n) steps instead of O(n) — verifying 1 transaction in a block of 4,096 requires only 12 hashes, not 4,096.
- Bitcoin block headers contain a Merkle root of all transactions, enabling SPV light clients to verify transactions using only ~464 bytes instead of a full ~1.7 MB block.
- Ethereum uses a Modified Merkle Patricia Trie for state, transactions, and receipts — three roots embedded in every block header.
- The Merkle Distributor pattern (popularized by the Uniswap UNI airdrop) stores a single 32-byte root on-chain instead of 250,000+ individual addresses, reducing deployment gas by orders of magnitude.
- Merkle proofs are not bulletproof — second preimage attacks, duplicate leaf bugs, and odd-leaf-count edge cases have caused real vulnerabilities (CVE-2012-2459 in Bitcoin).
- Verkle trees (vector commitment + Merkle) are Ethereum's planned replacement, shrinking proof sizes by 20-30x vs the current hexary Patricia trie.
What Is a Merkle Tree?
At its core, a Merkle tree converts a list of data into a single hash that commits to every element. Construction is bottom-up:
- Hash each data element individually to produce leaf nodes.
- Pair adjacent leaves and hash each pair to produce the next level.
- Repeat until one root hash remains.
flowchart BT
Root["Merkle Root<br/>H( H(AB) + H(CD) )"]
subgraph Intermediate
I1["H( H(A) + H(B) )"]
I2["H( H(C) + H(D) )"]
end
subgraph Leaves
L1["H(A)"]
L2["H(B)"]
L3["H(C)"]
L4["H(D)"]
end
L1 --> I1
L2 --> I1
L3 --> I2
L4 --> I2
I1 --> Root
I2 --> Root
Figure 1: A four-leaf Merkle tree. Each leaf is the hash of raw data. Each parent is the hash of its two children. The root commits to the entire dataset.
Change a single byte in data block C, and H(C) changes, which changes H(CD), which changes the root. The corruption propagates upward deterministically.
Hash Functions by Chain
| CHAIN | HASH FUNCTION | CONSTRUCTION | NOTES |
|---|---|---|---|
| Bitcoin | Double SHA-256 | Merkle-Damgard | Double-hashing defends against length-extension attacks |
| Ethereum | Keccak-256 | Sponge | Not vulnerable to length-extension; single hash sufficient |
| Hedera | SHA-384 | Merkle-Damgard | Truncated SHA-512; faster than SHA-256 on 64-bit hardware, 192-bit collision resistance |
| Solana | SHA-256 | Merkle-Damgard | Used in transaction and account state verification |
Merkle Proofs
A Merkle proof answers one question: "Is this data in the tree?" Without one, you would need the entire dataset to verify membership. With one, you need only the sibling hashes along the path from the target leaf to the root.
How Verification Works
To prove leaf B exists in the tree from Figure 1:
- Start with
H(B)(the leaf to verify). - Provide sibling
H(A)— hash them together to getH(AB). - Provide sibling
H(CD)— hash withH(AB)to get the root. - Compare computed root against the known root. Match = valid.
The proof for B is just two hashes: [H(A), H(CD)]. The verifier never sees data blocks A, C, or D.
Proof Size Comparison
| TREE SIZE (LEAVES) | PROOF DEPTH | PROOF SIZE | FULL DATASET SIZE | COMPRESSION |
|---|---|---|---|---|
| 128 | 7 | 224 bytes | 4 KB | ~18x |
| 1,024 | 10 | 320 bytes | 32 KB | ~100x |
| 10,000 | ~14 | 448 bytes | 312 KB | ~700x |
| 1,000,000 | ~20 | 640 bytes | 32 MB | ~50,000x |
Each proof step is one 32-byte hash. Total proof size = log2(n) × 32 bytes. A tree with a million leaves requires only 640 bytes to prove membership — compared to scanning 32 MB of hashes without the tree.
Blockchain Applications
Bitcoin: SPV and Light Clients
Every Bitcoin block header (80 bytes) contains a Merkle root of all transactions in that block. Satoshi described Simplified Payment Verification (SPV) in Section 8 of the Bitcoin whitepaper: a light client downloads only block headers and requests Merkle proofs from full nodes to verify specific transactions.
| METRIC | FULL BLOCK | SPV PROOF |
|---|---|---|
| Data per block | ~1.7 MB average | 80 bytes (header) + ~384 bytes (proof) |
| All headers since genesis | ~1.4 TB (full chain) | ~70 MB (headers only) |
| Verify 1 tx in 4,096 tx block | Process all 4,096 hashes | 12 sibling hashes |
This is what makes mobile Bitcoin wallets possible. They do not download the full blockchain — they trust block headers (validated by Proof of Work) and use Merkle proofs to confirm their transactions exist.
Ethereum: Three Tries Per Block
Ethereum extends the concept with a Modified Merkle Patricia Trie (MPT) — a hexary (16-way branching) trie that combines Merkle hashing with path-based key lookup. Every block header contains three MPT roots:
- stateRoot: Maps every address to its account state (nonce, balance, storage root, code hash). Covers 130+ million addresses.
- transactionsRoot: Maps transaction index to transaction data for that block. Immutable after mining.
- receiptsRoot: Maps transaction index to its receipt (gas used, logs, status).
Note
Ethereum's Patricia Trie is not a binary tree. Branch nodes have 16 children (one per hex nibble) plus a value slot, and extension nodes shortcut through sparse regions. This makes the structure wider but shallower than a standard Merkle tree.
Airdrop Claims: The Merkle Distributor
The Merkle Distributor pattern stores a single bytes32 merkleRoot on-chain — one storage slot — instead of writing every eligible address to contract storage. Recipients submit proofs off-chain to claim tokens.
The canonical implementation is Uniswap's MerkleDistributor.sol, deployed for the September 2020 UNI airdrop (0x090D4613...0821D256e (etherscan)).
How it works:
- Off-chain: Build a Merkle tree where each leaf =
keccak256(abi.encodePacked(index, account, amount)). - Deploy contract with just the Merkle root. Publish the full tree to IPFS.
- User calls
claim(index, account, amount, proof). - Contract verifies the proof against the stored root and transfers tokens.
- A bitmap (
mapping(uint256 => uint256)) tracks claimed status — eachuint256stores 256 claim flags as individual bits.
Gas comparison — 250,000 eligible addresses:
| APPROACH | SETUP COST | PER-CLAIM COST |
|---|---|---|
| Store all addresses on-chain | ~5.5 billion gas (250K × 22,100 gas SSTORE) | ~5,000 gas (mapping lookup) |
| Merkle root only | ~20,000 gas (one SSTORE) | ~45,000-55,000 gas (proof verification) |
The trade-off is clear: massively cheaper deployment in exchange for slightly higher per-claim cost. For large distributions, it is not close.
Rollup State Verification
Both Optimistic and ZK Bridge rollups post state roots (Merkle roots) to L1 as commitments:
- ZK Rollups: Sequencer computes new state root, generates a validity proof (ZK-SNARK/STARK), and posts both to L1. The L1 verifier checks the proof and accepts the new root. Withdrawals include a Merkle proof of the burn transaction's inclusion.
- Optimistic Rollups: Sequencer posts a new state root. It is assumed valid unless challenged during a ~7-day dispute window, where fraud proofs re-execute disputed transactions and compare state roots.
Rollup state anchoring is covered in more detail in Anchoring.
How It Works in Solidity
On-chain Merkle verification is straightforward. OpenZeppelin's MerkleProof.sol is the standard library:
The function walks the proof array, hashing the current value with each sibling hash using commutative hashing (sorts the pair before hashing so proof generators do not need to track left/right positioning). After processing all proof elements, it compares the result to the root.
Gas Costs
Keccak-256 in the EVM: 30 gas base + 6 gas per 32-byte word of input. Each proof step hashes 64 bytes (two 32-byte values) = 42 gas for the hash itself, plus memory and calldata overhead.
| TREE SIZE (LEAVES) | PROOF DEPTH | ESTIMATED VERIFICATION GAS |
|---|---|---|
| 128 | 7 | ~30,000 gas |
| 1,024 | 10 | ~35,000 gas |
| 10,000 | ~14 | ~40,000 gas |
| 250,000 | ~18 | ~50,000 gas |
For comparison: ECDSA signature verification costs ~29,000 gas, and a single SSTORE to a new slot costs 22,100 gas.
Risks & What Can Go Wrong
Second Preimage Attacks
Internal nodes are computed as H(left || right) — two concatenated 32-byte hashes (64 bytes total). An attacker can submit those 64 bytes as a fake "leaf" with a shortened proof, and verification passes because the contract cannot distinguish a leaf hash from an internal node hash.
OpenZeppelin's JavaScript library mitigates this by double-hashing leaves: keccak256(keccak256(leaf)). But the Solidity MerkleProof.verify function does not enforce this — developers must apply double-hashing themselves. OpenZeppelin GitHub issue #3091 documents the vulnerability.
Odd Number of Leaves (CVE-2012-2459)
When the leaf count is not a power of 2, the standard approach is to duplicate the last leaf. Bitcoin does this, and it introduced a real vulnerability: transaction lists [1,2,3,4,5,6] and [1,2,3,4,5,6,5,6] produce the same Merkle root. An attacker could create a mutated block with duplicate transactions that shares the Merkle root of a valid block, causing nodes to cache the invalid version and reject the real one.
A variation was reintroduced in Bitcoin Core 0.13.0 (2016), fixed in 0.14.0 (2017), and another variant discovered and fixed in 2023.
Airdrop Front-Running
Merkle proofs are public data — they appear in transaction calldata once broadcast. However, the Merkle Distributor pattern binds each leaf to a specific address: keccak256(index, account, amount). Even if a MEV Searcher copies the proof, the contract verifies the account matches the recipient. The bot cannot redirect tokens to themselves.
The real risk is poorly implemented contracts that do not bind claims to msg.sender.
Verkle Trees: The Planned Replacement
Ethereum's current hexary Patricia Trie requires large proofs — a single state proof can exceed 1 KB. Verkle trees (coined by John Kuszmaul, 2018) replace sibling hashes with polynomial commitments (IPA or KZG), eliminating the need to provide neighbor data entirely.
| PROPERTY | MERKLE PATRICIA TRIE | VERKLE TREE |
|---|---|---|
| Proof size (1B data points) | ~1 KB+ | ~150 bytes |
| Branching factor | 16 (hexary) | 256+ |
| Requires sibling hashes | ☑ | ☒ |
| Quantum resistant | ☑ (hash-based) | ☒ (elliptic curve) |
| Status | Production (since 2015) | Testnet (devnets running) |
The quantum vulnerability concern has prompted parallel research into binary Merkle trees with more efficient hash functions (EIP-7864) as a potential alternative path.
References
- Secrecy, Authentication, and Public Key Systems - Ralph Merkle, PhD Thesis, Stanford University (1979)
- Merkle Patricia Trie - Ethereum.org
- Merkle Proofs for Offline Data Integrity - Ethereum.org Tutorial
- Learn Me a Bitcoin: Merkle Root - Visual explanation of Bitcoin Merkle trees
- Merkle Tree Vulnerabilities - Bitcoin Optech
- OpenZeppelin MerkleProof.sol - Standard Solidity implementation
- Uniswap MerkleDistributor.sol - Canonical airdrop distributor
- The Second Preimage Attack for Merkle Trees in Solidity - RareSkills
- Verkle Trees - Vitalik Buterin (2021)
- Verkle Trees Roadmap - Ethereum.org
Changelog
| DATE | AUTHOR | NOTES |
|---|---|---|
| 2026-03-18 | Artificial. | Generated by robots. Gas: 75 tok |
| 2026-03-19 | Denizen. | Reviewed, edited, and curated by humans. |