CRDTs
A Conflict-free Replicated Data Type (CRDT) is a data structure that can be replicated across multiple nodes in a distributed system, updated independently and concurrently without coordination, and merged deterministically into a consistent state — with a mathematical guarantee of no conflicts. No consensus protocol, no leader election, no locking. Every replica converges to the same result, eventually, regardless of the order updates arrive.
TL;DR
- CRDTs allow multiple nodes to update shared state independently and merge without conflicts — guaranteed by math, not by consensus.
- Two families: state-based (CvRDTs) ship full state and merge via join-semilattice; operation-based (CmRDTs) ship individual operations that must be commutative.
- Common types include counters (G-Counter, PN-Counter), sets (G-Set, OR-Set), and registers (LWW-Register) — each with different trade-offs on what operations they support and what metadata they carry.
- CRDTs trade coordination for metadata overhead. A simple counter tracking 1,000 nodes requires 1,000 entries. An OR-Set carries a unique tag for every insertion ever made until garbage collected.
- Blockchain applications include decentralized databases (OrbitDB on IPFS), collaborative editing on peer-to-peer networks, and off-chain state management where nodes need local-first writes without waiting for on-chain finality.
- CRDTs guarantee convergence, not correctness. All replicas will agree — but what they agree on may not reflect user intent if the merge semantics do not match the application's domain logic.
What Is a CRDT?
The core problem: multiple nodes hold copies of the same data. They all accept writes locally. Network partitions, latency, and reordering mean updates arrive in different orders at different nodes. Traditional databases solve this with coordination — locks, consensus, leader election. CRDTs solve it with structure.
A CRDT is designed so that any possible combination of updates, applied in any order, produces the same result. The merge function is:
- Commutative: merge(A, B) = merge(B, A)
- Associative: merge(A, merge(B, C)) = merge(merge(A, B), C)
- Idempotent: merge(A, A) = A
These three properties define a join-semilattice — a mathematical structure where values can only move "upward" in a partial order, and every pair of states has a unique least upper bound. Once you have that, convergence is automatic.
The term was formalized in 2011 by Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski at INRIA, though the underlying ideas (commutative replication, vector clocks, monotonic merge functions) had been circulating in distributed systems research since the 1970s.
flowchart LR
subgraph Node A
A1["State: {x: 1}"]
end
subgraph Node B
B1["State: {y: 2}"]
end
subgraph Node C
C1["State: {x: 1, y: 2}"]
end
A1 -- "merge" --> C1
B1 -- "merge" --> C1
Figure 1: Two nodes with independent updates merge deterministically into the same result. No coordination required — the merge function guarantees convergence.
How It Works
Two Families
CRDTs come in two flavors. Both guarantee convergence, but they differ in what gets sent over the network.
| PROPERTY | STATE-BASED (CvRDT) | OPERATION-BASED (CmRDT) |
|---|---|---|
| What is transmitted | Full local state | Individual operations |
| Network requirement | At-least-once delivery | Exactly-once, causal delivery |
| Merge mechanism | Join-semilattice merge function | Commutative apply function |
| Bandwidth | Higher (full state per sync) | Lower (ops only) |
| Complexity | Simpler protocol | Simpler data, harder delivery |
| Idempotent delivery | ☑ Built-in (merge is idempotent) | ☒ Must be enforced externally |
State-based (CvRDT — Convergent): Each node periodically sends its full local state to other nodes. The receiver merges the incoming state with its own using a deterministic merge function. Because the merge is idempotent, duplicate deliveries are harmless. The downside is bandwidth — sending full state scales poorly for large data structures.
Operation-based (CmRDT — Commutative): Each node broadcasts individual operations (e.g., "increment counter by 1" or "add element X"). Receiving nodes apply operations directly. This is more bandwidth-efficient, but the network must guarantee causal ordering and exactly-once delivery — if an operation is delivered twice or out of causal order, the result may diverge.
In practice, many implementations use a hybrid approach — delta-state CRDTs — that transmit only the state changes (deltas) since the last sync, combining the idempotent delivery tolerance of CvRDTs with the bandwidth efficiency of CmRDTs.
The Semilattice
The mathematical structure underlying state-based CRDTs is a join-semilattice. A set S with a binary join operation (⊔) forms a join-semilattice if the operation is commutative, associative, and idempotent.
The join of two states always produces a state that is "at least as large" as both inputs — states only move forward, never backward. This monotonic growth is what makes convergence possible without coordination.
A counter that only increments is a trivial semilattice — the join of two values is max(a, b). A set that only grows (elements added, never removed) has a join of union(A, B). Once you need deletion, things get harder.
Common CRDT Types
Counters
| TYPE | OPERATIONS | HOW IT WORKS | TRADE-OFF |
|---|---|---|---|
| G-Counter | Increment only | Each node maintains its own count; total = sum of all node counts | Cannot decrement |
| PN-Counter | Increment and decrement | Two G-Counters: one for increments (P), one for decrements (N); value = P - N | Double the metadata |
A G-Counter (grow-only counter) is the simplest useful CRDT. Each of N nodes maintains a vector of N entries — one per node. A node increments only its own entry. The counter's value is the sum of all entries. Merge takes the element-wise maximum.
Node A: [3, 0, 0] → value = 3
Node B: [0, 5, 0] → value = 5
Node C: [0, 0, 2] → value = 2
──────────────────
Merged: [3, 5, 2] → value = 10
A PN-Counter supports decrement by maintaining two G-Counters: P (positive) for increments and N (negative) for decrements. The current value = sum(P) - sum(N). This works, but it means a counter that has been incremented and decremented a million times each still reports 0 while carrying two million-entry vectors.
Sets
| TYPE | ADD | REMOVE | HOW IT WORKS | TRADE-OFF |
|---|---|---|---|---|
| G-Set | ☑ | ☒ | Union-only set | No removal ever |
| 2P-Set | ☑ | ☑ (once) | Two G-Sets: added and removed | Removed elements cannot be re-added |
| OR-Set | ☑ | ☑ | Each add tagged with unique ID; remove targets specific tags | Metadata grows with every add |
| LWW-Element-Set | ☑ | ☑ | Timestamp-based: latest operation wins | Requires synchronized clocks |
The G-Set (grow-only set) is trivial — merge is set union. Elements can be added, never removed.
The 2P-Set (two-phase set) adds a "tombstone" set tracking removed elements. An element is in the set if it is in the add-set and not in the remove-set. The catch: once removed, an element can never be re-added.
The OR-Set (observed-remove set) solves this by tagging each addition with a globally unique identifier. Removal targets only the tags that the removing node has observed — concurrent additions with new tags survive the remove. This is the most practical set CRDT, but metadata grows with every insertion.
Registers
| TYPE | SEMANTICS | HOW IT WORKS | TRADE-OFF |
|---|---|---|---|
| LWW-Register | Last writer wins | Timestamp-based: highest timestamp overwrites | Requires synchronized clocks; silent data loss |
| MV-Register | Multi-value | Concurrent writes preserved as a set; application resolves | Application must handle multiple values |
A Last-Writer-Wins Register (LWW-Register) is the simplest approach to mutable values — every write carries a timestamp, and the highest timestamp wins. It is simple and space-efficient, but concurrent writes result in silent data loss. If two nodes write different values at nearly the same time, one value disappears without notification.
A Multi-Value Register (MV-Register) preserves all concurrent writes and presents them as a set. The application must resolve conflicts — think Git merge conflicts, but at the data structure level.
Blockchain and Decentralized Applications
Where CRDTs Fit
Blockchains and CRDTs solve related but different problems. Blockchains provide total ordering and global consensus — expensive, slow, and strong. CRDTs provide eventual consistency without coordination — cheap, fast, and weak. They complement each other.
| PROPERTY | BLOCKCHAIN CONSENSUS | CRDT CONVERGENCE |
|---|---|---|
| Ordering | Total (every node agrees on order) | Partial (causal at best) |
| Finality | Explicit (block confirmation) | Eventual (guaranteed but no timestamp) |
| Coordination | Required (consensus protocol) | Not required |
| Throughput | Low (bounded by consensus) | High (local writes, async merge) |
| Conflict resolution | Determined by protocol rules | Determined by data structure math |
OrbitDB
OrbitDB is a peer-to-peer database built on IPFS and libp2p that uses CRDTs as its merge strategy. Each database type maps to a CRDT:
| ORBITDB STORE | UNDERLYING CRDT | USE CASE |
|---|---|---|
| eventlog | Append-only log (G-Set of entries) | Immutable event streams |
| feed | Like eventlog but entries can be removed | Social feeds, messaging |
| keyvalue | LWW-Register per key | Configuration, user profiles |
| docstore | LWW per document ID | Document collections |
| counter | PN-Counter | Distributed counters |
OrbitDB does not use blockchain consensus. Peers replicate data over IPFS, merge locally using CRDT rules, and eventually converge. The IPFS content-addressing (hashing) provides data integrity — similar in spirit to how Anchoring timestamps prove data existence, except OrbitDB uses it for replication rather than proof.
Local-First Software
The "local-first" movement — applications that work offline and sync when connectivity returns — is where CRDTs have gained the most traction outside of databases. Libraries like Automerge and Yjs implement CRDTs for collaborative text editing, enabling Google Docs-style real-time collaboration without a central server.
In the decentralized context, this matters because:
- Decentralized applications (dApps) cannot always rely on immediate on-chain writes due to latency and cost.
- Users may want to collaborate off-chain and settle the result on-chain periodically.
- CRDTs enable off-chain state that is guaranteed to converge when nodes reconnect, without requiring a coordination server.
State Channels and Layer 2
State channels — where parties transact off-chain and settle on-chain — face a synchronization problem: both parties maintain local state that must converge. While most state channel implementations use signed state updates with sequence numbers (effectively an LWW-Register with a monotonic counter), the underlying principle is CRDT-adjacent: local-first writes with deterministic merge.
Risks & What Can Go Wrong
Metadata Bloat
CRDTs trade coordination for metadata. The cost is not abstract:
| CRDT TYPE | METADATA PER OPERATION | GROWTH PATTERN |
|---|---|---|
| G-Counter | One entry per node | O(n) where n = number of nodes |
| PN-Counter | Two entries per node | O(2n) |
| OR-Set | One unique tag per add | Unbounded without garbage collection |
| LWW-Register | Timestamp + value | Constant (only latest retained) |
An OR-Set tracking a shopping cart across 10,000 users who have collectively added and removed items 500,000 times carries 500,000 unique tags until garbage collected. Garbage collection in CRDTs requires coordination — the very thing CRDTs exist to avoid.
Convergence Is Not Correctness
All replicas will eventually agree. But what they agree on is determined by the CRDT's merge semantics, not by application intent. An LWW-Register will silently discard a concurrent write. An OR-Set will keep elements that a user thought they deleted (if a concurrent add happened). A PN-Counter can go negative.
The merge function is mathematically correct but semantically blind. Application developers must understand what each CRDT type does on conflict and ensure it matches their domain requirements.
The Delete Problem
Deletion in CRDTs is fundamentally hard. In a distributed system with no coordination, how do you ensure all replicas agree that something was removed?
- Tombstones: Mark items as deleted but keep them in the data structure forever. Works, but storage grows without bound.
- Causal stability: An operation is causally stable when all replicas have seen it. Tombstones for stable operations can be garbage collected. But determining causal stability requires metadata exchange — partial coordination.
- Epoch-based truncation: Periodically checkpoint and discard history before the checkpoint. Requires all nodes to agree on checkpoints — again, coordination.
There is no free lunch. Every deletion strategy reintroduces some form of coordination.
Clock Synchronization
LWW-Register and LWW-Element-Set depend on timestamps. In a truly decentralized network without synchronized clocks, "last writer wins" becomes "writer with the most aggressive clock wins." Hybrid Logical Clocks (HLC) mitigate this by combining physical timestamps with logical counters, but clock-dependent CRDTs remain brittle in adversarial environments.
Not a Replacement for Consensus
CRDTs guarantee convergence — all replicas reach the same state. They do not guarantee:
- Total ordering: Two operations may be concurrent with no defined order.
- Atomicity across types: Updating a counter and a set "together" is not atomic — they are independent CRDTs that merge independently.
- Access control: Any replica can write anything. There is no built-in concept of permissions or validation.
For applications requiring strong consistency, total ordering, or access control (like financial transactions), blockchain consensus remains necessary. CRDTs are a complement, not a replacement.
Historical Context
| YEAR | EVENT |
|---|---|
| 1978 | Lamport publishes "Time, Clocks, and the Ordering of Events in a Distributed System" — foundational work on logical clocks and causal ordering |
| 1988 | Vector clocks formalized by Fidge and Mattern independently — enabling causal tracking that CRDTs later build on |
| 2006 | Amazon's Dynamo paper describes "shopping cart" merge using set union — an early practical CRDT (before the term existed) |
| 2009 | Shapiro and Preguiça begin formalizing convergent replicated data types at INRIA |
| 2011 | "Conflict-free Replicated Data Types" paper published — formally defines CvRDTs and CmRDTs with semilattice proofs |
| 2011 | Riak (Basho Technologies) ships with built-in CRDT support — first major database to adopt the formalism |
| 2014 | Automerge project begins — CRDTs for JSON-like documents and collaborative editing |
| 2015 | Redis adds CRDT support via Redis Enterprise |
| 2016 | OrbitDB launches — peer-to-peer database on IPFS using CRDTs for merge |
| 2017 | Yjs released — high-performance CRDT framework for real-time collaborative editing |
| 2018 | Apple adopts CRDTs for Notes app synchronization across devices |
| 2019 | Delta-state CRDTs formalized — reducing bandwidth by transmitting only state changes |
| 2023 | Automerge 2.0 released with 100x performance improvement via Rust core |
References
- Conflict-free Replicated Data Types - Shapiro, Preguiça, Baquero, Zawirski, INRIA (2011)
- A Comprehensive Study of Convergent and Commutative Replicated Data Types - Shapiro et al., INRIA Technical Report (2011)
- CRDTs: The Hard Parts - Martin Kleppmann, Strange Loop (2020)
- Automerge - CRDT implementation for collaborative applications
- Yjs - High-performance CRDT framework for shared editing
- OrbitDB - Peer-to-peer database on IPFS using CRDTs
- Dynamo: Amazon's Highly Available Key-value Store - DeCandia et al., SOSP (2007)
- Time, Clocks, and the Ordering of Events in a Distributed System - Leslie Lamport (1978)
- Delta State Replicated Data Types - Almeida, Shoker, Baquero (2016)
- Local-first Software - Ink & Switch (2019)
Changelog
| DATE | AUTHOR | NOTES |
|---|---|---|
| 2026-04-04 | Artificial. | Generated by robots. Gas: 85 tok |
| 2026-04-04 | Denizen. | Reviewed, edited, and curated by humans. |