Collaborative State Sync — PE Trade-Off Analysis

Five approaches to convergence under concurrent edits: CRDTs, OT, Differential Sync, Event Sourcing, and Three-Way Merge.

Distributed State Convergence

As of 2026-06-01

PE Verdict

There is no winner across the board. CRDTs win local-first and offline-first because they don't need a server to converge; OT wins server-mediated rich text because it produces tighter intent preservation than CRDTs at the cost of central authority; LWW (a degenerate CRDT) wins property-level schemas like Figma's; Event Sourcing wins audit/replay/time-travel because state is derived; Three-Way Merge wins async branching because it surfaces conflicts to humans instead of resolving them silently. The PE skill is naming which property of the workload selects the algorithm before you write a line of code, because retrofitting is a 6-month rewrite, not a refactor.

Best default choices

CRDTsOffline-first, local-first, peer-to-peer, multi-master collaborative state OTServer-mediated rich text with tight intent preservation and central ordering Differential SyncFuzzy diff/patch sync when preventing divergence is less important than repairing it Event SourcingAudit, replay, time travel, CQRS, and workflows where state is a projection 3-Way MergeAsync branching, code/docs workflows, and human-visible conflict resolution

CRDT: Conflict-free Replicated Data Type. State-based (CvRDT) merges via idempotent join; op-based (CmRDT) propagates commutative operations. Strong eventual consistency without coordination.
OT: Operational Transformation. Concurrent operations transformed against each other so they produce the same final state regardless of arrival order. Requires central server in practice (TP2 is hard).
Differential Sync: Neil Fraser's algorithm: each peer keeps a shadow copy, diffs local edits against the shadow, patches arrive via fuzzy patch application. Powered Google Wave and original Docs.
Event Sourcing: Append-only log of immutable events; current state is a fold over the log. Conflict resolution is a projection/application concern, not a storage concern.
Three-Way Merge: Git-style merge using common ancestor: changes from base→A and base→B are combined; structural overlap surfaces as conflict for human resolution.

1. Trade-Offs

One table per technology. Each row is a real "give up X to get Y" decision. Sort columns by clicking headers.

CRDTs

Use when offline edits and multi-master convergence matter more than strict global invariants or minimal metadata.

Math-first: convergence is a property of the data type, not the network.

Trade-Off	What You Gain	What You Give Up	When It Bites You	PE Nuance
Commutative ops for no central authority	Peers converge without a coordinator. Works offline, P2P, multi-master across regions.	Every op carries causal metadata (vector clock, Lamport ID, or position identifier). Payload bloats.	A 50KB doc with 100K edits balloons to MB of CRDT state. Yjs is the lightest, still pays this.	The metadata-to-content ratio is the dirty secret. Yjs compresses it brilliantly via varint encoding; Automerge 1.x didn't and got destroyed in benchmarks. Always benchmark with realistic edit history, not initial state.
Tombstones for delete safety	A concurrent insert "before" a deleted element resolves cleanly. Convergence guaranteed.	Deleted content stays in the data structure forever (or until GC with quiescence guarantees).	User pastes and deletes a 10MB block; the doc never shrinks back. Worst with paste-heavy workloads.	GC requires global quiescence ("all peers have seen the delete"), which is impossible to prove in true P2P. Server-mediated CRDTs cheat by treating server as oracle. Yjs uses "snapshots" + structural sharing to mitigate, not eliminate.
Position identifiers over indices for text	No transformation needed when inserting concurrently. Insert at position(A,B) survives any reorder.	Identifiers grow logarithmically (LSEQ) or super-linearly (Logoot, RGA) with edit count.	Long-lived documents accumulate identifier depth, slowing local operations from O(log n) toward O(n).	YATA (Yjs) and Fugue (Loro) are the modern answer to this. RGA-style algorithms had a "stuck at the same position" interleaving bug that Fugue formally fixes. If you cite Yjs in an interview, know YATA.
Strong eventual consistency (SEC)	If two replicas have seen the same set of ops, their state is identical. No coordination required.	No linearizability, no transactions across keys, no "read your write" without local-only reads.	Counter-style invariants ("balance >= 0") cannot be enforced. You can converge to negative balance.	SEC is weaker than people think. Real systems pair CRDTs with a coordinator for invariant-critical ops, defeating the original purpose. Honest answer: CRDTs solve UI state, not money.
Per-data-type algorithm	Set, counter, map, register, sequence, tree each have specialized provably-correct algorithms.	Mixing types is hard. A "map of lists" is two CRDTs composed, and composition often loses guarantees.	You add a "move list item" feature and discover none of the existing tree CRDTs handle it without cycles.	The "moveable tree" problem was unsolved until Kleppmann's 2021 paper. Loro implements it; Automerge does too. If your app has hierarchical drag-and-drop, this is decisive.
Op-based delivery requires reliable causal broadcast	Smaller payloads than state-based. Diff is just the new op.	Must guarantee at-least-once + causal order. Lost op = divergence forever.	WebRTC datachannel drops a packet; without app-level resend, replicas silently diverge.	In practice, "op-based" CRDTs in production (Yjs, Automerge) are actually delta-state hybrids — they recover from missing ops by exchanging state vectors. Pure op-based is a textbook fiction.
No central server	P2P sync via WebRTC, BLE, sneakernet. Offline-first becomes natural.	Access control, persistence, presence, undo all become application-layer problems.	"Undo my last edit" is local-only by default; collaborative undo is a research problem.	Yjs has y-protocols for awareness/presence as separate concerns. Automerge punts entirely. Local-first manifesto is honest about this: convergence is solved; everything else around it is still hard.
Memory-resident operation log	Snappy local edits, time-travel, undo all become free or near-free.	Memory grows with edit count. Lazy loading is hard without breaking causal references.	A 5-year-old Notion doc, opened on mobile, OOMs the WebView.	Yjs sub-documents are the production answer. Treat the doc as a forest of small CRDTs and lazy-load. Anyone building a multi-MB CRDT and not using sub-docs has not hit prod yet.

Operational Transformation (OT)

Best when a central server can serialize document operations and preserving editor intent beats offline-first autonomy.

Server-mediated transformation: ops change shape as they cross other ops.

Trade-Off	What You Gain	What You Give Up	When It Bites You	PE Nuance
Transformation over commutativity	Tight intent preservation. Concurrent "insert at 5" and "insert at 5" produce 2 inserts at distinct positions, not one collapsed.	Each new operation type requires N transform functions against every other op type. Quadratic complexity.	Adding a "table" op to a text editor doubles the transform matrix and breaks proofs of existing pairs.	The TP2 property (convergence under arbitrary concurrent ops) is famously broken in most "OT" implementations. Google's Wave OT had a real TP2 bug for years. CRDTs avoid this entirely.
Central server for serialization	Linear op log, easy persistence, easy access control, easy presence, easy invariants.	Single point of failure for the doc. Offline is hard. Multi-region writes require careful design.	Server goes down during a meeting; nobody can collaborate even if they're on the same LAN.	"OT requires a central server" is an oversimplification — Jupiter and SOCT4 work P2P with TP2, but the complexity is brutal. Production OT is always star-topology. Google Docs proved this.
Linear history for free	Every op has a server-assigned sequence number. Audit, replay, undo redo all trivial.	Latency of every op = round trip to server. Optimistic local apply + reconciliation papers over this.	High RTT clients (300ms+) see jitter as the server rewrites their local state on every reconcile.	Google Docs hides this with an aggressive optimistic UI and a reconciliation queue. The hidden cost: client-side state machine for "in-flight ops" that has to be perfect, or characters duplicate / vanish.
Smaller wire payloads than CRDT state	An "insert 'a' at position 5" is ~10 bytes. No vector clock, no position ID.	Server-side history grows linearly with ops. Compaction requires care.	A 10-year-old Google Doc with millions of ops needs server-side snapshots to load fast.	OT wire format is the lightest of all approaches by far. This is why Google Docs scales to massive docs without client-side OOM. CRDTs can't match this without giving up offline.
Operation-typed schema	Each op has explicit semantics (insert, delete, retain, format). Schema migrations are op-version changes.	Evolving the op set is invasive. Old clients can't apply new ops; new clients can't transform against old ops without history.	Releasing a new "comment" op type and finding clients pinned to old versions in mobile app stores.	Forward-compatibility is the silent killer of OT systems. CRDT systems treat unknown ops as opaque blobs and forward; OT must transform, so unknown ops are fatal.
Inversion is straightforward	Each op has an inverse (insert ↔ delete with same position). Undo is just apply the inverse.	Collaborative undo requires transforming the inverse through subsequent ops, which can change meaning.	User edits, peer edits in the same region, user undoes — undo now deletes the peer's content.	Selective undo (undo my own op but not others) is the actual hard problem. Google Docs solved this; most OT implementations don't, and users notice immediately.
Proven for text and rich text	40+ years of research, multiple production systems, well-understood for sequence types.	Extending to JSON, trees, sets, counters is mostly DIY and rarely as elegant as CRDT versions.	You start with text OT and add a "comments thread" — now you're inventing OT primitives for tree-of-text.	ShareDB's json0 is the canonical OT-for-JSON, and even its maintainers acknowledge it's complex. If your doc is anything but rich text, OT pays a heavy tax.
Tightly server-coupled persistence	Persist the op log; replay rebuilds any state. Snapshots are an optimization, not a requirement.	Storage scales with edit volume, not document size. Hot docs in viral threads become storage hot spots.	A doc with 1M edits costs more to load than a 100MB blob even though the rendered text is 50KB.	Google Docs runs periodic snapshot + truncate. This is operational, not algorithmic. Production OT is always "OT + snapshot store", a fact most OT papers omit.

Differential Sync

Choose when sync intervals and fuzzy patches are acceptable, and repairing drift is simpler than modeling every operation.

Diff/patch over fuzzy matches: collapses divergence, doesn't prevent it.

Trade-Off	What You Gain	What You Give Up	When It Bites You	PE Nuance
Stateless diff over op streams	No causal metadata. Each sync round is a plain text diff + patch.	No intent preservation. Concurrent inserts at the same offset merge into garbage.	Two users typing different sentences at the cursor position simultaneously; result is character-level interleaved nonsense.	This is why Google Wave moved to OT and Docs followed. Diff-sync is "best effort convergence", which is fine for prose drafts and a disaster for code.
Shadow copy per peer	Diffs are computed locally against a known baseline. No global op log needed.	2x memory per peer (live + shadow). Hub model means O(N) shadows on server for N peers.	A 100-user collab session needs 100 shadows on the hub; large docs OOM the server.	Fraser's original paper acknowledges this; production deployments use shadow eviction with re-sync on reconnect. Nobody runs raw diff-sync at scale anymore.
Fuzzy patch application	Patches apply even when context has shifted. Robust to mild divergence.	"Best effort" matching can place edits in semantically wrong locations. Silent corruption.	You meant to fix typo on line 5, your patch matches similar text on line 47.	diff-match-patch's Bitap algorithm is genuinely brilliant for the use case (CodeMirror still uses it), but it's not a sync primitive. Reframe it as a fallback merge tool, not a sync engine.
Symmetric protocol	Same algorithm on client and server. Easy to implement, easy to reason about.	No notion of authoritative state. Conflicts resolve by "whoever syncs last wins per region".	Mobile client offline for 2 days, syncs back, sees 50 conflict edits and silently picks one branch.	The simplicity is seductive but deceptive. Symmetric protocols hide which side is authoritative, which is fine until you need to debug a divergence in prod and have no causal record.
Patch as transport-friendly format	Diffs compress well. Easy to send over any transport, including non-realtime (email, polling).	Larger than op-based for fine-grained edits. A single-char insert ships ~50 bytes of patch context.	High-frequency keystroke sync becomes bandwidth-heavy compared to OT or Yjs ops.	Diff-sync was designed for periodic sync (every 1-2s), not per-keystroke. Forcing it into realtime is a category error people make trying to extend it.
Best-effort convergence, not guaranteed	In the common case (low concurrency, similar baselines), converges fast with minimal state.	No formal guarantee. Pathological histories diverge silently or produce nonsense.	Your QA scripts can't reproduce a divergence bug because it depends on packet timing.	CRDTs and OT have formal proofs of convergence. Diff-sync has neither. For docs where corruption is annoying (notes), fine. For docs where corruption costs money (contracts), no.
Domain-agnostic via text representation	Works on anything serializable to text. JSON, code, prose all share the same engine.	Structural changes (move JSON key, reorder list) become character-level diffs and merge poorly.	Two users reorder list items differently; merged result has duplicated and reordered items.	This is why structural CRDTs (Automerge, Yjs Map) exist. Treating structured data as text is the path to corrupted JSON in your DB. Use diff-sync only on opaque text.
Convergence requires bidirectional flow	Both sides exchange diffs each cycle. Symmetric and simple.	Long offline periods break the shadow invariant; reconnection requires expensive full-doc reconciliation.	Field worker offline for a week, server has evicted shadow, reconnect downloads full doc + 3-way merge.	This is the unsolvable failure mode. CRDTs handle long offline natively because they don't depend on a shared baseline. Diff-sync is fundamentally an online-leaning algorithm.

Event Sourcing

A strong fit for audit/replay/time-travel domains where writes are immutable facts and projections own conflict behavior.

State is a fold. Conflict resolution lives in the projection.

Trade-Off	What You Gain	What You Give Up	When It Bites You	PE Nuance
Append-only log over mutable state	Full audit trail, time travel, replayable projections, natural fit for Kafka/Kinesis/EventStore.	Read path requires fold or materialized projection. Naive reads are O(events).	A 5-year-old aggregate with 200K events takes 30s to rehydrate on first read.	Snapshots + rolling projections are mandatory in prod, but they're an operational concern (when to snapshot, how to invalidate). Every event-sourced system reinvents this and most do it wrong the first time.
Schema evolution via event versioning	Old events stay immutable; upcasters bring them forward. Strong forward compatibility.	Upcasters accumulate. A 5-year system has chains of v1→v2→v3→v4 transforms for every event.	You change a field name in v7, and v1 events have to upcast through six versions to render.	Greg Young's rule: "events are immutable, the schema is forever". Most teams underestimate. Define a deprecation policy and a "weaver" pattern before you have 50 event types, not after.
Per-aggregate consistency boundary	Strong consistency within an aggregate via single-writer pattern. Easy to reason about.	Cross-aggregate transactions require sagas, process managers, or eventually consistent projections.	You discover "user moves item from cart to wishlist" spans two aggregates and needs saga, mid-sprint.	Aggregate boundaries are the most important design decision in event sourcing and the one that gets revisited 3 years in when invariants don't match. DDD-driven design is non-negotiable.
Conflict resolution at projection time	Different projections can resolve the same events differently. "Last-write-wins view" vs "history view".	No single source of truth for "current state". Projections drift, debugging requires log replay.	Two services have different projections of the same stream and start producing contradictory outputs to downstream.	This is genuinely powerful and genuinely dangerous. Production discipline: one canonical projection per consumer, with idempotency and exactly-once consumption (Kafka EOS or de-dup keys).
Idempotent event handlers	Replays are safe. Catch-up consumers reach the same state as live ones.	Handlers must be carefully designed. Side effects (emails, payments) require outbox or de-dup keys.	"Replay this projection from event 0" sends 500K welcome emails because the email handler isn't idempotent.	The outbox pattern is the standard answer for "side effects in event handlers". If you're event-sourcing without outbox + de-dup, you're one bad day away from a customer-facing incident.
Concurrency via optimistic versioning	Expected version on append rejects stale writes. Mathematically clean.	High-contention aggregates retry constantly. UX must handle "your version is stale, retry" without confusing users.	Black Friday: 1000 concurrent writes to a single inventory aggregate, 95% retry rate, latency spikes.	The mitigation is splitting the aggregate (per-SKU, per-warehouse) or moving to event-driven CQRS where commands are queued. Both are real architectural changes, not config.
Event log as integration contract	External consumers tail the log. Decoupled, scalable, ordering preserved.	Event schemas leak internal model to consumers. Refactoring is breaking the API.	You rename an internal aggregate and discover three external teams have consumers reading the old event names.	Public events ≠ internal events. Use the "event translator" pattern (Eric Evans, anticorruption layer) and never let internal events out to consumers. CDC tools (Debezium) make this mistake easy.
Storage cost over compute cost	Cheap to write (sequential append), parallel-readable. Object storage friendly.	Storage grows monotonically. Compaction is non-trivial because events are the source of truth.	3 years in, you have 10TB of events and reads through the projection are slow on cold storage.	Kafka's tiered storage and EventStoreDB's scavenging address this, but each has gotchas. The PE move is to architect retention from day one: hot/warm/cold, what's compactable, what's regulatory-locked.

Three-Way Merge

Default for async branching when surfacing conflicts to humans is safer than silently inventing an automatic merge.

Common ancestor anchor: surface conflicts to humans, don't auto-resolve.

Trade-Off	What You Gain	What You Give Up	When It Bites You	PE Nuance
Common-ancestor anchoring	Distinguishes "A changed, B didn't" from "both changed". No false conflicts.	Requires storing or computing the merge base. Histories must be navigable (DAG).	Diverging branches with no common ancestor (rebased out, cherry-picked) lose merge fidelity.	This is the conceptual leap over diff-sync: knowing the base reveals intent. Git's recursive merge handles multiple bases. Without a base, you're back to diff-sync's guessing.
Human conflict resolution over auto-merge	Semantic conflicts surface for human judgment. No silent corruption.	Real-time UX is impossible. Workflow requires explicit merge step.	Anything that needs sub-second collab (Figma, Docs) is the wrong fit.	3-way merge is async by design. Forcing it sync (continuous merge bots) defeats the whole point. Match the algorithm to the workflow: code review = async = 3-way merge.
Line-based or hunk-based diffing	Fast, robust, language-agnostic. Decades of tooling.	Whitespace, formatting, and structural changes generate spurious conflicts.	One side reformats with prettier, the other adds a function; 100% conflict on every line touched.	Structural (AST-aware) merge tools (mergiraf, semantic-merge) solve this but break on minor syntax variation. Most teams default to text-merge + lint-on-merge, which is a sane compromise.
DAG-based history model	Branches and merges are first-class. Bisect, blame, revert all work.	Storage and reasoning complexity. Beginners struggle with rebase vs merge.	Onboarding cost. New engineers cause incidents with bad force-pushes.	Git's UI is famously bad for the DAG underneath. Tools like Graphite, Jujutsu, Sapling are attempts to fix this without changing the model. The model itself is correct.
Two-sided commutativity, not three-sided	A merged with B = B merged with A in most cases. Predictable.	When more than two branches converge (octopus merge), behavior is undefined for conflicts.	Octopus merge with conflicts: Git refuses, you serialize the merges, history becomes unintuitive.	3-way merge is fundamentally a pairwise op. Real "N-way merge" is a research topic (Darcs patch theory, Pijul). For most teams, sequential pairwise merges work fine and are easier to debug.
Reversibility via revert commits	A change can be undone safely as a new event in history. Full audit.	Revert ≠ rollback. History grows; revert of revert is fragile.	You revert a problematic deploy, then realize part of it was correct; cherry-picking back is messy.	git revert is fine for forward-rolling fixes. For "undo everything since X", you want force-push (lossy) or a new branch (creates parallel history). Choose explicitly, not by accident.
Content-addressable storage	Deduplication via SHA. Tree object reuse across commits. Efficient at scale.	Storage growth is bound by content, but metadata (commits, refs) still grows with edit count.	Monorepos with 10M files and 1M commits stress even Git's CAS; LFS / partial clone are mandatory.	Git scales surprisingly well, but only with Scalar/partial-clone/sparse-checkout. Anyone with a monorepo above ~100GB is operating Git as a database, not a VCS. Meta and Google moved to Mercurial/custom for this reason.
Explicit branching workflow	Reviewable units of change. PR culture. Forces social negotiation of conflicts.	Friction. Branch lifetime > 1 week = pain. Continuous integration is a discipline, not a tool.	Long-lived feature branches → 3000-line merge with 200 conflicts → review skipped, bugs land.	Trunk-based development is the answer. 3-way merge works best when used least. The PE move is to design the workflow so 3-way merge rarely needs to resolve real conflict.

2. Use Cases

Real workloads where each technology was chosen, why it won, and what ruled out the obvious alternative.

CRDTs

Use when offline edits and multi-master convergence matter more than strict global invariants or minimal metadata.

Use Case	Company / Scenario	Driving Property	Scale Dimension	Why Not Alternative
Local-first note-taking with full offline sync	Anytype, Obsidian Sync, Linear (CRDT-inspired sync engine)	Zero-coordination offline edits that converge on reconnect, with multi-day offline tolerance	100K+ users, multi-device per user, days of offline edits	OT requires a server in the merge path; you can't make progress offline without one
Real-time collaborative rich text in editors	Yjs in Notion AI Blocks, Discourse, JupyterLab, ProseMirror integrations	Sub-100ms keystroke convergence across >50 concurrent editors per doc with no central authority required	Yjs handles docs with millions of ops, 100s of concurrent peers via y-websocket	OT's central server becomes the bottleneck at 100+ peers per doc; CRDT scales via gossip
Design tool object-level state sync	Figma (LWW-Register style, property-level)	Atomic property updates where conflicts on different fields of the same object don't fight	Millions of objects per file, 100+ concurrent collaborators on flagship files	Character-level CRDTs are overkill for "set fill color"; text CRDTs would balloon doc size 10x
P2P sync without a backend	Local-first apps using Automerge over WebRTC, Holepunch, BLE mesh apps	True P2P convergence with no infrastructure: no server, no relay, no cost-per-user backend	Small teams <20 peers, mesh latency <500ms, doc sizes <10MB	OT and diff-sync both require trusted central state; CRDT is the only mathematically-clean option
Multi-region active-active datastore conflict resolution	Riak (LWW + Riak DT), Redis CRDTs (Active-Active in Redis Enterprise), Cosmos DB multi-master	Cross-region writes that converge without coordination round-trips at the storage layer	Geo-distributed writes at 1M+ ops/sec across 3+ regions	Paxos/Raft cross-region adds 100ms+ latency; CRDTs converge async with no consensus call
Offline-tolerant field operations tools	Survey apps, field service apps, healthcare EMR with intermittent connectivity	Per-device durability with merge on reconnect, weeks of disconnection tolerated	10K-100K field devices, 100+ MB local state, weekly sync windows	Diff-sync's shadow eviction breaks at multi-day offline; CRDT's metadata is the durable price paid for this

Operational Transformation (OT)

Best when a central server can serialize document operations and preserving editor intent beats offline-first autonomy.

Use Case	Company / Scenario	Driving Property	Scale Dimension	Why Not Alternative
Cloud-first collaborative text editing	Google Docs, Etherpad, ShareDB-based apps	Tight intent preservation for character-level concurrent edits with minimal wire payload	Docs with 10M+ ops, 500+ concurrent editors per doc demonstrated by Google in interviews	CRDT metadata per character would bloat docs that already strain storage; OT op log is leaner
Centralized collaborative code editing	Replit Multiplayer, CodeMirror collab via ShareDB, Codeshare	Server-authoritative state for integration with run/build pipelines that need a canonical version	1K+ rooms, 10s of editors per room, sub-200ms keystroke latency	CRDTs work but adding compile/run integration is easier with a server-side authoritative state already
Spreadsheet collaboration with formula recalc	Google Sheets (OT-based for cell content)	Server-authoritative recalc on dependent cells; deterministic computation order	10K cells per sheet, dozens of concurrent editors, formula chains of 1K+ deps	CRDT for cell values is fine, but formula evaluation order needs serialization; OT gives you that for free
Real-time presentation / whiteboard editing (text-heavy)	Google Slides, Etherpad whiteboards with text-first model	Intent preservation on text within shapes, server-mediated sync for <100 collaborators	100s of slides per deck, 50 concurrent editors, mixed text/object operations	For pure text fields, OT's tighter intent preservation beats LWW + character CRDT hybrid
Enterprise wiki / structured doc collab	Atlassian Confluence (Synchrony, OT-based)	Server-authoritative version with audit trail and compliance hooks; central locking when needed	Enterprise scale: 100K+ docs, regulatory audit requirements	CRDTs don't naturally support enterprise lock-down (force checkout, supervised edits); OT's centralization fits

Differential Sync

Choose when sync intervals and fuzzy patches are acceptable, and repairing drift is simpler than modeling every operation.

Use Case	Company / Scenario	Driving Property	Scale Dimension	Why Not Alternative
Original Google Wave / early Google Docs	Google Wave (deprecated), early Docs prototype	Simple stateless sync over high-latency connections; minimal client implementation	Used briefly at Google scale before migration to OT for correctness reasons	OT existed but felt heavyweight in 2006; diff-sync was a deliberate "ship now, fix later" choice
Periodic sync for collaborative draft documents	Lightweight CMSs, draft markdown sync in static site generators	1-5 second periodic reconciliation tolerable; per-keystroke not needed	10-100 users per doc, sync intervals 1-10s	OT is overkill for periodic sync; CRDT metadata is wasteful when no offline needed
Fuzzy patch application in code review tools	diff-match-patch in CodeMirror, Phabricator, suggested-edit features	Patches must apply against shifted context (lines moved since suggestion was made)	10s-100s of suggestions per review, source files of 1K-10K lines	This is diff-sync repurposed correctly: as a one-shot merge tool, not a sync engine
Browser-extension form sync across devices	Form-filler extensions, password manager autofill state sync	Low-frequency, last-edit-wins semantics with stale-baseline tolerance	10s of devices per user, sync intervals minutes, doc sizes <1MB	CRDTs are overkill for "sync this form occasionally"; diff-sync's simplicity wins
Markdown blog post drafts across web + mobile	Indie blogging tools, draft sync in note apps without true real-time collab	Single-user multi-device, "last device wins" with conflict surfacing on collision	1 user, 2-5 devices, sub-100KB documents	Single-user case doesn't need OT or CRDT; diff-sync ships in 200 lines of code

Event Sourcing

A strong fit for audit/replay/time-travel domains where writes are immutable facts and projections own conflict behavior.

Use Case	Company / Scenario	Driving Property	Scale Dimension	Why Not Alternative
Financial ledgers with regulatory audit	Banks, fintech (Plaid-style transaction logs), trading systems	Immutable event log is the regulatory artifact; state is derived	Billions of events, multi-year retention, ms-grade replay for audits	CRUD destroys audit trail; CRDT's tombstones violate "no edits to past" regulatory requirements
Inventory and order systems with eventual consistency	Amazon retail, Shopify, large e-commerce platforms	Decoupled aggregates (cart, inventory, payment) with async projections to fulfillment, search, recommendations	1M+ orders/day, 100s of downstream projections, multi-region replication	Synchronous transactional CRUD across these domains creates a distributed monolith; event-driven decouples them
IoT / telemetry pipelines with replayable analytics	Connected vehicle data, industrial sensor fleets, observability backends	High-write throughput append-only, with downstream consumers tailing different projections	10M+ events/sec, multi-day retention, 100s of consumer groups	State-only stores lose time-series fidelity; event sourcing IS the time series
Domain-driven aggregates with complex invariants	Insurance policy systems, loan origination, multi-step approval workflows	Each event carries business intent (PolicyAmended, ClaimDenied) for downstream rule engines	Aggregates with 100s of event types, 10-year policy lifetimes	State-only models lose the "why" of each transition; event sourcing preserves it as first-class data
CQRS read-side materialization for varied query shapes	Microservices teams with one write model + many read models	Each consumer builds its own projection optimized for its query patterns	50+ downstream projections per event stream, varied DBs (ES, Postgres, ClickHouse)	Single shared DB schema becomes the choke point; ES + CQRS lets each team own its read shape

Three-Way Merge

Default for async branching when surfacing conflicts to humans is safer than silently inventing an automatic merge.

Use Case	Company / Scenario	Driving Property	Scale Dimension	Why Not Alternative
Source code version control	Git, Mercurial, Jujutsu, used industry-wide	Branched async workflow with explicit conflict surfacing to humans for semantic review	Linux kernel: 1M+ commits, 70K+ contributors, 30M lines of code	Auto-merge would silently corrupt code; conflict surfacing IS the safety property
Configuration as code with PR-based reviews	Terraform, Kubernetes manifests, GitOps tooling (ArgoCD, Flux)	Reviewable, revertable infra changes with branch-based environments	10K+ resources per repo, 100s of engineers, multiple deploy targets	Real-time config editing risks production outage; async merge with review is the safety net
Schema migrations across long-lived branches	Database migration tools (Flyway, Liquibase), Prisma migrations in feature branches	Migrations are ordered changes; merge surfaces ordering conflicts to humans for resolution	100s of migrations per repo over years, multiple parallel feature branches	Auto-resolve would apply migrations in unsafe orders; human review is mandatory
Wiki / docs with editorial workflow	MediaWiki edit conflicts, ReadTheDocs PR-based docs, Astro Starlight	Editorial review with conflict surface; "last save wins" is unacceptable	Wikipedia: millions of articles, peak edit collision rates manageable via 3-way merge	Diff-sync silently merges; OT/CRDT are overkill for slow editorial cadence
Forked / re-distributable content pipelines	Open source library forks, dataset versioning (DVC, lakeFS), spec forks	Long-lived divergent branches must rejoin upstream; ancestor-based merge is the workflow	1000s of forks, multi-year divergence, periodic upstream merges	No real-time component; 3-way merge is the right semantic match for the workflow

3. Limitations

Matrix view: rows are limitation categories, columns are technologies. Toggle columns to focus.

Limitation Category	CRDTs	OT	Diff Sync	Event Sourcing	3-Way Merge
Metadata overhead	HIGH Position IDs, vector clocks, tombstones. Yjs ~1.5x, Automerge historically 5-10x.	MEDIUM Op log grows with edits; snapshots required for prod.	MEDIUM 2x memory per peer for shadows; ephemeral diffs.	HIGH Storage grows linearly with events; snapshots mandatory.	MEDIUM History DAG metadata grows; CAS dedupes content.
Cross-key transactions	CRITICAL No global atomic ops. SEC only.	MEDIUM Server can serialize cross-key ops if modeled in op set.	CRITICAL No transactional concept at all.	HIGH Within aggregate yes; across aggregates needs sagas.	MEDIUM Atomic per commit; cross-commit needs application logic.
Schema evolution	MEDIUM Unknown ops can be forwarded as opaque; type changes are hard.	HIGH Op set is part of the contract; new ops break old clients.	MEDIUM Format-agnostic for text; structured changes break merge.	HIGH Upcasters required forever; chains accumulate.	MEDIUM Content-only; schema lives outside, must coordinate separately.
Invariant enforcement	CRITICAL Counters can go negative. No global constraint check.	MEDIUM Server is single writer per doc; can enforce invariants.	CRITICAL Last-sync wins; no global invariant.	MEDIUM Per-aggregate optimistic concurrency enforces invariants.	MEDIUM Pre-receive hooks / CI gates enforce invariants async.
Real-time presence / awareness	MEDIUM Separate awareness protocol needed (y-protocols). Not part of CRDT.	MEDIUM Server can broadcast presence cheaply; common in production.	MEDIUM Often layered separately via hub messages.	HIGH Not the right tool; presence is ephemeral state, ES is durable.	CRITICAL Async by design; presence is meaningless.
Access control granularity	HIGH P2P CRDTs can't enforce ACLs cryptographically without extra layer.	MEDIUM Server enforces ACLs per op; standard pattern.	MEDIUM Hub-mediated, can apply ACLs at the hub.	MEDIUM Per-command auth in command handler; well-established.	MEDIUM Repo/branch-level ACLs; standard in Git hosting.
Undo/redo complexity	HIGH Selective collaborative undo is a research problem.	MEDIUM Inverse-op transforms work; Google Docs proves it scales.	HIGH No op model; undo = re-sync to prior shadow.	MEDIUM Compensating events (CartItemRemoved); natural fit.	MEDIUM git revert / reset; well-understood.
Server cost at scale	LOW Stateless relay possible; cheap at scale.	HIGH Server in critical path for every op; scales as document count × concurrency.	HIGH O(N) shadows per doc on hub; memory pressure at scale.	MEDIUM Append throughput is cheap; projection compute scales with read patterns.	MEDIUM Storage scales with content + history; CAS amortizes.
Garbage collection	CRITICAL Tombstones accumulate; safe GC needs global quiescence proof.	MEDIUM Snapshot + truncate; well-understood.	LOW Stateless on the wire; minimal GC concern.	HIGH Event compaction breaks audit; regulatory considerations.	MEDIUM Reflog expiry; periodic gc; standard tooling.

4. Fault Tolerance

How each technology behaves when networks partition, nodes crash, or replicas drift. Rows are dimensions, columns are technologies.

Dimension	CRDTs	OT	Diff Sync	Event Sourcing	3-Way Merge
Replication model	Multi-master, leaderless. Every replica equal.	Star topology with single authoritative server; clients are caches.	Hub-and-spoke. Clients sync against hub or each other pairwise.	Log-replicated (Kafka/EventStore). Single-writer per partition; many readers.	Distributed VCS; full replicas (Git) or central with checkouts (SVN).
Failure detection	App-layer; CRDT itself has no notion. Awareness protocols use heartbeats.	Server health checks + client reconnect. Server is the SPOF that gets monitored.	Hub heartbeat; client detects via missed sync intervals.	Log infrastructure handles it (Kafka ISR, EventStore cluster heartbeats).	N/A on the algorithm; remote unavailability detected by transport.
Failover mechanism	No failover needed; any peer can serve. Reconcile state on rejoin.	Hot-standby server takes over op log; clients reconnect to new endpoint.	Hub failover; clients drop shadow and re-sync against new hub.	Log replicas elect new leader (Raft / KRaft); reads continue from followers.	Repo mirroring; switch remote URL. No automatic failover concept.
RTO (typical)	Effectively 0; offline-capable. Reconciliation is async.	10s-60s for server cutover; longer if op log recovery needed.	10s-60s for hub recovery; longer for re-shadow on all clients.	Sub-30s for log infra; minutes-to-hours to rebuild projections from log.	Manual; depends on mirror setup. Often minutes-to-hours.
RPO (typical)	Zero for any persisted replica; ops in transit may be lost without app-level guarantees.	Zero if op log is synchronously persisted; seconds if async.	Seconds to minutes; depends on sync interval.	Zero for committed events (sync replication); typically configurable.	Zero for pushed commits; unpushed work is local-only loss.
Split-brain behavior	Both sides accept writes; converge on merge. This is by design.	Single-server model prevents split-brain; can lose availability instead.	Both sides drift; on rejoin, "last patch wins per region" risks corruption.	Quorum prevents split-brain in log; minority partition rejects writes.	Both branches accept commits; merge surfaces conflicts to humans.
Blast radius of single-node failure	Minimal; one peer offline doesn't affect others. New peer rebuilds from any peer.	Entire document unavailable if no failover; single doc per server typical.	All clients of that hub lose sync until failover.	Single broker fails: partitions move to ISR; brief read/write degradation.	A remote going down blocks push/pull but local work continues.
Cross-region failover story	Native; multi-region multi-master is the use case. No coordination cost.	Hard; requires global op ordering or per-region docs. Google Docs is largely single-region per doc.	Per-region hubs with periodic cross-region sync; convergence is best-effort.	MirrorMaker / Kafka federation; lag-tolerant DR; canonical pattern.	Mirror clones; manual promotion. Operationally simple.
Data loss scenarios	Ops lost in transit without app-level resend = silent divergence. State vectors detect on rejoin.	Server crash before op log fsync = lost ops; client retry mitigates.	Patches lost = silent divergence. Resync on next cycle, but conflicts may slip through.	Events committed = durable. Producer crash between commit phases = duplicate (idempotency handles).	Unpushed commits on dead laptop = total loss. Force-push overwriting = recoverable from reflog 30 days.

5. Sharding (Scaling the Collaborative State)

These algorithms aren't databases, so "sharding" here means how the collaborative state is partitioned across servers, peers, or storage to scale beyond a single node.

Dimension	CRDTs	OT	Diff Sync	Event Sourcing	3-Way Merge
Sharding model	Document-level partitioning. Yjs sub-documents split one doc across CRDT instances.	Document-per-server-instance; consistent hash on doc ID across server fleet.	Document-per-hub; route by doc ID.	Partition by aggregate ID (consistent hash); strict per-partition ordering.	Repository as the shard; monorepo vs polyrepo is the partitioning decision.
Shard key constraints	Sub-doc boundaries must align with merge boundaries; cross-subdoc ops are app-level.	All ops to one doc must hit the same server; sticky sessions required.	All peers of a doc must hit same hub; sticky routing.	Aggregate ID drives partition; cross-aggregate ordering not preserved.	Boundary = repo. Cross-repo dependency tracking is a separate problem (Bazel, build systems).
Rebalancing mechanism	Sub-doc migration is data-plane; clients re-attach to new endpoint.	Move doc to new server: drain ops, transfer state, re-route clients. ~seconds of unavailability.	Drain hub, transfer shadows or force re-shadow on clients.	Kafka partition reassignment; transparent to producers, brief lag for consumers.	Repo splits via filter-branch / repo surgery. Major undertaking.
Rebalancing cost / impact	Low if subdocs are independent; high if cross-subdoc refs exist.	Brief read-only window; client reconnect storm on cutover.	Shadows must be re-established; bandwidth spike on transition.	Catch-up read I/O on new owner; producer lag during reassignment.	High; repo splits change history, can break downstream automation.
Hot-shard behavior	A viral doc still bottlenecks on the server hosting its websocket relay; subdivide via sub-docs.	Hot doc maxes out a single server; "shard the doc" via OT sub-docs (rare, complex).	Hot doc maxes out hub; shadow count × doc size = OOM risk.	Hot aggregate becomes hot partition; split aggregate (per-SKU, per-warehouse).	Hot repo (monorepo) hits Git's scaling limits; partial clone / sparse checkout required.
Maximum shards (practical)	Tens of thousands of sub-docs per session, limited by client memory.	Server fleet size; sticky routing limits concurrency.	Hub count; small-medium scale (10s of hubs).	Kafka: 200K partitions per cluster as a practical ceiling.	Effectively unlimited repos; monorepo size capped by tooling (LFS, Scalar).
Resharding without downtime?	Partial; sub-doc split is online but cross-references break briefly.	No; doc migration requires brief op suspension.	No; shadow re-establishment requires sync pause.	Yes; Kafka reassignment is online. KRaft makes it smoother.	Repo split is offline-style; large monorepos take hours.
Cross-shard query support	N/A — no query layer. App-layer joins across sub-docs.	N/A — OT is per-doc only.	N/A — same.	Cross-aggregate via projections; no native cross-partition transactions.	Cross-repo deps via submodules (painful) or build systems (Bazel, Buck).

6. Replication

How replicas synchronize state. This is the core algorithmic question for these technologies.

Dimension	CRDTs	OT	Diff Sync	Event Sourcing	3-Way Merge
Replication topology	Leaderless multi-master. Any peer can write.	Single-leader (server). Followers (clients) propose, leader serializes.	Hub-and-spoke or symmetric pair. No global leader.	Single-writer per partition; many readers (CQRS).	Distributed; every clone is a replica. Push/pull driven.
Sync vs async	Async by design; ops propagate via gossip / pub-sub.	Optimistic local apply (sync UX); server reconciliation is async.	Async, periodic. Sync intervals 0.5-5s typical.	Configurable: sync replication for durability, async for latency.	Manual; user-driven push/pull.
Replication factor (default / max)	N peers, no fixed count; any subset of peers carry state.	Usually 1 (server) + clients; HA pairs the server.	1 hub + N peers; HA via hub pairs.	3-5 replicas typical for Kafka; configurable per topic.	Unbounded; every clone is a replica.
Consistency level options	Strong eventual consistency. No linearizability.	Linearizable per doc via server serialization.	Best-effort eventual; not guaranteed convergence under adversarial timing.	Strong per-partition; eventual across partitions.	Per-commit linearizable on a branch; cross-branch is human-resolved.
Replication lag (typical)	Sub-100ms LAN, 100-500ms WAN. Bounded by network only.	RTT + server processing; 50-200ms typical.	One sync interval; 0.5-5s.	Single-digit ms within cluster; seconds cross-region.	Human-driven; minutes to days between pushes.
Conflict resolution	Built into the data type (LWW-register, OR-set, RGA, Fugue, YATA). Deterministic.	Transform functions per op-pair; produces single linear history.	Fuzzy patch application; "best effort". No formal guarantee.	Optimistic concurrency on expected version; retry on conflict.	Surface to human via conflict markers; explicit resolution required.
Cross-region replication	Native; multi-region multi-master is the design point.	Hard; either per-region docs or global op log (latency expensive).	Per-region hubs with cross-region peer sync; converges over rounds.	MirrorMaker 2 / Cluster Linking; canonical pattern for Kafka.	Mirror remotes; lazy sync. Operationally trivial.
Replication during partition	Both sides accept; converge on heal. AP in CAP terms.	Minority partition rejects writes (no server); CP-leaning.	Both sides accept locally; conflicts on rejoin.	Minority side rejects (quorum); majority continues.	Both sides commit locally; merge on rejoin.

7. Better Usage Patterns

Patterns most teams miss. Anti-patterns surface in code review. Optimizations compound at scale.

CRDTs

Use when offline edits and multi-master convergence matter more than strict global invariants or minimal metadata.

Pattern	What Most Teams Do Wrong	The Better Way	Why It Matters
Sub-documents for large state	Build one giant CRDT doc that holds the entire app state; OOM on mobile after 6 months.	Decompose into Yjs sub-docs (per page, per pane, per panel) and lazy-load on demand.	Doc lifetime is years. Without sub-docs, every user pays the full history cost on every device.
Awareness as a separate protocol	Stuff presence, cursors, selections into the CRDT itself; explode the op log with ephemeral churn.	Use y-awareness or equivalent ephemeral channel; CRDT is for durable state only.	Cursor movements are 60Hz; storing them as CRDT ops bloats the doc 1000x and breaks sync perf.
State vector exchange before op replay	Replay every op since the beginning of time on every reconnect.	Exchange state vectors first; transmit only the delta.	This is the difference between "reconnect in 100ms" and "reconnect in 30s" on long-lived docs.
Treat undo/redo as user-scoped	Build collaborative undo with shared undo stack; users undo each other's edits.	Per-user undo manager (Yjs UndoManager scope) that only undoes ops attributed to that user.	The single most-complained-about UX defect in homegrown CRDT editors; Yjs handles it correctly out of the box.
Schema for the CRDT, not for the app	Use CRDT primitives directly in UI code; refactoring the schema breaks every component.	Build a thin domain layer over the CRDT; UI reads from observable views, not raw Y.Map.	Decoupling lets you migrate from Yjs to Loro (or vice versa) without rewriting the UI.
Persist op log, not just snapshots	Snapshot the CRDT every N seconds and discard the op log; lose offline-edit reconciliation.	Persist both: op log for reconciliation, snapshots for fast load.	A new client joining after a week needs the ops, not the snapshot; snapshot is a load optimization, not a sync mechanism.

Operational Transformation (OT)

Best when a central server can serialize document operations and preserving editor intent beats offline-first autonomy.

Pattern	What Most Teams Do Wrong	The Better Way	Why It Matters
Server as op orderer, not state holder	Server holds full document state and broadcasts diffs; becomes the choke point for memory.	Server holds the op log + lightweight checkpoint; clients reconstruct view.	Op-log servers scale to 10x more docs per box than state-holding servers.
Explicit op versioning from day one	Add new op types without version bumps; old clients crash on new ops.	Embed op version in the op envelope; reject incompatible versions with explicit "upgrade required".	Mobile clients pinned to old versions for months are a fact of life; design for it or lose them silently.
Snapshot + truncate cadence	Let the op log grow forever; doc load time hits seconds, then tens of seconds.	Periodic snapshot to S3/blob + truncate. Snapshot interval tied to op count, not wall time.	10M-op docs are common at scale (Google Docs). Without snapshot policy, you lose load-time SLOs.
Per-doc backpressure	Let one viral doc with 1000 collaborators DDoS the server with op storms.	Rate-limit ops per client per doc; batch sub-100ms ops; throttle to server capacity.	Without this, a single doc takes down the entire server fleet via global queue saturation.
TP2 testing via fuzz harness	Trust the textbook proofs; ship; hit TP2 violations in prod with rare op sequences.	Property-based test (Hypothesis, fast-check): generate random op sequences, verify convergence on all permutations.	Google Wave's TP2 bug existed for years. Property tests catch it in CI.
Optimistic UI with reconciliation queue	Apply local op, wait for ack, then render; UX is laggy on high-RTT connections.	Apply local op immediately; queue pending ops; on ack, reconcile any server-side transforms.	Google Docs feels native because of this; absence of this pattern makes OT systems feel laggy by default.

Differential Sync

Choose when sync intervals and fuzzy patches are acceptable, and repairing drift is simpler than modeling every operation.

Pattern	What Most Teams Do Wrong	The Better Way	Why It Matters
Use diff-sync only for opaque text	Apply diff-sync to structured JSON; structural changes corrupt the data on merge.	Use diff-sync only on truly opaque text streams; use CRDT/OT for structured data.	This is the most common diff-sync mistake; it's a category error that corrupts production data.
Bound sync intervals to UX cadence	Try to make diff-sync feel real-time with 100ms intervals; saturate bandwidth and hub.	Use 1-5s intervals; pair with optimistic local UI so it feels instant.	Diff-sync at 100ms intervals defeats its own simplicity advantage; you're paying for OT-like overhead without OT's guarantees.
Shadow eviction policy	Keep shadows for every peer forever; hub memory grows unboundedly.	LRU evict shadows on inactivity; force full re-sync on reconnect of long-idle peers.	Without eviction, hub OOMs on long-tail user base.
Use diff-match-patch for client-side merge only	Build a sync engine on diff-match-patch and run it for real-time collab; corruption ensues.	Use diff-match-patch as a one-shot merge tool when integrating user-submitted patches with shifted context.	diff-match-patch is brilliant at its real job (CodeMirror suggestion application); use it there, not as a sync engine.
Detect divergence with checksums	Trust that sync converges; users report "my doc looks different on my phone" with no detection.	Exchange Merkle / rolling checksums of shadow + live; force full re-sync on mismatch.	Diff-sync silently diverges; without checksums, you find out from support tickets.

Event Sourcing

A strong fit for audit/replay/time-travel domains where writes are immutable facts and projections own conflict behavior.

Pattern	What Most Teams Do Wrong	The Better Way	Why It Matters
Outbox pattern for side effects	Send emails / call APIs directly from event handlers; re-runs duplicate them.	Write side-effect intent to an outbox table in the same transaction; separate worker processes idempotently.	This is the single most important pattern in ES. Without it, every re-run risks customer-facing duplicate actions.
Snapshot strategy from day one	Defer snapshots until aggregates have 10K+ events and load is broken; emergency retrofitting.	Snapshot every N events (commonly 100-500); store with strict version compatibility.	Retrofitting snapshots into a running system requires coordinated migration; design it in upfront.
Public vs internal event split	Publish internal events to external consumers via CDC; renames break consumers.	Translate internal events into a stable public event contract; never expose internal aggregates directly.	Refactoring your domain model becomes a public API break otherwise. Anti-corruption layer is mandatory.
Idempotency keys on commands	Re-submission of a command (network retry, replay) processes it twice.	Every command carries a unique idempotency key; command handler de-dupes before applying.	Without this, "submit order" duplicated under network blip becomes a duplicate charge; outbox + idempotency together are the safe pattern.
Versioned event schemas + upcasters	Modify old events in place to fit new schema; break audit and replay.	Old events stay immutable; upcasters transform v1→v2→v3 on read.	Events are forever. Any in-place mutation violates the core promise of event sourcing.
Aggregate sizing for contention	Model "Order" as a single aggregate including all items; black-friday writes contend on one stream.	Split into OrderHeader + OrderItem aggregates; per-item ops don't contend with header ops.	This is the most common ES scaling failure; revisit aggregate boundaries early when you see retry storms.

Three-Way Merge

Default for async branching when surfacing conflicts to humans is safer than silently inventing an automatic merge.

Pattern	What Most Teams Do Wrong	The Better Way	Why It Matters
Trunk-based development	Long-lived feature branches; merge becomes a multi-day archaeology project.	Branches live hours-to-days; merge frequently into trunk; feature flags hide incomplete work.	3-way merge works best when used least; the right answer is to minimize divergence.
Conflict resolution via review, not gut	Engineer resolves merge conflict solo; ships subtle semantic bug.	Conflicts require a second reviewer; treat merge resolution as a reviewable change.	Merge conflicts are the highest-bug-density events in software. Treat them with corresponding rigor.
Pre-commit auto-formatting	Whitespace and import-order conflicts dominate merge conflicts; signal-to-noise destroyed.	Enforce formatter on every commit; conflicts then only on semantic changes.	Without auto-format, 80% of merge conflicts are spurious; signal lost in noise.
Rerere for repeated conflicts	Resolve the same conflict on a rebase 10 times during a long-lived branch.	Enable git rerere; resolutions are cached and replayed automatically.	Hidden Git feature that saves hours on rebase-heavy workflows; almost nobody uses it.
Squash vs merge policy by branch type	One global policy; lose history granularity or pollute trunk with WIP commits.	Squash feature branches into trunk; merge commits for release branches to preserve history.	Distinguishing "workflow commits" from "history commits" matters at scale; treat squash as a UX choice.

8. Advanced / Next-Gen Alternatives

Emerging successors, adjacent technology that does it better for specific cases, and patterns that obviate the original need.

CRDTs

Use when offline edits and multi-master convergence matter more than strict global invariants or minimal metadata.

Successor / Alternative	What It Improves	Maturity	Migration Cost	When To Consider
Loro (Fugue text CRDT)	Eliminates RGA interleaving anomalies in concurrent text; better perf than Yjs on some benchmarks.	EMERGING	High; different API surface, encoding format incompatible with Yjs.	When you're starting fresh and want best-in-class text + tree CRDT, willing to bet on pre-1.0 maturity.
Automerge 2 (Rust core)	5-100x perf improvement over Automerge 1.x; closes the gap with Yjs on size and speed.	PRODUCTION	Medium; v1→v2 has migration path but data format changes.	When you need JSON-document semantics (vs text-first) and were burned by Automerge 1.x perf.
ElectricSQL / Replicache (sync layers atop Postgres)	Provides offline-first sync semantics with SQL queryability; CRDT under the hood, relational on top.	EMERGING	Low if greenfield; high if migrating from custom CRDT.	When you want CRDT benefits but also need to query the state with SQL (analytics, search).
Local-first sync engines (Liveblocks, Triplit, Jazz)	Managed CRDT-backed services; hosted relays, presence, persistence, auth.	EMERGING	Medium; integration work but no algorithm-level migration.	When you don't want to run Yjs y-websocket servers and need presence/auth out of the box.

Operational Transformation (OT)

Best when a central server can serialize document operations and preserving editor intent beats offline-first autonomy.

Successor / Alternative	What It Improves	Maturity	Migration Cost	When To Consider
CRDT (server-mediated)	Drops the TP2 problem; arbitrary concurrency converges by construction.	PRODUCTION	High; op model is fundamentally different.	When your OT system hits TP2 bugs in prod or you want to add offline support.
ShareDB / Json0 evolution	Modern OT-for-JSON with active maintenance; addresses most "OT for non-text" pain.	PRODUCTION	Low if already on OT; medium from scratch.	When you need OT specifically (server-authoritative, tight intent) but for structured docs.
Hybrid OT + CRDT	Use OT for character-level text inside fields, CRDT-style LWW for properties (Figma pattern).	PRODUCTION	High; requires re-architecting around dual conflict model.	When your domain has both rich text (needs OT) and structured objects (suits LWW); design-tool category.

Differential Sync

Choose when sync intervals and fuzzy patches are acceptable, and repairing drift is simpler than modeling every operation.

Successor / Alternative	What It Improves	Maturity	Migration Cost	When To Consider
Yjs / Automerge (CRDT replacement)	Adds formal convergence guarantees; offline support; structural awareness.	PRODUCTION	Medium; replace sync layer, keep UI; persistence schema changes.	When silent divergence is unacceptable and you need real concurrency guarantees.
ShareDB (OT replacement)	Tight intent preservation, no fuzzy patch matching; server-mediated.	PRODUCTION	High; op-based model is fundamentally different.	When your domain is text-heavy and silent merges produce visible bugs.
Operational sync engines (Replicache, Liveblocks)	Provides correctness guarantees with diff-sync-like simplicity.	EMERGING	Medium; framework adoption.	When the diff-sync simplicity was the appeal but the correctness wasn't sufficient.

Event Sourcing

A strong fit for audit/replay/time-travel domains where writes are immutable facts and projections own conflict behavior.

Successor / Alternative	What It Improves	Maturity	Migration Cost	When To Consider
CQRS without event sourcing	Separate read/write models without the immutable-log commitment; lower operational burden.	PRODUCTION	Low conceptually; high if already invested in event store.	When you need read-model flexibility but not audit/replay; common middle ground.
Change Data Capture (Debezium, AWS DMS)	Get event-log benefits from a traditional database without app-level event modeling.	PRODUCTION	Low; bolt onto existing CRUD systems.	When you want downstream stream integration without the design overhead of event sourcing; pragmatic choice for legacy.
Streaming databases (Materialize, RisingWave)	Built-in materialized views over event streams; projections are SQL, not application code.	EMERGING	Medium; integration work, paradigm shift for query teams.	When your projections are many and varied; streaming SQL beats N hand-coded projections.
Temporal / Restate / DBOS (durable execution)	Event sourcing for workflow state without the bespoke aggregate plumbing.	PRODUCTION	Medium; rethink workflow as functions.	When the event-sourced subsystem is mostly workflow / saga orchestration; let the framework own the log.

Three-Way Merge

Default for async branching when surfacing conflicts to humans is safer than silently inventing an automatic merge.

Successor / Alternative	What It Improves	Maturity	Migration Cost	When To Consider
Jujutsu (jj)	Cleaner conflict model; conflicts as first-class values that can be operated on; better rebase UX.	EMERGING	Low; jj is Git-compatible; coexists with existing repo.	When team UX with Git rebase is causing actual productivity loss; jj is the modern alternative.
Pijul (patch theory)	Commutative patches; merges are mathematically associative; no merge conflicts from reordering.	RESEARCH	High; entire paradigm shift, tooling immature.	Watch but don't bet. The math is beautiful; ecosystem is too small for production use today.
Sapling (Meta)	Monorepo scale; partial clone and lazy fetch by default; Phabricator-style stacked diffs.	PRODUCTION	Medium; Git-compatible mode exists; native mode requires migration.	When Git scale at monorepo level is the bottleneck (10M+ files, 10TB+ history).
Structural / AST-aware merge (mergiraf, GumTree)	Reduces spurious conflicts from whitespace/formatting; merges by syntax tree, not text lines.	EMERGING	Low; bolt onto existing Git via mergetool.	When team productivity is bleeding to formatter-vs-logic merge conflicts; high ROI bolt-on.
CRDT-based VCS (DVCS with no conflicts)	Eliminates conflict resolution entirely; trades human intent surfacing for auto-merge.	RESEARCH	High; paradigm shift.	When the conflict-surfacing property is genuinely not needed (rare for code); consider for prose / config.

Best default choices

Search and compare

1. Trade-Offs

CRDTs

Operational Transformation (OT)

Differential Sync

Event Sourcing

Three-Way Merge

2. Use Cases

CRDTs

Operational Transformation (OT)

Differential Sync

Event Sourcing

Three-Way Merge

3. Limitations

4. Fault Tolerance

5. Sharding (Scaling the Collaborative State)

6. Replication

7. Better Usage Patterns

CRDTs

Operational Transformation (OT)

Differential Sync

Event Sourcing

Three-Way Merge

8. Advanced / Next-Gen Alternatives

CRDTs

Operational Transformation (OT)

Differential Sync

Event Sourcing

Three-Way Merge