DynamoDB vs Cassandra vs PostgreSQL

Three databases that solve different problems and get mistakenly compared because they all store rows. Deep research edition.

Managed NoSQL KV Self-Managed Wide-Column Relational OLTP

As of 2026-05-16. Framing: "When would you pick each." Grounded in 2024-2025 post-mortems and recent benchmarks.

PE Verdict

Pick Postgres unless you have a specific reason not to. Pick DynamoDB when you want zero ops and your access patterns are known and frozen, accepting AWS regional dependency risk. Pick Cassandra only when you genuinely need active-active multi-region writes and have the ops headcount to operate it (most teams that pick Cassandra in 2026 should have picked ScyllaDB or DynamoDB instead, as Discord's 177-to-72 node migration demonstrated).

Headline Benchmark Numbers (2024-2025 Independent Tests)

  • DynamoDB: 4K-40K ops/sec range (highly variable by access pattern), single-digit ms p99 in steady state, ~$1.25 per million requests at typical sizing
  • Cassandra: 80K-106K ops/sec on mixed workloads (3-node baseline), p99 reads 40-125ms historical (Discord), p99 writes 5-70ms historical
  • PostgreSQL: ~16K ops/sec write-heavy OLTP single-node, ~1.8x MySQL on writes; modern hardware pushes 100K+ TPS on well-tuned r6i.32xlarge

Best default choices

1. Trade-Offs

DynamoDB

Use when access patterns are known, low-latency key-value lookups matter, AWS-native operations are preferred, and zero database ops is worth the modeling constraints.

#Trade-OffWhat You GainWhat You Give UpWhen It Bites YouPE Nuance
01Partition key drives all access patternsPredictable sub-10ms p99 reads at any scale, no query planner surprisesCannot do ad-hoc queries, every new access pattern requires a new GSI or tablePM asks for a new query shape post-launch and the answer is "we need a GSI or rebuild the table"Single-table design with overloaded composite keys is the escape, but most teams discover it after building 5 separate tables and now need a migration.
023000 RCU / 1000 WCU per-partition ceilingForces good key design, prevents one workload from starving the tableHot key kills you even if table-level capacity is unusedViral content concentrates all traffic on one partition key, you throttle at 5% of provisioned capacityAWS docs: adaptive capacity and split-for-heat take 5-10min to kick in; dropped writes during the lag windowThe 1000 WCU per-partition limit applies even in on-demand mode and is not raisable. Pre-shard via write-suffix pattern before viral moments, not during.
03Eventual consistency by default, strong reads cost 2xCheap reads at scale, ~5ms p99 with eventualRead-after-write surprises if you forget the ConsistentRead flag, plus 2x cost when you need itUser updates profile, refreshes page, sees stale data, files a bug. You add ConsistentRead everywhere and the bill doubles.GSIs are always eventually consistent, no flag will fix that. If you need read-after-write on a secondary access pattern, you need to query the base table.
04On-demand pricing vs provisionedZero capacity planning, scales to any spikeRoughly 7x the cost of saturated provisioned capacityWorkload becomes steady-state and you're paying 7x what you could be. Or stays spiky and provisioned would have throttled.The right answer is usually provisioned + auto-scaling once you know your traffic shape. On-demand is the right starting default and the wrong long-term choice.
05400KB item size hard limitForces externalization of blobs, keeps per-partition storage manageableCannot store medium-sized documents, must split or offload to S3Product launches a feature that stores a 1MB JSON blob and you're now juggling two-system consistency between DDB and S3Most teams paper over this with "store S3 key in DDB" but never solve the consistency problem (S3 write succeeds, DDB write fails, orphan blob). Build the cleanup job before you ship.
06No native joins, no cross-table transactions across regionsPredictable latency, no query planner to explain when things go slowJoins become application-level multi-get, transactions across Global Tables require app coordinationCompliance asks for a report across users + orders + refunds, and now you're scanning three tables and merging in LambdaTransactWriteItems works within a single region and table, up to 100 items. Cross-region needs sagas. This is the silent gotcha that derails most "let's go multi-region" projects.
07Vendor lock-in plus implicit regional dependency on us-east-1Tight integration with Lambda, IAM, CloudWatch, no ops burdenCannot move without rewriting data access layer; AWS control plane dependencies cascade through DDBOct 19-20, 2025: DNS race condition in DDB us-east-1 took down DDB regionally, then cascaded to EC2 (DWFM lease failures), Lambda, NLB. 15 hours of degraded service.Source: AWS post-mortem, Oct 2025. Race between DNS Planner and DNS Enactor wrote empty DNS record for dynamodb.us-east-1.amazonaws.com"Fully managed" doesn't mean "no failure modes you need to plan for." Most teams treat DDB as a black box and have zero runbook for regional DDB unavailability. Build the multi-region failover plan even if you don't need it today.
08Streams provide CDC, but with 24h retentionNative change capture into Lambda, Kinesis, EventBridgeIf your consumer falls behind 24 hours, data is goneLambda consumer fails silently overnight, you discover Monday morning and the changes are unrecoverableUse Kinesis Data Streams sink (longer retention) for anything important. The 24h limit is fine for triggers but never for critical pipelines.

Cassandra

Choose only when you need active-active multi-region writes, massive write scale, tunable consistency, and you have the operational depth to own compaction, repair, and clock-skew risk.

#Trade-OffWhat You GainWhat You Give UpWhen It Bites YouPE Nuance
01Tunable consistency per operation (ONE / QUORUM / ALL / LOCAL_*)Latency and consistency become per-query knobs, not architectural commitmentsEvery developer must understand R+W>RF formula, every team gets it wrong at least onceReads default to ONE, writes default to ONE, team thinks they have consistency, race conditions appear in productionQUORUM/QUORUM is the right default for most workloads. LOCAL_QUORUM for multi-DC. Anyone who codes against CL=ONE for writes is shipping a bug.
02Peer-to-peer with no leader, every node accepts writesNo failover step, no leader election lag, true active-active multi-DCLast-writer-wins via timestamps, silent data loss on clock skewNTP drifts on one node, writes from that node lose every conflict, days pass before someone noticesThis is the single biggest operational risk in Cassandra and it's not in any vendor pitch. Monitor clock skew like your job depends on it, because it does.
03LSM tree storage, write-optimized via commit log + memtableSub-ms writes, massive write throughput per nodeRead amplification (multiple SSTables), compaction is a permanent ops concernCompaction falls behind during ingest spikes, reads slow down, eventually disks fill upCompaction strategy choice (STCS vs LCS vs TWCS) is one of the highest-leverage decisions and most teams pick wrong. Time-series should be TWCS, no exceptions.
04JVM-based runtime with garbage collectionMature JVM ecosystem, broad tooling, well-understood operational modelGC pauses create p99 tail latency variance; "super long consecutive GC pauses" require manual node rebootDiscord 2022: p99 read latency 40-125ms, p99 writes 5-70ms, frequent on-call paging for GC-induced cluster instabilityDiscord migrated 177-node Cassandra cluster to 72-node ScyllaDB cluster (C++, no GC); p99 reads improved to 15ms, p99 writes to 5msIf you're picking Cassandra in 2026 over ScyllaDB, you should have a specific reason (ecosystem maturity, existing skills, multi-cloud) because the GC-pause tail latency tax is real and operationally expensive.
05Tombstones for deletes (gravestone records)Distributed deletes work eventually-consistently without coordinationReads must scan tombstones until compaction reaps them (gc_grace_seconds, default 10 days)Queue-like workload (insert, read, delete) tombstones reads to 10x slower as old tombstones pile upCassandra is famously the wrong choice for queue patterns. If your workload has a high delete:insert ratio, you're going to have a bad time, regardless of tuning. Discord's migration was held up by tombstone-heavy token ranges in the last 0.0001%.
06CQL looks like SQL but isn'tFamiliar surface for relational developers, fast onboardingFamiliarity is a trap, developers write Cassandra anti-patterns thinking they're writing SQLJOIN-less, GROUP BY-less, the first cross-table query someone tries fails and they redesign the schemaL7-level red flag in interviews: a candidate who treats CQL as SQL with limitations. Real signal: candidate who treats it as a key-value API with a SQL-shaped wrapper.
07Operational complexity is high (repair, compaction, GC tuning)Total control over performance, no managed-service ceilingNeeds 2-3 dedicated SREs minimum at production scaleYou hit a JVM GC pause issue at 50TB and the team has nobody who knows the JVM well enoughDiscord described their Cassandra ops as "high-toil" with "unpredictable latency and frequent on-call incidents" before migratingDataStax Astra or ScyllaDB Cloud removes most of this. If you're picking Cassandra in 2026 self-hosted, you're picking a 3-engineer operational commitment.
08Lightweight transactions (LWT) via PaxosCompare-and-set semantics on top of an eventually-consistent store4 round trips, ~10x slower than regular writes, scoped to single partitionDeveloper sprinkles LWT for "safety", throughput drops 80%, cluster CPU pegs at 100%LWT is a tactical escape valve, not a strategy. Cassandra 5.x ships Accord (paxos-derived strict-serializable transactions); watch this space if you've been blocked by transactions.

PostgreSQL

Default to Postgres for most product and OLTP systems where relational modeling, ACID transactions, SQL flexibility, extensions, and operational familiarity matter.

#Trade-OffWhat You GainWhat You Give UpWhen It Bites YouPE Nuance
01Single-leader replication, vertical scaling primaryStrong consistency, ACID transactions, no clock-skew failure modesWrite throughput capped by single primary, eventual cliff requires shardingWrite QPS approaches 30K and you're researching Citus on a deadlineModern hardware pushes the single-node ceiling much higher than most teams realize (100K+ TPS on a properly tuned r6i.32xlarge with NVMe). Don't shard until you've actually tuned.
02MVCC via row versions, no read locksReaders never block writers, writers never block readersVACUUM is permanent ops concern; XID wraparound is a documented production-killing failure modeMailchimp/Mandrill 2016: autovacuum fell behind on a busy shard, XID wraparound protection halted writes, ~40 hours of outage. Sentry 2024: same root cause, emergency manual vacuuming required.Postgres uses a 32-bit XID counter; database refuses writes at ~2.1B transactions to prevent corruption. Alert at 1.5B XID age.XID wraparound is the most under-appreciated failure mode in Postgres. It develops over weeks with no visible symptoms then takes the database offline completely. Set up the alerts before you need them.
03Streaming replication (physical) + logical replicationRead replicas for scale, logical for selective replication and major version upgradesPhysical replicas read-only and version-locked, logical has known caveats (DDL, sequences, large transactions)Logical replication breaks on a schema change and you're debugging replication slot growth at 3amFor major version upgrades, logical replication is the only zero-downtime path and it's still terrifying. Practice the cutover on a copy first.
04Connection-per-process modelStrong isolation, easy debugging per sessionEach connection costs ~10MB RAM, max_connections becomes a hard ceiling at ~500-1000Microservices proliferate, each maintains its own pool, you hit the connection wallPgBouncer in transaction mode is mandatory for any non-trivial deployment. Without it you're capping your architecture at a few hundred concurrent backend connections.
05Native partitioning, native sharding via Citus extensionTiered scaling story: partition first, shard later when neededSharding is a retrofit, query planner has no native distributed-query conceptCross-shard JOIN performance degrades to scatter-gather, application has to be sharding-awareCitus works when data partitions naturally on a tenant_id-like key and 95% of queries are tenant-scoped. It breaks on global aggregations. Plan the data model for sharding from day one if you suspect you'll need it.
06Extension ecosystem (PostGIS, pgvector, TimescaleDB, pg_partman)One database for OLTP, GIS, vector search, time-series, queuesEach extension is a maintenance commitment, version-coupling across extensionsPostgres major upgrade requires waiting for 4 extensions to support the new versionExtensions are Postgres's superpower and its hidden cost. The "use one tool" gain is real until you have 6 extensions, then upgrade cadence becomes the bottleneck.
07Synchronous replication available but optionalTrade durability for latency on a per-transaction basisDefault async means a failover can lose committed transactionsPrimary AZ fails, replica promoted, 30 seconds of writes are gone, audit team is unhappysynchronous_commit=remote_apply is the strongest setting and costs 2-5ms latency per write. For financial workloads it's non-negotiable. Most teams don't even know it exists.
08Index proliferation is cheap, except when it isn'tAdd indexes liberally to speed up queriesEach index slows down writes, increases bloat, fights for buffer cacheWrite performance silently degrades as the indexes-per-table count creeps to 12+Use pg_stat_user_indexes to find unused indexes quarterly. Most production systems carry 30%+ dead-weight indexes that nobody dares to drop because "they might be used."

2. Use Cases

DynamoDB

Use CaseCompany / ScenarioDriving PropertyScale DimensionWhy Not Alternative
Shopping cart and session storeAmazon retail (origin use case)Sub-10ms p99 with zero ops overhead across regionsTrillions of items, peak ~89M req/sec during Prime DayCassandra would need a dedicated SRE team; Postgres would not scale writes past single primary
Ad-tech bid storage and countersMid-size DSP, 200K QPS sustainedPredictable p99 under 15ms, no GC pauses50TB hot data, 3M writes/sec spike during traffic peaksCassandra GC tail latency at 99.9% would breach the 100ms bid-window SLO; Postgres caps below 1M writes/sec
Mobile app backend (user profiles, prefs, sync state)Snapchat-style, Tinder-style appsPer-user lookup latency, global tables for region affinity100M+ MAU, ~5K QPS per regionPostgres single-leader latency cross-region would force read replicas plus app-layer routing
IoT device state and telemetry ingestFleet management, ~1M devices reporting every 30sHigh write throughput with auto-scaling, TTL for retention~33K writes/sec sustained, time-series data with 90-day TTLPostgres TimescaleDB cheaper at low scale but doesn't auto-scale through traffic spikes
Serverless event-sourced workloadLambda + DDB Streams patternNative Lambda trigger, IAM integration, zero connection-pool concerns~10K events/sec with downstream fanoutPostgres would require RDS Proxy and connection limits would still cap concurrency
Gaming leaderboards and player stateMobile games, competitive titlesSingle-digit ms read on per-player profile lookup50M players, 200K concurrent at peakRedis loses persistence guarantees; Cassandra adds operational tax for no latency gain

Cassandra

Use CaseCompany / ScenarioDriving PropertyScale DimensionWhy Not Alternative
Time-series metrics and observability storageNetflix Atlas (1 trillion requests/day on Cassandra)Massive write throughput with linear horizontal scalingBillions of writes/day, multi-PB datasetsPostgres TimescaleDB caps below 1M inserts/sec; DDB cost prohibitive at this scale (~$1.25/M requests vs ~$0.0001/M on Cassandra hardware)
Multi-region active-active write workloadGlobal IoT platforms, sync-heavy mobile appsLocal-DC writes with no failover, async cross-DC convergence10+ regions, sub-50ms write latency in eachDDB Global Tables exist but lose to Cassandra on per-key conflict resolution flexibility; Postgres has no native active-active story
User activity / event log storage (historical context)Discord (177 nodes at peak before ScyllaDB migration)Append-heavy workload, channel-partitioned by Snowflake IDTrillions of messages, ~177 Cassandra nodes pre-migrationPostgres write ceiling; DDB cost; though Discord ultimately concluded ScyllaDB was the better trade-off
Recommendation system feature storeSpotify-style music streaming, news rankingWide-row reads (all features for a user) with low-latency lookup500M users, 1000+ features per user, 10K QPSPostgres row width and per-query latency wouldn't sustain it; DDB item-size limit (400KB) kills the wide-row pattern
Fraud detection feature aggregationPayment processors, large e-commerceReal-time aggregation across time windows with eventual consistency toleranceTens of TB, sub-100ms feature retrievalRedis loses persistence at scale; DDB throughput limits push cost too high

PostgreSQL

Use CaseCompany / ScenarioDriving PropertyScale DimensionWhy Not Alternative
Transactional system of recordMost B2B SaaS (Stripe hybrid, Notion, Linear)ACID transactions, mature ecosystem, schema flexibility10s of TB, sub-10K TPS sustained (well within single-node ceiling)DDB lacks cross-table transactions across regions; Cassandra has no real ACID story
Multi-tenant SaaS with tenant isolationNotion, Linear, Heroku, Citus customersPer-tenant query patterns, schema-per-tenant or RLS isolation10K-100K tenants, varied query shapes per tenantNoSQL forces denormalization that breaks tenant-isolated reporting; Citus extends Postgres for sharding when needed
Geospatial workloadsUber (early), delivery platforms, ride-sharingPostGIS extension provides production-grade spatial indexingMillions of geo-queries per minuteDDB lacks spatial primitives; Cassandra has no geo-index that's production-credible
Vector search for AI applicationsRAG pipelines, embedding storage, Pinecone alternativespgvector provides HNSW indexes, mature SQL surrounds it10M-100M embeddings with metadata filteringDedicated vector DBs are faster but force a second store and a sync problem
Analytics-adjacent OLTPInternal admin tools, reporting backendsWindow functions, CTEs, JSONB, full SQL for ad-hoc analysis1-10TB, complex queriesCassandra and DDB cannot do ad-hoc aggregation; warehouse is overkill for this scale
Job queue / task schedulingHatchet, Trigger.dev, Sidekiq alternativesSELECT ... FOR UPDATE SKIP LOCKED makes Postgres a credible queue, single source of truth10K-100K jobs/sec with retries and dead-letterDedicated brokers (Redis, SQS) lose the durability and queryability of Postgres-backed queues

3. Limitations

LimitationDynamoDBCassandraPostgreSQL
Max item / row size400KB hard limit High2GB partition recommended, ~100MB row practical Med1.6TB row, 1GB per field practical Med
Per-key throughput ceiling3000 RCU / 1000 WCU per partition (not raisable, even on-demand) CriticalPer-node ceiling depends on hardware, no logical per-key cap MedSingle writer per row, no logical cap (single-node ceiling caps writes) Med
Query flexibilityOnly by partition key + sort key range CriticalOnly by partition key + clustering key range, no joins HighFull SQL, only limited by query planner Med
Cross-key transactionsTransactWriteItems up to 100 items, single region only HighNo multi-partition transactions; LWT single-partition; Accord coming in 5.x CriticalFull ACID across the entire database Med
Schema evolutionSchema-on-read, but GSI changes are slow MedALTER TABLE works but can be expensive at scale HighOnline DDL works, some operations still lock Med
Backup and restorePITR up to 35 days, full restore is slow MedSnapshot-based, ops-heavy, no PITR native HighWAL-based PITR, mature tooling Med
Multi-region write storyGlobal Tables, LWW conflict resolution MedNative active-active multi-DC MedNo native active-active, requires BDR or external tools Critical
Cost predictability at scaleSpike-driven cost cliffs on on-demand (~$1.25/M requests) HighHardware + ops headcount fixed cost (~$0.0001/M on equivalent hw) MedCompute + storage fixed cost Med
Connection scalingHTTP-based, no connection pool needed MedNative driver pool, ~10K connections per node MedProcess-per-connection caps at ~500-1000 without PgBouncer High
Documented catastrophic failure modeRegional DNS race condition (Oct 2025 us-east-1, ~15h outage) CriticalClock skew silent data loss; GC pauses requiring manual node reboot HighXID wraparound (Sentry 2024, Mailchimp 2016 ~40h outage) Critical

4. Fault Tolerance

Post-Mortem Grounding Cells reflect documented failure modes from AWS Oct 2025 post-mortem (DDB DNS race), Discord 2022 migration (Cassandra GC + tail latency), and Sentry 2024 / Mailchimp 2016 (Postgres XID wraparound). These are the failure paths PE-level interviews actually probe.
DimensionDynamoDBCassandraPostgreSQL
Replication model3x sync across 3 AZs, leader-based, automaticTunable RF (typically 3), peer-to-peer, no leaderSingle primary, async streaming replicas (sync optional)
Failure detectionAWS control plane, sub-30s (when control plane is healthy)Gossip protocol, ~10-30s detectionExternal (Patroni, repmgr, RDS), 30-60s typical
Failover mechanismAutomatic, transparent to clientNo failover needed (no leader), client retries to next replicaExternal orchestrator promotes a replica, DNS/proxy update
RTO (typical)Sub-second to client, ~60s control-plane recovery (but see Oct 2025: ~3h DDB recovery, ~15h full ecosystem)Sub-second (just retry to another coordinator)30-120s for managed (RDS), 5-30s for tuned Patroni setups
RPO (typical)0 for AZ failure (sync replication)0 with CL=ALL writes, otherwise potential loss on coordinator failure pre-replication0 with synchronous_commit, seconds with async (default)
Split-brain behaviorPrevented by AWS control plane quorumPossible during partition; LWW resolves on heal, can lose data via clock skewPossible if failover is misconfigured; STONITH or fencing required
Blast radius of single-node failureSingle partition unavailable ~30-60s, no data loss~RF replicas absorb load, no client-visible impact at QUORUMIf primary: full write unavailability until promotion; if replica: degraded read capacity
Regional / control-plane failureDocumented: Oct 2025 DNS race condition took down DDB us-east-1 for ~3h, cascaded to EC2/Lambda/NLB for ~15h totalMulti-DC native; LOCAL_QUORUM keeps regional impact localNo native multi-region story; depends on tooling and orchestrator design
Cross-region failover storyGlobal Tables, active-active, no failover neededMulti-DC native, LOCAL_QUORUM keeps reads/writes localNo native active-active; requires logical replication or BDR
Silent data loss vectorOperator error on Global Tables (LWW resolution)Clock skew + LWW: drifting NTP loses every write conflict, undetected for daysXID wraparound: dev-week-scale silent buildup, then total write outage (Mailchimp ~40h)

5. Sharding

DimensionDynamoDBCassandraPostgreSQL
Sharding modelHash on partition key, automatic, hidden from userConsistent hash with vnodes (~256 per node), automaticNative: none. Citus: hash on distribution column. Native partitioning: range/list/hash on partition key (single-node)
Shard key constraintsSet at write, immutable, max 2KBSet at write, immutable, designed into table DDLCitus: chosen at table-create, immutable; partition key: any column
Rebalancing mechanismAutomatic split-for-heat and split-for-size, transparentManual or via repair tools, node add/remove triggers rebalanceCitus: shard_rebalancer extension, uses logical replication; native partitioning: manual
Rebalancing cost / impactTransparent, no client-visible impact (when control plane works)Network-heavy, can take days for large clusters, throttle-ableCitus rebalance can run online via logical replication; native partition move is manual
Hot-shard behaviorSplit-for-heat after 5-10 min sustained load (proven lag in flash spikes); pre-shard via write-suffixHot partition saturates one replica set (Discord's exact problem: channel-partitioned messages with skew); manual key redesign requiredHot shard saturates a worker node; queries can be redistributed via Citus rebalancer or app-level
Maximum shards (practical)Effectively unlimited, partitions scale with table size and throughput1000+ vnodes per cluster easily; clusters up to thousands of nodes documented (Netflix runs 1T req/day)Citus: hundreds of shards comfortable; tens of thousands possible but operationally heavy
Resharding without downtime?Automatic, always onlineYes via repair + decommission, but operationally complexCitus: yes via logical replication. Native: no, requires app-level migration
Cross-shard query supportScan (expensive, full-table) or GSI (with its own sharding)Coordinator scatter-gather, but anti-pattern; design tables to avoidCitus: parallel query planner does scatter-gather for SELECT; complex JOINs need co-location

6. Replication

DimensionDynamoDBCassandraPostgreSQL
Replication topologyLeader-follower per partition, single leaderLeaderless (Dynamo paper lineage), peer-to-peerSingle primary, cascading replicas supported
Sync vs asyncSync within region (3 AZs), async cross-region (Global Tables)Depends on consistency level: sync at QUORUM/ALL, async at ONEAsync by default; sync, remote_apply, remote_write configurable per transaction
Replication factor3, not configurableConfigurable, typically 3 per DC; per-keyspace settingConfigurable number of replicas, no theoretical cap
Consistency level optionsEventually consistent (default), strongly consistent (2x cost, in-region only, not on GSIs)ONE / QUORUM / LOCAL_QUORUM / ALL / SERIAL (LWT) — tunable per queryRead-your-writes via primary; replicas have configurable max staleness
Replication lag (typical)Single-digit ms within region, ~1s cross-region (Global Tables)Sub-second within DC, 100ms-seconds cross-DC depending on network10-100ms in-region async; less than 1ms sync remote_write
Conflict resolutionLast-writer-wins by timestamp (Global Tables)Last-writer-wins by timestamp; vulnerable to clock skew silent lossNo multi-master, so no conflict resolution; BDR extension adds LWW with column-level resolution
Cross-region replicationGlobal Tables: active-active, eventually consistentNative multi-DC with NetworkTopologyStrategyLogical replication across regions; no native active-active
Replication during partitionMinority AZ writes fail, majority continuesBoth sides accept writes at CL=ONE; conflicts resolved on heal (LWW)Async: writes continue on primary, replica falls behind. Sync: writes block
Replication failure modesControl-plane DNS dependency (Oct 2025 incident)Hinted handoff overflow under sustained replica down; tombstone tsunami on repairReplication slot growth (logical) fills disk; physical replica fall-behind

7. Better Usage Patterns

DynamoDB

PatternWhat Most Teams Do WrongThe Better WayWhy It Matters
Single-table designCreate one table per entity (Users, Orders, Products) like a relational schemaOne table with overloaded PK/SK and entity-type prefix (USER#123, ORDER#456)Co-located access patterns become single queries; new access patterns become key-design decisions, not migrations
Write sharding for hot keysUse the natural key and hope adaptive capacity catches up (it won't, in time)Suffix the write key with a random 1-N shard (user_id#shard_0..N), aggregate on readPre-shards distribute load before adaptive capacity has a chance to throttle; eliminates flash-spike risk (the 5-10min adaptive capacity lag is a documented kill window)
GSI projection designProject ALL on every GSI "just in case"Project only attributes the query uses, plus the keysGSIs store and bill independently; ALL projections double or triple your storage and write cost
Provisioned + auto-scaling vs on-demandStay on on-demand forever because "it just works"Move to provisioned with auto-scaling once traffic is predictable (after ~3 months)Provisioned saturated is 6-7x cheaper than on-demand for the same throughput; auto-scaling handles the variance
Conditional writes for idempotencyUse UpdateItem without conditions, rely on retries being idempotent at the app layerConditionExpression on every retryable write (attribute_not_exists, version-based)Prevents the double-write bug that appears once a year in production and ruins your week
Multi-region failure planningTreat managed = invulnerable, no runbook for regional DDB outageBuild read-failover to a second region via Global Tables; explicit chaos test against regional control-plane failureOct 2025 us-east-1 outage was a wake-up call. Most DDB customers had no plan and were down for hours waiting for AWS

Cassandra

PatternWhat Most Teams Do WrongThe Better WayWhy It Matters
Query-first table designDesign one normalized schema, then try to query it different waysOne table per query pattern, denormalize ruthlessly, write to all tables on updateCassandra has no query optimizer; the table layout IS the query plan. Wrong table design means linear scans or impossible queries
Compaction strategy selectionUse default STCS for everythingSTCS for general, LCS for read-heavy, TWCS for time-series with TTLWrong strategy causes 10x read amplification or storage bloat; TWCS for time-series is the difference between working and not
Clock synchronizationTrust the OS NTP defaultsRun chrony or dedicated NTP with sub-ms accuracy, monitor drift as an SLOLWW conflict resolution silently loses writes when clocks drift; this is the #1 silent data loss vector in Cassandra
Consistency level selectionUse ONE/ONE for performance "because Cassandra is eventually consistent"QUORUM/QUORUM as default, LOCAL_QUORUM in multi-DC, drop to ONE only with explicit reasonR+W>RF is the correctness threshold; teams that don't enforce it ship race conditions to production
Tombstone managementUse Cassandra as a queue (insert, read, delete pattern)Use TTL for transient data, design tables append-only where possible, monitor tombstone countTombstone scans degrade reads exponentially; Discord's migration was held up at 99.9999% by exactly this issue
Request coalescing for hot partitionsLet multiple users requesting the same data hit the database independentlyAdd a request coalescing layer (Discord built theirs in Rust) — many users for the same key share one DB readHot partition reads multiplied by user concurrency is what killed Discord's Cassandra cluster; coalescing reduces effective fanout

PostgreSQL

PatternWhat Most Teams Do WrongThe Better WayWhy It Matters
Connection poolingEach app/service manages its own pool, hit max_connections wallPgBouncer in transaction mode in front of every Postgres deploymentPostgres caps at ~500-1000 connections without it; PgBouncer multiplexes thousands of clients onto tens of backend connections
VACUUM and autovacuum tuningLeave autovacuum at defaults, blame Postgres when things slow downTune autovacuum_vacuum_scale_factor per-table for high-churn tables; monitor pg_stat_user_tables.n_dead_tup and XID age aggressivelyDefault autovacuum settings are conservative; high-churn tables accumulate dead tuples faster than vacuum runs. Mailchimp's ~40h outage and Sentry's emergency vacuuming both traced to this pattern
XID wraparound monitoringTrust autovacuum to handle it, alert on slow queries insteadAlert on SELECT max(age(datfrozenxid)) exceeding 1.5B (well before the 2B emergency threshold)XID wraparound develops over days/weeks with zero symptoms then halts all writes. This single alert prevents the worst Postgres failure mode
Index strategyAdd indexes for every slow query, never remove anyQuarterly pg_stat_user_indexes audit, drop indexes with zero scans, use partial indexes for selective conditionsEach unused index slows down every write and bloats the buffer cache; typical prod systems carry 30%+ dead-weight indexes
Long-running transactionsWrap big batch jobs in a single transaction "for atomicity"Chunk into smaller transactions with checkpoint table; use advisory locks for coordinationLong transactions block VACUUM from cleaning dead tuples, cause bloat to spiral, accelerate XID age. Duffel's 2021 outage traced to exactly this pattern (DDL statements without timeouts)
Replica usage for readsSend all reads to primary "for consistency"Route reads to replicas with explicit staleness tolerance; primary handles writes and strongly-consistent readsRead-replica routing typically removes 60-80% of primary load; teams that don't do this scale vertically until they can't

8. Advanced / Next-Gen Alternatives

DynamoDB

Successor / AlternativeWhat It ImprovesMaturityMigration CostWhen To Consider
ScyllaDB / ScyllaDB Cloud (Alternator)Wire-compatible DDB API at lower cost, no per-partition WCU ceiling, multi-cloudProduction at scaleLow for basic API, high if using Streams or TransactionsCost at scale becomes prohibitive ($50K+/mo), regional dependency risk became unacceptable post-Oct-2025, or multi-cloud mandate
Aurora DSQLStrongly consistent multi-region writes with SQL surface and ACIDGA but youngHigh — full schema rewrite, SQL access patternsYou need multi-region writes AND ACID transactions, and can wait out the maturity curve
FoundationDBStrict serializability with multi-key transactions across the entire datasetProduction (Apple iCloud, Snowflake)Very high — different data model, requires layer designCorrectness ceiling matters more than ops cost (financial, ledger, regulated workloads)
TigerBeetle (for financial)Strict serializable double-entry accounting at 1M+ TPSEarly productionVery high — purpose-built, not a general databaseYou're building a ledger / payment system and DDB's eventual consistency on indexes becomes a correctness problem

Cassandra

Successor / AlternativeWhat It ImprovesMaturityMigration CostWhen To Consider
ScyllaDBSame data model and CQL, 3-10x throughput per node, no JVM GC pauses (Discord: 177→72 nodes, p99 reads 40-125ms→15ms)Production at scale (Discord, Numberly)Low — wire-compatible with Cassandra driverTail latency or operational cost of Cassandra has become unacceptable, you want to keep the data model. This is the default modern answer for self-hosted wide-column.
Cassandra 5.x with Accord (transactions)Native multi-partition strict-serializable transactions via Accord protocolEmerging (5.x GA, Accord ongoing)None if you're already on CassandraYou want Cassandra's scale but have been blocked by lack of multi-partition transactions
YugabyteDB / CockroachDBStrongly consistent distributed SQL with global transactionsProductionHigh — different query model (SQL vs CQL), strong vs eventual consistency reshapes app logicYou need horizontal scale AND ACID AND SQL — Cassandra was the wrong original choice
DynamoDB (managed)Removes the entire ops burden (compaction, repair, GC tuning)ProductionHigh — different API, vendor lock-in, regional dependency riskYou picked Cassandra for scale but the 3-SRE ops cost exceeds the value; you're already AWS-native and accept the trade-off

PostgreSQL

Successor / AlternativeWhat It ImprovesMaturityMigration CostWhen To Consider
Citus (extension)Horizontal sharding for Postgres, keeps SQL surfaceProduction at scaleMedium — requires distribution-key design, some query rewritesSingle-primary write ceiling reached, data partitions naturally by tenant or workspace
Neon / Aurora serverlessSeparates compute from storage, instant branching, auto-scaleProductionLow — wire-compatible PostgresVariable workload, dev/prod parity matters, want to scale compute independent of storage
CockroachDB / YugabyteDBHorizontal scale with strong consistency, Postgres-wire-compatibleProductionMedium — wire compat but different operational model and some SQL features differNeed active-active multi-region with strong consistency, Postgres can't provide it natively
OrioleDB / Postgres 17+ undo-log storageEliminates VACUUM via undo-log MVCC, reduces bloat, eliminates XID wraparound riskEmergingLow at maturity — Postgres-compatibleWatch this space; if it lands stable, the entire VACUUM tuning burden becomes optional, not mandatory. This is the most exciting Postgres direction in a decade.