Caching Stack Trade-Offs: In-Memory + HTTP/Edge

Two layers of caching, one analysis. In-memory data caches sit between app and database. HTTP/edge caches sit between user and origin. They compose, they do not compete. This artifact compares them honestly within each layer.

Caching L6/L7 Depth

As of 2026-06-08

PE Verdict

In-memory layer: Redis won the API war but lost the license war. Valkey is now the default open-source choice for new builds; Redis 8 (AGPLv3) is fine for AGPL-compatible workloads or paying customers; Memcached remains the right answer when your cache is genuinely just a hash map and you want multithreaded raw throughput; KeyDB is on borrowed time post-Valkey momentum.

HTTP/edge layer: CloudFront wins when you are AWS-deep and need integration; Cloudflare wins on developer experience and reach (with single-vendor blast-radius risk demonstrated again on 2025-11-18); Fastly wins when you need VCL-grade programmability and instant purge; Varnish wins when you need to own the cache layer; NGINX is the right answer when caching is a side-effect of being a reverse proxy, not the main job.

Best default choices

Cross-Layer Overview

Caching is not one decision; it is three to five decisions stacked in series. In-memory caches (Group A) hold hot data near the app process; HTTP/edge caches (Group B) hold rendered responses near the user. Most production stacks run both layers. Mixing them up in design review is the most common L6 interview tell.

User Browser/App GROUP B — HTTP / EDGE CACHE CDN / Reverse Proxy CloudFront Cloudflare Fastly Varnish NGINX App Origin GROUP A — IN-MEMORY CACHE Key-Value Store / Side-Cache Redis 8 (AGPLv3) Valkey 9 (BSD) Memcached KeyDB Database System of record ~10-50ms ~50-200ms <1ms 5-50ms User cache-miss path: User → Edge cache → App → In-memory cache → DB. Hit at any layer short-circuits the rest.

Layer Characteristics

Dimension Group A — In-Memory Cache Group B — HTTP / Edge Cache
What gets cachedApplication objects, sessions, computed results, leaderboardsHTTP responses, static assets, API responses with cache headers
Cache keyApplication-defined string keyURL plus selected headers / query parameters (Vary)
Hit latency targetSub-millisecond p99 in same VPC5-30ms p99 from user (closest POP)
Capacity ceilingLimited by node RAM (cluster scales out, expensive)Effectively unlimited at provider; per-POP disk-backed
Invalidation modelExplicit delete or TTL, fully under app controlTTL plus purge API, often eventual across POPs
Failure mode if cache diesThundering herd onto database, cascading failure within secondsOrigin must absorb full user traffic, often impossible at scale
Who owns the boxYou (self-hosted) or managed service (ElastiCache, MemoryDB)Almost always provider-managed; Varnish/NGINX are the self-host exceptions
Cost driverMemory GB-hours plus cross-AZ networkBandwidth GB egress plus per-request charges
Where it lives on the pathApp → cache → DB (server-side, behind app)User → cache → app (client-facing, in front of app)

Trade-Offs

One table per technology. Each row is a real trade: you gain X by giving up Y. Columns are sortable.

Group A — In-Memory Caches

Redis In-Memory

v8.0 GA May 2025, tri-licensed (RSALv2 / SSPLv1 / AGPLv3). Single-threaded command execution with threaded I/O.

Trade-OffWhat You GainWhat You Give UpWhen It Bites YouPE Nuance
Single-threaded core executionAtomic ops without locks, predictable per-op latency, lockless data structuresOne CPU core caps your peak throughput per shardHot key with O(N) commands (KEYS, large SUNIONSTORE) freezes the entire instanceRedis 6+ multi-threaded only the I/O layer, not command dispatch. If your bottleneck is CPU on command execution, scale by sharding, not by adding cores.
Rich data structuresLists, sets, sorted sets, streams, hashes, geo, vector sets (8.0)Schema decisions baked into key naming, hard to evolve without migrationYou build a sorted set leaderboard and later need range queries by a different field; only option is full rewriteThe richer the type, the more it becomes load-bearing schema. Treat key naming and structure choice as a one-way door.
Tri-license (AGPLv3 / RSALv2 / SSPLv1)OSI-approved option exists again as of v8.0AGPL copyleft creates legal review burden; many enterprises ban it outrightSecurity review or M&A due diligence flags AGPL; you migrate to Valkey under time pressurePick license at adoption, not in panic. If your legal team has even started writing a copyleft policy, pick Valkey now.
Persistence (RDB / AOF)Survive process restart, point-in-time snapshots, replay logDisk fsync cost, fork() RAM doubling during RDB snapshotLarge dataset (50GB+) on a small node, BGSAVE forks and OOM-kills the processIf you actually need durability, you do not want a cache; you want a database. Use AOF everysec as a recovery aid, not as a durability guarantee.
Cluster mode (16384 slots)Horizontal scale, automatic slot migration, gossip-based topologyMulti-key ops fail unless keys share a hash tag, no MULTI across slotsYou add cluster mode to an app that uses Lua scripts touching many keys; half of them break silentlyCluster mode is not a free upgrade. The application has to be cluster-aware. Many teams add it then regret it; benchmark single-instance with replicas first.
In-memory by designSub-millisecond reads, no disk in the hot path$/GB is 10-30x the cost of SSD storageDataset grows past planned size, eviction starts pruning live keys, hit rate cratersSet maxmemory and maxmemory-policy at provisioning time, not after the SEV. noeviction turns the cache into a write-failing surprise.
Pub/Sub plus Streams plus FunctionsOne process replaces a message broker, a cron, and a script runnerMixing concerns makes capacity planning and blast radius messyCache and pub/sub on the same cluster, pub/sub clients hit slow-consumer, cache latency spikesOperationally you should treat Redis-as-broker and Redis-as-cache as separate clusters. They share code but they do not share failure modes.
Vector sets (8.0 new data type)RAG and embedding workloads without a separate vector DBHNSW index sits in RAM, dataset size is gated by node memoryEmbeddings explode past 100M vectors at 1536 dims, RAM cost forces a real vector DB anywayVector sets are a fine bridge from prototype to production. They are not a substitute for Pinecone or pgvector at 10M+ vector scale.

Memcached In-Memory

BSD license, multithreaded, deliberately simple. Meta serves around 5 billion requests per second on Memcached.

Trade-OffWhat You GainWhat You Give UpWhen It Bites YouPE Nuance
Multithreaded architectureLinear scale-up across cores on a big node, simpler hardware-utilization storyNo atomicity beyond single-key ops, no scripting, no transactionsYou need compare-and-swap on multi-field record; CAS only works per-key, you build a distributed lock yourselfMemcached vertical scale is the cleanest of any cache. r6g.16xlarge with 64 vCPUs and 512GB RAM utilized cleanly is hard to beat for pure GET/SET throughput.
String values onlyServer stays simple, fast, lock-free hash tableApplication owns all serialization (JSON, MessagePack, Protobuf)Schema change in your serialized blob breaks every cached entry; no migration path other than full flushUse a version prefix in the cache key (v3:user:123) so a serialization change cycles cleanly without poison entries.
1 MB default value limitForces good cache hygiene, prevents one bad key from eating a slab classLarger blobs (rendered HTML pages, image bytes) need chunking or a different cacheCache-stampede mitigation logic that stores a "rendered" response works in dev, fails in prod once the page growsThe limit is tunable to 128MB via -I, but the slab allocator was designed for small objects. Pushing it past 1MB usually means you picked the wrong tool.
No persistence by designNo fork pauses, no fsync stalls, predictable steady-state latencyRestart equals total data loss, cold cache problem on every deployYou deploy a fleet-wide config change that restarts Memcached; database absorbs 100% of read traffic, falls overAlways have application-level warmup logic or rolling restart by node. Treat the cold-cache problem as a planned event, not a surprise.
LRU with slab classesPredictable memory footprint, no fragmentation problemsSlab calcification (one slab class fills, others sit empty)Workload shifts to mostly 4KB objects, slab class 6 evicts hot keys while slab class 4 has free spaceModern Memcached has slab_reassign and slab_automove, but most ops teams never enable them and quietly bleed hit rate.
No replication or clusteringStateless from the cluster's perspective, client-side sharding is trivial (consistent hashing)Node failure equals partition loss until cache refills from originOne node in a 20-node fleet dies; 5% of cache evaporates, your database P99 doubles for 30 minutesThe Memcached failure model assumes the database can serve the miss rate. That assumption is the entire architecture; verify it under load test.
250-byte key length capServer stays fast, predictable hash table behaviorLong composite keys (tenant-id plus user-id plus feature-flag plus locale) get truncated or hashedMulti-tenant SaaS where keys naturally grow long; you hash, then lose cache-key debuggabilityIf you have to hash the key client-side to fit, you have already lost the ability to inspect what is cached. Worth picking Redis for any key longer than ~150 bytes.

Valkey In-Memory

Linux Foundation fork of Redis 7.2.4 (March 2024). BSD-3-Clause. v9.0 GA October 2025 with billion-RPS cluster claims.

Trade-OffWhat You GainWhat You Give UpWhen It Bites YouPE Nuance
BSD license under Linux FoundationVendor-neutral governance, OSI-approved permissive, no AGPL contagionLess concentrated commercial development velocity than a single-vendor projectAn enterprise feature you needed (advanced security, RBAC) lands in Redis Enterprise firstThe governance argument is stronger than it looks. AWS, Google, Oracle, Ericsson, Huawei, Tencent contributing means feature flow is broad, not blocked on one company's roadmap.
Wire-protocol compatible with Redis 7.2Drop-in client replacement, all existing redis-cli, libraries, and Lua scripts workCompatibility is a moving target; Redis 8 features (vector sets) are not in Valkey yetYour team writes against Redis 8 vector sets, then a security policy forces Valkey migrationThe fork date is locked at Redis 7.2.4. Anything in Redis 7.4+ commercial or Redis 8 is either ported (slowly) to Valkey or never lands. Pin your client library to RESP3 features, not Redis-specific commands.
Enhanced I/O threading (8.0)Better multi-core utilization without breaking single-threaded execution modelConfiguration complexity, IO threads count tuning under loadYou set io-threads too high on a small instance, threads contend on shared queues, throughput drops below single-threaded baselineThreading helps the I/O layer only. If your bottleneck is CPU on EXPIRE scans or large MGETs, threading does not help; you still need to shard.
Atomic slot migration (9.0)Resharding moves entire slot atomically, simpler operationsYounger feature; Redis Cluster's slot-by-slot migration has more battle scarsYou hit a corner case during resharding under heavy write load; less community precedent for the fix9.0 is October 2025. For mission-critical clusters, stay on 8.1 LTS until 9.x has 6+ months of production miles outside the Linux Foundation labs.
Module ecosystem (JSON, Bloom)First-party modules contributed by AWS and Google, BSD-licensedSmaller than Redis Stack; RediSearch, RedisGraph not directly availableYou need full-text search; have to either port RediSearch (license complication) or run a separate search engineModule gap closes monthly. If your need is JSON or basic indexing, Valkey is fine; if it is RediSearch or Redis Graph, you are picking between Redis 8 AGPL or a different tool entirely.
Adopted by managed services fastElastiCache, MemoryDB, Memorystore, Aiven all support Valkey by mid-2025Vendor implementations differ subtly (TLS modes, IAM auth, snapshot formats)You migrate from ElastiCache Valkey to self-hosted Valkey; auth integration breaks, snapshot import failsThe cloud-vendor support is real but each one wraps Valkey differently. Treat managed Valkey as a different product from upstream Valkey for ops purposes.
Multiple databases in cluster mode (9.0)Logical tenant separation without separate clusters, cheaper multi-tenancyMulti-DB was historically a Redis foot-gun (clients confused, replication scope)You enable multi-DB cluster mode for tenant isolation, a client library bug routes cross-tenantTest the client library you use against this feature explicitly. Many clients made assumptions in single-DB cluster mode that break here.

KeyDB In-Memory

Multithreaded Redis fork (2019), acquired by Snap May 2022. Runs critical Snap infra; no commercial product or paid support.

Trade-OffWhat You GainWhat You Give UpWhen It Bites YouPE Nuance
Fully multithreaded command executionMulti-core scale on a single node, often 2-4x throughput vs Redis at same hardwareSpinlock-based concurrency; tuning server-threads wrong is worse than single-threadedYou set threads to match cores on a noisy-neighbor VM; spinlocks burn CPU, throughput collapsesKeyDB explicitly requires exclusive use of assigned cores. On Kubernetes with CPU limits, you will not get the documented numbers.
Active-active replicationMulti-master, both nodes accept writes, lower failover RTOLast-write-wins conflict resolution; no CRDTs, no causal orderingTwo writers in two regions update the same key concurrently; one of the writes is silently lostActive-active is convenient until you have a write conflict. Most teams reach for it for cross-AZ failover but actually need a leader-follower with fast promotion.
Snap is the sole stewardReal production validation at Snap scale, code is openRoadmap is driven by Snap's needs, not the broader communityYou need a feature Snap does not; PR sits for months with no reviewSnap was listed as a Valkey contributor when the fork launched, signaling strategic ambiguity. Risk: KeyDB receives only Snap-internal fixes going forward.
Flash storage tier (FLASH)Spill less-hot data to NVMe, dataset sizes beyond RAMFlash hits are 10-50x slower than RAM hits; tail latency degradesP99 latency for cold reads jumps from 0.5ms to 20ms after spillover; downstream timeouts triggerFlash tier is only useful if your access pattern is genuinely tiered hot vs cold. For uniform-access workloads, it just adds variance without saving cost.
Redis API compatibilityDrop-in for most existing Redis applicationsDiverges from Redis 7+; vector sets, newer modules not availableYou depend on a Redis 7.4 command, only to find KeyDB is locked closer to Redis 6.2 semanticsThe compatibility window narrows each year. Use INFO to see actual Redis-compatibility version reported; do not trust the docs.
Subscriber-publisher fanout at scaleSnap battle-tested at extreme pub/sub fanoutLess community pub/sub tooling than RedisYou need to debug a slow subscriber under load; troubleshooting docs are sparser than RedisThe Snap engineering blog has solid pub/sub posts. For non-Snap pub/sub patterns, you are reading source code.
No commercial backing for supportFree, no license cost, no vendor lock-inNo SLA, no paid escalation, no enterprise patching pipelineYou hit a production data-loss bug on a Saturday; only recourse is GitHub issuesFor mission-critical workloads, the lack of a paid escalation path is the most under-discussed risk. Enterprise procurement will surface this in security review.
Group B — HTTP / Edge Caches

Amazon CloudFront Edge / CDN

~1600 edge locations, integrated with Lambda@Edge and CloudFront Functions. Origin Shield, real-time logs.

Trade-OffWhat You GainWhat You Give UpWhen It Bites YouPE Nuance
Deep AWS integrationIAM-native, S3 OAC, Lambda@Edge, WAF, Shield, signed URLs, KMSVendor lock-in; moving off CloudFront means rebuilding edge logicMulti-cloud mandate hits, CloudFront-only features (OAC, Lambda@Edge) have no obvious portThe integration is the moat. If your origin is S3 + ALB + Lambda + WAF + Shield, CloudFront is roughly free engineering. If your origin is on GCP, you are paying for an awkward seam.
Two edge runtimes (Functions vs Lambda@Edge)CloudFront Functions for sub-ms header rewrites; Lambda@Edge for full Node/Python with AWS SDKTwo mental models, two deployment paths, two billing modelsYou start in Lambda@Edge for everything; bill hits $10K+/month for header rewrites that belong in FunctionsThe cost difference is real: Functions at $0.10 per million requests vs Lambda@Edge at $0.60 plus duration. For 1B requests/month, that is $100 vs $600+. Default to Functions; escalate to Lambda@Edge only on demonstrated need.
Origin Shield ($0.0060/GB)Consolidates origin fetches to one regional cache, cuts origin load 50-90%Extra hop in the path, extra per-GB charge, single-region failure exposureOrigin Shield region degrades; your traffic that should hit the edge instead routes through a failing intermediaryOrigin Shield is right when your origin is expensive (Lambda, complex queries) and wrong when your origin is itself a CDN-ish thing (S3 with high cache hit). Calculate before enabling.
Tiered pricing by regionLower cost in NA/EU, predictable per-GBIndian and South American regions are 2-3x more expensive; bill becomes geography-dependentYou launch in APAC, your egress cost line per user is 3x what you modeled on NA-only test trafficThe "10 regions, 10 prices" model means cost projection requires real geo-distribution data. Get a 30-day sample of egress by region before committing.
Real-time logs to KinesisSub-minute log latency, custom field selection, replay to Athena/RedshiftExtra Kinesis costs, 1% sampling minimum, more pipelines to operateYou enable 100% sampling on a high-traffic distribution; Kinesis cost spikes, downstream consumer falls behindReal-time logs at 1-5% sampling for security analytics is the right default. 100% is for short debugging windows, not steady state.
Functions Key-Value StoreStateful edge logic (A/B tests, redirects) without origin round-tripEventually-consistent global propagation; small per-distribution limitsMarketing flips a feature flag; some users see it 30+ seconds before others, browser refresh shows inconsistencyKVS is roughly Cloudflare KV with worse limits. Useful for low-cardinality config, not for high-write workloads.
Anycast routing on AWS networkSame TLS termination as ALB, predictable AWS-internal performanceLess aggressive last-mile peering than Cloudflare or Fastly in some geosIndia users get worse RTT than your test from us-east-1 suggested; CloudFront cannot fix bad transitCloudFront's POP density is high but its peering is AWS-network-first. For markets with patchy AWS peering, Cloudflare and Fastly often win by 30-100ms.
Streaming and live videoMediaPackage, MediaTailor, native HLS/DASH support, low-latency HLSReinforces AWS lock-in for the entire media pipelineCost optimization moves you to a specialist CDN (BlazingCDN, Bunny); you have to rebuild signing and DRMFor pure media delivery at scale, hyperscaler CDNs are rarely the cost winner. CloudFront wins only when integration with the rest of your AWS media stack offsets the premium.

Cloudflare Edge / CDN

335 cities, 125+ countries. Workers (V8 isolates), KV, R2, D1, Durable Objects, Pages. Reported as fastest in ~48% of top eyeball networks.

Trade-OffWhat You GainWhat You Give UpWhen It Bites YouPE Nuance
V8 isolates instead of containersNear-zero cold start (5ms), runs in every POP, cheaper per requestJavaScript/TypeScript primary path; Wasm support exists but most idioms assume JSYou want native binary perf for image transforms; Wasm works but Lambda@Edge plus Node is sometimes simplerThe isolate model is genuinely a step ahead of container-based edge (Lambda@Edge). Cold-start cost is structural, not just optimization.
Unmetered bandwidth tierCost-predictable for static content, free CDN basics, "fair use"Section 2.8 ToS lets Cloudflare bump high-egress-non-website traffic to enterpriseYour video streaming app grows past "website-like" usage profile; sales calls you for an enterprise contractThe "unmetered" claim is real until your traffic looks like a media CDN. Read 2.8 before standardizing.
Single-vendor blast radiusOne config plane, one bill, one support callTheir bad day is your bad day, globally and uncorrelated with origin2025-11-18: a database permission change doubled a Bot Management config file, crashed proxy processes globally for hoursThis is the single largest production concern with Cloudflare. The 2025 incident sequence (Feb R2, June KV, August DDoS, November Bot Management) confirms blast radius is a recurring failure mode. Plan a CDN failover.
Workers KV (eventually consistent)Global key-value store, sub-5ms hot reads, persists across POPs1 write per second per key cap; eventual consistency on writes (up to 60s)You use KV for rate-limiting counters; writes get throttled, limits become advisory not enforcedKV is for read-heavy reference data, full stop. For counters use Durable Objects; for transactional state use D1 or your origin DB.
R2 (S3-compatible, zero egress)Free egress to internet, killer pricing vs S3 for high-egress workloadsS3 API compatibility is partial (some APIs, ACL semantics, multipart edge cases differ)You lift-and-shift from S3; a niche API call (S3 Select, Object Lambda) silently no-opsR2 is genuinely disruptive for egress-heavy use cases (media, model weights, software downloads). For S3-as-feature-platform, the gap matters more than the price.
Workers ecosystem (D1, Queues, Durable Objects)Build full apps at edge without a separate originEach product has its own limits, billing, and maturity curveYou build an app on Workers + D1 + Durable Objects; D1 hits its alpha-stage size limit, you replatformThe platform is real but the surface area changed faster than most teams' code refactoring budget allows. Treat as Cloudflare-native, not portable.
Cache API (in-POP, ephemeral)Local POP cache for any Worker response, separate from KVEach POP has independent cache; no global purge for Cache API entries by defaultYou cache personalized API responses by user-id; cache lives only in one POP, hit rate is much lower than expectedThe two-tier story (Cache API ephemeral, KV persistent) is powerful but the mental model is exactly inverted from CloudFront. Document the convention in your runbook.
Free DDoS protectionLayer 3/4/7 mitigation included on every plan, absorbed 3.8 Tbps in 2024Mitigation is opinionated; some "legitimate but unusual" patterns get rate-limitedYour monitoring scraper, load-test runner, or AI training crawler gets challenged or blockedThe defaults are right for 95% of sites. For API-only or B2B integrations, tune Bot Management rules; do not just trust the defaults.

Fastly Edge / CDN

Heavily customized Varnish 2.1 fork at core. ~150 Tbps capacity, fewer but bigger POPs. VCL programmable, Compute@Edge (Wasm/Lucet).

Trade-OffWhat You GainWhat You Give UpWhen It Bites YouPE Nuance
VCL programmabilityFull caching logic as code; complex routing, A/B, request enrichment at edgeVCL is a niche DSL; team has to learn a non-portable languageSenior engineer who owns VCL leaves; nobody can debug a misbehaving subroutineVCL is genuinely the most powerful per-request hook in any CDN. Worth the learning cost if you do real edge logic; overkill for static-only delivery.
Sub-150ms global instant purgeInvalidate cache by URL or surrogate key in milliseconds worldwidePremium price tier; instant purge is part of why Fastly costs more per GBYou move from Fastly to a cheaper CDN; suddenly your inventory update lag goes from 1s to 60s, business breaksInstant purge with surrogate keys is the killer feature for high-write workloads (news, sports scores, inventory). For static long-TTL content, you are overpaying.
Compute@Edge on Wasm (Lucet)Multi-language at edge (Rust, JS, Go, Wasm), strong isolation, fast cold startSmaller community than Cloudflare Workers; more "you are early" feelYou hit a Wasm runtime quirk; community answer is "open a support ticket"Compute@Edge is technically excellent. The reason Cloudflare Workers feels bigger is community size, not technical superiority.
Fewer, larger POPsHigher cache hit rates per POP, lower origin load, simpler debuggingGeographic coverage thinner than Cloudflare in long-tail marketsYou launch in Africa or smaller APAC markets; user latency is higher than expected vs CloudflareThe POP-density argument is more nuanced than it looks. Big POPs with good peering often outperform many small POPs with mediocre peering for top-100 cities.
2021 outage legacyPostmortem culture forced real investment in safer config rolloutReputation cost: that outage took down Reddit, NYT, Amazon for ~1 hourProcurement security review surfaces the 2021 incident; you have to explain mitigation in writingThe 2021 outage was a single-customer config error that propagated. Post-incident, Fastly added staged config rollout. Cloudflare's 2025-11-18 outage was structurally similar; this is a class of failure, not vendor-specific.
Real-time analytics and logsPer-second stats, syslog streaming to S3/Logentries, JSON feedsLogs and analytics priced separately; cost can sneak upYou enable full request-level logging; monthly bill grows faster than trafficFastly's logging is the best in class for debuggability. Use it; just sample once you are past the initial debugging phase.
Origin ShieldingConsolidate origin fetches to one shield POP, dramatically reduce origin RPSAdds a hop, single POP failure exposure for the shield pathShield POP has a bad day; origin sees its full pre-CDN traffic for the durationAlways pair Origin Shielding with multi-POP shield fallback if you cannot tolerate origin spikes. Often missed in initial configurations.

Varnish Cache Edge / Self-Hosted

Open-source HTTP accelerator. v7.7 current (BSD-2-Clause). VCL-driven, reverse proxy cache, designed for HTTP. Varnish Software offers Varnish Enterprise (proprietary).

Trade-OffWhat You GainWhat You Give UpWhen It Bites YouPE Nuance
VCL on your own hardwareTotal control over cache policy, can run in any data center or cloudYou operate the cache fleet: capacity, patching, monitoring, on-callCache fleet patching slips, a known CVE is unmitigated, security review failsSelf-hosting Varnish is right when latency or compliance forbids a SaaS edge. Otherwise you are reinventing what Fastly already runs at higher quality.
Built for HTTP, only HTTPBest-in-class HTTP cache semantics, ESI, range requests, conditional GETsTLS termination is via Hitch or another sidecar; not native in open-source VarnishYou try to use Varnish for TLS termination; discover you need to operate Hitch alongside, doubles ops surfaceThe "no TLS" stance is deliberate (HTTP is hard enough). In practice every production deploy has a TLS sidecar; budget for it.
VMOD extensibilityC-level extensions for custom logic; rich ecosystem of community modulesCustom VMODs are C code in your cache process; bugs crash VarnishA bad VMOD pointer dereference takes down your cache fleet during peak; rollback is not automaticTreat custom VMODs like kernel modules. Code review, fuzz test, canary deploy. Most outages "in Varnish" are actually in custom VMODs.
Single-machine ops modelNo clustering, no consensus, no shared state to corruptHigh availability is on you (LB in front, sticky sessions, hot standby)One Varnish node dies; LB does not detect fast enough; user gets origin direct for 30 seconds, origin falls overThe "no cluster" simplicity is genuinely an asset. Pair with an L4 LB that does proper health checks (not just TCP-up).
Grace, keep, and stale servingServe stale during origin outage, separate background fetch from deliveryVCL complexity grows quickly; grace logic is the second-biggest source of "why is this stale" ticketsGrace period is 5 minutes, you fix a bug at origin, users still see the bug for 5 minutes after deployGrace plus stale-while-revalidate is Varnish's killer feature for origin resilience. Use surrogate keys to purge by tag rather than relying on grace alone.
No native cluster purgeEach node is independent, predictableCluster-wide purge requires Varnish Broadcaster or similar add-onYou PURGE on one node; other 9 nodes still serve the stale; users see inconsistencyProduction deploys always need Varnish Broadcaster, custom HTTP-fanout, or Varnish Enterprise. Plan it in from day one.
Open source (BSD-2-Clause)No license cost, total transparency, large community, debuggableNo SLA, no commercial escalation; pay Varnish Software for enterprise supportCritical bug in 7.7, fix lands in 7.8; you have to upgrade an entire fleet to take itMost production Varnish fleets pay Varnish Software for either Enterprise or support. Pure community Varnish works but is for teams that genuinely own this code path.

NGINX Reverse Proxy / Cache

F5-owned (since 2019), BSD-2-Clause open source. Web server, reverse proxy, load balancer, mail proxy, HTTP cache. v1.30 stable (April 2026). NGINX Plus is the commercial tier.

Trade-OffWhat You GainWhat You Give UpWhen It Bites YouPE Nuance
Jack-of-all-tradesOne binary does TLS term, LB, cache, reverse proxy, web serverSpecialized features (instant cluster purge, VCL-grade programmability) are weakerYou need surrogate-key purge across 50 NGINX boxes; build it yourself with consul-template plus HTTP fanoutNGINX is the right answer when caching is one of many jobs the box does. It is the wrong answer when caching is the only job and you need top-end semantics.
proxy_cache on local diskDisk-backed cache survives restart, persists past RAM sizeCache directory format is internal, no easy cross-node sharing, no in-memory cache for hot keysSSD wears out faster than expected under high write rate; replacing it requires fleet rotationNGINX cache is functional, not specialized. For high-write-rate caches, mount on fast NVMe and tune proxy_cache_lock.
Configuration via directivesEasy to learn, plain text, version controlled, predictableComplex logic forces nested if/map/regex; no real programmability without Lua (OpenResty)You need conditional caching on a JWT claim; pure nginx.conf does not handle it, you reach for Lua, now you have two languagesIf you find yourself writing complex maps and embedded Lua, you have already outgrown plain NGINX. Either move logic upstream or switch to Varnish/Fastly.
F5 ownership (post-2019)Enterprise support pipeline, sustained investment in NGINX Plus featuresF5 prioritizes Plus features; open-source NGINX advances more conservativelyA feature you assumed was open-source (active health checks, dynamic config) turns out to be Plus-onlyThe Plus vs OSS gap is real and grew under F5. For production caching at any scale, NGINX Plus or OpenResty is usually the right pick.
Massive ecosystem and toolingLargest install base of any web server (35%+ market share)Stack Overflow answers often outdated or wrong; many "best practice" blog posts ignore modern featuresYou copy a config from a 2018 blog; it has a CVE-prone TLS cipher list, security scan failsAlways start from the F5 docs (docs.nginx.com), not Google. The ecosystem is huge but quality varies.
Multiple roles share one processLower memory and connection overhead than running separate boxes for LB/cache/TLSCache I/O contends with TLS handshake on the same worker; tuning is harderHeavy cache write load slows TLS handshakes; users perceive slow first-byte time, not a cache problemWorker-process tuning matters more than the docs suggest. worker_processes auto is rarely the right answer at scale; tune to physical cores and pin if needed.
No native distributed cachePer-node state is simple, no consensus, no replication bugsCluster-wide cache invalidation has to be built externallyYou deploy a config change; cache TTL is still 10 minutes; users see stale content while you waitPair NGINX with a separate cache purge fanout (RabbitMQ, NATS, HTTP broadcast). Production deploys almost always do; budget for it.

Use Cases

Real production scenarios per technology, with the driving property that picked it over alternatives.

Group A — In-Memory Caches

Redis

Use CaseCompany / ScenarioDriving PropertyScale DimensionWhy Not Alternative
Session storeTwitter/X session validationSub-ms lookup of session token to user context500M+ DAU, millions of session checks per secondMemcached lacks rich types for storing user-context blob; DB lookup is 10ms+ minimum
LeaderboardsStack Overflow reputation, game scoreboardsSorted sets with O(log N) ZADD and O(log N + M) ZRANGE10M+ scored entries, real-time updatesSQL ORDER BY plus LIMIT does not scale to millions of writes/sec; no other cache has native sorted set
Rate limitingAPI gateways, login attempts, abuse mitigationAtomic INCR with EXPIRE plus Lua for sliding window100K+ rate-checks/sec per regionDB-based rate limit is too slow; Memcached lacks atomic compound operations
Pub/Sub fanoutInternal notifications, chat presenceSub-ms publish to N subscribers, no broker setup10K+ concurrent subscribers per channelKafka is overkill for ephemeral fanout; RabbitMQ adds operational weight
Vector search (Redis 8)RAG prototypes, semantic cacheHNSW index in RAM, sub-ms approximate kNN1-10M vectors, 768-1536 dimpgvector is fine but adds a query latency dependency on Postgres availability; dedicated vector DBs are heavier to operate
Distributed locks (Redlock)Distributed cron, deduplication, electionSET NX EX as a one-line lock primitive10K+ lock acquisitions/secZooKeeper is heavier; etcd is fine but adds a separate dependency

Memcached

Use CaseCompany / ScenarioDriving PropertyScale DimensionWhy Not Alternative
Hyperscale object cacheMeta (Facebook) caches roughly 5B GETs/sec on Memcached at peakMultithreaded raw GET/SET throughput on huge nodesTrillions of cached objects, exabytes of in-memory dataRedis single-thread cap forces 10x more shards; pure GET/SET does not need Redis types
Database query cacheWordPress, Drupal at scale; MediaWikiOne-line "cache this query result" pattern, dirt-simple invalidationWikipedia-class read traffic, multi-TB cachedRedis adds operational weight unjustified by use case; you do not need persistence for a query cache
Rendered HTML / page fragment cachePinterest, Etsy product page fragments1KB-100KB blob storage, immune to fragmentation10K+ requests/sec/node, P99 under 1msRedis equivalent works but is slower per core; HTTP cache (Varnish) adds invalidation complexity
Hot dataset acceleration in front of slow storesHadoop/HBase fronted by Memcached for read-heavy workloadsPure GET path is dominant, no need for compound opsPetabyte HBase, hot working set in 1-10TB Memcached fleetRedis cluster cost-prohibitive at this scale; HBase block cache alone insufficient
Multi-tenant SaaS cache layerHeroku-style platforms exposing cache as a serviceStateless nodes, trivial horizontal scale, no replication concerns10K tenants per clusterRedis multi-tenancy is harder; no Pub/Sub or scripting noise across tenants

Valkey

Use CaseCompany / ScenarioDriving PropertyScale DimensionWhy Not Alternative
License-clean Redis migrationAWS migrating ElastiCache Redis OSS customers to Valkey by default in 2025BSD license satisfies enterprise IP reviewHundreds of thousands of managed customer fleetsRedis 8 AGPLv3 is unviable for many; Memcached lacks rich types
Cloud-vendor neutral cacheMulti-cloud SaaS (e.g., Aiven) offering Valkey across AWS/GCP/AzureSingle OSS SKU runs identically across hyperscalersThousands of provisioned clustersRedis OSS license depends on managed-service-prohibition clause; legally awkward across clouds
Linux distro default in-memory KVDebian/Ubuntu shipping Valkey as the redis-server replacementOSI-approved license required by Debian policyTens of millions of Linux installsRedis SSPL is incompatible with Debian's Free Software Guidelines
Billion-RPS cluster (Valkey 9.0)High-throughput SaaS infrastructure at scaleAtomic slot migration, multi-DB cluster mode, advanced threading1B+ requests/sec at cluster scaleMemcached lacks rich types and cluster mode; Redis Cluster has older slot-migration semantics
RDMA-accelerated low-latency workload (experimental)HPC and AI/ML workloads needing kernel-bypass networkingSub-100µs P99 over RDMA fabricMicrosecond-class P99 requirementsRedis OSS lacks RDMA support; Memcached has no roadmap for it

KeyDB

Use CaseCompany / ScenarioDriving PropertyScale DimensionWhy Not Alternative
Snap's internal caching infrastructureSnapchat backendMulti-threaded throughput on big nodes, in-house stewardSnap-scale daily traffic, billions of requestsInternal investment in KeyDB makes it cheaper than re-evaluating Redis or Valkey
Redis migration for multi-core utilizationTeams hitting single-threaded Redis ceiling but unwilling to shardDrop-in Redis API plus 2-4x throughput per node50K-200K ops/sec single-node ceiling raised to 200K-500KSharding adds app-side complexity; Memcached lacks types
Active-active geo-replicationMulti-region SaaS wanting fast cross-region failoverBoth regions accept writes; lower RTO than leader-follower promotion2-5 regions, last-write-wins acceptableRedis has no native active-active in OSS; Redis Enterprise CRDB is commercial
Flash-tier cache for cold dataDatasets larger than RAM, hot-cold ratio favorableRAM-priced perf for hot keys, NVMe-priced bulk for cold10TB+ working set on 1TB RAM nodesRedis OSS has no integrated flash tier; commercial Redis on Flash is the only Redis option
Cost-optimized Redis at small scaleStartups maximizing single-node performance before shardingFree, multi-threaded, no per-instance licensing1-3 large nodes, <1M ops/sec totalManaged Redis costs add up; Valkey works but is newer than KeyDB at the time of adoption
Group B — HTTP / Edge Caches

Amazon CloudFront

Use CaseCompany / ScenarioDriving PropertyScale DimensionWhy Not Alternative
S3-fronted static asset deliveryPrime Video, Disney+, every AWS-hosted siteS3 OAC plus signed URLs plus regional edge caches, all IAM-nativePB-scale storage, global CDN egressCloudflare/Fastly need separate S3 auth flow; integration friction
HLS/DASH video streamingPrime Video, FuboTV, Twitch event simulcastsMediaPackage + CloudFront low-latency HLS chainMillions of concurrent viewers, sub-3s glass-to-glassSpecialist CDNs are cheaper per GB but require rebuilding the entire media pipeline
API acceleration with WAF and ShieldBanking and fintech APIs (Capital One, Robinhood)WAF rules, Shield Advanced DDoS, KMS-encrypted log delivery, all integrated100K+ requests/sec API with strict complianceCloudflare WAF is good but requires duplicating security policy outside the AWS account boundary
Edge personalization via CloudFront FunctionsE-commerce A/B variant routing, geo-redirects, header normalizationSub-ms execution, $0.10/M requests, runs at every edge1B+ requests/month at minimal added costLambda@Edge for the same work costs 6x more; full Workers stack requires platform change
Origin Shield in front of regional ALBMulti-region ALB consolidating to single shield POPCuts origin RPS by 80%+ for cacheable workloads10K+ origin RPS reduced to under 2KCloudflare Tiered Caching achieves similar; not available in non-Cloudflare stacks

Cloudflare

Use CaseCompany / ScenarioDriving PropertyScale DimensionWhy Not Alternative
Free-tier CDN for personal/SMB sitesMillions of small sites, indie hackers, blogsUnmetered bandwidth, free TLS, free DDoSCloudflare claims to serve a large share of top eyeball networks fastestCloudFront has no free tier; Fastly has no real SMB plan
Workers-native edge appsDiscord (parts), Shopify (parts), AI inference proxiesV8 isolates, near-zero cold start, code runs in every POP10B+ requests/day, multi-regionLambda@Edge cold start is 10-100x worse; AWS region-bound vs Cloudflare's per-POP execution
R2 zero-egress object storageHugging Face model weights, large software downloads, video archivesS3 API plus zero egress cost to the internetPB-scale, high-egress, predictable billsS3 egress costs roughly $0.05-$0.09/GB to internet; R2 is genuinely free egress
DDoS mitigation as front doorSites that have been attacked, target-rich verticals (crypto, gambling)Layer 3/4/7 mitigation absorbed 3.8 Tbps in 2024 attackMulti-Tbps attack absorptionAWS Shield Advanced is comparable but more expensive per protected resource
Global Workers KV for feature flags / configsMobile app config delivery, feature flag systemsSub-5ms hot reads globally, eventual consistency acceptable1M+ reads/sec global; writes infrequentS3 + CloudFront is slower for writes; LaunchDarkly etc. add latency and cost

Fastly

Use CaseCompany / ScenarioDriving PropertyScale DimensionWhy Not Alternative
Real-time news, sports, financeThe New York Times, NPR, Reddit, Vimeo, Shopify storefrontsSub-150ms instant global purge by surrogate keyMillions of pages, high write rate, must update fastCloudFront purge takes minutes; Cloudflare cache purge is best-effort
E-commerce stock and pricing accuracyShopify Plus stores, GitHub Marketplace, ticketing sitesCache hit for the hot view, purge in ms when inventory changes10K+ purges/sec sustainedCDNs without instant purge force shorter TTL, which kills hit rate
API caching with VCLGitHub API in front of Rails, fintech APIs with per-tenant rulesVCL allows complex per-request caching logic at edgeBillions of API requests/dayCloudflare Workers can express it but VCL is closer to the cache, fewer abstraction layers
Compute@Edge for personalizationNews personalization, dynamic header injection, request enrichmentWasm runtime, multi-language, near-zero cold start10M+ personalized responses/day at edgeLambda@Edge has higher cold start; CloudFront Functions too limited for complex logic
Image and video on-the-fly transformationSpotify cover art delivery, news photo pipelinesEdge image optimization plus VCL routing to origin variantsPB-scale image bandwidth, sub-second purgeCloudFront image transforms are bolted on; lower control over caching keys

Varnish Cache

Use CaseCompany / ScenarioDriving PropertyScale DimensionWhy Not Alternative
On-prem reverse proxy cacheBanks and healthcare providers with PII not allowed off-networkVCL programmability without sending traffic to a SaaS CDN10-100K req/sec on-prem, regulated industrySaaS CDNs (Cloudflare, Fastly) need data egress; compliance does not allow
Origin shield for SaaS CDNSites running Varnish in their own data centers in front of Fastly or CloudflareExtra layer of cache outside SaaS CDN, customizable VCLMulti-tier cache, additional purge controlsSaaS-only deployment leaves no programmable cache layer under your control
News and media internal CDNMajor publishers (e.g., Wikimedia historically) running Varnish at edgeESI (Edge Side Includes), grace, surrogate keys all out of the boxWikimedia served much of Wikipedia from Varnish layers historicallyBuilding this on NGINX requires Lua or external coordination; building on Apache is a non-starter
API gateway-style cachingInternal APIs cached in front of slow servicesConditional GET, ETag, full HTTP semantics, programmable in VCL100K+ API RPS, low origin RPSNGINX proxy_cache is adequate but lacks ESI and grace semantics
Cache layer for legacy app modernizationStrangler pattern: Varnish in front of legacy monolithVCL routes some paths to new microservices, others to legacyMigration period: months to yearsNGINX can route but VCL's caching semantics during routing are more powerful

NGINX

Use CaseCompany / ScenarioDriving PropertyScale DimensionWhy Not Alternative
Reverse proxy with opportunistic cacheMost production microservice stacks, Kubernetes Ingress (Ingress NGINX)Single binary handles TLS, LB, cache, all of it10K+ RPS per node, multi-service routingVarnish does not do TLS natively; HAProxy lacks caching; running both is more ops surface
API gateway entry pointBanking APIs, Kong/Tyk built on top of NGINX/OpenRestyProgrammable routing plus rate-limit plus auth plus cache, all in one10K+ tenant APIs behind one NGINX fleetPure CDN does not handle internal API gateway needs; dedicated API gateways are more expensive
Cache in front of slow CMSWordPress sites using NGINX fastcgi_cacheDisk-backed cache reduces PHP-FPM load by 95%+1M+ pageviews/day on $20/mo VPSVarnish works but adds an operational layer for a workload NGINX already handles
TLS termination plus cache for staticSingle-tenant SaaS, internal portals, dev environmentsOne binary, configured via plain text, predictable behavior1-100 RPS per box, dozens of boxesCloudflare for free tier works but adds external dependency for internal use
Edge of a private CDNSelf-built CDN using NGINX boxes per geographic POPFull control over routing, caching, peering decisions10s of POPs, GB/sec aggregateSaaS CDN cheaper unless geo-presence requires it; Varnish lacks TLS without sidecar

Limitations

Per-technology limitation tables with severity badges and workaround cost.

Group A — In-Memory Caches

Redis

LimitationSeverityWorkaroundWorkaround Cost
Single-threaded command executionHighShard across cluster mode, or use Valkey/KeyDBApp-side cluster-awareness, possible Lua rewrites, hash-tag changes
AGPLv3 / RSALv2 / SSPLv1 licenseHighPay Redis Enterprise, or migrate to ValkeyEither ongoing license fee or migration project (weeks for ops, more for surface-area dependencies)
RAM is the ceilingHighRedis on Flash (commercial), or aggressive eviction policiesCommercial license; or hit rate degradation under allkeys-lru
BGSAVE forks the processMediumDisable RDB, rely on AOF only; or take snapshots on replicaReplica overhead; AOF-only has different recovery semantics
Cluster multi-key ops restrictedMediumHash tags ({tag}key1, {tag}key2) to colocate keysSchema design constraints; risk of hot slot if tag is too narrow
Lua scripts are blockingMediumKeep scripts short; use Functions (Redis 7+) for organized librariesForced discipline; long script equals cluster-wide slowdown

Memcached

LimitationSeverityWorkaroundWorkaround Cost
No persistenceHighTreat cache loss as a planned event; do rolling restartsApplication-level warmup logic; coordinated deploys
No replicationHighUse consistent hashing on the client; tolerate partition loss on node failureDatabase needs to be able to serve full miss rate during failure
1MB default value limitMediumChunk large values client-side; or raise -I to 128MB maxClient complexity; slab allocator inefficiency at large values
String values onlyMediumSerialize with MessagePack/Protobuf client-sideApp-side serialization complexity; schema migration is harder
250-byte key limitMediumHash long keys (SHA256) on the clientLose key debuggability; cannot use SCAN-like introspection
No atomic compound opsMediumUse CAS for single-key updates; design around per-key atomicityApp-side retry loops; cannot do multi-key transactions
Slab calcificationMediumEnable slab_reassign and slab_automoveBrief throughput dip during reassignment; needs tuning

Valkey

LimitationSeverityWorkaroundWorkaround Cost
Forked from Redis 7.2.4; missing Redis 8 featuresMediumWait for community port, or accept feature gapLose vector sets, newer modules; community port arrives months later
Smaller ecosystem of third-party toolsMediumUse Redis-compatible tools (RedisInsight, etc.)Some Redis-specific tooling assumes commercial features; minor compat seams
Multi-DB cluster mode (9.0) is newMediumStay on single-DB cluster mode until 9.x has milesLose the multi-tenancy benefit; harder logical separation
Module compatibility incompleteMediumStick to first-party modules (JSON, Bloom) plus core typesRediSearch, RedisTimeSeries equivalents need separate evaluation
RDMA support is experimentalMediumUse TCP only for production until RDMA hardensCannot exploit kernel-bypass perf yet
Same single-threaded core as RedisHighCluster mode for horizontal scaleSame cluster-aware client cost as Redis

KeyDB

LimitationSeverityWorkaroundWorkaround Cost
Snap is sole steward; uncertain roadmapCriticalHave a Valkey or Redis migration plan readyMigration project sitting in the backlog; license decision pre-made
No paid support / SLAHighSelf-support via GitHub, or migrate to a supported alternativeEngineering on-call burden grows; no escalation path on data-loss bugs
Diverging from Redis API over timeMediumPin to Redis commands available in 6.2 era; avoid post-7.0 commandsLose feature velocity from Redis ecosystem
Spinlock-based threading is tuning-sensitiveMediumPin CPUs, tune server-threads to physical coresHardware-aware deployment; harder in shared k8s environments
Active-active replication uses LWWHighDesign data model to avoid concurrent same-key writesApp-level partition-by-region or write-quorum logic
Flash tier adds tail-latency varianceMediumProfile workload; disable Flash if access is uniformLose cost savings of flash tier
Group B — HTTP / Edge Caches

Amazon CloudFront

LimitationSeverityWorkaroundWorkaround Cost
Lambda@Edge must deploy from us-east-1MediumAccept the constraint; treat us-east-1 as the deploy regionLambda@Edge availability tied to us-east-1 IAM; us-east-1 outages affect deploys
Purge propagation is minutes, not millisecondsHighUse short TTL and rely on revalidation; or use Origin Shield + invalidations sparinglyShort TTL kills cache hit rate; invalidations have request quota
Per-region pricing is opaqueMediumSample your real geo-distribution; price out by region tierEngineering time on cost modeling; surprise bills in expensive regions
CloudFront Functions limited (CPU, no network)MediumUse Lambda@Edge for complex logic; live with the cost difference6x cost per request, plus duration billing
Vendor lock-in to AWSMediumAbstract edge logic into portable Wasm modules where possibleEngineering effort to keep things portable; mostly aspirational at scale
Real-time logs require KinesisMediumUse 1% sampling at steady state; ramp during incidentsSampling means you miss low-frequency events at full fidelity

Cloudflare

LimitationSeverityWorkaroundWorkaround Cost
Single-vendor blast radius (Nov 2025 outage)CriticalMulti-CDN failover (Cloudflare + CloudFront or Fastly)Doubled CDN bill plus DNS-failover complexity; meaningful engineering project
KV: 1 write/sec per key capHighUse Durable Objects for high-write workloads; KV is read-heavy onlyDifferent mental model; Durable Object cost and availability characteristics differ
Workers limits: 50-128 MB memory, 30s CPUMediumDecompose into chained workers; offload heavy work to originCode complexity; worker-to-worker calls add latency
Cache API ephemeral per-POPMediumUse Tiered Cache or fall back to KV for shared cacheTiered Cache adds latency; KV is eventually consistent
ToS 2.8 unmetered-bandwidth ambiguityMediumPre-negotiate enterprise contract if traffic profile is non-websiteEnterprise contract pricing; loss of free-tier predictability
Bot Management can over-block legitimate trafficMediumTune bot rules; whitelist known crawlers and monitoringOngoing rule maintenance; risk of regression after default changes

Fastly

LimitationSeverityWorkaroundWorkaround Cost
Premium per-GB pricing vs CloudFront/CloudflareHighUse Fastly for dynamic content; offload static media to cheaper CDNMulti-CDN management; per-asset CDN routing decisions
Fewer POPs than CloudflareMediumCombine with another CDN in markets where Fastly is thinMulti-CDN routing logic and DNS
VCL is a niche languageMediumInvest in team VCL training; document patterns internallyHiring is harder; senior VCL engineers are rare
2021 outage reputationMediumDemonstrate post-incident config-rollout improvements during security reviewProcurement friction; need to write the explanation up
Real-time analytics priced separatelyMediumSample; use rolled-up metrics for steady stateReduced debuggability for low-frequency issues
Smaller free tier than CloudflareMediumUse Fastly's developer tier for evaluation; commit on productionNo "free forever" path; cost starts day one for production

Varnish Cache

LimitationSeverityWorkaroundWorkaround Cost
No native TLS terminationHighRun Hitch, HAProxy, or NGINX in frontDoubled ops surface; another process to monitor and patch
No cluster purgeHighUse Varnish Broadcaster or build HTTP-fanout; or pay for Varnish EnterpriseCustom infrastructure or commercial license
You operate the fleetHighUse a SaaS like Fastly (built on customized Varnish) insteadLose self-hosted control and customization depth
VCL is C-level powerful and C-level dangerousMediumStrict VCL review; canary every changeEngineering review process; harder to ship quickly
Custom VMODs are unsafe C extensionsHighStick to community-maintained VMODs; avoid bespoke C in cache processLose extensibility benefit that drove the choice to Varnish
No persistent cache across restart (in OSS)MediumUse Varnish Enterprise MSE (massive storage engine)Commercial license required

NGINX

LimitationSeverityWorkaroundWorkaround Cost
Caching is a feature, not the focusMediumUse Varnish or Fastly for cache-heavy workloadsExtra process or vendor in the stack
No real programmability without LuaMediumUse OpenResty for Lua; or NGINX Plus dynamic modulesEither Lua language overhead, or NGINX Plus license
No native cluster purgeHighBuild HTTP fanout for invalidation; or use NGINX Plus cache APICustom infrastructure; or commercial license
Some features Plus-only (active health, dynamic config)MediumPay for NGINX Plus, or use OpenResty/custom scriptingLicense cost; or engineering time
proxy_cache uses disk; flash wears under heavy writesMediumUse enterprise-grade NVMe; rotate disks in fleet maintenanceHardware cost; fleet management overhead
Worker tuning is non-obviousMediumTune worker_processes and worker_connections to actual cores and connection patternPerformance engineering time; load testing

Fault Tolerance

Per-group matrix tables. Rows are fault-tolerance dimensions; columns are technologies.

Group A — In-Memory Caches
DimensionRedisMemcachedValkeyKeyDB
Replication modelLeader-follower async (semi-sync via WAIT)None in OSS; repcached is third-partyLeader-follower async (RESP3-based)Active-active (multi-master) or leader-follower
Failure detectionSentinel quorum-based heartbeats; Cluster gossip in cluster modeClient-side only; no server detectionSame as Redis (Sentinel or Cluster gossip)Cluster gossip plus active-replication heartbeats
Failover mechanismSentinel-driven leader election; Cluster mode slot reassignmentClient rehashes; affected slots cold-missSame as RedisPromote replica or rely on active-active peer
RTO (typical)10-30s (Sentinel); 1-15s (Cluster failover)0s for survivors; cold-start time for replaced nodeSimilar to Redis; ~10-15sSub-second on active-active failover
RPO (typical)Async replication lag (ms-seconds); higher with AOF everysecFull loss of failed node's dataSame as RedisLast-write-wins may cause partial data inconsistency
Split-brain behaviorCluster quorum prevents most; Sentinel min-replicas-to-write helpsN/A (no replication)Same as RedisBoth sides accept writes; conflicts resolved via LWW (data loss possible)
Blast radius of single-node failureSlots owned by failed node (1/N of keyspace) until replica promotedKeys hashed to failed node (1/N) become misses until rehashSame as RedisLower (peer keeps serving); depends on hash slot ownership
Cross-region failoverManual; or use Redis Enterprise CRDB (commercial)Application-level (multi-cluster, route on miss)Same as RedisActive-active across regions is the headline feature
Data loss scenariosAsync-replicated writes lost on leader failure; AOF rewrite mid-crashAny node restart, any process crashSame as RedisConflicting concurrent writes; LWW silently drops one
Group B — HTTP / Edge Caches
DimensionCloudFrontCloudflareFastlyVarnish (self-hosted)NGINX (self-hosted)
Replication modelPer-POP independent cache; tiered/Origin Shield optionalPer-POP cache plus Tiered Cache and ArgoPer-POP cache plus shield POPsNone native (per-node); use Broadcaster for fanoutNone native (per-node); manual sync needed
Failure detectionAWS-internal anycast withdrawal on POP failureAnycast plus DNS plus health-aware routingBGP anycast plus health checksYour LB does itYour LB does it
Failover mechanismAnycast routes around bad POPs automaticallyAnycast plus traffic engineeringAnycast plus traffic engineeringL4 LB removes unhealthy node from poolL4 LB removes unhealthy node from pool
RTO (typical)Seconds for anycast convergenceSeconds for anycast convergenceSeconds for anycast convergence5-30s depending on LB health-check interval5-30s depending on LB health-check interval
RPO (typical)N/A (cache rebuilds from origin)N/A (cache rebuilds from origin)N/A (cache rebuilds from origin)N/AN/A
Split-brain behaviorDifferent POPs may serve different cached versions brieflySame; eventual consistency on cache stateSame; instant purge mitigatesEach node has independent cache; client may see version skewSame as Varnish
Blast radius of single-POP failureUsers in that geo route to next POP; brief RTT increaseSameSame; with fewer POPs, the next-POP RTT delta is largerLocal to the data center; all traffic routes through LBSame as Varnish
Cross-region failoverAnycast; or Route 53 health-aware recordsAnycast handles it transparentlyAnycast handles it transparentlyDNS-based (Route 53, etc.) at slower RTODNS-based at slower RTO
Data loss scenariosN/A for cache; origin is source of truth2025-11-18: bad config crashed proxy processes globally for hours2021: bad config triggered global outage ~1hrCrash plus bad VMOD can corrupt local cache stateCrash without graceful shutdown can leave partial cache files

Sharding

Group A — In-Memory Caches
DimensionRedisMemcachedValkeyKeyDB
Sharding modelHash slots (16384); client- or proxy-aware in cluster modeClient-side consistent hashing (Ketama)Hash slots (16384) inherited from Redis ClusterHash slots (16384) Redis-compatible
Shard key constraintsMulti-key ops require keys in same slot (hash tag pattern)Each key independent; no cross-key ops anywaySame as RedisSame as Redis
Rebalancing mechanismCLUSTER SETSLOT migration, slot-by-slotAdd/remove node, client recomputes ringAtomic slot migration (9.0); previously slot-by-slotSlot-by-slot Redis-compatible migration
Rebalancing cost / impactLive migration; brief MOVED redirects per slotCache miss spike during ring change; warmup requiredAtomic migration in 9.0 reduces redirect churnSimilar to Redis
Hot-shard behaviorOne slot saturates one CPU; hot key has no automatic mitigationOne node saturates; consistent hashing distributes loadSame as RedisMulti-threading helps within a hot shard, but hot key still limited
Maximum shards (practical)~1000 nodes per cluster; gossip overhead grows quadraticallyHundreds of nodes (no inter-node coordination)Same as RedisHundreds of nodes
Resharding without downtime?Yes, online with brief MOVED responsesYes, but warm-up cost on rehashYes, faster with atomic slot migrationYes
Cross-shard query supportNone natively; multi-key ops require same slotNone (no compound ops anyway)Same as RedisSame as Redis
Group B — HTTP / Edge Caches
DimensionCloudFrontCloudflareFastlyVarnish (self-hosted)NGINX (self-hosted)
Sharding modelPer-POP independent cache; same key may exist on 100s of POPsPer-POP plus optional Tiered Cache hierarchyPer-POP plus shield POP hierarchyPer-node (you decide cluster topology)Per-node (you decide cluster topology)
Shard key constraintsCache key built from URL plus Vary headersSame; URL plus cache-key rulesVCL controls cache key explicitlyVCL controls cache key explicitlyNGINX proxy_cache_key directive
Rebalancing mechanismN/A; cache rebuilds organically from originN/A; sameN/A; sameManual (deploy new fleet, drain old)Manual (deploy new fleet, drain old)
Rebalancing cost / impactAdding POPs has no migration cost (each warms from origin)SameSameCold-cache warmup as traffic shiftsSame
Hot-shard behaviorSingle hot URL multiplied across all POPs; rarely a problemSame; Tiered Cache helpsSame; shield POP consolidatesHot URL on one node; LB can rebalanceSame as Varnish
Maximum shards (practical)1600+ POPs; bounded by AWS335+ cities; bounded by Cloudflare~100 POPs but each is biggerSelf-imposed; typically 5-50 nodes per DCSame
Resharding without downtime?Implicit (AWS adds POPs without customer action)ImplicitImplicitYes, by draining one node at a timeYes, by draining one node at a time
Cross-shard query supportN/A (cache is per-POP)Tiered Cache promotes a fetch to a higher-tier POPOrigin Shielding consolidates fetchesN/A; LB chooses one nodeN/A; LB chooses one node

Replication

Group A — In-Memory Caches
DimensionRedisMemcachedValkeyKeyDB
Replication topologyLeader-follower (single-leader)None in OSSLeader-follower (single-leader)Multi-leader (active-active) or leader-follower
Sync vs asyncAsync by default; WAIT for semi-sync guaranteeN/AAsync by default; WAIT supportedAsync multi-master replication
Replication factor (default / max)Default 1 replica; practical max 3-5 replicas per masterN/ASame as RedisMultiple active peers (2-5 typical)
Consistency level optionsEventual (default); semi-sync via WAIT N TIMEOUTN/ASame as RedisEventual (LWW conflict resolution)
Replication lag (typical)Sub-ms in same AZ; ms-seconds cross-region asyncN/ASame as RedisSub-ms to ms (depends on network)
Conflict resolutionN/A (single leader prevents conflicts)N/AN/A (single leader)Last-write-wins by timestamp
Cross-region replicationManual or Redis Enterprise CRDB (CRDTs, commercial)Application-level onlyManual same as RedisBuilt-in active-active replication is the headline
Replication during partitionReplica stops receiving updates; can be promoted by SentinelN/ASame as RedisBoth sides accept writes; reconciled via LWW on heal
Group B — HTTP / Edge Caches
DimensionCloudFrontCloudflareFastlyVarnish (self-hosted)NGINX (self-hosted)
Replication topologyNone for cache (each POP independent); origin is source of truthSame; Tiered Cache adds a hierarchySame; Origin Shielding adds a hierarchyNone nativeNone native
Sync vs asyncN/A (no inter-POP replication of cache state)N/AN/AN/AN/A
Replication factor (default / max)Effectively unlimited (each POP can cache same object)SameSamePer-node; typically 2-5 nodes per DCPer-node; typically 2-5 nodes per DC
Consistency level optionsTTL plus purge (invalidation request, minutes to propagate)TTL plus instant purge plus tag-based purgeTTL plus sub-150ms instant purge by surrogate keyTTL plus PURGE (per node) plus surrogate keysTTL plus PURGE (per node); no native tags
Replication lag (typical)Invalidations: 5-60s typical; up to minutesCache purge: seconds; KV writes: up to 60s globallyInstant purge: under 150ms globallyN/A (no replication); per-node purge is localSame as Varnish
Conflict resolutionN/A (cache is read-only of origin)N/AN/AN/AN/A
Cross-region replicationInherent (every POP independently caches from origin)InherentInherentBuild your own (multi-DC fleet)Build your own
Replication during partitionPOPs operate independently; if a POP is partitioned from origin, serves stale until TTLSame; serve-while-revalidate is configurableGrace and stale-while-revalidate handle origin partitions cleanlyGrace + stale-while-revalidate (VCL)NGINX has proxy_cache_use_stale for similar behavior

Better Usage Patterns

Patterns most teams miss. What gets called out in code review at L6/L7.

Group A — In-Memory Caches

Redis

PatternWhat Most Teams Do WrongThe Better WayWhy It Matters
Cache stampede preventionTTL expires, every request races to recompute, origin gets hammeredUse SET NX EX lock or probabilistic early refresh (XFetch)One cold key during peak traffic can cascade to a full outage; this pattern is the most common Redis-related production fire
Pipelining for bulk operationsOne SET per call, app-side latency dominated by RTTPipeline 100-1000 ops; or use MSET/MGET for atomic batchesSingle-trip latency to Redis is ~0.5ms; 1000 separate calls is 500ms; pipelined is 1-2ms
Avoid O(N) commands on productionKEYS *, SMEMBERS on huge sets, HGETALL on huge hashesUse SCAN, SSCAN, HSCAN with COUNT boundsO(N) commands block the single thread; one large SMEMBERS can stall every other client for seconds
Use hash tags for cluster colocationCluster mode adopted, multi-key ops break, half the Lua scripts return CROSSSLOTDesign keys as user:{user_id}:profile, user:{user_id}:sessions with shared tagCluster mode failure mode for an unprepared app is silent: ops just fail with CROSSSLOT, app sees half-success
Connection pooling, not per-requestApp opens a connection per request; Redis hits maxclients ceilingUse lazy pool (Jedis pool, lettuce shared, ioredis cluster client)Connection setup is 0.5-2ms; on a hot path, it doubles every cache call
Set maxmemory and a sane eviction policyDefault noeviction; writes start failing when memory fillsmaxmemory set to 80% of node RAM, maxmemory-policy allkeys-lru for cache usenoeviction makes Redis fail writes silently from the app's perspective; cache becomes write-failure layer
Use replicas for read-scaling carefullyReads go to async replicas; users see stale data, application doesn't expect itMark replica reads as "read-stale-acceptable"; route consistent reads to leaderReplicas can lag seconds under load; if your app assumes read-your-writes, it will silently break

Memcached

PatternWhat Most Teams Do WrongThe Better WayWhy It Matters
Use CAS for safe updatesGET then SET, two clients overwrite each other's updateUse gets (returns CAS token) then cas (conditional set)Without CAS, multi-step updates are inherently racy; lost-update bugs hide for months
Cache version prefix in keySchema change requires full cache flush; thundering herdPrefix key with serialization version: v3:user:123Version bump invalidates old keys gracefully; new and old can coexist during deploy
Enable slab automoveDefault config; one slab class fills, others sit empty, hit rate degradesSet slab_reassign=1 and slab_automove=1Slab calcification silently drops hit rate by 20-40% over weeks; few teams notice
Use binary protocol for high RPSText protocol default; parsing overhead at high throughputUse binary protocol clients (libmemcached, pymemcache binary mode)Binary protocol is roughly 2x faster on parse; matters at 100K+ ops/sec/client
Plan for rolling restartFleet restart equals 100% cache miss equals DB falls overRestart one node at a time; warm new fleet before swappingDeploys are routine; cold-cache outage on deploy is the most preventable Memcached fire
Set TTL aggressivelyNo TTL set; LRU eviction silently drops keys at unpredictable timesAlways set explicit TTL (even 86400 for "daily")Explicit TTL makes cache behavior predictable; LRU-only means oldest least-popular wins, hard to reason about

Valkey

PatternWhat Most Teams Do WrongThe Better WayWhy It Matters
All Redis patterns applyTreat Valkey as a different product, miss decades of Redis loreApply Redis best practices (stampede prevention, pipelining, hash tags) directlyValkey is wire-compatible; the operational patterns transfer 1:1
Pin to features in your fork windowUse Redis 8 features (vector sets) and assume they will land in ValkeyStick to features available in Valkey 8.x at adoption; track Valkey roadmap for newer needsThe fork is at 7.2.4; anything past that has to be ported. Don't build on features that don't exist yet.
Use first-party modulesAssume RediSearch / RedisGraph will work; surprise on adoptionUse Valkey-Bloom and Valkey-JSON (AWS/Google contributed); evaluate alternatives for searchFirst-party modules are BSD; commercial Redis modules have license incompatibility
Enable I/O threading deliberatelySet io-threads high "for performance"; spinlocks degrade throughputSet io-threads to ~half of physical cores; benchmark; iterateI/O threading is a single dial that can make or break perf; default is conservative
Adopt 9.0 features carefullyAtomic slot migration and multi-DB cluster mode adopted on day one for prodStay on 8.1 LTS for critical workloads; pilot 9.0 features on a non-critical cluster first9.0 is October 2025; major features need 6+ months of fleet miles before mission-critical use
Plan migration from Redis with TLS settings in mindDrop-in migration breaks because TLS ciphers or auth modes differTest in staging with prod TLS config; many managed-Valkey vendors wrap auth differentlyThe "drop-in" promise is true for wire protocol; auth and TLS often need glue code

KeyDB

PatternWhat Most Teams Do WrongThe Better WayWhy It Matters
Tune server-threads to physical coresSet threads equal to vCPUs in cloud VMs; spinlocks contendSet threads to physical cores; pin CPUs; avoid noisy-neighbor VMsKeyDB's spinlock model assumes exclusive cores; oversubscription destroys the perf claim
Plan for migration off KeyDBTreat KeyDB as a long-term stable choiceHave a Valkey or Redis migration plan documented; pin to Redis 6.2 compatible featuresSnap's strategic ambiguity means roadmap risk; the option value of being migration-ready is real
Avoid concurrent same-key writes across active-active peersTrust active-active to "just work"; LWW silently loses writesPartition writes by key prefix per region; or use single-leader with fast promotionLWW is fine for cache (recompute on miss); for anything close to durable state, it's a foot-gun
Treat Flash tier as cost optimization, not capacityEnable Flash to "fit more data"; tail latency suffersUse Flash only when the workload is genuinely hot-cold; benchmark P99 vs RAM-onlyFlash hits are 10-50x slower; for uniform access, you've added variance for no benefit
Self-support readinessAssume GitHub issues will get answeredInternal runbook for common KeyDB failures; budget for self-debuggingNo SLA, no commercial escalation; production support is your team's responsibility
Group B — HTTP / Edge Caches

Amazon CloudFront

PatternWhat Most Teams Do WrongThe Better WayWhy It Matters
Default to CloudFront Functions over Lambda@EdgeUse Lambda@Edge for everything because it's "more capable"Start with CloudFront Functions; escalate to Lambda@Edge only on network or SDK needFor 1B requests/month, Functions cost $100 vs Lambda@Edge $600+; the default choice is a 6-10x cost gap
Use Origin Shield deliberatelyEnable everywhere for "better cache hit"Enable only when origin is expensive (Lambda, complex queries) or cache hit ratio is lowOrigin Shield adds a hop and a per-GB charge; for high-cache-hit static workloads it's a net negative
Normalize cache keys to reduce varianceDefault cache key includes all query strings, headers; cache hit ratio is terribleUse CloudFront Functions to canonicalize; whitelist only meaningful query paramsOne uncontrolled query param can cut hit rate by 80%; classic engineering miss
Use signed cookies, not signed URLs, for sessionsSign every URL; user shares URL, downstream cache hit suffersSign cookies for session-bounded access; URLs stay cacheableSigned URLs are per-user; signed cookies allow caching the same URL across the user's session
Route via Origin FailoverSingle origin, no fallback; origin outage equals user-visible 5xxConfigure origin groups with failover criteria (502, 504, etc.)One config change, multi-origin resilience without app changes
Use cache policies, not legacy "Cache Based on Selected Request Headers"Use the legacy radio buttons; cache key is messyDefine explicit cache policies as IaC (CDK, Terraform); version themCache policy is reusable across distributions; legacy mode forces per-distribution tweaks and drift

Cloudflare

PatternWhat Most Teams Do WrongThe Better WayWhy It Matters
Multi-CDN failover, not Cloudflare-onlyTrust the SLA; nothing else neededActive-passive with CloudFront or Fastly; DNS-failover or active-active anycast2025-11-18 outage took down sites globally for hours; multi-CDN is the proven mitigation
Use Cache API for hot per-POP data, KV for global referenceUse KV for everything because it's "the data layer"Cache API for fresh fetches in same POP; KV for config / reference dataKV has 1 write/sec/key and seconds of replication lag; Cache API has neither
Workers Smart Placement for origin-heavy workloadsWorkers run at every POP, including ones far from originEnable Smart Placement to colocate Worker with origin regionFor workloads where the Worker mostly calls origin, placing it next to origin reduces P99 significantly
Use Tiered Cache for low-hit-rate originsTrust default per-POP cacheEnable Tiered Cache (Smart Routing) so misses funnel through fewer POPsPer-POP miss multiplies origin load by N; tiered cache reduces it by an order of magnitude
Pin Bot Management whitelist for known crawlersTrust the defaults; monitoring or AI crawlers get challengedWhitelist known user-agents and IPs; tune rules per routeCloudflare's Bot Management defaults block more than people realize; over-blocking is a silent bug source
Plan for 2.8 ToS implicationsBuild a media-streaming product on Free / Pro planNegotiate enterprise contract before traffic profile shiftsThe unmetered-bandwidth promise has limits; surprise enterprise sales call mid-launch is bad

Fastly

PatternWhat Most Teams Do WrongThe Better WayWhy It Matters
Use surrogate keys for cache invalidationInvalidate by URL pattern; brittle, hard to map content to URLTag responses with surrogate keys; purge by tagSurrogate-key purge is Fastly's killer feature; URL-based purge is fragile and slow
Origin Shielding placementUse the default shield POP; not necessarily closest to originSet shield to the POP closest to origin regionShield-to-origin latency directly affects cache miss tail latency
VCL disciplinePile logic into vcl_recv; complex if/else treesUse Fastly's subroutine convention; split logic by phase (recv, hash, fetch, deliver)VCL phases have semantic meaning; mixing them causes subtle bugs (e.g., cache key set after lookup)
Compute@Edge for the right workloadsRewrite everything in Compute@Edge because "Wasm is the future"Use Compute@Edge for request enrichment, A/B routing, complex auth; keep simple caching in VCLWasm cold-start is near-zero but each invocation has cost; VCL is free per-request
Use Edge Dictionaries for configHardcode redirect maps and feature flags in VCLUse Edge Dictionaries; update without redeploying VCLVCL deploys go through compile; dictionary updates are near-instant via API
Stale-while-revalidate aggressivelyShort TTL plus no stale handling; origin sees miss-stormsLong TTL plus stale-while-revalidate plus surrogate-key purgeCombining long TTL with fast invalidation is the whole point of Fastly; lots of teams underuse this

Varnish Cache

PatternWhat Most Teams Do WrongThe Better WayWhy It Matters
VCL phase disciplineCram all logic into vcl_recvPlace logic in the right phase: vcl_recv, vcl_hash, vcl_backend_fetch, vcl_deliverPhases run in a specific order with specific available variables; misplacement causes silent misbehavior
Use surrogate keys (xkey VMOD)Purge by URL pattern; brittle, slowTag responses with surrogate keys via xkey; purge by tagThe single highest-leverage Varnish pattern; most production Varnish fleets that don't use it should
Cluster purge fanoutPURGE on one node; assume cluster-wide effectUse Varnish Broadcaster or custom HTTP-fanout to propagateWithout fanout, multi-node clusters serve inconsistent content; user-visible bug
Use std.log for VCL debugging, not regsub-heavy headersAdd 20 debug headers in vcl_deliver; pollute responsesUse std.log + varnishlog for debugging; ship logs separatelyProduction responses with debug headers are noise; logs are a better audit trail
Grace and keep tuningDefault 10s grace; origin outage triggers user-visible errorsSet beresp.grace to minutes-hours; combine with stale-while-revalidateGrace is Varnish's resilience superpower; underused by teams used to TTL-only thinking
Always run a TLS sidecar (Hitch, NGINX, HAProxy)Try to terminate TLS in Varnish; discover it's not supportedHitch on same host for TLS; pass cleartext to Varnish via UDSThe pattern is so universal it should be the default in any Varnish runbook

NGINX

PatternWhat Most Teams Do WrongThe Better WayWhy It Matters
Enable proxy_cache_lock for stampede preventionCache key expires; 1000 requests hit origin simultaneouslySet proxy_cache_lock on to serialize cache fillsThe single most missed NGINX cache directive; one line prevents the most common origin overload
Use proxy_cache_use_stale for origin outagesOrigin 500s; users see 500sConfigure proxy_cache_use_stale error timeout updatingTwo lines of config turn cache into resilience layer; the equivalent of Varnish grace
Cache key normalizationDefault cache key includes full query string; low hit rateUse map to normalize; cache key includes only meaningful query paramsSame impact as on CloudFront: uncontrolled query params kill hit rate
Tune worker count to actual coresworker_processes auto; not always right under containerized limitsSet explicitly to physical core count; pin with worker_cpu_affinity at high scaleIn containers, auto reads cgroup limits incorrectly on older versions; perf suffers silently
Health checks via NGINX Plus or third-partyTrust upstream is up; one failing backend gets trafficNGINX Plus active health checks; or nginx-upstream-dynamic-servers module for OSSWithout active health checks, NGINX only knows about failures after a request fails; first request after failure pays the cost
Use open_file_cache for static-heavy workloadsStatic file metadata lookup per request; stat() bottleneckEnable open_file_cache to cache fd metadataFor high-RPS static workloads, file metadata cache is the difference between 10K and 100K RPS per worker

Advanced / Next-Gen Alternatives

What's replacing or augmenting each technology, and what to watch.

Group A — In-Memory Caches

Redis

Successor / AlternativeWhat It ImprovesMaturityMigration CostWhen To Consider
ValkeyBSD license, faster community velocity, atomic slot migrationProductionLow (wire-compatible)If AGPL/RSAL is a blocker; new builds default to Valkey
DragonflyDBMulti-threaded from the ground up, claims 25x throughput on big nodesEmergingMedium (Redis API mostly; some semantic gaps)If single-node throughput is the bottleneck and full Redis-API parity isn't required
Microsoft GarnetC# implementation, RESP-compatible, high throughput, advanced storage tiersEmergingMediumMicrosoft-stack shops; experimental for high-end research-grade perf
Amazon MemoryDBMulti-AZ durable Redis with consensus-backed writesProductionLowWhen you need cache plus database guarantees, not just cache

Memcached

Successor / AlternativeWhat It ImprovesMaturityMigration CostWhen To Consider
Valkey / Redis with threaded I/OAdds replication, persistence, rich types while approaching Memcached throughputProductionMedium (different API, but more capable)When you've outgrown raw GET/SET
Meta's CacheLib (open-sourced)In-process embedded cache, used to power Meta's cachingProductionHigh (embedded library, not service)When you want cache as a library inside your service, not a separate fleet
AerospikeHybrid memory plus SSD, sub-ms even at TB scale, multi-DC replicationProductionHigh (different data model)When cache size outgrows RAM economics and you still need sub-ms reads
Hyperscale alternatives (CacheLib, Meta TAO)Purpose-built for scale beyond what Memcached envisionedProductionVery highWhen you operate at Meta or Google scale; otherwise stick with Memcached

Valkey

Successor / AlternativeWhat It ImprovesMaturityMigration CostWhen To Consider
Redis 8 (AGPLv3)Vector sets, newer modules, faster feature velocityProductionLow (wire-compatible)If AGPLv3 is acceptable and you need Redis 8 features
DragonflyDBMulti-threaded from scratch, BSL license, drop-in APIEmergingMediumWhen threading model is the bottleneck and BSL is acceptable
Amazon MemoryDB for ValkeyMulti-AZ durable Valkey with consensus writesProductionLowWhen you need stronger durability than async replication offers

KeyDB

Successor / AlternativeWhat It ImprovesMaturityMigration CostWhen To Consider
Valkey 8+ with I/O threadingActive LF stewardship, faster community velocityProductionLowAnytime; this is the natural migration path
DragonflyDBTrue multi-threaded execution, not just I/OEmergingLow (Redis-compatible)When you adopted KeyDB specifically for threading and want to push further
Redis Enterprise (commercial)Production-grade Redis with paid support and CRDB for multi-DCProductionLowIf active-active was the KeyDB feature you needed and now you need vendor support
Group B — HTTP / Edge Caches

Amazon CloudFront

Successor / AlternativeWhat It ImprovesMaturityMigration CostWhen To Consider
CloudflareBigger POP footprint, V8 isolates, free tier, R2 zero egressProductionMedium (auth re-design, IAM equivalents)Cost-sensitive workloads, multi-cloud, developer experience matters
FastlyInstant purge, VCL programmability, fewer-larger POPsProductionMedium (VCL learning curve)High-write content (news, sports, inventory) where purge speed matters
Specialized media CDNs (BlazingCDN, Bunny)Order-of-magnitude lower per-GB cost for media workloadsProductionMedium (signed URL, DRM rework)Pure video / large-file delivery where cost dominates the decision

Cloudflare

Successor / AlternativeWhat It ImprovesMaturityMigration CostWhen To Consider
Multi-CDN (Cloudflare + CloudFront/Fastly)Eliminates single-vendor blast radius (Nov 2025 outage was global)ProductionMedium-high (DNS, config drift, multi-bill)Mission-critical workloads where 4+ hours of outage is unacceptable
Fastly Compute@EdgeWasm runtime, multi-language, comparable performanceProductionHigh (rewrite from Workers JS to Wasm)When you need a single vendor switch off Cloudflare
AWS CloudFront + Lambda@Edge / FunctionsAWS-native integration, less single-vendor risk than Cloudflare for AWS-heavy stacksProductionMediumIf your origin is AWS-deep, CloudFront is the obvious second CDN

Fastly

Successor / AlternativeWhat It ImprovesMaturityMigration CostWhen To Consider
Cloudflare WorkersLarger ecosystem, V8 isolates, more POPs in long-tail marketsProductionHigh (VCL to JS rewrite)When VCL programmability is no longer worth the language overhead
AWS CloudFront + Lambda@EdgeAWS-native integration, predictable enterprise sales cycleProductionMedium-highWhen AWS-deep origin makes CloudFront's integration story compelling
Self-hosted Varnish + own POPsTotal control, no SaaS vendor riskProductionVery high (operate own CDN)Very rare: compliance or geography forces self-hosting

Varnish Cache

Successor / AlternativeWhat It ImprovesMaturityMigration CostWhen To Consider
FastlySame VCL lineage, fully managed, global anycastProductionLow-medium (VCL ports with some Fastly-specific changes)When operating Varnish is no longer worth the engineering effort
Varnish Enterprise (commercial)MSE persistent storage, native clustering, paid supportProductionLowWhen you want to stay self-hosted but need enterprise features
NGINX PlusAdd caching to existing NGINX without learning VCLProductionMediumWhen team already runs NGINX and Varnish's specialized cache features aren't critical
Apache Traffic ServerYahoo-scale-proven, similar HTTP-cache focus, more permissive configProductionHigh (different config language, different ops model)Rare; mostly for very large CDN-like internal deployments

NGINX

Successor / AlternativeWhat It ImprovesMaturityMigration CostWhen To Consider
EnvoyModern data-plane for service mesh, xDS dynamic config, gRPC nativeProductionHigh (different config model, different operational story)Microservices / service-mesh context; NGINX feels heavyweight
Cloudflare PingoraRust-based proxy, multi-threaded, used to power Cloudflare's edgeEmergingVery high (library, not config; Rust)When you want NGINX-class perf but the C codebase is the issue
HAProxy plus VarnishBest-of-breed: HAProxy for LB/TLS, Varnish for cacheProductionMedium (operate two services)When neither role is a side concern; you want specialists
CaddyAutomatic HTTPS, simpler config, modern Go-basedProductionLow for simple use casesSMB and developer-facing deployments where NGINX config feels heavy

Trade-off analysis built with /tech-tradeoffs-analyzer skill. Each cell is operator-grounded, not vendor marketing. Verify against your specific workload before architectural commitments.