Caching Stack Trade-Offs: In-Memory + HTTP/Edge

Two layers of caching, one analysis. In-memory data caches sit between app and database. HTTP/edge caches sit between user and origin. They compose, they do not compete. This artifact compares them honestly within each layer.

Caching L6/L7 Depth

As of 2026-06-08

PE Verdict

In-memory layer: Redis won the API war but lost the license war. Valkey is now the default open-source choice for new builds; Redis 8 (AGPLv3) is fine for AGPL-compatible workloads or paying customers; Memcached remains the right answer when your cache is genuinely just a hash map and you want multithreaded raw throughput; KeyDB is on borrowed time post-Valkey momentum.

HTTP/edge layer: CloudFront wins when you are AWS-deep and need integration; Cloudflare wins on developer experience and reach (with single-vendor blast-radius risk demonstrated again on 2025-11-18); Fastly wins when you need VCL-grade programmability and instant purge; Varnish wins when you need to own the cache layer; NGINX is the right answer when caching is a side-effect of being a reverse proxy, not the main job.

Best default choices

Valkey for OSS Redis-like cacheDefault for new open-source in-memory cache builds when Redis licensing is a concern Memcached for pure GET/SETUse when the cache is truly a distributed hash map and raw multithreaded throughput matters CloudFront for AWS-heavy stacksUse when origin, identity, logs, and operations already live inside AWS Cloudflare for edge reachUse when global POP coverage, Workers, developer speed, and R2 economics matter most

Cross-Layer Overview

Caching is not one decision; it is three to five decisions stacked in series. In-memory caches (Group A) hold hot data near the app process; HTTP/edge caches (Group B) hold rendered responses near the user. Most production stacks run both layers. Mixing them up in design review is the most common L6 interview tell.

Layer Characteristics

Dimension	Group A — In-Memory Cache	Group B — HTTP / Edge Cache
What gets cached	Application objects, sessions, computed results, leaderboards	HTTP responses, static assets, API responses with cache headers
Cache key	Application-defined string key	URL plus selected headers / query parameters (Vary)
Hit latency target	Sub-millisecond p99 in same VPC	5-30ms p99 from user (closest POP)
Capacity ceiling	Limited by node RAM (cluster scales out, expensive)	Effectively unlimited at provider; per-POP disk-backed
Invalidation model	Explicit delete or TTL, fully under app control	TTL plus purge API, often eventual across POPs
Failure mode if cache dies	Thundering herd onto database, cascading failure within seconds	Origin must absorb full user traffic, often impossible at scale
Who owns the box	You (self-hosted) or managed service (ElastiCache, MemoryDB)	Almost always provider-managed; Varnish/NGINX are the self-host exceptions
Cost driver	Memory GB-hours plus cross-AZ network	Bandwidth GB egress plus per-request charges
Where it lives on the path	App → cache → DB (server-side, behind app)	User → cache → app (client-facing, in front of app)

Trade-Offs

One table per technology. Each row is a real trade: you gain X by giving up Y. Columns are sortable.

Group A — In-Memory Caches

Redis In-Memory

v8.0 GA May 2025, tri-licensed (RSALv2 / SSPLv1 / AGPLv3). Single-threaded command execution with threaded I/O.

Trade-Off	What You Gain	What You Give Up	When It Bites You	PE Nuance
Single-threaded core execution	Atomic ops without locks, predictable per-op latency, lockless data structures	One CPU core caps your peak throughput per shard	Hot key with O(N) commands (KEYS, large SUNIONSTORE) freezes the entire instance	Redis 6+ multi-threaded only the I/O layer, not command dispatch. If your bottleneck is CPU on command execution, scale by sharding, not by adding cores.
Rich data structures	Lists, sets, sorted sets, streams, hashes, geo, vector sets (8.0)	Schema decisions baked into key naming, hard to evolve without migration	You build a sorted set leaderboard and later need range queries by a different field; only option is full rewrite	The richer the type, the more it becomes load-bearing schema. Treat key naming and structure choice as a one-way door.
Tri-license (AGPLv3 / RSALv2 / SSPLv1)	OSI-approved option exists again as of v8.0	AGPL copyleft creates legal review burden; many enterprises ban it outright	Security review or M&A due diligence flags AGPL; you migrate to Valkey under time pressure	Pick license at adoption, not in panic. If your legal team has even started writing a copyleft policy, pick Valkey now.
Persistence (RDB / AOF)	Survive process restart, point-in-time snapshots, replay log	Disk fsync cost, fork() RAM doubling during RDB snapshot	Large dataset (50GB+) on a small node, BGSAVE forks and OOM-kills the process	If you actually need durability, you do not want a cache; you want a database. Use AOF everysec as a recovery aid, not as a durability guarantee.
Cluster mode (16384 slots)	Horizontal scale, automatic slot migration, gossip-based topology	Multi-key ops fail unless keys share a hash tag, no MULTI across slots	You add cluster mode to an app that uses Lua scripts touching many keys; half of them break silently	Cluster mode is not a free upgrade. The application has to be cluster-aware. Many teams add it then regret it; benchmark single-instance with replicas first.
In-memory by design	Sub-millisecond reads, no disk in the hot path	$/GB is 10-30x the cost of SSD storage	Dataset grows past planned size, eviction starts pruning live keys, hit rate craters	Set `maxmemory` and `maxmemory-policy` at provisioning time, not after the SEV. `noeviction` turns the cache into a write-failing surprise.
Pub/Sub plus Streams plus Functions	One process replaces a message broker, a cron, and a script runner	Mixing concerns makes capacity planning and blast radius messy	Cache and pub/sub on the same cluster, pub/sub clients hit slow-consumer, cache latency spikes	Operationally you should treat Redis-as-broker and Redis-as-cache as separate clusters. They share code but they do not share failure modes.
Vector sets (8.0 new data type)	RAG and embedding workloads without a separate vector DB	HNSW index sits in RAM, dataset size is gated by node memory	Embeddings explode past 100M vectors at 1536 dims, RAM cost forces a real vector DB anyway	Vector sets are a fine bridge from prototype to production. They are not a substitute for Pinecone or pgvector at 10M+ vector scale.

Memcached In-Memory

BSD license, multithreaded, deliberately simple. Meta serves around 5 billion requests per second on Memcached.

Trade-Off	What You Gain	What You Give Up	When It Bites You	PE Nuance
Multithreaded architecture	Linear scale-up across cores on a big node, simpler hardware-utilization story	No atomicity beyond single-key ops, no scripting, no transactions	You need compare-and-swap on multi-field record; CAS only works per-key, you build a distributed lock yourself	Memcached vertical scale is the cleanest of any cache. r6g.16xlarge with 64 vCPUs and 512GB RAM utilized cleanly is hard to beat for pure GET/SET throughput.
String values only	Server stays simple, fast, lock-free hash table	Application owns all serialization (JSON, MessagePack, Protobuf)	Schema change in your serialized blob breaks every cached entry; no migration path other than full flush	Use a version prefix in the cache key (`v3:user:123`) so a serialization change cycles cleanly without poison entries.
1 MB default value limit	Forces good cache hygiene, prevents one bad key from eating a slab class	Larger blobs (rendered HTML pages, image bytes) need chunking or a different cache	Cache-stampede mitigation logic that stores a "rendered" response works in dev, fails in prod once the page grows	The limit is tunable to 128MB via `-I`, but the slab allocator was designed for small objects. Pushing it past 1MB usually means you picked the wrong tool.
No persistence by design	No fork pauses, no fsync stalls, predictable steady-state latency	Restart equals total data loss, cold cache problem on every deploy	You deploy a fleet-wide config change that restarts Memcached; database absorbs 100% of read traffic, falls over	Always have application-level warmup logic or rolling restart by node. Treat the cold-cache problem as a planned event, not a surprise.
LRU with slab classes	Predictable memory footprint, no fragmentation problems	Slab calcification (one slab class fills, others sit empty)	Workload shifts to mostly 4KB objects, slab class 6 evicts hot keys while slab class 4 has free space	Modern Memcached has `slab_reassign` and `slab_automove`, but most ops teams never enable them and quietly bleed hit rate.
No replication or clustering	Stateless from the cluster's perspective, client-side sharding is trivial (consistent hashing)	Node failure equals partition loss until cache refills from origin	One node in a 20-node fleet dies; 5% of cache evaporates, your database P99 doubles for 30 minutes	The Memcached failure model assumes the database can serve the miss rate. That assumption is the entire architecture; verify it under load test.
250-byte key length cap	Server stays fast, predictable hash table behavior	Long composite keys (tenant-id plus user-id plus feature-flag plus locale) get truncated or hashed	Multi-tenant SaaS where keys naturally grow long; you hash, then lose cache-key debuggability	If you have to hash the key client-side to fit, you have already lost the ability to inspect what is cached. Worth picking Redis for any key longer than ~150 bytes.

Valkey In-Memory

Linux Foundation fork of Redis 7.2.4 (March 2024). BSD-3-Clause. v9.0 GA October 2025 with billion-RPS cluster claims.

Trade-Off	What You Gain	What You Give Up	When It Bites You	PE Nuance
BSD license under Linux Foundation	Vendor-neutral governance, OSI-approved permissive, no AGPL contagion	Less concentrated commercial development velocity than a single-vendor project	An enterprise feature you needed (advanced security, RBAC) lands in Redis Enterprise first	The governance argument is stronger than it looks. AWS, Google, Oracle, Ericsson, Huawei, Tencent contributing means feature flow is broad, not blocked on one company's roadmap.
Wire-protocol compatible with Redis 7.2	Drop-in client replacement, all existing redis-cli, libraries, and Lua scripts work	Compatibility is a moving target; Redis 8 features (vector sets) are not in Valkey yet	Your team writes against Redis 8 vector sets, then a security policy forces Valkey migration	The fork date is locked at Redis 7.2.4. Anything in Redis 7.4+ commercial or Redis 8 is either ported (slowly) to Valkey or never lands. Pin your client library to RESP3 features, not Redis-specific commands.
Enhanced I/O threading (8.0)	Better multi-core utilization without breaking single-threaded execution model	Configuration complexity, IO threads count tuning under load	You set `io-threads` too high on a small instance, threads contend on shared queues, throughput drops below single-threaded baseline	Threading helps the I/O layer only. If your bottleneck is CPU on EXPIRE scans or large MGETs, threading does not help; you still need to shard.
Atomic slot migration (9.0)	Resharding moves entire slot atomically, simpler operations	Younger feature; Redis Cluster's slot-by-slot migration has more battle scars	You hit a corner case during resharding under heavy write load; less community precedent for the fix	9.0 is October 2025. For mission-critical clusters, stay on 8.1 LTS until 9.x has 6+ months of production miles outside the Linux Foundation labs.
Module ecosystem (JSON, Bloom)	First-party modules contributed by AWS and Google, BSD-licensed	Smaller than Redis Stack; RediSearch, RedisGraph not directly available	You need full-text search; have to either port RediSearch (license complication) or run a separate search engine	Module gap closes monthly. If your need is JSON or basic indexing, Valkey is fine; if it is RediSearch or Redis Graph, you are picking between Redis 8 AGPL or a different tool entirely.
Adopted by managed services fast	ElastiCache, MemoryDB, Memorystore, Aiven all support Valkey by mid-2025	Vendor implementations differ subtly (TLS modes, IAM auth, snapshot formats)	You migrate from ElastiCache Valkey to self-hosted Valkey; auth integration breaks, snapshot import fails	The cloud-vendor support is real but each one wraps Valkey differently. Treat managed Valkey as a different product from upstream Valkey for ops purposes.
Multiple databases in cluster mode (9.0)	Logical tenant separation without separate clusters, cheaper multi-tenancy	Multi-DB was historically a Redis foot-gun (clients confused, replication scope)	You enable multi-DB cluster mode for tenant isolation, a client library bug routes cross-tenant	Test the client library you use against this feature explicitly. Many clients made assumptions in single-DB cluster mode that break here.

KeyDB In-Memory

Multithreaded Redis fork (2019), acquired by Snap May 2022. Runs critical Snap infra; no commercial product or paid support.

Trade-Off	What You Gain	What You Give Up	When It Bites You	PE Nuance
Fully multithreaded command execution	Multi-core scale on a single node, often 2-4x throughput vs Redis at same hardware	Spinlock-based concurrency; tuning `server-threads` wrong is worse than single-threaded	You set threads to match cores on a noisy-neighbor VM; spinlocks burn CPU, throughput collapses	KeyDB explicitly requires exclusive use of assigned cores. On Kubernetes with CPU limits, you will not get the documented numbers.
Active-active replication	Multi-master, both nodes accept writes, lower failover RTO	Last-write-wins conflict resolution; no CRDTs, no causal ordering	Two writers in two regions update the same key concurrently; one of the writes is silently lost	Active-active is convenient until you have a write conflict. Most teams reach for it for cross-AZ failover but actually need a leader-follower with fast promotion.
Snap is the sole steward	Real production validation at Snap scale, code is open	Roadmap is driven by Snap's needs, not the broader community	You need a feature Snap does not; PR sits for months with no review	Snap was listed as a Valkey contributor when the fork launched, signaling strategic ambiguity. Risk: KeyDB receives only Snap-internal fixes going forward.
Flash storage tier (FLASH)	Spill less-hot data to NVMe, dataset sizes beyond RAM	Flash hits are 10-50x slower than RAM hits; tail latency degrades	P99 latency for cold reads jumps from 0.5ms to 20ms after spillover; downstream timeouts trigger	Flash tier is only useful if your access pattern is genuinely tiered hot vs cold. For uniform-access workloads, it just adds variance without saving cost.
Redis API compatibility	Drop-in for most existing Redis applications	Diverges from Redis 7+; vector sets, newer modules not available	You depend on a Redis 7.4 command, only to find KeyDB is locked closer to Redis 6.2 semantics	The compatibility window narrows each year. Use `INFO` to see actual Redis-compatibility version reported; do not trust the docs.
Subscriber-publisher fanout at scale	Snap battle-tested at extreme pub/sub fanout	Less community pub/sub tooling than Redis	You need to debug a slow subscriber under load; troubleshooting docs are sparser than Redis	The Snap engineering blog has solid pub/sub posts. For non-Snap pub/sub patterns, you are reading source code.
No commercial backing for support	Free, no license cost, no vendor lock-in	No SLA, no paid escalation, no enterprise patching pipeline	You hit a production data-loss bug on a Saturday; only recourse is GitHub issues	For mission-critical workloads, the lack of a paid escalation path is the most under-discussed risk. Enterprise procurement will surface this in security review.

Group B — HTTP / Edge Caches

Amazon CloudFront Edge / CDN

~1600 edge locations, integrated with Lambda@Edge and CloudFront Functions. Origin Shield, real-time logs.

Trade-Off	What You Gain	What You Give Up	When It Bites You	PE Nuance
Deep AWS integration	IAM-native, S3 OAC, Lambda@Edge, WAF, Shield, signed URLs, KMS	Vendor lock-in; moving off CloudFront means rebuilding edge logic	Multi-cloud mandate hits, CloudFront-only features (OAC, Lambda@Edge) have no obvious port	The integration is the moat. If your origin is S3 + ALB + Lambda + WAF + Shield, CloudFront is roughly free engineering. If your origin is on GCP, you are paying for an awkward seam.
Two edge runtimes (Functions vs Lambda@Edge)	CloudFront Functions for sub-ms header rewrites; Lambda@Edge for full Node/Python with AWS SDK	Two mental models, two deployment paths, two billing models	You start in Lambda@Edge for everything; bill hits $10K+/month for header rewrites that belong in Functions	The cost difference is real: Functions at $0.10 per million requests vs Lambda@Edge at $0.60 plus duration. For 1B requests/month, that is $100 vs $600+. Default to Functions; escalate to Lambda@Edge only on demonstrated need.
Origin Shield ($0.0060/GB)	Consolidates origin fetches to one regional cache, cuts origin load 50-90%	Extra hop in the path, extra per-GB charge, single-region failure exposure	Origin Shield region degrades; your traffic that should hit the edge instead routes through a failing intermediary	Origin Shield is right when your origin is expensive (Lambda, complex queries) and wrong when your origin is itself a CDN-ish thing (S3 with high cache hit). Calculate before enabling.
Tiered pricing by region	Lower cost in NA/EU, predictable per-GB	Indian and South American regions are 2-3x more expensive; bill becomes geography-dependent	You launch in APAC, your egress cost line per user is 3x what you modeled on NA-only test traffic	The "10 regions, 10 prices" model means cost projection requires real geo-distribution data. Get a 30-day sample of egress by region before committing.
Real-time logs to Kinesis	Sub-minute log latency, custom field selection, replay to Athena/Redshift	Extra Kinesis costs, 1% sampling minimum, more pipelines to operate	You enable 100% sampling on a high-traffic distribution; Kinesis cost spikes, downstream consumer falls behind	Real-time logs at 1-5% sampling for security analytics is the right default. 100% is for short debugging windows, not steady state.
Functions Key-Value Store	Stateful edge logic (A/B tests, redirects) without origin round-trip	Eventually-consistent global propagation; small per-distribution limits	Marketing flips a feature flag; some users see it 30+ seconds before others, browser refresh shows inconsistency	KVS is roughly Cloudflare KV with worse limits. Useful for low-cardinality config, not for high-write workloads.
Anycast routing on AWS network	Same TLS termination as ALB, predictable AWS-internal performance	Less aggressive last-mile peering than Cloudflare or Fastly in some geos	India users get worse RTT than your test from us-east-1 suggested; CloudFront cannot fix bad transit	CloudFront's POP density is high but its peering is AWS-network-first. For markets with patchy AWS peering, Cloudflare and Fastly often win by 30-100ms.
Streaming and live video	MediaPackage, MediaTailor, native HLS/DASH support, low-latency HLS	Reinforces AWS lock-in for the entire media pipeline	Cost optimization moves you to a specialist CDN (BlazingCDN, Bunny); you have to rebuild signing and DRM	For pure media delivery at scale, hyperscaler CDNs are rarely the cost winner. CloudFront wins only when integration with the rest of your AWS media stack offsets the premium.

Cloudflare Edge / CDN

335 cities, 125+ countries. Workers (V8 isolates), KV, R2, D1, Durable Objects, Pages. Reported as fastest in ~48% of top eyeball networks.

Trade-Off	What You Gain	What You Give Up	When It Bites You	PE Nuance
V8 isolates instead of containers	Near-zero cold start (5ms), runs in every POP, cheaper per request	JavaScript/TypeScript primary path; Wasm support exists but most idioms assume JS	You want native binary perf for image transforms; Wasm works but Lambda@Edge plus Node is sometimes simpler	The isolate model is genuinely a step ahead of container-based edge (Lambda@Edge). Cold-start cost is structural, not just optimization.
Unmetered bandwidth tier	Cost-predictable for static content, free CDN basics, "fair use"	Section 2.8 ToS lets Cloudflare bump high-egress-non-website traffic to enterprise	Your video streaming app grows past "website-like" usage profile; sales calls you for an enterprise contract	The "unmetered" claim is real until your traffic looks like a media CDN. Read 2.8 before standardizing.
Single-vendor blast radius	One config plane, one bill, one support call	Their bad day is your bad day, globally and uncorrelated with origin	2025-11-18: a database permission change doubled a Bot Management config file, crashed proxy processes globally for hours	This is the single largest production concern with Cloudflare. The 2025 incident sequence (Feb R2, June KV, August DDoS, November Bot Management) confirms blast radius is a recurring failure mode. Plan a CDN failover.
Workers KV (eventually consistent)	Global key-value store, sub-5ms hot reads, persists across POPs	1 write per second per key cap; eventual consistency on writes (up to 60s)	You use KV for rate-limiting counters; writes get throttled, limits become advisory not enforced	KV is for read-heavy reference data, full stop. For counters use Durable Objects; for transactional state use D1 or your origin DB.
R2 (S3-compatible, zero egress)	Free egress to internet, killer pricing vs S3 for high-egress workloads	S3 API compatibility is partial (some APIs, ACL semantics, multipart edge cases differ)	You lift-and-shift from S3; a niche API call (S3 Select, Object Lambda) silently no-ops	R2 is genuinely disruptive for egress-heavy use cases (media, model weights, software downloads). For S3-as-feature-platform, the gap matters more than the price.
Workers ecosystem (D1, Queues, Durable Objects)	Build full apps at edge without a separate origin	Each product has its own limits, billing, and maturity curve	You build an app on Workers + D1 + Durable Objects; D1 hits its alpha-stage size limit, you replatform	The platform is real but the surface area changed faster than most teams' code refactoring budget allows. Treat as Cloudflare-native, not portable.
Cache API (in-POP, ephemeral)	Local POP cache for any Worker response, separate from KV	Each POP has independent cache; no global purge for Cache API entries by default	You cache personalized API responses by user-id; cache lives only in one POP, hit rate is much lower than expected	The two-tier story (Cache API ephemeral, KV persistent) is powerful but the mental model is exactly inverted from CloudFront. Document the convention in your runbook.
Free DDoS protection	Layer 3/4/7 mitigation included on every plan, absorbed 3.8 Tbps in 2024	Mitigation is opinionated; some "legitimate but unusual" patterns get rate-limited	Your monitoring scraper, load-test runner, or AI training crawler gets challenged or blocked	The defaults are right for 95% of sites. For API-only or B2B integrations, tune Bot Management rules; do not just trust the defaults.

Fastly Edge / CDN

Heavily customized Varnish 2.1 fork at core. ~150 Tbps capacity, fewer but bigger POPs. VCL programmable, Compute@Edge (Wasm/Lucet).

Trade-Off	What You Gain	What You Give Up	When It Bites You	PE Nuance
VCL programmability	Full caching logic as code; complex routing, A/B, request enrichment at edge	VCL is a niche DSL; team has to learn a non-portable language	Senior engineer who owns VCL leaves; nobody can debug a misbehaving subroutine	VCL is genuinely the most powerful per-request hook in any CDN. Worth the learning cost if you do real edge logic; overkill for static-only delivery.
Sub-150ms global instant purge	Invalidate cache by URL or surrogate key in milliseconds worldwide	Premium price tier; instant purge is part of why Fastly costs more per GB	You move from Fastly to a cheaper CDN; suddenly your inventory update lag goes from 1s to 60s, business breaks	Instant purge with surrogate keys is the killer feature for high-write workloads (news, sports scores, inventory). For static long-TTL content, you are overpaying.
Compute@Edge on Wasm (Lucet)	Multi-language at edge (Rust, JS, Go, Wasm), strong isolation, fast cold start	Smaller community than Cloudflare Workers; more "you are early" feel	You hit a Wasm runtime quirk; community answer is "open a support ticket"	Compute@Edge is technically excellent. The reason Cloudflare Workers feels bigger is community size, not technical superiority.
Fewer, larger POPs	Higher cache hit rates per POP, lower origin load, simpler debugging	Geographic coverage thinner than Cloudflare in long-tail markets	You launch in Africa or smaller APAC markets; user latency is higher than expected vs Cloudflare	The POP-density argument is more nuanced than it looks. Big POPs with good peering often outperform many small POPs with mediocre peering for top-100 cities.
2021 outage legacy	Postmortem culture forced real investment in safer config rollout	Reputation cost: that outage took down Reddit, NYT, Amazon for ~1 hour	Procurement security review surfaces the 2021 incident; you have to explain mitigation in writing	The 2021 outage was a single-customer config error that propagated. Post-incident, Fastly added staged config rollout. Cloudflare's 2025-11-18 outage was structurally similar; this is a class of failure, not vendor-specific.
Real-time analytics and logs	Per-second stats, syslog streaming to S3/Logentries, JSON feeds	Logs and analytics priced separately; cost can sneak up	You enable full request-level logging; monthly bill grows faster than traffic	Fastly's logging is the best in class for debuggability. Use it; just sample once you are past the initial debugging phase.
Origin Shielding	Consolidate origin fetches to one shield POP, dramatically reduce origin RPS	Adds a hop, single POP failure exposure for the shield path	Shield POP has a bad day; origin sees its full pre-CDN traffic for the duration	Always pair Origin Shielding with multi-POP shield fallback if you cannot tolerate origin spikes. Often missed in initial configurations.

Varnish Cache Edge / Self-Hosted

Open-source HTTP accelerator. v7.7 current (BSD-2-Clause). VCL-driven, reverse proxy cache, designed for HTTP. Varnish Software offers Varnish Enterprise (proprietary).

Trade-Off	What You Gain	What You Give Up	When It Bites You	PE Nuance
VCL on your own hardware	Total control over cache policy, can run in any data center or cloud	You operate the cache fleet: capacity, patching, monitoring, on-call	Cache fleet patching slips, a known CVE is unmitigated, security review fails	Self-hosting Varnish is right when latency or compliance forbids a SaaS edge. Otherwise you are reinventing what Fastly already runs at higher quality.
Built for HTTP, only HTTP	Best-in-class HTTP cache semantics, ESI, range requests, conditional GETs	TLS termination is via Hitch or another sidecar; not native in open-source Varnish	You try to use Varnish for TLS termination; discover you need to operate Hitch alongside, doubles ops surface	The "no TLS" stance is deliberate (HTTP is hard enough). In practice every production deploy has a TLS sidecar; budget for it.
VMOD extensibility	C-level extensions for custom logic; rich ecosystem of community modules	Custom VMODs are C code in your cache process; bugs crash Varnish	A bad VMOD pointer dereference takes down your cache fleet during peak; rollback is not automatic	Treat custom VMODs like kernel modules. Code review, fuzz test, canary deploy. Most outages "in Varnish" are actually in custom VMODs.
Single-machine ops model	No clustering, no consensus, no shared state to corrupt	High availability is on you (LB in front, sticky sessions, hot standby)	One Varnish node dies; LB does not detect fast enough; user gets origin direct for 30 seconds, origin falls over	The "no cluster" simplicity is genuinely an asset. Pair with an L4 LB that does proper health checks (not just TCP-up).
Grace, keep, and stale serving	Serve stale during origin outage, separate background fetch from delivery	VCL complexity grows quickly; grace logic is the second-biggest source of "why is this stale" tickets	Grace period is 5 minutes, you fix a bug at origin, users still see the bug for 5 minutes after deploy	Grace plus stale-while-revalidate is Varnish's killer feature for origin resilience. Use surrogate keys to purge by tag rather than relying on grace alone.
No native cluster purge	Each node is independent, predictable	Cluster-wide purge requires Varnish Broadcaster or similar add-on	You PURGE on one node; other 9 nodes still serve the stale; users see inconsistency	Production deploys always need Varnish Broadcaster, custom HTTP-fanout, or Varnish Enterprise. Plan it in from day one.
Open source (BSD-2-Clause)	No license cost, total transparency, large community, debuggable	No SLA, no commercial escalation; pay Varnish Software for enterprise support	Critical bug in 7.7, fix lands in 7.8; you have to upgrade an entire fleet to take it	Most production Varnish fleets pay Varnish Software for either Enterprise or support. Pure community Varnish works but is for teams that genuinely own this code path.

NGINX Reverse Proxy / Cache

F5-owned (since 2019), BSD-2-Clause open source. Web server, reverse proxy, load balancer, mail proxy, HTTP cache. v1.30 stable (April 2026). NGINX Plus is the commercial tier.

Trade-Off	What You Gain	What You Give Up	When It Bites You	PE Nuance
Jack-of-all-trades	One binary does TLS term, LB, cache, reverse proxy, web server	Specialized features (instant cluster purge, VCL-grade programmability) are weaker	You need surrogate-key purge across 50 NGINX boxes; build it yourself with consul-template plus HTTP fanout	NGINX is the right answer when caching is one of many jobs the box does. It is the wrong answer when caching is the only job and you need top-end semantics.
proxy_cache on local disk	Disk-backed cache survives restart, persists past RAM size	Cache directory format is internal, no easy cross-node sharing, no in-memory cache for hot keys	SSD wears out faster than expected under high write rate; replacing it requires fleet rotation	NGINX cache is functional, not specialized. For high-write-rate caches, mount on fast NVMe and tune `proxy_cache_lock`.
Configuration via directives	Easy to learn, plain text, version controlled, predictable	Complex logic forces nested if/map/regex; no real programmability without Lua (OpenResty)	You need conditional caching on a JWT claim; pure nginx.conf does not handle it, you reach for Lua, now you have two languages	If you find yourself writing complex maps and embedded Lua, you have already outgrown plain NGINX. Either move logic upstream or switch to Varnish/Fastly.
F5 ownership (post-2019)	Enterprise support pipeline, sustained investment in NGINX Plus features	F5 prioritizes Plus features; open-source NGINX advances more conservatively	A feature you assumed was open-source (active health checks, dynamic config) turns out to be Plus-only	The Plus vs OSS gap is real and grew under F5. For production caching at any scale, NGINX Plus or OpenResty is usually the right pick.
Massive ecosystem and tooling	Largest install base of any web server (35%+ market share)	Stack Overflow answers often outdated or wrong; many "best practice" blog posts ignore modern features	You copy a config from a 2018 blog; it has a CVE-prone TLS cipher list, security scan fails	Always start from the F5 docs (docs.nginx.com), not Google. The ecosystem is huge but quality varies.
Multiple roles share one process	Lower memory and connection overhead than running separate boxes for LB/cache/TLS	Cache I/O contends with TLS handshake on the same worker; tuning is harder	Heavy cache write load slows TLS handshakes; users perceive slow first-byte time, not a cache problem	Worker-process tuning matters more than the docs suggest. `worker_processes auto` is rarely the right answer at scale; tune to physical cores and pin if needed.
No native distributed cache	Per-node state is simple, no consensus, no replication bugs	Cluster-wide cache invalidation has to be built externally	You deploy a config change; cache TTL is still 10 minutes; users see stale content while you wait	Pair NGINX with a separate cache purge fanout (RabbitMQ, NATS, HTTP broadcast). Production deploys almost always do; budget for it.

Use Cases

Real production scenarios per technology, with the driving property that picked it over alternatives.

Group A — In-Memory Caches

Redis

Use Case	Company / Scenario	Driving Property	Scale Dimension	Why Not Alternative
Session store	Twitter/X session validation	Sub-ms lookup of session token to user context	500M+ DAU, millions of session checks per second	Memcached lacks rich types for storing user-context blob; DB lookup is 10ms+ minimum
Leaderboards	Stack Overflow reputation, game scoreboards	Sorted sets with O(log N) ZADD and O(log N + M) ZRANGE	10M+ scored entries, real-time updates	SQL ORDER BY plus LIMIT does not scale to millions of writes/sec; no other cache has native sorted set
Rate limiting	API gateways, login attempts, abuse mitigation	Atomic INCR with EXPIRE plus Lua for sliding window	100K+ rate-checks/sec per region	DB-based rate limit is too slow; Memcached lacks atomic compound operations
Pub/Sub fanout	Internal notifications, chat presence	Sub-ms publish to N subscribers, no broker setup	10K+ concurrent subscribers per channel	Kafka is overkill for ephemeral fanout; RabbitMQ adds operational weight
Vector search (Redis 8)	RAG prototypes, semantic cache	HNSW index in RAM, sub-ms approximate kNN	1-10M vectors, 768-1536 dim	pgvector is fine but adds a query latency dependency on Postgres availability; dedicated vector DBs are heavier to operate
Distributed locks (Redlock)	Distributed cron, deduplication, election	SET NX EX as a one-line lock primitive	10K+ lock acquisitions/sec	ZooKeeper is heavier; etcd is fine but adds a separate dependency

Memcached

Use Case	Company / Scenario	Driving Property	Scale Dimension	Why Not Alternative
Hyperscale object cache	Meta (Facebook) caches roughly 5B GETs/sec on Memcached at peak	Multithreaded raw GET/SET throughput on huge nodes	Trillions of cached objects, exabytes of in-memory data	Redis single-thread cap forces 10x more shards; pure GET/SET does not need Redis types
Database query cache	WordPress, Drupal at scale; MediaWiki	One-line "cache this query result" pattern, dirt-simple invalidation	Wikipedia-class read traffic, multi-TB cached	Redis adds operational weight unjustified by use case; you do not need persistence for a query cache
Rendered HTML / page fragment cache	Pinterest, Etsy product page fragments	1KB-100KB blob storage, immune to fragmentation	10K+ requests/sec/node, P99 under 1ms	Redis equivalent works but is slower per core; HTTP cache (Varnish) adds invalidation complexity
Hot dataset acceleration in front of slow stores	Hadoop/HBase fronted by Memcached for read-heavy workloads	Pure GET path is dominant, no need for compound ops	Petabyte HBase, hot working set in 1-10TB Memcached fleet	Redis cluster cost-prohibitive at this scale; HBase block cache alone insufficient
Multi-tenant SaaS cache layer	Heroku-style platforms exposing cache as a service	Stateless nodes, trivial horizontal scale, no replication concerns	10K tenants per cluster	Redis multi-tenancy is harder; no Pub/Sub or scripting noise across tenants

Valkey

Use Case	Company / Scenario	Driving Property	Scale Dimension	Why Not Alternative
License-clean Redis migration	AWS migrating ElastiCache Redis OSS customers to Valkey by default in 2025	BSD license satisfies enterprise IP review	Hundreds of thousands of managed customer fleets	Redis 8 AGPLv3 is unviable for many; Memcached lacks rich types
Cloud-vendor neutral cache	Multi-cloud SaaS (e.g., Aiven) offering Valkey across AWS/GCP/Azure	Single OSS SKU runs identically across hyperscalers	Thousands of provisioned clusters	Redis OSS license depends on managed-service-prohibition clause; legally awkward across clouds
Linux distro default in-memory KV	Debian/Ubuntu shipping Valkey as the redis-server replacement	OSI-approved license required by Debian policy	Tens of millions of Linux installs	Redis SSPL is incompatible with Debian's Free Software Guidelines
Billion-RPS cluster (Valkey 9.0)	High-throughput SaaS infrastructure at scale	Atomic slot migration, multi-DB cluster mode, advanced threading	1B+ requests/sec at cluster scale	Memcached lacks rich types and cluster mode; Redis Cluster has older slot-migration semantics
RDMA-accelerated low-latency workload (experimental)	HPC and AI/ML workloads needing kernel-bypass networking	Sub-100µs P99 over RDMA fabric	Microsecond-class P99 requirements	Redis OSS lacks RDMA support; Memcached has no roadmap for it

KeyDB

Use Case	Company / Scenario	Driving Property	Scale Dimension	Why Not Alternative
Snap's internal caching infrastructure	Snapchat backend	Multi-threaded throughput on big nodes, in-house steward	Snap-scale daily traffic, billions of requests	Internal investment in KeyDB makes it cheaper than re-evaluating Redis or Valkey
Redis migration for multi-core utilization	Teams hitting single-threaded Redis ceiling but unwilling to shard	Drop-in Redis API plus 2-4x throughput per node	50K-200K ops/sec single-node ceiling raised to 200K-500K	Sharding adds app-side complexity; Memcached lacks types
Active-active geo-replication	Multi-region SaaS wanting fast cross-region failover	Both regions accept writes; lower RTO than leader-follower promotion	2-5 regions, last-write-wins acceptable	Redis has no native active-active in OSS; Redis Enterprise CRDB is commercial
Flash-tier cache for cold data	Datasets larger than RAM, hot-cold ratio favorable	RAM-priced perf for hot keys, NVMe-priced bulk for cold	10TB+ working set on 1TB RAM nodes	Redis OSS has no integrated flash tier; commercial Redis on Flash is the only Redis option
Cost-optimized Redis at small scale	Startups maximizing single-node performance before sharding	Free, multi-threaded, no per-instance licensing	1-3 large nodes, <1M ops/sec total	Managed Redis costs add up; Valkey works but is newer than KeyDB at the time of adoption

Group B — HTTP / Edge Caches

Amazon CloudFront

Use Case	Company / Scenario	Driving Property	Scale Dimension	Why Not Alternative
S3-fronted static asset delivery	Prime Video, Disney+, every AWS-hosted site	S3 OAC plus signed URLs plus regional edge caches, all IAM-native	PB-scale storage, global CDN egress	Cloudflare/Fastly need separate S3 auth flow; integration friction
HLS/DASH video streaming	Prime Video, FuboTV, Twitch event simulcasts	MediaPackage + CloudFront low-latency HLS chain	Millions of concurrent viewers, sub-3s glass-to-glass	Specialist CDNs are cheaper per GB but require rebuilding the entire media pipeline
API acceleration with WAF and Shield	Banking and fintech APIs (Capital One, Robinhood)	WAF rules, Shield Advanced DDoS, KMS-encrypted log delivery, all integrated	100K+ requests/sec API with strict compliance	Cloudflare WAF is good but requires duplicating security policy outside the AWS account boundary
Edge personalization via CloudFront Functions	E-commerce A/B variant routing, geo-redirects, header normalization	Sub-ms execution, $0.10/M requests, runs at every edge	1B+ requests/month at minimal added cost	Lambda@Edge for the same work costs 6x more; full Workers stack requires platform change
Origin Shield in front of regional ALB	Multi-region ALB consolidating to single shield POP	Cuts origin RPS by 80%+ for cacheable workloads	10K+ origin RPS reduced to under 2K	Cloudflare Tiered Caching achieves similar; not available in non-Cloudflare stacks

Cloudflare

Use Case	Company / Scenario	Driving Property	Scale Dimension	Why Not Alternative
Free-tier CDN for personal/SMB sites	Millions of small sites, indie hackers, blogs	Unmetered bandwidth, free TLS, free DDoS	Cloudflare claims to serve a large share of top eyeball networks fastest	CloudFront has no free tier; Fastly has no real SMB plan
Workers-native edge apps	Discord (parts), Shopify (parts), AI inference proxies	V8 isolates, near-zero cold start, code runs in every POP	10B+ requests/day, multi-region	Lambda@Edge cold start is 10-100x worse; AWS region-bound vs Cloudflare's per-POP execution
R2 zero-egress object storage	Hugging Face model weights, large software downloads, video archives	S3 API plus zero egress cost to the internet	PB-scale, high-egress, predictable bills	S3 egress costs roughly $0.05-$0.09/GB to internet; R2 is genuinely free egress
DDoS mitigation as front door	Sites that have been attacked, target-rich verticals (crypto, gambling)	Layer 3/4/7 mitigation absorbed 3.8 Tbps in 2024 attack	Multi-Tbps attack absorption	AWS Shield Advanced is comparable but more expensive per protected resource
Global Workers KV for feature flags / configs	Mobile app config delivery, feature flag systems	Sub-5ms hot reads globally, eventual consistency acceptable	1M+ reads/sec global; writes infrequent	S3 + CloudFront is slower for writes; LaunchDarkly etc. add latency and cost

Fastly

Use Case	Company / Scenario	Driving Property	Scale Dimension	Why Not Alternative
Real-time news, sports, finance	The New York Times, NPR, Reddit, Vimeo, Shopify storefronts	Sub-150ms instant global purge by surrogate key	Millions of pages, high write rate, must update fast	CloudFront purge takes minutes; Cloudflare cache purge is best-effort
E-commerce stock and pricing accuracy	Shopify Plus stores, GitHub Marketplace, ticketing sites	Cache hit for the hot view, purge in ms when inventory changes	10K+ purges/sec sustained	CDNs without instant purge force shorter TTL, which kills hit rate
API caching with VCL	GitHub API in front of Rails, fintech APIs with per-tenant rules	VCL allows complex per-request caching logic at edge	Billions of API requests/day	Cloudflare Workers can express it but VCL is closer to the cache, fewer abstraction layers
Compute@Edge for personalization	News personalization, dynamic header injection, request enrichment	Wasm runtime, multi-language, near-zero cold start	10M+ personalized responses/day at edge	Lambda@Edge has higher cold start; CloudFront Functions too limited for complex logic
Image and video on-the-fly transformation	Spotify cover art delivery, news photo pipelines	Edge image optimization plus VCL routing to origin variants	PB-scale image bandwidth, sub-second purge	CloudFront image transforms are bolted on; lower control over caching keys

Varnish Cache

Use Case	Company / Scenario	Driving Property	Scale Dimension	Why Not Alternative
On-prem reverse proxy cache	Banks and healthcare providers with PII not allowed off-network	VCL programmability without sending traffic to a SaaS CDN	10-100K req/sec on-prem, regulated industry	SaaS CDNs (Cloudflare, Fastly) need data egress; compliance does not allow
Origin shield for SaaS CDN	Sites running Varnish in their own data centers in front of Fastly or Cloudflare	Extra layer of cache outside SaaS CDN, customizable VCL	Multi-tier cache, additional purge controls	SaaS-only deployment leaves no programmable cache layer under your control
News and media internal CDN	Major publishers (e.g., Wikimedia historically) running Varnish at edge	ESI (Edge Side Includes), grace, surrogate keys all out of the box	Wikimedia served much of Wikipedia from Varnish layers historically	Building this on NGINX requires Lua or external coordination; building on Apache is a non-starter
API gateway-style caching	Internal APIs cached in front of slow services	Conditional GET, ETag, full HTTP semantics, programmable in VCL	100K+ API RPS, low origin RPS	NGINX `proxy_cache` is adequate but lacks ESI and grace semantics
Cache layer for legacy app modernization	Strangler pattern: Varnish in front of legacy monolith	VCL routes some paths to new microservices, others to legacy	Migration period: months to years	NGINX can route but VCL's caching semantics during routing are more powerful

NGINX

Use Case	Company / Scenario	Driving Property	Scale Dimension	Why Not Alternative
Reverse proxy with opportunistic cache	Most production microservice stacks, Kubernetes Ingress (Ingress NGINX)	Single binary handles TLS, LB, cache, all of it	10K+ RPS per node, multi-service routing	Varnish does not do TLS natively; HAProxy lacks caching; running both is more ops surface
API gateway entry point	Banking APIs, Kong/Tyk built on top of NGINX/OpenResty	Programmable routing plus rate-limit plus auth plus cache, all in one	10K+ tenant APIs behind one NGINX fleet	Pure CDN does not handle internal API gateway needs; dedicated API gateways are more expensive
Cache in front of slow CMS	WordPress sites using NGINX `fastcgi_cache`	Disk-backed cache reduces PHP-FPM load by 95%+	1M+ pageviews/day on $20/mo VPS	Varnish works but adds an operational layer for a workload NGINX already handles
TLS termination plus cache for static	Single-tenant SaaS, internal portals, dev environments	One binary, configured via plain text, predictable behavior	1-100 RPS per box, dozens of boxes	Cloudflare for free tier works but adds external dependency for internal use
Edge of a private CDN	Self-built CDN using NGINX boxes per geographic POP	Full control over routing, caching, peering decisions	10s of POPs, GB/sec aggregate	SaaS CDN cheaper unless geo-presence requires it; Varnish lacks TLS without sidecar

Limitations

Per-technology limitation tables with severity badges and workaround cost.

Group A — In-Memory Caches

Redis

Limitation	Severity	Workaround	Workaround Cost
Single-threaded command execution	High	Shard across cluster mode, or use Valkey/KeyDB	App-side cluster-awareness, possible Lua rewrites, hash-tag changes
AGPLv3 / RSALv2 / SSPLv1 license	High	Pay Redis Enterprise, or migrate to Valkey	Either ongoing license fee or migration project (weeks for ops, more for surface-area dependencies)
RAM is the ceiling	High	Redis on Flash (commercial), or aggressive eviction policies	Commercial license; or hit rate degradation under `allkeys-lru`
BGSAVE forks the process	Medium	Disable RDB, rely on AOF only; or take snapshots on replica	Replica overhead; AOF-only has different recovery semantics
Cluster multi-key ops restricted	Medium	Hash tags ({tag}key1, {tag}key2) to colocate keys	Schema design constraints; risk of hot slot if tag is too narrow
Lua scripts are blocking	Medium	Keep scripts short; use Functions (Redis 7+) for organized libraries	Forced discipline; long script equals cluster-wide slowdown

Memcached

Limitation	Severity	Workaround	Workaround Cost
No persistence	High	Treat cache loss as a planned event; do rolling restarts	Application-level warmup logic; coordinated deploys
No replication	High	Use consistent hashing on the client; tolerate partition loss on node failure	Database needs to be able to serve full miss rate during failure
1MB default value limit	Medium	Chunk large values client-side; or raise `-I` to 128MB max	Client complexity; slab allocator inefficiency at large values
String values only	Medium	Serialize with MessagePack/Protobuf client-side	App-side serialization complexity; schema migration is harder
250-byte key limit	Medium	Hash long keys (SHA256) on the client	Lose key debuggability; cannot use SCAN-like introspection
No atomic compound ops	Medium	Use CAS for single-key updates; design around per-key atomicity	App-side retry loops; cannot do multi-key transactions
Slab calcification	Medium	Enable `slab_reassign` and `slab_automove`	Brief throughput dip during reassignment; needs tuning

Valkey

Limitation	Severity	Workaround	Workaround Cost
Forked from Redis 7.2.4; missing Redis 8 features	Medium	Wait for community port, or accept feature gap	Lose vector sets, newer modules; community port arrives months later
Smaller ecosystem of third-party tools	Medium	Use Redis-compatible tools (RedisInsight, etc.)	Some Redis-specific tooling assumes commercial features; minor compat seams
Multi-DB cluster mode (9.0) is new	Medium	Stay on single-DB cluster mode until 9.x has miles	Lose the multi-tenancy benefit; harder logical separation
Module compatibility incomplete	Medium	Stick to first-party modules (JSON, Bloom) plus core types	RediSearch, RedisTimeSeries equivalents need separate evaluation
RDMA support is experimental	Medium	Use TCP only for production until RDMA hardens	Cannot exploit kernel-bypass perf yet
Same single-threaded core as Redis	High	Cluster mode for horizontal scale	Same cluster-aware client cost as Redis

KeyDB

Limitation	Severity	Workaround	Workaround Cost
Snap is sole steward; uncertain roadmap	Critical	Have a Valkey or Redis migration plan ready	Migration project sitting in the backlog; license decision pre-made
No paid support / SLA	High	Self-support via GitHub, or migrate to a supported alternative	Engineering on-call burden grows; no escalation path on data-loss bugs
Diverging from Redis API over time	Medium	Pin to Redis commands available in 6.2 era; avoid post-7.0 commands	Lose feature velocity from Redis ecosystem
Spinlock-based threading is tuning-sensitive	Medium	Pin CPUs, tune `server-threads` to physical cores	Hardware-aware deployment; harder in shared k8s environments
Active-active replication uses LWW	High	Design data model to avoid concurrent same-key writes	App-level partition-by-region or write-quorum logic
Flash tier adds tail-latency variance	Medium	Profile workload; disable Flash if access is uniform	Lose cost savings of flash tier

Group B — HTTP / Edge Caches

Amazon CloudFront

Limitation	Severity	Workaround	Workaround Cost
Lambda@Edge must deploy from us-east-1	Medium	Accept the constraint; treat us-east-1 as the deploy region	Lambda@Edge availability tied to us-east-1 IAM; us-east-1 outages affect deploys
Purge propagation is minutes, not milliseconds	High	Use short TTL and rely on revalidation; or use Origin Shield + invalidations sparingly	Short TTL kills cache hit rate; invalidations have request quota
Per-region pricing is opaque	Medium	Sample your real geo-distribution; price out by region tier	Engineering time on cost modeling; surprise bills in expensive regions
CloudFront Functions limited (CPU, no network)	Medium	Use Lambda@Edge for complex logic; live with the cost difference	6x cost per request, plus duration billing
Vendor lock-in to AWS	Medium	Abstract edge logic into portable Wasm modules where possible	Engineering effort to keep things portable; mostly aspirational at scale
Real-time logs require Kinesis	Medium	Use 1% sampling at steady state; ramp during incidents	Sampling means you miss low-frequency events at full fidelity

Cloudflare

Limitation	Severity	Workaround	Workaround Cost
Single-vendor blast radius (Nov 2025 outage)	Critical	Multi-CDN failover (Cloudflare + CloudFront or Fastly)	Doubled CDN bill plus DNS-failover complexity; meaningful engineering project
KV: 1 write/sec per key cap	High	Use Durable Objects for high-write workloads; KV is read-heavy only	Different mental model; Durable Object cost and availability characteristics differ
Workers limits: 50-128 MB memory, 30s CPU	Medium	Decompose into chained workers; offload heavy work to origin	Code complexity; worker-to-worker calls add latency
Cache API ephemeral per-POP	Medium	Use Tiered Cache or fall back to KV for shared cache	Tiered Cache adds latency; KV is eventually consistent
ToS 2.8 unmetered-bandwidth ambiguity	Medium	Pre-negotiate enterprise contract if traffic profile is non-website	Enterprise contract pricing; loss of free-tier predictability
Bot Management can over-block legitimate traffic	Medium	Tune bot rules; whitelist known crawlers and monitoring	Ongoing rule maintenance; risk of regression after default changes

Fastly

Limitation	Severity	Workaround	Workaround Cost
Premium per-GB pricing vs CloudFront/Cloudflare	High	Use Fastly for dynamic content; offload static media to cheaper CDN	Multi-CDN management; per-asset CDN routing decisions
Fewer POPs than Cloudflare	Medium	Combine with another CDN in markets where Fastly is thin	Multi-CDN routing logic and DNS
VCL is a niche language	Medium	Invest in team VCL training; document patterns internally	Hiring is harder; senior VCL engineers are rare
2021 outage reputation	Medium	Demonstrate post-incident config-rollout improvements during security review	Procurement friction; need to write the explanation up
Real-time analytics priced separately	Medium	Sample; use rolled-up metrics for steady state	Reduced debuggability for low-frequency issues
Smaller free tier than Cloudflare	Medium	Use Fastly's developer tier for evaluation; commit on production	No "free forever" path; cost starts day one for production

Varnish Cache

Limitation	Severity	Workaround	Workaround Cost
No native TLS termination	High	Run Hitch, HAProxy, or NGINX in front	Doubled ops surface; another process to monitor and patch
No cluster purge	High	Use Varnish Broadcaster or build HTTP-fanout; or pay for Varnish Enterprise	Custom infrastructure or commercial license
You operate the fleet	High	Use a SaaS like Fastly (built on customized Varnish) instead	Lose self-hosted control and customization depth
VCL is C-level powerful and C-level dangerous	Medium	Strict VCL review; canary every change	Engineering review process; harder to ship quickly
Custom VMODs are unsafe C extensions	High	Stick to community-maintained VMODs; avoid bespoke C in cache process	Lose extensibility benefit that drove the choice to Varnish
No persistent cache across restart (in OSS)	Medium	Use Varnish Enterprise MSE (massive storage engine)	Commercial license required

NGINX

Limitation	Severity	Workaround	Workaround Cost
Caching is a feature, not the focus	Medium	Use Varnish or Fastly for cache-heavy workloads	Extra process or vendor in the stack
No real programmability without Lua	Medium	Use OpenResty for Lua; or NGINX Plus dynamic modules	Either Lua language overhead, or NGINX Plus license
No native cluster purge	High	Build HTTP fanout for invalidation; or use NGINX Plus cache API	Custom infrastructure; or commercial license
Some features Plus-only (active health, dynamic config)	Medium	Pay for NGINX Plus, or use OpenResty/custom scripting	License cost; or engineering time
proxy_cache uses disk; flash wears under heavy writes	Medium	Use enterprise-grade NVMe; rotate disks in fleet maintenance	Hardware cost; fleet management overhead
Worker tuning is non-obvious	Medium	Tune `worker_processes` and `worker_connections` to actual cores and connection pattern	Performance engineering time; load testing

Fault Tolerance

Per-group matrix tables. Rows are fault-tolerance dimensions; columns are technologies.

Group A — In-Memory Caches

Dimension	Redis	Memcached	Valkey	KeyDB
Replication model	Leader-follower async (semi-sync via WAIT)	None in OSS; repcached is third-party	Leader-follower async (RESP3-based)	Active-active (multi-master) or leader-follower
Failure detection	Sentinel quorum-based heartbeats; Cluster gossip in cluster mode	Client-side only; no server detection	Same as Redis (Sentinel or Cluster gossip)	Cluster gossip plus active-replication heartbeats
Failover mechanism	Sentinel-driven leader election; Cluster mode slot reassignment	Client rehashes; affected slots cold-miss	Same as Redis	Promote replica or rely on active-active peer
RTO (typical)	10-30s (Sentinel); 1-15s (Cluster failover)	0s for survivors; cold-start time for replaced node	Similar to Redis; ~10-15s	Sub-second on active-active failover
RPO (typical)	Async replication lag (ms-seconds); higher with AOF everysec	Full loss of failed node's data	Same as Redis	Last-write-wins may cause partial data inconsistency
Split-brain behavior	Cluster quorum prevents most; Sentinel min-replicas-to-write helps	N/A (no replication)	Same as Redis	Both sides accept writes; conflicts resolved via LWW (data loss possible)
Blast radius of single-node failure	Slots owned by failed node (1/N of keyspace) until replica promoted	Keys hashed to failed node (1/N) become misses until rehash	Same as Redis	Lower (peer keeps serving); depends on hash slot ownership
Cross-region failover	Manual; or use Redis Enterprise CRDB (commercial)	Application-level (multi-cluster, route on miss)	Same as Redis	Active-active across regions is the headline feature
Data loss scenarios	Async-replicated writes lost on leader failure; AOF rewrite mid-crash	Any node restart, any process crash	Same as Redis	Conflicting concurrent writes; LWW silently drops one

Group B — HTTP / Edge Caches

Dimension	CloudFront	Cloudflare	Fastly	Varnish (self-hosted)	NGINX (self-hosted)
Replication model	Per-POP independent cache; tiered/Origin Shield optional	Per-POP cache plus Tiered Cache and Argo	Per-POP cache plus shield POPs	None native (per-node); use Broadcaster for fanout	None native (per-node); manual sync needed
Failure detection	AWS-internal anycast withdrawal on POP failure	Anycast plus DNS plus health-aware routing	BGP anycast plus health checks	Your LB does it	Your LB does it
Failover mechanism	Anycast routes around bad POPs automatically	Anycast plus traffic engineering	Anycast plus traffic engineering	L4 LB removes unhealthy node from pool	L4 LB removes unhealthy node from pool
RTO (typical)	Seconds for anycast convergence	Seconds for anycast convergence	Seconds for anycast convergence	5-30s depending on LB health-check interval	5-30s depending on LB health-check interval
RPO (typical)	N/A (cache rebuilds from origin)	N/A (cache rebuilds from origin)	N/A (cache rebuilds from origin)	N/A	N/A
Split-brain behavior	Different POPs may serve different cached versions briefly	Same; eventual consistency on cache state	Same; instant purge mitigates	Each node has independent cache; client may see version skew	Same as Varnish
Blast radius of single-POP failure	Users in that geo route to next POP; brief RTT increase	Same	Same; with fewer POPs, the next-POP RTT delta is larger	Local to the data center; all traffic routes through LB	Same as Varnish
Cross-region failover	Anycast; or Route 53 health-aware records	Anycast handles it transparently	Anycast handles it transparently	DNS-based (Route 53, etc.) at slower RTO	DNS-based at slower RTO
Data loss scenarios	N/A for cache; origin is source of truth	2025-11-18: bad config crashed proxy processes globally for hours	2021: bad config triggered global outage ~1hr	Crash plus bad VMOD can corrupt local cache state	Crash without graceful shutdown can leave partial cache files

Sharding

Group A — In-Memory Caches

Dimension	Redis	Memcached	Valkey	KeyDB
Sharding model	Hash slots (16384); client- or proxy-aware in cluster mode	Client-side consistent hashing (Ketama)	Hash slots (16384) inherited from Redis Cluster	Hash slots (16384) Redis-compatible
Shard key constraints	Multi-key ops require keys in same slot (hash tag pattern)	Each key independent; no cross-key ops anyway	Same as Redis	Same as Redis
Rebalancing mechanism	CLUSTER SETSLOT migration, slot-by-slot	Add/remove node, client recomputes ring	Atomic slot migration (9.0); previously slot-by-slot	Slot-by-slot Redis-compatible migration
Rebalancing cost / impact	Live migration; brief MOVED redirects per slot	Cache miss spike during ring change; warmup required	Atomic migration in 9.0 reduces redirect churn	Similar to Redis
Hot-shard behavior	One slot saturates one CPU; hot key has no automatic mitigation	One node saturates; consistent hashing distributes load	Same as Redis	Multi-threading helps within a hot shard, but hot key still limited
Maximum shards (practical)	~1000 nodes per cluster; gossip overhead grows quadratically	Hundreds of nodes (no inter-node coordination)	Same as Redis	Hundreds of nodes
Resharding without downtime?	Yes, online with brief MOVED responses	Yes, but warm-up cost on rehash	Yes, faster with atomic slot migration	Yes
Cross-shard query support	None natively; multi-key ops require same slot	None (no compound ops anyway)	Same as Redis	Same as Redis

Group B — HTTP / Edge Caches

Dimension	CloudFront	Cloudflare	Fastly	Varnish (self-hosted)	NGINX (self-hosted)
Sharding model	Per-POP independent cache; same key may exist on 100s of POPs	Per-POP plus optional Tiered Cache hierarchy	Per-POP plus shield POP hierarchy	Per-node (you decide cluster topology)	Per-node (you decide cluster topology)
Shard key constraints	Cache key built from URL plus Vary headers	Same; URL plus cache-key rules	VCL controls cache key explicitly	VCL controls cache key explicitly	NGINX `proxy_cache_key` directive
Rebalancing mechanism	N/A; cache rebuilds organically from origin	N/A; same	N/A; same	Manual (deploy new fleet, drain old)	Manual (deploy new fleet, drain old)
Rebalancing cost / impact	Adding POPs has no migration cost (each warms from origin)	Same	Same	Cold-cache warmup as traffic shifts	Same
Hot-shard behavior	Single hot URL multiplied across all POPs; rarely a problem	Same; Tiered Cache helps	Same; shield POP consolidates	Hot URL on one node; LB can rebalance	Same as Varnish
Maximum shards (practical)	1600+ POPs; bounded by AWS	335+ cities; bounded by Cloudflare	~100 POPs but each is bigger	Self-imposed; typically 5-50 nodes per DC	Same
Resharding without downtime?	Implicit (AWS adds POPs without customer action)	Implicit	Implicit	Yes, by draining one node at a time	Yes, by draining one node at a time
Cross-shard query support	N/A (cache is per-POP)	Tiered Cache promotes a fetch to a higher-tier POP	Origin Shielding consolidates fetches	N/A; LB chooses one node	N/A; LB chooses one node

Replication

Group A — In-Memory Caches

Dimension	Redis	Memcached	Valkey	KeyDB
Replication topology	Leader-follower (single-leader)	None in OSS	Leader-follower (single-leader)	Multi-leader (active-active) or leader-follower
Sync vs async	Async by default; WAIT for semi-sync guarantee	N/A	Async by default; WAIT supported	Async multi-master replication
Replication factor (default / max)	Default 1 replica; practical max 3-5 replicas per master	N/A	Same as Redis	Multiple active peers (2-5 typical)
Consistency level options	Eventual (default); semi-sync via WAIT N TIMEOUT	N/A	Same as Redis	Eventual (LWW conflict resolution)
Replication lag (typical)	Sub-ms in same AZ; ms-seconds cross-region async	N/A	Same as Redis	Sub-ms to ms (depends on network)
Conflict resolution	N/A (single leader prevents conflicts)	N/A	N/A (single leader)	Last-write-wins by timestamp
Cross-region replication	Manual or Redis Enterprise CRDB (CRDTs, commercial)	Application-level only	Manual same as Redis	Built-in active-active replication is the headline
Replication during partition	Replica stops receiving updates; can be promoted by Sentinel	N/A	Same as Redis	Both sides accept writes; reconciled via LWW on heal

Group B — HTTP / Edge Caches

Dimension	CloudFront	Cloudflare	Fastly	Varnish (self-hosted)	NGINX (self-hosted)
Replication topology	None for cache (each POP independent); origin is source of truth	Same; Tiered Cache adds a hierarchy	Same; Origin Shielding adds a hierarchy	None native	None native
Sync vs async	N/A (no inter-POP replication of cache state)	N/A	N/A	N/A	N/A
Replication factor (default / max)	Effectively unlimited (each POP can cache same object)	Same	Same	Per-node; typically 2-5 nodes per DC	Per-node; typically 2-5 nodes per DC
Consistency level options	TTL plus purge (invalidation request, minutes to propagate)	TTL plus instant purge plus tag-based purge	TTL plus sub-150ms instant purge by surrogate key	TTL plus PURGE (per node) plus surrogate keys	TTL plus PURGE (per node); no native tags
Replication lag (typical)	Invalidations: 5-60s typical; up to minutes	Cache purge: seconds; KV writes: up to 60s globally	Instant purge: under 150ms globally	N/A (no replication); per-node purge is local	Same as Varnish
Conflict resolution	N/A (cache is read-only of origin)	N/A	N/A	N/A	N/A
Cross-region replication	Inherent (every POP independently caches from origin)	Inherent	Inherent	Build your own (multi-DC fleet)	Build your own
Replication during partition	POPs operate independently; if a POP is partitioned from origin, serves stale until TTL	Same; serve-while-revalidate is configurable	Grace and stale-while-revalidate handle origin partitions cleanly	Grace + stale-while-revalidate (VCL)	NGINX has `proxy_cache_use_stale` for similar behavior

Better Usage Patterns

Patterns most teams miss. What gets called out in code review at L6/L7.

Group A — In-Memory Caches

Redis

Pattern	What Most Teams Do Wrong	The Better Way	Why It Matters
Cache stampede prevention	TTL expires, every request races to recompute, origin gets hammered	Use SET NX EX lock or probabilistic early refresh (XFetch)	One cold key during peak traffic can cascade to a full outage; this pattern is the most common Redis-related production fire
Pipelining for bulk operations	One `SET` per call, app-side latency dominated by RTT	Pipeline 100-1000 ops; or use `MSET`/`MGET` for atomic batches	Single-trip latency to Redis is ~0.5ms; 1000 separate calls is 500ms; pipelined is 1-2ms
Avoid O(N) commands on production	`KEYS *`, `SMEMBERS` on huge sets, `HGETALL` on huge hashes	Use `SCAN`, `SSCAN`, `HSCAN` with COUNT bounds	O(N) commands block the single thread; one large SMEMBERS can stall every other client for seconds
Use hash tags for cluster colocation	Cluster mode adopted, multi-key ops break, half the Lua scripts return CROSSSLOT	Design keys as `user:{user_id}:profile`, `user:{user_id}:sessions` with shared tag	Cluster mode failure mode for an unprepared app is silent: ops just fail with CROSSSLOT, app sees half-success
Connection pooling, not per-request	App opens a connection per request; Redis hits `maxclients` ceiling	Use lazy pool (Jedis pool, lettuce shared, ioredis cluster client)	Connection setup is 0.5-2ms; on a hot path, it doubles every cache call
Set `maxmemory` and a sane eviction policy	Default `noeviction`; writes start failing when memory fills	`maxmemory` set to 80% of node RAM, `maxmemory-policy allkeys-lru` for cache use	`noeviction` makes Redis fail writes silently from the app's perspective; cache becomes write-failure layer
Use replicas for read-scaling carefully	Reads go to async replicas; users see stale data, application doesn't expect it	Mark replica reads as "read-stale-acceptable"; route consistent reads to leader	Replicas can lag seconds under load; if your app assumes read-your-writes, it will silently break

Memcached

Pattern	What Most Teams Do Wrong	The Better Way	Why It Matters
Use CAS for safe updates	GET then SET, two clients overwrite each other's update	Use `gets` (returns CAS token) then `cas` (conditional set)	Without CAS, multi-step updates are inherently racy; lost-update bugs hide for months
Cache version prefix in key	Schema change requires full cache flush; thundering herd	Prefix key with serialization version: `v3:user:123`	Version bump invalidates old keys gracefully; new and old can coexist during deploy
Enable slab automove	Default config; one slab class fills, others sit empty, hit rate degrades	Set `slab_reassign=1` and `slab_automove=1`	Slab calcification silently drops hit rate by 20-40% over weeks; few teams notice
Use binary protocol for high RPS	Text protocol default; parsing overhead at high throughput	Use binary protocol clients (libmemcached, pymemcache binary mode)	Binary protocol is roughly 2x faster on parse; matters at 100K+ ops/sec/client
Plan for rolling restart	Fleet restart equals 100% cache miss equals DB falls over	Restart one node at a time; warm new fleet before swapping	Deploys are routine; cold-cache outage on deploy is the most preventable Memcached fire
Set TTL aggressively	No TTL set; LRU eviction silently drops keys at unpredictable times	Always set explicit TTL (even 86400 for "daily")	Explicit TTL makes cache behavior predictable; LRU-only means oldest least-popular wins, hard to reason about

Valkey

Pattern	What Most Teams Do Wrong	The Better Way	Why It Matters
All Redis patterns apply	Treat Valkey as a different product, miss decades of Redis lore	Apply Redis best practices (stampede prevention, pipelining, hash tags) directly	Valkey is wire-compatible; the operational patterns transfer 1:1
Pin to features in your fork window	Use Redis 8 features (vector sets) and assume they will land in Valkey	Stick to features available in Valkey 8.x at adoption; track Valkey roadmap for newer needs	The fork is at 7.2.4; anything past that has to be ported. Don't build on features that don't exist yet.
Use first-party modules	Assume RediSearch / RedisGraph will work; surprise on adoption	Use Valkey-Bloom and Valkey-JSON (AWS/Google contributed); evaluate alternatives for search	First-party modules are BSD; commercial Redis modules have license incompatibility
Enable I/O threading deliberately	Set `io-threads` high "for performance"; spinlocks degrade throughput	Set `io-threads` to ~half of physical cores; benchmark; iterate	I/O threading is a single dial that can make or break perf; default is conservative
Adopt 9.0 features carefully	Atomic slot migration and multi-DB cluster mode adopted on day one for prod	Stay on 8.1 LTS for critical workloads; pilot 9.0 features on a non-critical cluster first	9.0 is October 2025; major features need 6+ months of fleet miles before mission-critical use
Plan migration from Redis with TLS settings in mind	Drop-in migration breaks because TLS ciphers or auth modes differ	Test in staging with prod TLS config; many managed-Valkey vendors wrap auth differently	The "drop-in" promise is true for wire protocol; auth and TLS often need glue code

KeyDB

Pattern	What Most Teams Do Wrong	The Better Way	Why It Matters
Tune server-threads to physical cores	Set threads equal to vCPUs in cloud VMs; spinlocks contend	Set threads to physical cores; pin CPUs; avoid noisy-neighbor VMs	KeyDB's spinlock model assumes exclusive cores; oversubscription destroys the perf claim
Plan for migration off KeyDB	Treat KeyDB as a long-term stable choice	Have a Valkey or Redis migration plan documented; pin to Redis 6.2 compatible features	Snap's strategic ambiguity means roadmap risk; the option value of being migration-ready is real
Avoid concurrent same-key writes across active-active peers	Trust active-active to "just work"; LWW silently loses writes	Partition writes by key prefix per region; or use single-leader with fast promotion	LWW is fine for cache (recompute on miss); for anything close to durable state, it's a foot-gun
Treat Flash tier as cost optimization, not capacity	Enable Flash to "fit more data"; tail latency suffers	Use Flash only when the workload is genuinely hot-cold; benchmark P99 vs RAM-only	Flash hits are 10-50x slower; for uniform access, you've added variance for no benefit
Self-support readiness	Assume GitHub issues will get answered	Internal runbook for common KeyDB failures; budget for self-debugging	No SLA, no commercial escalation; production support is your team's responsibility

Group B — HTTP / Edge Caches

Amazon CloudFront

Pattern	What Most Teams Do Wrong	The Better Way	Why It Matters
Default to CloudFront Functions over Lambda@Edge	Use Lambda@Edge for everything because it's "more capable"	Start with CloudFront Functions; escalate to Lambda@Edge only on network or SDK need	For 1B requests/month, Functions cost $100 vs Lambda@Edge $600+; the default choice is a 6-10x cost gap
Use Origin Shield deliberately	Enable everywhere for "better cache hit"	Enable only when origin is expensive (Lambda, complex queries) or cache hit ratio is low	Origin Shield adds a hop and a per-GB charge; for high-cache-hit static workloads it's a net negative
Normalize cache keys to reduce variance	Default cache key includes all query strings, headers; cache hit ratio is terrible	Use CloudFront Functions to canonicalize; whitelist only meaningful query params	One uncontrolled query param can cut hit rate by 80%; classic engineering miss
Use signed cookies, not signed URLs, for sessions	Sign every URL; user shares URL, downstream cache hit suffers	Sign cookies for session-bounded access; URLs stay cacheable	Signed URLs are per-user; signed cookies allow caching the same URL across the user's session
Route via Origin Failover	Single origin, no fallback; origin outage equals user-visible 5xx	Configure origin groups with failover criteria (502, 504, etc.)	One config change, multi-origin resilience without app changes
Use cache policies, not legacy "Cache Based on Selected Request Headers"	Use the legacy radio buttons; cache key is messy	Define explicit cache policies as IaC (CDK, Terraform); version them	Cache policy is reusable across distributions; legacy mode forces per-distribution tweaks and drift

Cloudflare

Pattern	What Most Teams Do Wrong	The Better Way	Why It Matters
Multi-CDN failover, not Cloudflare-only	Trust the SLA; nothing else needed	Active-passive with CloudFront or Fastly; DNS-failover or active-active anycast	2025-11-18 outage took down sites globally for hours; multi-CDN is the proven mitigation
Use Cache API for hot per-POP data, KV for global reference	Use KV for everything because it's "the data layer"	Cache API for fresh fetches in same POP; KV for config / reference data	KV has 1 write/sec/key and seconds of replication lag; Cache API has neither
Workers Smart Placement for origin-heavy workloads	Workers run at every POP, including ones far from origin	Enable Smart Placement to colocate Worker with origin region	For workloads where the Worker mostly calls origin, placing it next to origin reduces P99 significantly
Use Tiered Cache for low-hit-rate origins	Trust default per-POP cache	Enable Tiered Cache (Smart Routing) so misses funnel through fewer POPs	Per-POP miss multiplies origin load by N; tiered cache reduces it by an order of magnitude
Pin Bot Management whitelist for known crawlers	Trust the defaults; monitoring or AI crawlers get challenged	Whitelist known user-agents and IPs; tune rules per route	Cloudflare's Bot Management defaults block more than people realize; over-blocking is a silent bug source
Plan for 2.8 ToS implications	Build a media-streaming product on Free / Pro plan	Negotiate enterprise contract before traffic profile shifts	The unmetered-bandwidth promise has limits; surprise enterprise sales call mid-launch is bad

Fastly

Pattern	What Most Teams Do Wrong	The Better Way	Why It Matters
Use surrogate keys for cache invalidation	Invalidate by URL pattern; brittle, hard to map content to URL	Tag responses with surrogate keys; purge by tag	Surrogate-key purge is Fastly's killer feature; URL-based purge is fragile and slow
Origin Shielding placement	Use the default shield POP; not necessarily closest to origin	Set shield to the POP closest to origin region	Shield-to-origin latency directly affects cache miss tail latency
VCL discipline	Pile logic into `vcl_recv`; complex if/else trees	Use Fastly's subroutine convention; split logic by phase (recv, hash, fetch, deliver)	VCL phases have semantic meaning; mixing them causes subtle bugs (e.g., cache key set after lookup)
Compute@Edge for the right workloads	Rewrite everything in Compute@Edge because "Wasm is the future"	Use Compute@Edge for request enrichment, A/B routing, complex auth; keep simple caching in VCL	Wasm cold-start is near-zero but each invocation has cost; VCL is free per-request
Use Edge Dictionaries for config	Hardcode redirect maps and feature flags in VCL	Use Edge Dictionaries; update without redeploying VCL	VCL deploys go through compile; dictionary updates are near-instant via API
Stale-while-revalidate aggressively	Short TTL plus no stale handling; origin sees miss-storms	Long TTL plus `stale-while-revalidate` plus surrogate-key purge	Combining long TTL with fast invalidation is the whole point of Fastly; lots of teams underuse this

Varnish Cache

Pattern	What Most Teams Do Wrong	The Better Way	Why It Matters
VCL phase discipline	Cram all logic into `vcl_recv`	Place logic in the right phase: `vcl_recv`, `vcl_hash`, `vcl_backend_fetch`, `vcl_deliver`	Phases run in a specific order with specific available variables; misplacement causes silent misbehavior
Use surrogate keys (xkey VMOD)	Purge by URL pattern; brittle, slow	Tag responses with surrogate keys via xkey; purge by tag	The single highest-leverage Varnish pattern; most production Varnish fleets that don't use it should
Cluster purge fanout	PURGE on one node; assume cluster-wide effect	Use Varnish Broadcaster or custom HTTP-fanout to propagate	Without fanout, multi-node clusters serve inconsistent content; user-visible bug
Use `std.log` for VCL debugging, not regsub-heavy headers	Add 20 debug headers in `vcl_deliver`; pollute responses	Use `std.log` + `varnishlog` for debugging; ship logs separately	Production responses with debug headers are noise; logs are a better audit trail
Grace and keep tuning	Default 10s grace; origin outage triggers user-visible errors	Set `beresp.grace` to minutes-hours; combine with `stale-while-revalidate`	Grace is Varnish's resilience superpower; underused by teams used to TTL-only thinking
Always run a TLS sidecar (Hitch, NGINX, HAProxy)	Try to terminate TLS in Varnish; discover it's not supported	Hitch on same host for TLS; pass cleartext to Varnish via UDS	The pattern is so universal it should be the default in any Varnish runbook

NGINX

Pattern	What Most Teams Do Wrong	The Better Way	Why It Matters
Enable `proxy_cache_lock` for stampede prevention	Cache key expires; 1000 requests hit origin simultaneously	Set `proxy_cache_lock on` to serialize cache fills	The single most missed NGINX cache directive; one line prevents the most common origin overload
Use `proxy_cache_use_stale` for origin outages	Origin 500s; users see 500s	Configure `proxy_cache_use_stale error timeout updating`	Two lines of config turn cache into resilience layer; the equivalent of Varnish grace
Cache key normalization	Default cache key includes full query string; low hit rate	Use `map` to normalize; cache key includes only meaningful query params	Same impact as on CloudFront: uncontrolled query params kill hit rate
Tune worker count to actual cores	`worker_processes auto`; not always right under containerized limits	Set explicitly to physical core count; pin with `worker_cpu_affinity` at high scale	In containers, auto reads cgroup limits incorrectly on older versions; perf suffers silently
Health checks via NGINX Plus or third-party	Trust upstream is up; one failing backend gets traffic	NGINX Plus active health checks; or nginx-upstream-dynamic-servers module for OSS	Without active health checks, NGINX only knows about failures after a request fails; first request after failure pays the cost
Use `open_file_cache` for static-heavy workloads	Static file metadata lookup per request; `stat()` bottleneck	Enable `open_file_cache` to cache fd metadata	For high-RPS static workloads, file metadata cache is the difference between 10K and 100K RPS per worker

Advanced / Next-Gen Alternatives

What's replacing or augmenting each technology, and what to watch.

Group A — In-Memory Caches

Redis

Successor / Alternative	What It Improves	Maturity	Migration Cost	When To Consider
Valkey	BSD license, faster community velocity, atomic slot migration	Production	Low (wire-compatible)	If AGPL/RSAL is a blocker; new builds default to Valkey
DragonflyDB	Multi-threaded from the ground up, claims 25x throughput on big nodes	Emerging	Medium (Redis API mostly; some semantic gaps)	If single-node throughput is the bottleneck and full Redis-API parity isn't required
Microsoft Garnet	C# implementation, RESP-compatible, high throughput, advanced storage tiers	Emerging	Medium	Microsoft-stack shops; experimental for high-end research-grade perf
Amazon MemoryDB	Multi-AZ durable Redis with consensus-backed writes	Production	Low	When you need cache plus database guarantees, not just cache

Memcached

Successor / Alternative	What It Improves	Maturity	Migration Cost	When To Consider
Valkey / Redis with threaded I/O	Adds replication, persistence, rich types while approaching Memcached throughput	Production	Medium (different API, but more capable)	When you've outgrown raw GET/SET
Meta's CacheLib (open-sourced)	In-process embedded cache, used to power Meta's caching	Production	High (embedded library, not service)	When you want cache as a library inside your service, not a separate fleet
Aerospike	Hybrid memory plus SSD, sub-ms even at TB scale, multi-DC replication	Production	High (different data model)	When cache size outgrows RAM economics and you still need sub-ms reads
Hyperscale alternatives (CacheLib, Meta TAO)	Purpose-built for scale beyond what Memcached envisioned	Production	Very high	When you operate at Meta or Google scale; otherwise stick with Memcached

Valkey

Successor / Alternative	What It Improves	Maturity	Migration Cost	When To Consider
Redis 8 (AGPLv3)	Vector sets, newer modules, faster feature velocity	Production	Low (wire-compatible)	If AGPLv3 is acceptable and you need Redis 8 features
DragonflyDB	Multi-threaded from scratch, BSL license, drop-in API	Emerging	Medium	When threading model is the bottleneck and BSL is acceptable
Amazon MemoryDB for Valkey	Multi-AZ durable Valkey with consensus writes	Production	Low	When you need stronger durability than async replication offers

KeyDB

Successor / Alternative	What It Improves	Maturity	Migration Cost	When To Consider
Valkey 8+ with I/O threading	Active LF stewardship, faster community velocity	Production	Low	Anytime; this is the natural migration path
DragonflyDB	True multi-threaded execution, not just I/O	Emerging	Low (Redis-compatible)	When you adopted KeyDB specifically for threading and want to push further
Redis Enterprise (commercial)	Production-grade Redis with paid support and CRDB for multi-DC	Production	Low	If active-active was the KeyDB feature you needed and now you need vendor support

Group B — HTTP / Edge Caches

Amazon CloudFront

Successor / Alternative	What It Improves	Maturity	Migration Cost	When To Consider
Cloudflare	Bigger POP footprint, V8 isolates, free tier, R2 zero egress	Production	Medium (auth re-design, IAM equivalents)	Cost-sensitive workloads, multi-cloud, developer experience matters
Fastly	Instant purge, VCL programmability, fewer-larger POPs	Production	Medium (VCL learning curve)	High-write content (news, sports, inventory) where purge speed matters
Specialized media CDNs (BlazingCDN, Bunny)	Order-of-magnitude lower per-GB cost for media workloads	Production	Medium (signed URL, DRM rework)	Pure video / large-file delivery where cost dominates the decision

Cloudflare

Successor / Alternative	What It Improves	Maturity	Migration Cost	When To Consider
Multi-CDN (Cloudflare + CloudFront/Fastly)	Eliminates single-vendor blast radius (Nov 2025 outage was global)	Production	Medium-high (DNS, config drift, multi-bill)	Mission-critical workloads where 4+ hours of outage is unacceptable
Fastly Compute@Edge	Wasm runtime, multi-language, comparable performance	Production	High (rewrite from Workers JS to Wasm)	When you need a single vendor switch off Cloudflare
AWS CloudFront + Lambda@Edge / Functions	AWS-native integration, less single-vendor risk than Cloudflare for AWS-heavy stacks	Production	Medium	If your origin is AWS-deep, CloudFront is the obvious second CDN

Fastly

Successor / Alternative	What It Improves	Maturity	Migration Cost	When To Consider
Cloudflare Workers	Larger ecosystem, V8 isolates, more POPs in long-tail markets	Production	High (VCL to JS rewrite)	When VCL programmability is no longer worth the language overhead
AWS CloudFront + Lambda@Edge	AWS-native integration, predictable enterprise sales cycle	Production	Medium-high	When AWS-deep origin makes CloudFront's integration story compelling
Self-hosted Varnish + own POPs	Total control, no SaaS vendor risk	Production	Very high (operate own CDN)	Very rare: compliance or geography forces self-hosting

Varnish Cache

Successor / Alternative	What It Improves	Maturity	Migration Cost	When To Consider
Fastly	Same VCL lineage, fully managed, global anycast	Production	Low-medium (VCL ports with some Fastly-specific changes)	When operating Varnish is no longer worth the engineering effort
Varnish Enterprise (commercial)	MSE persistent storage, native clustering, paid support	Production	Low	When you want to stay self-hosted but need enterprise features
NGINX Plus	Add caching to existing NGINX without learning VCL	Production	Medium	When team already runs NGINX and Varnish's specialized cache features aren't critical
Apache Traffic Server	Yahoo-scale-proven, similar HTTP-cache focus, more permissive config	Production	High (different config language, different ops model)	Rare; mostly for very large CDN-like internal deployments

NGINX

Successor / Alternative	What It Improves	Maturity	Migration Cost	When To Consider
Envoy	Modern data-plane for service mesh, xDS dynamic config, gRPC native	Production	High (different config model, different operational story)	Microservices / service-mesh context; NGINX feels heavyweight
Cloudflare Pingora	Rust-based proxy, multi-threaded, used to power Cloudflare's edge	Emerging	Very high (library, not config; Rust)	When you want NGINX-class perf but the C codebase is the issue
HAProxy plus Varnish	Best-of-breed: HAProxy for LB/TLS, Varnish for cache	Production	Medium (operate two services)	When neither role is a side concern; you want specialists
Caddy	Automatic HTTPS, simpler config, modern Go-based	Production	Low for simple use cases	SMB and developer-facing deployments where NGINX config feels heavy

Best default choices

Search and compare

Cross-Layer Overview

Layer Characteristics

Trade-Offs

Redis In-Memory

Memcached In-Memory

Valkey In-Memory

KeyDB In-Memory

Amazon CloudFront Edge / CDN

Cloudflare Edge / CDN

Fastly Edge / CDN

Varnish Cache Edge / Self-Hosted

NGINX Reverse Proxy / Cache

Use Cases

Redis

Memcached

Valkey

KeyDB

Amazon CloudFront

Cloudflare

Fastly

Varnish Cache

NGINX

Limitations

Redis

Memcached

Valkey

KeyDB

Amazon CloudFront

Cloudflare

Fastly

Varnish Cache

NGINX

Fault Tolerance

Sharding

Replication

Better Usage Patterns

Redis

Memcached

Valkey

KeyDB

Amazon CloudFront

Cloudflare

Fastly

Varnish Cache

NGINX

Advanced / Next-Gen Alternatives

Redis

Memcached

Valkey

KeyDB

Amazon CloudFront

Cloudflare

Fastly

Varnish Cache

NGINX