Caching Stack Trade-Offs: In-Memory + HTTP/Edge
Two layers of caching, one analysis. In-memory data caches sit between app and database. HTTP/edge caches sit between user and origin. They compose, they do not compete. This artifact compares them honestly within each layer.
Caching L6/L7 DepthAs of 2026-06-08
In-memory layer: Redis won the API war but lost the license war. Valkey is now the default open-source choice for new builds; Redis 8 (AGPLv3) is fine for AGPL-compatible workloads or paying customers; Memcached remains the right answer when your cache is genuinely just a hash map and you want multithreaded raw throughput; KeyDB is on borrowed time post-Valkey momentum.
HTTP/edge layer: CloudFront wins when you are AWS-deep and need integration; Cloudflare wins on developer experience and reach (with single-vendor blast-radius risk demonstrated again on 2025-11-18); Fastly wins when you need VCL-grade programmability and instant purge; Varnish wins when you need to own the cache layer; NGINX is the right answer when caching is a side-effect of being a reverse proxy, not the main job.
Best default choices
Cross-Layer Overview
Caching is not one decision; it is three to five decisions stacked in series. In-memory caches (Group A) hold hot data near the app process; HTTP/edge caches (Group B) hold rendered responses near the user. Most production stacks run both layers. Mixing them up in design review is the most common L6 interview tell.
Layer Characteristics
| Dimension | Group A — In-Memory Cache | Group B — HTTP / Edge Cache |
|---|---|---|
| What gets cached | Application objects, sessions, computed results, leaderboards | HTTP responses, static assets, API responses with cache headers |
| Cache key | Application-defined string key | URL plus selected headers / query parameters (Vary) |
| Hit latency target | Sub-millisecond p99 in same VPC | 5-30ms p99 from user (closest POP) |
| Capacity ceiling | Limited by node RAM (cluster scales out, expensive) | Effectively unlimited at provider; per-POP disk-backed |
| Invalidation model | Explicit delete or TTL, fully under app control | TTL plus purge API, often eventual across POPs |
| Failure mode if cache dies | Thundering herd onto database, cascading failure within seconds | Origin must absorb full user traffic, often impossible at scale |
| Who owns the box | You (self-hosted) or managed service (ElastiCache, MemoryDB) | Almost always provider-managed; Varnish/NGINX are the self-host exceptions |
| Cost driver | Memory GB-hours plus cross-AZ network | Bandwidth GB egress plus per-request charges |
| Where it lives on the path | App → cache → DB (server-side, behind app) | User → cache → app (client-facing, in front of app) |
Trade-Offs
One table per technology. Each row is a real trade: you gain X by giving up Y. Columns are sortable.
Redis In-Memory
v8.0 GA May 2025, tri-licensed (RSALv2 / SSPLv1 / AGPLv3). Single-threaded command execution with threaded I/O.
| Trade-Off | What You Gain | What You Give Up | When It Bites You | PE Nuance |
|---|---|---|---|---|
| Single-threaded core execution | Atomic ops without locks, predictable per-op latency, lockless data structures | One CPU core caps your peak throughput per shard | Hot key with O(N) commands (KEYS, large SUNIONSTORE) freezes the entire instance | Redis 6+ multi-threaded only the I/O layer, not command dispatch. If your bottleneck is CPU on command execution, scale by sharding, not by adding cores. |
| Rich data structures | Lists, sets, sorted sets, streams, hashes, geo, vector sets (8.0) | Schema decisions baked into key naming, hard to evolve without migration | You build a sorted set leaderboard and later need range queries by a different field; only option is full rewrite | The richer the type, the more it becomes load-bearing schema. Treat key naming and structure choice as a one-way door. |
| Tri-license (AGPLv3 / RSALv2 / SSPLv1) | OSI-approved option exists again as of v8.0 | AGPL copyleft creates legal review burden; many enterprises ban it outright | Security review or M&A due diligence flags AGPL; you migrate to Valkey under time pressure | Pick license at adoption, not in panic. If your legal team has even started writing a copyleft policy, pick Valkey now. |
| Persistence (RDB / AOF) | Survive process restart, point-in-time snapshots, replay log | Disk fsync cost, fork() RAM doubling during RDB snapshot | Large dataset (50GB+) on a small node, BGSAVE forks and OOM-kills the process | If you actually need durability, you do not want a cache; you want a database. Use AOF everysec as a recovery aid, not as a durability guarantee. |
| Cluster mode (16384 slots) | Horizontal scale, automatic slot migration, gossip-based topology | Multi-key ops fail unless keys share a hash tag, no MULTI across slots | You add cluster mode to an app that uses Lua scripts touching many keys; half of them break silently | Cluster mode is not a free upgrade. The application has to be cluster-aware. Many teams add it then regret it; benchmark single-instance with replicas first. |
| In-memory by design | Sub-millisecond reads, no disk in the hot path | $/GB is 10-30x the cost of SSD storage | Dataset grows past planned size, eviction starts pruning live keys, hit rate craters | Set maxmemory and maxmemory-policy at provisioning time, not after the SEV. noeviction turns the cache into a write-failing surprise. |
| Pub/Sub plus Streams plus Functions | One process replaces a message broker, a cron, and a script runner | Mixing concerns makes capacity planning and blast radius messy | Cache and pub/sub on the same cluster, pub/sub clients hit slow-consumer, cache latency spikes | Operationally you should treat Redis-as-broker and Redis-as-cache as separate clusters. They share code but they do not share failure modes. |
| Vector sets (8.0 new data type) | RAG and embedding workloads without a separate vector DB | HNSW index sits in RAM, dataset size is gated by node memory | Embeddings explode past 100M vectors at 1536 dims, RAM cost forces a real vector DB anyway | Vector sets are a fine bridge from prototype to production. They are not a substitute for Pinecone or pgvector at 10M+ vector scale. |
Memcached In-Memory
BSD license, multithreaded, deliberately simple. Meta serves around 5 billion requests per second on Memcached.
| Trade-Off | What You Gain | What You Give Up | When It Bites You | PE Nuance |
|---|---|---|---|---|
| Multithreaded architecture | Linear scale-up across cores on a big node, simpler hardware-utilization story | No atomicity beyond single-key ops, no scripting, no transactions | You need compare-and-swap on multi-field record; CAS only works per-key, you build a distributed lock yourself | Memcached vertical scale is the cleanest of any cache. r6g.16xlarge with 64 vCPUs and 512GB RAM utilized cleanly is hard to beat for pure GET/SET throughput. |
| String values only | Server stays simple, fast, lock-free hash table | Application owns all serialization (JSON, MessagePack, Protobuf) | Schema change in your serialized blob breaks every cached entry; no migration path other than full flush | Use a version prefix in the cache key (v3:user:123) so a serialization change cycles cleanly without poison entries. |
| 1 MB default value limit | Forces good cache hygiene, prevents one bad key from eating a slab class | Larger blobs (rendered HTML pages, image bytes) need chunking or a different cache | Cache-stampede mitigation logic that stores a "rendered" response works in dev, fails in prod once the page grows | The limit is tunable to 128MB via -I, but the slab allocator was designed for small objects. Pushing it past 1MB usually means you picked the wrong tool. |
| No persistence by design | No fork pauses, no fsync stalls, predictable steady-state latency | Restart equals total data loss, cold cache problem on every deploy | You deploy a fleet-wide config change that restarts Memcached; database absorbs 100% of read traffic, falls over | Always have application-level warmup logic or rolling restart by node. Treat the cold-cache problem as a planned event, not a surprise. |
| LRU with slab classes | Predictable memory footprint, no fragmentation problems | Slab calcification (one slab class fills, others sit empty) | Workload shifts to mostly 4KB objects, slab class 6 evicts hot keys while slab class 4 has free space | Modern Memcached has slab_reassign and slab_automove, but most ops teams never enable them and quietly bleed hit rate. |
| No replication or clustering | Stateless from the cluster's perspective, client-side sharding is trivial (consistent hashing) | Node failure equals partition loss until cache refills from origin | One node in a 20-node fleet dies; 5% of cache evaporates, your database P99 doubles for 30 minutes | The Memcached failure model assumes the database can serve the miss rate. That assumption is the entire architecture; verify it under load test. |
| 250-byte key length cap | Server stays fast, predictable hash table behavior | Long composite keys (tenant-id plus user-id plus feature-flag plus locale) get truncated or hashed | Multi-tenant SaaS where keys naturally grow long; you hash, then lose cache-key debuggability | If you have to hash the key client-side to fit, you have already lost the ability to inspect what is cached. Worth picking Redis for any key longer than ~150 bytes. |
Valkey In-Memory
Linux Foundation fork of Redis 7.2.4 (March 2024). BSD-3-Clause. v9.0 GA October 2025 with billion-RPS cluster claims.
| Trade-Off | What You Gain | What You Give Up | When It Bites You | PE Nuance |
|---|---|---|---|---|
| BSD license under Linux Foundation | Vendor-neutral governance, OSI-approved permissive, no AGPL contagion | Less concentrated commercial development velocity than a single-vendor project | An enterprise feature you needed (advanced security, RBAC) lands in Redis Enterprise first | The governance argument is stronger than it looks. AWS, Google, Oracle, Ericsson, Huawei, Tencent contributing means feature flow is broad, not blocked on one company's roadmap. |
| Wire-protocol compatible with Redis 7.2 | Drop-in client replacement, all existing redis-cli, libraries, and Lua scripts work | Compatibility is a moving target; Redis 8 features (vector sets) are not in Valkey yet | Your team writes against Redis 8 vector sets, then a security policy forces Valkey migration | The fork date is locked at Redis 7.2.4. Anything in Redis 7.4+ commercial or Redis 8 is either ported (slowly) to Valkey or never lands. Pin your client library to RESP3 features, not Redis-specific commands. |
| Enhanced I/O threading (8.0) | Better multi-core utilization without breaking single-threaded execution model | Configuration complexity, IO threads count tuning under load | You set io-threads too high on a small instance, threads contend on shared queues, throughput drops below single-threaded baseline | Threading helps the I/O layer only. If your bottleneck is CPU on EXPIRE scans or large MGETs, threading does not help; you still need to shard. |
| Atomic slot migration (9.0) | Resharding moves entire slot atomically, simpler operations | Younger feature; Redis Cluster's slot-by-slot migration has more battle scars | You hit a corner case during resharding under heavy write load; less community precedent for the fix | 9.0 is October 2025. For mission-critical clusters, stay on 8.1 LTS until 9.x has 6+ months of production miles outside the Linux Foundation labs. |
| Module ecosystem (JSON, Bloom) | First-party modules contributed by AWS and Google, BSD-licensed | Smaller than Redis Stack; RediSearch, RedisGraph not directly available | You need full-text search; have to either port RediSearch (license complication) or run a separate search engine | Module gap closes monthly. If your need is JSON or basic indexing, Valkey is fine; if it is RediSearch or Redis Graph, you are picking between Redis 8 AGPL or a different tool entirely. |
| Adopted by managed services fast | ElastiCache, MemoryDB, Memorystore, Aiven all support Valkey by mid-2025 | Vendor implementations differ subtly (TLS modes, IAM auth, snapshot formats) | You migrate from ElastiCache Valkey to self-hosted Valkey; auth integration breaks, snapshot import fails | The cloud-vendor support is real but each one wraps Valkey differently. Treat managed Valkey as a different product from upstream Valkey for ops purposes. |
| Multiple databases in cluster mode (9.0) | Logical tenant separation without separate clusters, cheaper multi-tenancy | Multi-DB was historically a Redis foot-gun (clients confused, replication scope) | You enable multi-DB cluster mode for tenant isolation, a client library bug routes cross-tenant | Test the client library you use against this feature explicitly. Many clients made assumptions in single-DB cluster mode that break here. |
KeyDB In-Memory
Multithreaded Redis fork (2019), acquired by Snap May 2022. Runs critical Snap infra; no commercial product or paid support.
| Trade-Off | What You Gain | What You Give Up | When It Bites You | PE Nuance |
|---|---|---|---|---|
| Fully multithreaded command execution | Multi-core scale on a single node, often 2-4x throughput vs Redis at same hardware | Spinlock-based concurrency; tuning server-threads wrong is worse than single-threaded | You set threads to match cores on a noisy-neighbor VM; spinlocks burn CPU, throughput collapses | KeyDB explicitly requires exclusive use of assigned cores. On Kubernetes with CPU limits, you will not get the documented numbers. |
| Active-active replication | Multi-master, both nodes accept writes, lower failover RTO | Last-write-wins conflict resolution; no CRDTs, no causal ordering | Two writers in two regions update the same key concurrently; one of the writes is silently lost | Active-active is convenient until you have a write conflict. Most teams reach for it for cross-AZ failover but actually need a leader-follower with fast promotion. |
| Snap is the sole steward | Real production validation at Snap scale, code is open | Roadmap is driven by Snap's needs, not the broader community | You need a feature Snap does not; PR sits for months with no review | Snap was listed as a Valkey contributor when the fork launched, signaling strategic ambiguity. Risk: KeyDB receives only Snap-internal fixes going forward. |
| Flash storage tier (FLASH) | Spill less-hot data to NVMe, dataset sizes beyond RAM | Flash hits are 10-50x slower than RAM hits; tail latency degrades | P99 latency for cold reads jumps from 0.5ms to 20ms after spillover; downstream timeouts trigger | Flash tier is only useful if your access pattern is genuinely tiered hot vs cold. For uniform-access workloads, it just adds variance without saving cost. |
| Redis API compatibility | Drop-in for most existing Redis applications | Diverges from Redis 7+; vector sets, newer modules not available | You depend on a Redis 7.4 command, only to find KeyDB is locked closer to Redis 6.2 semantics | The compatibility window narrows each year. Use INFO to see actual Redis-compatibility version reported; do not trust the docs. |
| Subscriber-publisher fanout at scale | Snap battle-tested at extreme pub/sub fanout | Less community pub/sub tooling than Redis | You need to debug a slow subscriber under load; troubleshooting docs are sparser than Redis | The Snap engineering blog has solid pub/sub posts. For non-Snap pub/sub patterns, you are reading source code. |
| No commercial backing for support | Free, no license cost, no vendor lock-in | No SLA, no paid escalation, no enterprise patching pipeline | You hit a production data-loss bug on a Saturday; only recourse is GitHub issues | For mission-critical workloads, the lack of a paid escalation path is the most under-discussed risk. Enterprise procurement will surface this in security review. |
Amazon CloudFront Edge / CDN
~1600 edge locations, integrated with Lambda@Edge and CloudFront Functions. Origin Shield, real-time logs.
| Trade-Off | What You Gain | What You Give Up | When It Bites You | PE Nuance |
|---|---|---|---|---|
| Deep AWS integration | IAM-native, S3 OAC, Lambda@Edge, WAF, Shield, signed URLs, KMS | Vendor lock-in; moving off CloudFront means rebuilding edge logic | Multi-cloud mandate hits, CloudFront-only features (OAC, Lambda@Edge) have no obvious port | The integration is the moat. If your origin is S3 + ALB + Lambda + WAF + Shield, CloudFront is roughly free engineering. If your origin is on GCP, you are paying for an awkward seam. |
| Two edge runtimes (Functions vs Lambda@Edge) | CloudFront Functions for sub-ms header rewrites; Lambda@Edge for full Node/Python with AWS SDK | Two mental models, two deployment paths, two billing models | You start in Lambda@Edge for everything; bill hits $10K+/month for header rewrites that belong in Functions | The cost difference is real: Functions at $0.10 per million requests vs Lambda@Edge at $0.60 plus duration. For 1B requests/month, that is $100 vs $600+. Default to Functions; escalate to Lambda@Edge only on demonstrated need. |
| Origin Shield ($0.0060/GB) | Consolidates origin fetches to one regional cache, cuts origin load 50-90% | Extra hop in the path, extra per-GB charge, single-region failure exposure | Origin Shield region degrades; your traffic that should hit the edge instead routes through a failing intermediary | Origin Shield is right when your origin is expensive (Lambda, complex queries) and wrong when your origin is itself a CDN-ish thing (S3 with high cache hit). Calculate before enabling. |
| Tiered pricing by region | Lower cost in NA/EU, predictable per-GB | Indian and South American regions are 2-3x more expensive; bill becomes geography-dependent | You launch in APAC, your egress cost line per user is 3x what you modeled on NA-only test traffic | The "10 regions, 10 prices" model means cost projection requires real geo-distribution data. Get a 30-day sample of egress by region before committing. |
| Real-time logs to Kinesis | Sub-minute log latency, custom field selection, replay to Athena/Redshift | Extra Kinesis costs, 1% sampling minimum, more pipelines to operate | You enable 100% sampling on a high-traffic distribution; Kinesis cost spikes, downstream consumer falls behind | Real-time logs at 1-5% sampling for security analytics is the right default. 100% is for short debugging windows, not steady state. |
| Functions Key-Value Store | Stateful edge logic (A/B tests, redirects) without origin round-trip | Eventually-consistent global propagation; small per-distribution limits | Marketing flips a feature flag; some users see it 30+ seconds before others, browser refresh shows inconsistency | KVS is roughly Cloudflare KV with worse limits. Useful for low-cardinality config, not for high-write workloads. |
| Anycast routing on AWS network | Same TLS termination as ALB, predictable AWS-internal performance | Less aggressive last-mile peering than Cloudflare or Fastly in some geos | India users get worse RTT than your test from us-east-1 suggested; CloudFront cannot fix bad transit | CloudFront's POP density is high but its peering is AWS-network-first. For markets with patchy AWS peering, Cloudflare and Fastly often win by 30-100ms. |
| Streaming and live video | MediaPackage, MediaTailor, native HLS/DASH support, low-latency HLS | Reinforces AWS lock-in for the entire media pipeline | Cost optimization moves you to a specialist CDN (BlazingCDN, Bunny); you have to rebuild signing and DRM | For pure media delivery at scale, hyperscaler CDNs are rarely the cost winner. CloudFront wins only when integration with the rest of your AWS media stack offsets the premium. |
Cloudflare Edge / CDN
335 cities, 125+ countries. Workers (V8 isolates), KV, R2, D1, Durable Objects, Pages. Reported as fastest in ~48% of top eyeball networks.
| Trade-Off | What You Gain | What You Give Up | When It Bites You | PE Nuance |
|---|---|---|---|---|
| V8 isolates instead of containers | Near-zero cold start (5ms), runs in every POP, cheaper per request | JavaScript/TypeScript primary path; Wasm support exists but most idioms assume JS | You want native binary perf for image transforms; Wasm works but Lambda@Edge plus Node is sometimes simpler | The isolate model is genuinely a step ahead of container-based edge (Lambda@Edge). Cold-start cost is structural, not just optimization. |
| Unmetered bandwidth tier | Cost-predictable for static content, free CDN basics, "fair use" | Section 2.8 ToS lets Cloudflare bump high-egress-non-website traffic to enterprise | Your video streaming app grows past "website-like" usage profile; sales calls you for an enterprise contract | The "unmetered" claim is real until your traffic looks like a media CDN. Read 2.8 before standardizing. |
| Single-vendor blast radius | One config plane, one bill, one support call | Their bad day is your bad day, globally and uncorrelated with origin | 2025-11-18: a database permission change doubled a Bot Management config file, crashed proxy processes globally for hours | This is the single largest production concern with Cloudflare. The 2025 incident sequence (Feb R2, June KV, August DDoS, November Bot Management) confirms blast radius is a recurring failure mode. Plan a CDN failover. |
| Workers KV (eventually consistent) | Global key-value store, sub-5ms hot reads, persists across POPs | 1 write per second per key cap; eventual consistency on writes (up to 60s) | You use KV for rate-limiting counters; writes get throttled, limits become advisory not enforced | KV is for read-heavy reference data, full stop. For counters use Durable Objects; for transactional state use D1 or your origin DB. |
| R2 (S3-compatible, zero egress) | Free egress to internet, killer pricing vs S3 for high-egress workloads | S3 API compatibility is partial (some APIs, ACL semantics, multipart edge cases differ) | You lift-and-shift from S3; a niche API call (S3 Select, Object Lambda) silently no-ops | R2 is genuinely disruptive for egress-heavy use cases (media, model weights, software downloads). For S3-as-feature-platform, the gap matters more than the price. |
| Workers ecosystem (D1, Queues, Durable Objects) | Build full apps at edge without a separate origin | Each product has its own limits, billing, and maturity curve | You build an app on Workers + D1 + Durable Objects; D1 hits its alpha-stage size limit, you replatform | The platform is real but the surface area changed faster than most teams' code refactoring budget allows. Treat as Cloudflare-native, not portable. |
| Cache API (in-POP, ephemeral) | Local POP cache for any Worker response, separate from KV | Each POP has independent cache; no global purge for Cache API entries by default | You cache personalized API responses by user-id; cache lives only in one POP, hit rate is much lower than expected | The two-tier story (Cache API ephemeral, KV persistent) is powerful but the mental model is exactly inverted from CloudFront. Document the convention in your runbook. |
| Free DDoS protection | Layer 3/4/7 mitigation included on every plan, absorbed 3.8 Tbps in 2024 | Mitigation is opinionated; some "legitimate but unusual" patterns get rate-limited | Your monitoring scraper, load-test runner, or AI training crawler gets challenged or blocked | The defaults are right for 95% of sites. For API-only or B2B integrations, tune Bot Management rules; do not just trust the defaults. |
Fastly Edge / CDN
Heavily customized Varnish 2.1 fork at core. ~150 Tbps capacity, fewer but bigger POPs. VCL programmable, Compute@Edge (Wasm/Lucet).
| Trade-Off | What You Gain | What You Give Up | When It Bites You | PE Nuance |
|---|---|---|---|---|
| VCL programmability | Full caching logic as code; complex routing, A/B, request enrichment at edge | VCL is a niche DSL; team has to learn a non-portable language | Senior engineer who owns VCL leaves; nobody can debug a misbehaving subroutine | VCL is genuinely the most powerful per-request hook in any CDN. Worth the learning cost if you do real edge logic; overkill for static-only delivery. |
| Sub-150ms global instant purge | Invalidate cache by URL or surrogate key in milliseconds worldwide | Premium price tier; instant purge is part of why Fastly costs more per GB | You move from Fastly to a cheaper CDN; suddenly your inventory update lag goes from 1s to 60s, business breaks | Instant purge with surrogate keys is the killer feature for high-write workloads (news, sports scores, inventory). For static long-TTL content, you are overpaying. |
| Compute@Edge on Wasm (Lucet) | Multi-language at edge (Rust, JS, Go, Wasm), strong isolation, fast cold start | Smaller community than Cloudflare Workers; more "you are early" feel | You hit a Wasm runtime quirk; community answer is "open a support ticket" | Compute@Edge is technically excellent. The reason Cloudflare Workers feels bigger is community size, not technical superiority. |
| Fewer, larger POPs | Higher cache hit rates per POP, lower origin load, simpler debugging | Geographic coverage thinner than Cloudflare in long-tail markets | You launch in Africa or smaller APAC markets; user latency is higher than expected vs Cloudflare | The POP-density argument is more nuanced than it looks. Big POPs with good peering often outperform many small POPs with mediocre peering for top-100 cities. |
| 2021 outage legacy | Postmortem culture forced real investment in safer config rollout | Reputation cost: that outage took down Reddit, NYT, Amazon for ~1 hour | Procurement security review surfaces the 2021 incident; you have to explain mitigation in writing | The 2021 outage was a single-customer config error that propagated. Post-incident, Fastly added staged config rollout. Cloudflare's 2025-11-18 outage was structurally similar; this is a class of failure, not vendor-specific. |
| Real-time analytics and logs | Per-second stats, syslog streaming to S3/Logentries, JSON feeds | Logs and analytics priced separately; cost can sneak up | You enable full request-level logging; monthly bill grows faster than traffic | Fastly's logging is the best in class for debuggability. Use it; just sample once you are past the initial debugging phase. |
| Origin Shielding | Consolidate origin fetches to one shield POP, dramatically reduce origin RPS | Adds a hop, single POP failure exposure for the shield path | Shield POP has a bad day; origin sees its full pre-CDN traffic for the duration | Always pair Origin Shielding with multi-POP shield fallback if you cannot tolerate origin spikes. Often missed in initial configurations. |
Varnish Cache Edge / Self-Hosted
Open-source HTTP accelerator. v7.7 current (BSD-2-Clause). VCL-driven, reverse proxy cache, designed for HTTP. Varnish Software offers Varnish Enterprise (proprietary).
| Trade-Off | What You Gain | What You Give Up | When It Bites You | PE Nuance |
|---|---|---|---|---|
| VCL on your own hardware | Total control over cache policy, can run in any data center or cloud | You operate the cache fleet: capacity, patching, monitoring, on-call | Cache fleet patching slips, a known CVE is unmitigated, security review fails | Self-hosting Varnish is right when latency or compliance forbids a SaaS edge. Otherwise you are reinventing what Fastly already runs at higher quality. |
| Built for HTTP, only HTTP | Best-in-class HTTP cache semantics, ESI, range requests, conditional GETs | TLS termination is via Hitch or another sidecar; not native in open-source Varnish | You try to use Varnish for TLS termination; discover you need to operate Hitch alongside, doubles ops surface | The "no TLS" stance is deliberate (HTTP is hard enough). In practice every production deploy has a TLS sidecar; budget for it. |
| VMOD extensibility | C-level extensions for custom logic; rich ecosystem of community modules | Custom VMODs are C code in your cache process; bugs crash Varnish | A bad VMOD pointer dereference takes down your cache fleet during peak; rollback is not automatic | Treat custom VMODs like kernel modules. Code review, fuzz test, canary deploy. Most outages "in Varnish" are actually in custom VMODs. |
| Single-machine ops model | No clustering, no consensus, no shared state to corrupt | High availability is on you (LB in front, sticky sessions, hot standby) | One Varnish node dies; LB does not detect fast enough; user gets origin direct for 30 seconds, origin falls over | The "no cluster" simplicity is genuinely an asset. Pair with an L4 LB that does proper health checks (not just TCP-up). |
| Grace, keep, and stale serving | Serve stale during origin outage, separate background fetch from delivery | VCL complexity grows quickly; grace logic is the second-biggest source of "why is this stale" tickets | Grace period is 5 minutes, you fix a bug at origin, users still see the bug for 5 minutes after deploy | Grace plus stale-while-revalidate is Varnish's killer feature for origin resilience. Use surrogate keys to purge by tag rather than relying on grace alone. |
| No native cluster purge | Each node is independent, predictable | Cluster-wide purge requires Varnish Broadcaster or similar add-on | You PURGE on one node; other 9 nodes still serve the stale; users see inconsistency | Production deploys always need Varnish Broadcaster, custom HTTP-fanout, or Varnish Enterprise. Plan it in from day one. |
| Open source (BSD-2-Clause) | No license cost, total transparency, large community, debuggable | No SLA, no commercial escalation; pay Varnish Software for enterprise support | Critical bug in 7.7, fix lands in 7.8; you have to upgrade an entire fleet to take it | Most production Varnish fleets pay Varnish Software for either Enterprise or support. Pure community Varnish works but is for teams that genuinely own this code path. |
NGINX Reverse Proxy / Cache
F5-owned (since 2019), BSD-2-Clause open source. Web server, reverse proxy, load balancer, mail proxy, HTTP cache. v1.30 stable (April 2026). NGINX Plus is the commercial tier.
| Trade-Off | What You Gain | What You Give Up | When It Bites You | PE Nuance |
|---|---|---|---|---|
| Jack-of-all-trades | One binary does TLS term, LB, cache, reverse proxy, web server | Specialized features (instant cluster purge, VCL-grade programmability) are weaker | You need surrogate-key purge across 50 NGINX boxes; build it yourself with consul-template plus HTTP fanout | NGINX is the right answer when caching is one of many jobs the box does. It is the wrong answer when caching is the only job and you need top-end semantics. |
| proxy_cache on local disk | Disk-backed cache survives restart, persists past RAM size | Cache directory format is internal, no easy cross-node sharing, no in-memory cache for hot keys | SSD wears out faster than expected under high write rate; replacing it requires fleet rotation | NGINX cache is functional, not specialized. For high-write-rate caches, mount on fast NVMe and tune proxy_cache_lock. |
| Configuration via directives | Easy to learn, plain text, version controlled, predictable | Complex logic forces nested if/map/regex; no real programmability without Lua (OpenResty) | You need conditional caching on a JWT claim; pure nginx.conf does not handle it, you reach for Lua, now you have two languages | If you find yourself writing complex maps and embedded Lua, you have already outgrown plain NGINX. Either move logic upstream or switch to Varnish/Fastly. |
| F5 ownership (post-2019) | Enterprise support pipeline, sustained investment in NGINX Plus features | F5 prioritizes Plus features; open-source NGINX advances more conservatively | A feature you assumed was open-source (active health checks, dynamic config) turns out to be Plus-only | The Plus vs OSS gap is real and grew under F5. For production caching at any scale, NGINX Plus or OpenResty is usually the right pick. |
| Massive ecosystem and tooling | Largest install base of any web server (35%+ market share) | Stack Overflow answers often outdated or wrong; many "best practice" blog posts ignore modern features | You copy a config from a 2018 blog; it has a CVE-prone TLS cipher list, security scan fails | Always start from the F5 docs (docs.nginx.com), not Google. The ecosystem is huge but quality varies. |
| Multiple roles share one process | Lower memory and connection overhead than running separate boxes for LB/cache/TLS | Cache I/O contends with TLS handshake on the same worker; tuning is harder | Heavy cache write load slows TLS handshakes; users perceive slow first-byte time, not a cache problem | Worker-process tuning matters more than the docs suggest. worker_processes auto is rarely the right answer at scale; tune to physical cores and pin if needed. |
| No native distributed cache | Per-node state is simple, no consensus, no replication bugs | Cluster-wide cache invalidation has to be built externally | You deploy a config change; cache TTL is still 10 minutes; users see stale content while you wait | Pair NGINX with a separate cache purge fanout (RabbitMQ, NATS, HTTP broadcast). Production deploys almost always do; budget for it. |
Use Cases
Real production scenarios per technology, with the driving property that picked it over alternatives.
Redis
| Use Case | Company / Scenario | Driving Property | Scale Dimension | Why Not Alternative |
|---|---|---|---|---|
| Session store | Twitter/X session validation | Sub-ms lookup of session token to user context | 500M+ DAU, millions of session checks per second | Memcached lacks rich types for storing user-context blob; DB lookup is 10ms+ minimum |
| Leaderboards | Stack Overflow reputation, game scoreboards | Sorted sets with O(log N) ZADD and O(log N + M) ZRANGE | 10M+ scored entries, real-time updates | SQL ORDER BY plus LIMIT does not scale to millions of writes/sec; no other cache has native sorted set |
| Rate limiting | API gateways, login attempts, abuse mitigation | Atomic INCR with EXPIRE plus Lua for sliding window | 100K+ rate-checks/sec per region | DB-based rate limit is too slow; Memcached lacks atomic compound operations |
| Pub/Sub fanout | Internal notifications, chat presence | Sub-ms publish to N subscribers, no broker setup | 10K+ concurrent subscribers per channel | Kafka is overkill for ephemeral fanout; RabbitMQ adds operational weight |
| Vector search (Redis 8) | RAG prototypes, semantic cache | HNSW index in RAM, sub-ms approximate kNN | 1-10M vectors, 768-1536 dim | pgvector is fine but adds a query latency dependency on Postgres availability; dedicated vector DBs are heavier to operate |
| Distributed locks (Redlock) | Distributed cron, deduplication, election | SET NX EX as a one-line lock primitive | 10K+ lock acquisitions/sec | ZooKeeper is heavier; etcd is fine but adds a separate dependency |
Memcached
| Use Case | Company / Scenario | Driving Property | Scale Dimension | Why Not Alternative |
|---|---|---|---|---|
| Hyperscale object cache | Meta (Facebook) caches roughly 5B GETs/sec on Memcached at peak | Multithreaded raw GET/SET throughput on huge nodes | Trillions of cached objects, exabytes of in-memory data | Redis single-thread cap forces 10x more shards; pure GET/SET does not need Redis types |
| Database query cache | WordPress, Drupal at scale; MediaWiki | One-line "cache this query result" pattern, dirt-simple invalidation | Wikipedia-class read traffic, multi-TB cached | Redis adds operational weight unjustified by use case; you do not need persistence for a query cache |
| Rendered HTML / page fragment cache | Pinterest, Etsy product page fragments | 1KB-100KB blob storage, immune to fragmentation | 10K+ requests/sec/node, P99 under 1ms | Redis equivalent works but is slower per core; HTTP cache (Varnish) adds invalidation complexity |
| Hot dataset acceleration in front of slow stores | Hadoop/HBase fronted by Memcached for read-heavy workloads | Pure GET path is dominant, no need for compound ops | Petabyte HBase, hot working set in 1-10TB Memcached fleet | Redis cluster cost-prohibitive at this scale; HBase block cache alone insufficient |
| Multi-tenant SaaS cache layer | Heroku-style platforms exposing cache as a service | Stateless nodes, trivial horizontal scale, no replication concerns | 10K tenants per cluster | Redis multi-tenancy is harder; no Pub/Sub or scripting noise across tenants |
Valkey
| Use Case | Company / Scenario | Driving Property | Scale Dimension | Why Not Alternative |
|---|---|---|---|---|
| License-clean Redis migration | AWS migrating ElastiCache Redis OSS customers to Valkey by default in 2025 | BSD license satisfies enterprise IP review | Hundreds of thousands of managed customer fleets | Redis 8 AGPLv3 is unviable for many; Memcached lacks rich types |
| Cloud-vendor neutral cache | Multi-cloud SaaS (e.g., Aiven) offering Valkey across AWS/GCP/Azure | Single OSS SKU runs identically across hyperscalers | Thousands of provisioned clusters | Redis OSS license depends on managed-service-prohibition clause; legally awkward across clouds |
| Linux distro default in-memory KV | Debian/Ubuntu shipping Valkey as the redis-server replacement | OSI-approved license required by Debian policy | Tens of millions of Linux installs | Redis SSPL is incompatible with Debian's Free Software Guidelines |
| Billion-RPS cluster (Valkey 9.0) | High-throughput SaaS infrastructure at scale | Atomic slot migration, multi-DB cluster mode, advanced threading | 1B+ requests/sec at cluster scale | Memcached lacks rich types and cluster mode; Redis Cluster has older slot-migration semantics |
| RDMA-accelerated low-latency workload (experimental) | HPC and AI/ML workloads needing kernel-bypass networking | Sub-100µs P99 over RDMA fabric | Microsecond-class P99 requirements | Redis OSS lacks RDMA support; Memcached has no roadmap for it |
KeyDB
| Use Case | Company / Scenario | Driving Property | Scale Dimension | Why Not Alternative |
|---|---|---|---|---|
| Snap's internal caching infrastructure | Snapchat backend | Multi-threaded throughput on big nodes, in-house steward | Snap-scale daily traffic, billions of requests | Internal investment in KeyDB makes it cheaper than re-evaluating Redis or Valkey |
| Redis migration for multi-core utilization | Teams hitting single-threaded Redis ceiling but unwilling to shard | Drop-in Redis API plus 2-4x throughput per node | 50K-200K ops/sec single-node ceiling raised to 200K-500K | Sharding adds app-side complexity; Memcached lacks types |
| Active-active geo-replication | Multi-region SaaS wanting fast cross-region failover | Both regions accept writes; lower RTO than leader-follower promotion | 2-5 regions, last-write-wins acceptable | Redis has no native active-active in OSS; Redis Enterprise CRDB is commercial |
| Flash-tier cache for cold data | Datasets larger than RAM, hot-cold ratio favorable | RAM-priced perf for hot keys, NVMe-priced bulk for cold | 10TB+ working set on 1TB RAM nodes | Redis OSS has no integrated flash tier; commercial Redis on Flash is the only Redis option |
| Cost-optimized Redis at small scale | Startups maximizing single-node performance before sharding | Free, multi-threaded, no per-instance licensing | 1-3 large nodes, <1M ops/sec total | Managed Redis costs add up; Valkey works but is newer than KeyDB at the time of adoption |
Amazon CloudFront
| Use Case | Company / Scenario | Driving Property | Scale Dimension | Why Not Alternative |
|---|---|---|---|---|
| S3-fronted static asset delivery | Prime Video, Disney+, every AWS-hosted site | S3 OAC plus signed URLs plus regional edge caches, all IAM-native | PB-scale storage, global CDN egress | Cloudflare/Fastly need separate S3 auth flow; integration friction |
| HLS/DASH video streaming | Prime Video, FuboTV, Twitch event simulcasts | MediaPackage + CloudFront low-latency HLS chain | Millions of concurrent viewers, sub-3s glass-to-glass | Specialist CDNs are cheaper per GB but require rebuilding the entire media pipeline |
| API acceleration with WAF and Shield | Banking and fintech APIs (Capital One, Robinhood) | WAF rules, Shield Advanced DDoS, KMS-encrypted log delivery, all integrated | 100K+ requests/sec API with strict compliance | Cloudflare WAF is good but requires duplicating security policy outside the AWS account boundary |
| Edge personalization via CloudFront Functions | E-commerce A/B variant routing, geo-redirects, header normalization | Sub-ms execution, $0.10/M requests, runs at every edge | 1B+ requests/month at minimal added cost | Lambda@Edge for the same work costs 6x more; full Workers stack requires platform change |
| Origin Shield in front of regional ALB | Multi-region ALB consolidating to single shield POP | Cuts origin RPS by 80%+ for cacheable workloads | 10K+ origin RPS reduced to under 2K | Cloudflare Tiered Caching achieves similar; not available in non-Cloudflare stacks |
Cloudflare
| Use Case | Company / Scenario | Driving Property | Scale Dimension | Why Not Alternative |
|---|---|---|---|---|
| Free-tier CDN for personal/SMB sites | Millions of small sites, indie hackers, blogs | Unmetered bandwidth, free TLS, free DDoS | Cloudflare claims to serve a large share of top eyeball networks fastest | CloudFront has no free tier; Fastly has no real SMB plan |
| Workers-native edge apps | Discord (parts), Shopify (parts), AI inference proxies | V8 isolates, near-zero cold start, code runs in every POP | 10B+ requests/day, multi-region | Lambda@Edge cold start is 10-100x worse; AWS region-bound vs Cloudflare's per-POP execution |
| R2 zero-egress object storage | Hugging Face model weights, large software downloads, video archives | S3 API plus zero egress cost to the internet | PB-scale, high-egress, predictable bills | S3 egress costs roughly $0.05-$0.09/GB to internet; R2 is genuinely free egress |
| DDoS mitigation as front door | Sites that have been attacked, target-rich verticals (crypto, gambling) | Layer 3/4/7 mitigation absorbed 3.8 Tbps in 2024 attack | Multi-Tbps attack absorption | AWS Shield Advanced is comparable but more expensive per protected resource |
| Global Workers KV for feature flags / configs | Mobile app config delivery, feature flag systems | Sub-5ms hot reads globally, eventual consistency acceptable | 1M+ reads/sec global; writes infrequent | S3 + CloudFront is slower for writes; LaunchDarkly etc. add latency and cost |
Fastly
| Use Case | Company / Scenario | Driving Property | Scale Dimension | Why Not Alternative |
|---|---|---|---|---|
| Real-time news, sports, finance | The New York Times, NPR, Reddit, Vimeo, Shopify storefronts | Sub-150ms instant global purge by surrogate key | Millions of pages, high write rate, must update fast | CloudFront purge takes minutes; Cloudflare cache purge is best-effort |
| E-commerce stock and pricing accuracy | Shopify Plus stores, GitHub Marketplace, ticketing sites | Cache hit for the hot view, purge in ms when inventory changes | 10K+ purges/sec sustained | CDNs without instant purge force shorter TTL, which kills hit rate |
| API caching with VCL | GitHub API in front of Rails, fintech APIs with per-tenant rules | VCL allows complex per-request caching logic at edge | Billions of API requests/day | Cloudflare Workers can express it but VCL is closer to the cache, fewer abstraction layers |
| Compute@Edge for personalization | News personalization, dynamic header injection, request enrichment | Wasm runtime, multi-language, near-zero cold start | 10M+ personalized responses/day at edge | Lambda@Edge has higher cold start; CloudFront Functions too limited for complex logic |
| Image and video on-the-fly transformation | Spotify cover art delivery, news photo pipelines | Edge image optimization plus VCL routing to origin variants | PB-scale image bandwidth, sub-second purge | CloudFront image transforms are bolted on; lower control over caching keys |
Varnish Cache
| Use Case | Company / Scenario | Driving Property | Scale Dimension | Why Not Alternative |
|---|---|---|---|---|
| On-prem reverse proxy cache | Banks and healthcare providers with PII not allowed off-network | VCL programmability without sending traffic to a SaaS CDN | 10-100K req/sec on-prem, regulated industry | SaaS CDNs (Cloudflare, Fastly) need data egress; compliance does not allow |
| Origin shield for SaaS CDN | Sites running Varnish in their own data centers in front of Fastly or Cloudflare | Extra layer of cache outside SaaS CDN, customizable VCL | Multi-tier cache, additional purge controls | SaaS-only deployment leaves no programmable cache layer under your control |
| News and media internal CDN | Major publishers (e.g., Wikimedia historically) running Varnish at edge | ESI (Edge Side Includes), grace, surrogate keys all out of the box | Wikimedia served much of Wikipedia from Varnish layers historically | Building this on NGINX requires Lua or external coordination; building on Apache is a non-starter |
| API gateway-style caching | Internal APIs cached in front of slow services | Conditional GET, ETag, full HTTP semantics, programmable in VCL | 100K+ API RPS, low origin RPS | NGINX proxy_cache is adequate but lacks ESI and grace semantics |
| Cache layer for legacy app modernization | Strangler pattern: Varnish in front of legacy monolith | VCL routes some paths to new microservices, others to legacy | Migration period: months to years | NGINX can route but VCL's caching semantics during routing are more powerful |
NGINX
| Use Case | Company / Scenario | Driving Property | Scale Dimension | Why Not Alternative |
|---|---|---|---|---|
| Reverse proxy with opportunistic cache | Most production microservice stacks, Kubernetes Ingress (Ingress NGINX) | Single binary handles TLS, LB, cache, all of it | 10K+ RPS per node, multi-service routing | Varnish does not do TLS natively; HAProxy lacks caching; running both is more ops surface |
| API gateway entry point | Banking APIs, Kong/Tyk built on top of NGINX/OpenResty | Programmable routing plus rate-limit plus auth plus cache, all in one | 10K+ tenant APIs behind one NGINX fleet | Pure CDN does not handle internal API gateway needs; dedicated API gateways are more expensive |
| Cache in front of slow CMS | WordPress sites using NGINX fastcgi_cache | Disk-backed cache reduces PHP-FPM load by 95%+ | 1M+ pageviews/day on $20/mo VPS | Varnish works but adds an operational layer for a workload NGINX already handles |
| TLS termination plus cache for static | Single-tenant SaaS, internal portals, dev environments | One binary, configured via plain text, predictable behavior | 1-100 RPS per box, dozens of boxes | Cloudflare for free tier works but adds external dependency for internal use |
| Edge of a private CDN | Self-built CDN using NGINX boxes per geographic POP | Full control over routing, caching, peering decisions | 10s of POPs, GB/sec aggregate | SaaS CDN cheaper unless geo-presence requires it; Varnish lacks TLS without sidecar |
Limitations
Per-technology limitation tables with severity badges and workaround cost.
Redis
| Limitation | Severity | Workaround | Workaround Cost |
|---|---|---|---|
| Single-threaded command execution | High | Shard across cluster mode, or use Valkey/KeyDB | App-side cluster-awareness, possible Lua rewrites, hash-tag changes |
| AGPLv3 / RSALv2 / SSPLv1 license | High | Pay Redis Enterprise, or migrate to Valkey | Either ongoing license fee or migration project (weeks for ops, more for surface-area dependencies) |
| RAM is the ceiling | High | Redis on Flash (commercial), or aggressive eviction policies | Commercial license; or hit rate degradation under allkeys-lru |
| BGSAVE forks the process | Medium | Disable RDB, rely on AOF only; or take snapshots on replica | Replica overhead; AOF-only has different recovery semantics |
| Cluster multi-key ops restricted | Medium | Hash tags ({tag}key1, {tag}key2) to colocate keys | Schema design constraints; risk of hot slot if tag is too narrow |
| Lua scripts are blocking | Medium | Keep scripts short; use Functions (Redis 7+) for organized libraries | Forced discipline; long script equals cluster-wide slowdown |
Memcached
| Limitation | Severity | Workaround | Workaround Cost |
|---|---|---|---|
| No persistence | High | Treat cache loss as a planned event; do rolling restarts | Application-level warmup logic; coordinated deploys |
| No replication | High | Use consistent hashing on the client; tolerate partition loss on node failure | Database needs to be able to serve full miss rate during failure |
| 1MB default value limit | Medium | Chunk large values client-side; or raise -I to 128MB max | Client complexity; slab allocator inefficiency at large values |
| String values only | Medium | Serialize with MessagePack/Protobuf client-side | App-side serialization complexity; schema migration is harder |
| 250-byte key limit | Medium | Hash long keys (SHA256) on the client | Lose key debuggability; cannot use SCAN-like introspection |
| No atomic compound ops | Medium | Use CAS for single-key updates; design around per-key atomicity | App-side retry loops; cannot do multi-key transactions |
| Slab calcification | Medium | Enable slab_reassign and slab_automove | Brief throughput dip during reassignment; needs tuning |
Valkey
| Limitation | Severity | Workaround | Workaround Cost |
|---|---|---|---|
| Forked from Redis 7.2.4; missing Redis 8 features | Medium | Wait for community port, or accept feature gap | Lose vector sets, newer modules; community port arrives months later |
| Smaller ecosystem of third-party tools | Medium | Use Redis-compatible tools (RedisInsight, etc.) | Some Redis-specific tooling assumes commercial features; minor compat seams |
| Multi-DB cluster mode (9.0) is new | Medium | Stay on single-DB cluster mode until 9.x has miles | Lose the multi-tenancy benefit; harder logical separation |
| Module compatibility incomplete | Medium | Stick to first-party modules (JSON, Bloom) plus core types | RediSearch, RedisTimeSeries equivalents need separate evaluation |
| RDMA support is experimental | Medium | Use TCP only for production until RDMA hardens | Cannot exploit kernel-bypass perf yet |
| Same single-threaded core as Redis | High | Cluster mode for horizontal scale | Same cluster-aware client cost as Redis |
KeyDB
| Limitation | Severity | Workaround | Workaround Cost |
|---|---|---|---|
| Snap is sole steward; uncertain roadmap | Critical | Have a Valkey or Redis migration plan ready | Migration project sitting in the backlog; license decision pre-made |
| No paid support / SLA | High | Self-support via GitHub, or migrate to a supported alternative | Engineering on-call burden grows; no escalation path on data-loss bugs |
| Diverging from Redis API over time | Medium | Pin to Redis commands available in 6.2 era; avoid post-7.0 commands | Lose feature velocity from Redis ecosystem |
| Spinlock-based threading is tuning-sensitive | Medium | Pin CPUs, tune server-threads to physical cores | Hardware-aware deployment; harder in shared k8s environments |
| Active-active replication uses LWW | High | Design data model to avoid concurrent same-key writes | App-level partition-by-region or write-quorum logic |
| Flash tier adds tail-latency variance | Medium | Profile workload; disable Flash if access is uniform | Lose cost savings of flash tier |
Amazon CloudFront
| Limitation | Severity | Workaround | Workaround Cost |
|---|---|---|---|
| Lambda@Edge must deploy from us-east-1 | Medium | Accept the constraint; treat us-east-1 as the deploy region | Lambda@Edge availability tied to us-east-1 IAM; us-east-1 outages affect deploys |
| Purge propagation is minutes, not milliseconds | High | Use short TTL and rely on revalidation; or use Origin Shield + invalidations sparingly | Short TTL kills cache hit rate; invalidations have request quota |
| Per-region pricing is opaque | Medium | Sample your real geo-distribution; price out by region tier | Engineering time on cost modeling; surprise bills in expensive regions |
| CloudFront Functions limited (CPU, no network) | Medium | Use Lambda@Edge for complex logic; live with the cost difference | 6x cost per request, plus duration billing |
| Vendor lock-in to AWS | Medium | Abstract edge logic into portable Wasm modules where possible | Engineering effort to keep things portable; mostly aspirational at scale |
| Real-time logs require Kinesis | Medium | Use 1% sampling at steady state; ramp during incidents | Sampling means you miss low-frequency events at full fidelity |
Cloudflare
| Limitation | Severity | Workaround | Workaround Cost |
|---|---|---|---|
| Single-vendor blast radius (Nov 2025 outage) | Critical | Multi-CDN failover (Cloudflare + CloudFront or Fastly) | Doubled CDN bill plus DNS-failover complexity; meaningful engineering project |
| KV: 1 write/sec per key cap | High | Use Durable Objects for high-write workloads; KV is read-heavy only | Different mental model; Durable Object cost and availability characteristics differ |
| Workers limits: 50-128 MB memory, 30s CPU | Medium | Decompose into chained workers; offload heavy work to origin | Code complexity; worker-to-worker calls add latency |
| Cache API ephemeral per-POP | Medium | Use Tiered Cache or fall back to KV for shared cache | Tiered Cache adds latency; KV is eventually consistent |
| ToS 2.8 unmetered-bandwidth ambiguity | Medium | Pre-negotiate enterprise contract if traffic profile is non-website | Enterprise contract pricing; loss of free-tier predictability |
| Bot Management can over-block legitimate traffic | Medium | Tune bot rules; whitelist known crawlers and monitoring | Ongoing rule maintenance; risk of regression after default changes |
Fastly
| Limitation | Severity | Workaround | Workaround Cost |
|---|---|---|---|
| Premium per-GB pricing vs CloudFront/Cloudflare | High | Use Fastly for dynamic content; offload static media to cheaper CDN | Multi-CDN management; per-asset CDN routing decisions |
| Fewer POPs than Cloudflare | Medium | Combine with another CDN in markets where Fastly is thin | Multi-CDN routing logic and DNS |
| VCL is a niche language | Medium | Invest in team VCL training; document patterns internally | Hiring is harder; senior VCL engineers are rare |
| 2021 outage reputation | Medium | Demonstrate post-incident config-rollout improvements during security review | Procurement friction; need to write the explanation up |
| Real-time analytics priced separately | Medium | Sample; use rolled-up metrics for steady state | Reduced debuggability for low-frequency issues |
| Smaller free tier than Cloudflare | Medium | Use Fastly's developer tier for evaluation; commit on production | No "free forever" path; cost starts day one for production |
Varnish Cache
| Limitation | Severity | Workaround | Workaround Cost |
|---|---|---|---|
| No native TLS termination | High | Run Hitch, HAProxy, or NGINX in front | Doubled ops surface; another process to monitor and patch |
| No cluster purge | High | Use Varnish Broadcaster or build HTTP-fanout; or pay for Varnish Enterprise | Custom infrastructure or commercial license |
| You operate the fleet | High | Use a SaaS like Fastly (built on customized Varnish) instead | Lose self-hosted control and customization depth |
| VCL is C-level powerful and C-level dangerous | Medium | Strict VCL review; canary every change | Engineering review process; harder to ship quickly |
| Custom VMODs are unsafe C extensions | High | Stick to community-maintained VMODs; avoid bespoke C in cache process | Lose extensibility benefit that drove the choice to Varnish |
| No persistent cache across restart (in OSS) | Medium | Use Varnish Enterprise MSE (massive storage engine) | Commercial license required |
NGINX
| Limitation | Severity | Workaround | Workaround Cost |
|---|---|---|---|
| Caching is a feature, not the focus | Medium | Use Varnish or Fastly for cache-heavy workloads | Extra process or vendor in the stack |
| No real programmability without Lua | Medium | Use OpenResty for Lua; or NGINX Plus dynamic modules | Either Lua language overhead, or NGINX Plus license |
| No native cluster purge | High | Build HTTP fanout for invalidation; or use NGINX Plus cache API | Custom infrastructure; or commercial license |
| Some features Plus-only (active health, dynamic config) | Medium | Pay for NGINX Plus, or use OpenResty/custom scripting | License cost; or engineering time |
| proxy_cache uses disk; flash wears under heavy writes | Medium | Use enterprise-grade NVMe; rotate disks in fleet maintenance | Hardware cost; fleet management overhead |
| Worker tuning is non-obvious | Medium | Tune worker_processes and worker_connections to actual cores and connection pattern | Performance engineering time; load testing |
Fault Tolerance
Per-group matrix tables. Rows are fault-tolerance dimensions; columns are technologies.
| Dimension | Redis | Memcached | Valkey | KeyDB |
|---|---|---|---|---|
| Replication model | Leader-follower async (semi-sync via WAIT) | None in OSS; repcached is third-party | Leader-follower async (RESP3-based) | Active-active (multi-master) or leader-follower |
| Failure detection | Sentinel quorum-based heartbeats; Cluster gossip in cluster mode | Client-side only; no server detection | Same as Redis (Sentinel or Cluster gossip) | Cluster gossip plus active-replication heartbeats |
| Failover mechanism | Sentinel-driven leader election; Cluster mode slot reassignment | Client rehashes; affected slots cold-miss | Same as Redis | Promote replica or rely on active-active peer |
| RTO (typical) | 10-30s (Sentinel); 1-15s (Cluster failover) | 0s for survivors; cold-start time for replaced node | Similar to Redis; ~10-15s | Sub-second on active-active failover |
| RPO (typical) | Async replication lag (ms-seconds); higher with AOF everysec | Full loss of failed node's data | Same as Redis | Last-write-wins may cause partial data inconsistency |
| Split-brain behavior | Cluster quorum prevents most; Sentinel min-replicas-to-write helps | N/A (no replication) | Same as Redis | Both sides accept writes; conflicts resolved via LWW (data loss possible) |
| Blast radius of single-node failure | Slots owned by failed node (1/N of keyspace) until replica promoted | Keys hashed to failed node (1/N) become misses until rehash | Same as Redis | Lower (peer keeps serving); depends on hash slot ownership |
| Cross-region failover | Manual; or use Redis Enterprise CRDB (commercial) | Application-level (multi-cluster, route on miss) | Same as Redis | Active-active across regions is the headline feature |
| Data loss scenarios | Async-replicated writes lost on leader failure; AOF rewrite mid-crash | Any node restart, any process crash | Same as Redis | Conflicting concurrent writes; LWW silently drops one |
| Dimension | CloudFront | Cloudflare | Fastly | Varnish (self-hosted) | NGINX (self-hosted) |
|---|---|---|---|---|---|
| Replication model | Per-POP independent cache; tiered/Origin Shield optional | Per-POP cache plus Tiered Cache and Argo | Per-POP cache plus shield POPs | None native (per-node); use Broadcaster for fanout | None native (per-node); manual sync needed |
| Failure detection | AWS-internal anycast withdrawal on POP failure | Anycast plus DNS plus health-aware routing | BGP anycast plus health checks | Your LB does it | Your LB does it |
| Failover mechanism | Anycast routes around bad POPs automatically | Anycast plus traffic engineering | Anycast plus traffic engineering | L4 LB removes unhealthy node from pool | L4 LB removes unhealthy node from pool |
| RTO (typical) | Seconds for anycast convergence | Seconds for anycast convergence | Seconds for anycast convergence | 5-30s depending on LB health-check interval | 5-30s depending on LB health-check interval |
| RPO (typical) | N/A (cache rebuilds from origin) | N/A (cache rebuilds from origin) | N/A (cache rebuilds from origin) | N/A | N/A |
| Split-brain behavior | Different POPs may serve different cached versions briefly | Same; eventual consistency on cache state | Same; instant purge mitigates | Each node has independent cache; client may see version skew | Same as Varnish |
| Blast radius of single-POP failure | Users in that geo route to next POP; brief RTT increase | Same | Same; with fewer POPs, the next-POP RTT delta is larger | Local to the data center; all traffic routes through LB | Same as Varnish |
| Cross-region failover | Anycast; or Route 53 health-aware records | Anycast handles it transparently | Anycast handles it transparently | DNS-based (Route 53, etc.) at slower RTO | DNS-based at slower RTO |
| Data loss scenarios | N/A for cache; origin is source of truth | 2025-11-18: bad config crashed proxy processes globally for hours | 2021: bad config triggered global outage ~1hr | Crash plus bad VMOD can corrupt local cache state | Crash without graceful shutdown can leave partial cache files |
Replication
| Dimension | Redis | Memcached | Valkey | KeyDB |
|---|---|---|---|---|
| Replication topology | Leader-follower (single-leader) | None in OSS | Leader-follower (single-leader) | Multi-leader (active-active) or leader-follower |
| Sync vs async | Async by default; WAIT for semi-sync guarantee | N/A | Async by default; WAIT supported | Async multi-master replication |
| Replication factor (default / max) | Default 1 replica; practical max 3-5 replicas per master | N/A | Same as Redis | Multiple active peers (2-5 typical) |
| Consistency level options | Eventual (default); semi-sync via WAIT N TIMEOUT | N/A | Same as Redis | Eventual (LWW conflict resolution) |
| Replication lag (typical) | Sub-ms in same AZ; ms-seconds cross-region async | N/A | Same as Redis | Sub-ms to ms (depends on network) |
| Conflict resolution | N/A (single leader prevents conflicts) | N/A | N/A (single leader) | Last-write-wins by timestamp |
| Cross-region replication | Manual or Redis Enterprise CRDB (CRDTs, commercial) | Application-level only | Manual same as Redis | Built-in active-active replication is the headline |
| Replication during partition | Replica stops receiving updates; can be promoted by Sentinel | N/A | Same as Redis | Both sides accept writes; reconciled via LWW on heal |
| Dimension | CloudFront | Cloudflare | Fastly | Varnish (self-hosted) | NGINX (self-hosted) |
|---|---|---|---|---|---|
| Replication topology | None for cache (each POP independent); origin is source of truth | Same; Tiered Cache adds a hierarchy | Same; Origin Shielding adds a hierarchy | None native | None native |
| Sync vs async | N/A (no inter-POP replication of cache state) | N/A | N/A | N/A | N/A |
| Replication factor (default / max) | Effectively unlimited (each POP can cache same object) | Same | Same | Per-node; typically 2-5 nodes per DC | Per-node; typically 2-5 nodes per DC |
| Consistency level options | TTL plus purge (invalidation request, minutes to propagate) | TTL plus instant purge plus tag-based purge | TTL plus sub-150ms instant purge by surrogate key | TTL plus PURGE (per node) plus surrogate keys | TTL plus PURGE (per node); no native tags |
| Replication lag (typical) | Invalidations: 5-60s typical; up to minutes | Cache purge: seconds; KV writes: up to 60s globally | Instant purge: under 150ms globally | N/A (no replication); per-node purge is local | Same as Varnish |
| Conflict resolution | N/A (cache is read-only of origin) | N/A | N/A | N/A | N/A |
| Cross-region replication | Inherent (every POP independently caches from origin) | Inherent | Inherent | Build your own (multi-DC fleet) | Build your own |
| Replication during partition | POPs operate independently; if a POP is partitioned from origin, serves stale until TTL | Same; serve-while-revalidate is configurable | Grace and stale-while-revalidate handle origin partitions cleanly | Grace + stale-while-revalidate (VCL) | NGINX has proxy_cache_use_stale for similar behavior |
Better Usage Patterns
Patterns most teams miss. What gets called out in code review at L6/L7.
Redis
| Pattern | What Most Teams Do Wrong | The Better Way | Why It Matters |
|---|---|---|---|
| Cache stampede prevention | TTL expires, every request races to recompute, origin gets hammered | Use SET NX EX lock or probabilistic early refresh (XFetch) | One cold key during peak traffic can cascade to a full outage; this pattern is the most common Redis-related production fire |
| Pipelining for bulk operations | One SET per call, app-side latency dominated by RTT | Pipeline 100-1000 ops; or use MSET/MGET for atomic batches | Single-trip latency to Redis is ~0.5ms; 1000 separate calls is 500ms; pipelined is 1-2ms |
| Avoid O(N) commands on production | KEYS *, SMEMBERS on huge sets, HGETALL on huge hashes | Use SCAN, SSCAN, HSCAN with COUNT bounds | O(N) commands block the single thread; one large SMEMBERS can stall every other client for seconds |
| Use hash tags for cluster colocation | Cluster mode adopted, multi-key ops break, half the Lua scripts return CROSSSLOT | Design keys as user:{user_id}:profile, user:{user_id}:sessions with shared tag | Cluster mode failure mode for an unprepared app is silent: ops just fail with CROSSSLOT, app sees half-success |
| Connection pooling, not per-request | App opens a connection per request; Redis hits maxclients ceiling | Use lazy pool (Jedis pool, lettuce shared, ioredis cluster client) | Connection setup is 0.5-2ms; on a hot path, it doubles every cache call |
Set maxmemory and a sane eviction policy | Default noeviction; writes start failing when memory fills | maxmemory set to 80% of node RAM, maxmemory-policy allkeys-lru for cache use | noeviction makes Redis fail writes silently from the app's perspective; cache becomes write-failure layer |
| Use replicas for read-scaling carefully | Reads go to async replicas; users see stale data, application doesn't expect it | Mark replica reads as "read-stale-acceptable"; route consistent reads to leader | Replicas can lag seconds under load; if your app assumes read-your-writes, it will silently break |
Memcached
| Pattern | What Most Teams Do Wrong | The Better Way | Why It Matters |
|---|---|---|---|
| Use CAS for safe updates | GET then SET, two clients overwrite each other's update | Use gets (returns CAS token) then cas (conditional set) | Without CAS, multi-step updates are inherently racy; lost-update bugs hide for months |
| Cache version prefix in key | Schema change requires full cache flush; thundering herd | Prefix key with serialization version: v3:user:123 | Version bump invalidates old keys gracefully; new and old can coexist during deploy |
| Enable slab automove | Default config; one slab class fills, others sit empty, hit rate degrades | Set slab_reassign=1 and slab_automove=1 | Slab calcification silently drops hit rate by 20-40% over weeks; few teams notice |
| Use binary protocol for high RPS | Text protocol default; parsing overhead at high throughput | Use binary protocol clients (libmemcached, pymemcache binary mode) | Binary protocol is roughly 2x faster on parse; matters at 100K+ ops/sec/client |
| Plan for rolling restart | Fleet restart equals 100% cache miss equals DB falls over | Restart one node at a time; warm new fleet before swapping | Deploys are routine; cold-cache outage on deploy is the most preventable Memcached fire |
| Set TTL aggressively | No TTL set; LRU eviction silently drops keys at unpredictable times | Always set explicit TTL (even 86400 for "daily") | Explicit TTL makes cache behavior predictable; LRU-only means oldest least-popular wins, hard to reason about |
Valkey
| Pattern | What Most Teams Do Wrong | The Better Way | Why It Matters |
|---|---|---|---|
| All Redis patterns apply | Treat Valkey as a different product, miss decades of Redis lore | Apply Redis best practices (stampede prevention, pipelining, hash tags) directly | Valkey is wire-compatible; the operational patterns transfer 1:1 |
| Pin to features in your fork window | Use Redis 8 features (vector sets) and assume they will land in Valkey | Stick to features available in Valkey 8.x at adoption; track Valkey roadmap for newer needs | The fork is at 7.2.4; anything past that has to be ported. Don't build on features that don't exist yet. |
| Use first-party modules | Assume RediSearch / RedisGraph will work; surprise on adoption | Use Valkey-Bloom and Valkey-JSON (AWS/Google contributed); evaluate alternatives for search | First-party modules are BSD; commercial Redis modules have license incompatibility |
| Enable I/O threading deliberately | Set io-threads high "for performance"; spinlocks degrade throughput | Set io-threads to ~half of physical cores; benchmark; iterate | I/O threading is a single dial that can make or break perf; default is conservative |
| Adopt 9.0 features carefully | Atomic slot migration and multi-DB cluster mode adopted on day one for prod | Stay on 8.1 LTS for critical workloads; pilot 9.0 features on a non-critical cluster first | 9.0 is October 2025; major features need 6+ months of fleet miles before mission-critical use |
| Plan migration from Redis with TLS settings in mind | Drop-in migration breaks because TLS ciphers or auth modes differ | Test in staging with prod TLS config; many managed-Valkey vendors wrap auth differently | The "drop-in" promise is true for wire protocol; auth and TLS often need glue code |
KeyDB
| Pattern | What Most Teams Do Wrong | The Better Way | Why It Matters |
|---|---|---|---|
| Tune server-threads to physical cores | Set threads equal to vCPUs in cloud VMs; spinlocks contend | Set threads to physical cores; pin CPUs; avoid noisy-neighbor VMs | KeyDB's spinlock model assumes exclusive cores; oversubscription destroys the perf claim |
| Plan for migration off KeyDB | Treat KeyDB as a long-term stable choice | Have a Valkey or Redis migration plan documented; pin to Redis 6.2 compatible features | Snap's strategic ambiguity means roadmap risk; the option value of being migration-ready is real |
| Avoid concurrent same-key writes across active-active peers | Trust active-active to "just work"; LWW silently loses writes | Partition writes by key prefix per region; or use single-leader with fast promotion | LWW is fine for cache (recompute on miss); for anything close to durable state, it's a foot-gun |
| Treat Flash tier as cost optimization, not capacity | Enable Flash to "fit more data"; tail latency suffers | Use Flash only when the workload is genuinely hot-cold; benchmark P99 vs RAM-only | Flash hits are 10-50x slower; for uniform access, you've added variance for no benefit |
| Self-support readiness | Assume GitHub issues will get answered | Internal runbook for common KeyDB failures; budget for self-debugging | No SLA, no commercial escalation; production support is your team's responsibility |
Amazon CloudFront
| Pattern | What Most Teams Do Wrong | The Better Way | Why It Matters |
|---|---|---|---|
| Default to CloudFront Functions over Lambda@Edge | Use Lambda@Edge for everything because it's "more capable" | Start with CloudFront Functions; escalate to Lambda@Edge only on network or SDK need | For 1B requests/month, Functions cost $100 vs Lambda@Edge $600+; the default choice is a 6-10x cost gap |
| Use Origin Shield deliberately | Enable everywhere for "better cache hit" | Enable only when origin is expensive (Lambda, complex queries) or cache hit ratio is low | Origin Shield adds a hop and a per-GB charge; for high-cache-hit static workloads it's a net negative |
| Normalize cache keys to reduce variance | Default cache key includes all query strings, headers; cache hit ratio is terrible | Use CloudFront Functions to canonicalize; whitelist only meaningful query params | One uncontrolled query param can cut hit rate by 80%; classic engineering miss |
| Use signed cookies, not signed URLs, for sessions | Sign every URL; user shares URL, downstream cache hit suffers | Sign cookies for session-bounded access; URLs stay cacheable | Signed URLs are per-user; signed cookies allow caching the same URL across the user's session |
| Route via Origin Failover | Single origin, no fallback; origin outage equals user-visible 5xx | Configure origin groups with failover criteria (502, 504, etc.) | One config change, multi-origin resilience without app changes |
| Use cache policies, not legacy "Cache Based on Selected Request Headers" | Use the legacy radio buttons; cache key is messy | Define explicit cache policies as IaC (CDK, Terraform); version them | Cache policy is reusable across distributions; legacy mode forces per-distribution tweaks and drift |
Cloudflare
| Pattern | What Most Teams Do Wrong | The Better Way | Why It Matters |
|---|---|---|---|
| Multi-CDN failover, not Cloudflare-only | Trust the SLA; nothing else needed | Active-passive with CloudFront or Fastly; DNS-failover or active-active anycast | 2025-11-18 outage took down sites globally for hours; multi-CDN is the proven mitigation |
| Use Cache API for hot per-POP data, KV for global reference | Use KV for everything because it's "the data layer" | Cache API for fresh fetches in same POP; KV for config / reference data | KV has 1 write/sec/key and seconds of replication lag; Cache API has neither |
| Workers Smart Placement for origin-heavy workloads | Workers run at every POP, including ones far from origin | Enable Smart Placement to colocate Worker with origin region | For workloads where the Worker mostly calls origin, placing it next to origin reduces P99 significantly |
| Use Tiered Cache for low-hit-rate origins | Trust default per-POP cache | Enable Tiered Cache (Smart Routing) so misses funnel through fewer POPs | Per-POP miss multiplies origin load by N; tiered cache reduces it by an order of magnitude |
| Pin Bot Management whitelist for known crawlers | Trust the defaults; monitoring or AI crawlers get challenged | Whitelist known user-agents and IPs; tune rules per route | Cloudflare's Bot Management defaults block more than people realize; over-blocking is a silent bug source |
| Plan for 2.8 ToS implications | Build a media-streaming product on Free / Pro plan | Negotiate enterprise contract before traffic profile shifts | The unmetered-bandwidth promise has limits; surprise enterprise sales call mid-launch is bad |
Fastly
| Pattern | What Most Teams Do Wrong | The Better Way | Why It Matters |
|---|---|---|---|
| Use surrogate keys for cache invalidation | Invalidate by URL pattern; brittle, hard to map content to URL | Tag responses with surrogate keys; purge by tag | Surrogate-key purge is Fastly's killer feature; URL-based purge is fragile and slow |
| Origin Shielding placement | Use the default shield POP; not necessarily closest to origin | Set shield to the POP closest to origin region | Shield-to-origin latency directly affects cache miss tail latency |
| VCL discipline | Pile logic into vcl_recv; complex if/else trees | Use Fastly's subroutine convention; split logic by phase (recv, hash, fetch, deliver) | VCL phases have semantic meaning; mixing them causes subtle bugs (e.g., cache key set after lookup) |
| Compute@Edge for the right workloads | Rewrite everything in Compute@Edge because "Wasm is the future" | Use Compute@Edge for request enrichment, A/B routing, complex auth; keep simple caching in VCL | Wasm cold-start is near-zero but each invocation has cost; VCL is free per-request |
| Use Edge Dictionaries for config | Hardcode redirect maps and feature flags in VCL | Use Edge Dictionaries; update without redeploying VCL | VCL deploys go through compile; dictionary updates are near-instant via API |
| Stale-while-revalidate aggressively | Short TTL plus no stale handling; origin sees miss-storms | Long TTL plus stale-while-revalidate plus surrogate-key purge | Combining long TTL with fast invalidation is the whole point of Fastly; lots of teams underuse this |
Varnish Cache
| Pattern | What Most Teams Do Wrong | The Better Way | Why It Matters |
|---|---|---|---|
| VCL phase discipline | Cram all logic into vcl_recv | Place logic in the right phase: vcl_recv, vcl_hash, vcl_backend_fetch, vcl_deliver | Phases run in a specific order with specific available variables; misplacement causes silent misbehavior |
| Use surrogate keys (xkey VMOD) | Purge by URL pattern; brittle, slow | Tag responses with surrogate keys via xkey; purge by tag | The single highest-leverage Varnish pattern; most production Varnish fleets that don't use it should |
| Cluster purge fanout | PURGE on one node; assume cluster-wide effect | Use Varnish Broadcaster or custom HTTP-fanout to propagate | Without fanout, multi-node clusters serve inconsistent content; user-visible bug |
Use std.log for VCL debugging, not regsub-heavy headers | Add 20 debug headers in vcl_deliver; pollute responses | Use std.log + varnishlog for debugging; ship logs separately | Production responses with debug headers are noise; logs are a better audit trail |
| Grace and keep tuning | Default 10s grace; origin outage triggers user-visible errors | Set beresp.grace to minutes-hours; combine with stale-while-revalidate | Grace is Varnish's resilience superpower; underused by teams used to TTL-only thinking |
| Always run a TLS sidecar (Hitch, NGINX, HAProxy) | Try to terminate TLS in Varnish; discover it's not supported | Hitch on same host for TLS; pass cleartext to Varnish via UDS | The pattern is so universal it should be the default in any Varnish runbook |
NGINX
| Pattern | What Most Teams Do Wrong | The Better Way | Why It Matters |
|---|---|---|---|
Enable proxy_cache_lock for stampede prevention | Cache key expires; 1000 requests hit origin simultaneously | Set proxy_cache_lock on to serialize cache fills | The single most missed NGINX cache directive; one line prevents the most common origin overload |
Use proxy_cache_use_stale for origin outages | Origin 500s; users see 500s | Configure proxy_cache_use_stale error timeout updating | Two lines of config turn cache into resilience layer; the equivalent of Varnish grace |
| Cache key normalization | Default cache key includes full query string; low hit rate | Use map to normalize; cache key includes only meaningful query params | Same impact as on CloudFront: uncontrolled query params kill hit rate |
| Tune worker count to actual cores | worker_processes auto; not always right under containerized limits | Set explicitly to physical core count; pin with worker_cpu_affinity at high scale | In containers, auto reads cgroup limits incorrectly on older versions; perf suffers silently |
| Health checks via NGINX Plus or third-party | Trust upstream is up; one failing backend gets traffic | NGINX Plus active health checks; or nginx-upstream-dynamic-servers module for OSS | Without active health checks, NGINX only knows about failures after a request fails; first request after failure pays the cost |
Use open_file_cache for static-heavy workloads | Static file metadata lookup per request; stat() bottleneck | Enable open_file_cache to cache fd metadata | For high-RPS static workloads, file metadata cache is the difference between 10K and 100K RPS per worker |
Advanced / Next-Gen Alternatives
What's replacing or augmenting each technology, and what to watch.
Redis
| Successor / Alternative | What It Improves | Maturity | Migration Cost | When To Consider |
|---|---|---|---|---|
| Valkey | BSD license, faster community velocity, atomic slot migration | Production | Low (wire-compatible) | If AGPL/RSAL is a blocker; new builds default to Valkey |
| DragonflyDB | Multi-threaded from the ground up, claims 25x throughput on big nodes | Emerging | Medium (Redis API mostly; some semantic gaps) | If single-node throughput is the bottleneck and full Redis-API parity isn't required |
| Microsoft Garnet | C# implementation, RESP-compatible, high throughput, advanced storage tiers | Emerging | Medium | Microsoft-stack shops; experimental for high-end research-grade perf |
| Amazon MemoryDB | Multi-AZ durable Redis with consensus-backed writes | Production | Low | When you need cache plus database guarantees, not just cache |
Memcached
| Successor / Alternative | What It Improves | Maturity | Migration Cost | When To Consider |
|---|---|---|---|---|
| Valkey / Redis with threaded I/O | Adds replication, persistence, rich types while approaching Memcached throughput | Production | Medium (different API, but more capable) | When you've outgrown raw GET/SET |
| Meta's CacheLib (open-sourced) | In-process embedded cache, used to power Meta's caching | Production | High (embedded library, not service) | When you want cache as a library inside your service, not a separate fleet |
| Aerospike | Hybrid memory plus SSD, sub-ms even at TB scale, multi-DC replication | Production | High (different data model) | When cache size outgrows RAM economics and you still need sub-ms reads |
| Hyperscale alternatives (CacheLib, Meta TAO) | Purpose-built for scale beyond what Memcached envisioned | Production | Very high | When you operate at Meta or Google scale; otherwise stick with Memcached |
Valkey
| Successor / Alternative | What It Improves | Maturity | Migration Cost | When To Consider |
|---|---|---|---|---|
| Redis 8 (AGPLv3) | Vector sets, newer modules, faster feature velocity | Production | Low (wire-compatible) | If AGPLv3 is acceptable and you need Redis 8 features |
| DragonflyDB | Multi-threaded from scratch, BSL license, drop-in API | Emerging | Medium | When threading model is the bottleneck and BSL is acceptable |
| Amazon MemoryDB for Valkey | Multi-AZ durable Valkey with consensus writes | Production | Low | When you need stronger durability than async replication offers |
KeyDB
| Successor / Alternative | What It Improves | Maturity | Migration Cost | When To Consider |
|---|---|---|---|---|
| Valkey 8+ with I/O threading | Active LF stewardship, faster community velocity | Production | Low | Anytime; this is the natural migration path |
| DragonflyDB | True multi-threaded execution, not just I/O | Emerging | Low (Redis-compatible) | When you adopted KeyDB specifically for threading and want to push further |
| Redis Enterprise (commercial) | Production-grade Redis with paid support and CRDB for multi-DC | Production | Low | If active-active was the KeyDB feature you needed and now you need vendor support |
Amazon CloudFront
| Successor / Alternative | What It Improves | Maturity | Migration Cost | When To Consider |
|---|---|---|---|---|
| Cloudflare | Bigger POP footprint, V8 isolates, free tier, R2 zero egress | Production | Medium (auth re-design, IAM equivalents) | Cost-sensitive workloads, multi-cloud, developer experience matters |
| Fastly | Instant purge, VCL programmability, fewer-larger POPs | Production | Medium (VCL learning curve) | High-write content (news, sports, inventory) where purge speed matters |
| Specialized media CDNs (BlazingCDN, Bunny) | Order-of-magnitude lower per-GB cost for media workloads | Production | Medium (signed URL, DRM rework) | Pure video / large-file delivery where cost dominates the decision |
Cloudflare
| Successor / Alternative | What It Improves | Maturity | Migration Cost | When To Consider |
|---|---|---|---|---|
| Multi-CDN (Cloudflare + CloudFront/Fastly) | Eliminates single-vendor blast radius (Nov 2025 outage was global) | Production | Medium-high (DNS, config drift, multi-bill) | Mission-critical workloads where 4+ hours of outage is unacceptable |
| Fastly Compute@Edge | Wasm runtime, multi-language, comparable performance | Production | High (rewrite from Workers JS to Wasm) | When you need a single vendor switch off Cloudflare |
| AWS CloudFront + Lambda@Edge / Functions | AWS-native integration, less single-vendor risk than Cloudflare for AWS-heavy stacks | Production | Medium | If your origin is AWS-deep, CloudFront is the obvious second CDN |
Fastly
| Successor / Alternative | What It Improves | Maturity | Migration Cost | When To Consider |
|---|---|---|---|---|
| Cloudflare Workers | Larger ecosystem, V8 isolates, more POPs in long-tail markets | Production | High (VCL to JS rewrite) | When VCL programmability is no longer worth the language overhead |
| AWS CloudFront + Lambda@Edge | AWS-native integration, predictable enterprise sales cycle | Production | Medium-high | When AWS-deep origin makes CloudFront's integration story compelling |
| Self-hosted Varnish + own POPs | Total control, no SaaS vendor risk | Production | Very high (operate own CDN) | Very rare: compliance or geography forces self-hosting |
Varnish Cache
| Successor / Alternative | What It Improves | Maturity | Migration Cost | When To Consider |
|---|---|---|---|---|
| Fastly | Same VCL lineage, fully managed, global anycast | Production | Low-medium (VCL ports with some Fastly-specific changes) | When operating Varnish is no longer worth the engineering effort |
| Varnish Enterprise (commercial) | MSE persistent storage, native clustering, paid support | Production | Low | When you want to stay self-hosted but need enterprise features |
| NGINX Plus | Add caching to existing NGINX without learning VCL | Production | Medium | When team already runs NGINX and Varnish's specialized cache features aren't critical |
| Apache Traffic Server | Yahoo-scale-proven, similar HTTP-cache focus, more permissive config | Production | High (different config language, different ops model) | Rare; mostly for very large CDN-like internal deployments |
NGINX
| Successor / Alternative | What It Improves | Maturity | Migration Cost | When To Consider |
|---|---|---|---|---|
| Envoy | Modern data-plane for service mesh, xDS dynamic config, gRPC native | Production | High (different config model, different operational story) | Microservices / service-mesh context; NGINX feels heavyweight |
| Cloudflare Pingora | Rust-based proxy, multi-threaded, used to power Cloudflare's edge | Emerging | Very high (library, not config; Rust) | When you want NGINX-class perf but the C codebase is the issue |
| HAProxy plus Varnish | Best-of-breed: HAProxy for LB/TLS, Varnish for cache | Production | Medium (operate two services) | When neither role is a side concern; you want specialists |
| Caddy | Automatic HTTPS, simpler config, modern Go-based | Production | Low for simple use cases | SMB and developer-facing deployments where NGINX config feels heavy |