L7 Proxies / Load Balancers / API Gateways

Principal Engineer trade-off analysis across Nginx, HAProxy, Envoy, Pingora, and Kong.

Category Sweep L7 Data Plane API Gateway

As of 2026-06-02

PE Verdict

The proxy market split into three lanes by 2026. HAProxy stays the throughput king for stable backend topologies (around 42K RPS per Kubernetes ingress benchmark, lowest p99). Envoy won the service mesh and AI gateway race on the strength of xDS, dynamic reconfiguration without restart, and a healthy filter chain (WASM, ext_proc, Lua). Pingora is the post-Nginx future for organizations that own their proxy code: Cloudflare reports roughly 70% memory and 60% CPU reduction versus Nginx at trillion-request-per-day scale.

Nginx remains dominant by deployment count, but governance pressure is real. The community-maintained ingress-nginx Kubernetes project is being retired in March 2026, and developer-led forks (Freenginx, Angie) emerged after F5 acquisition friction. Treat Nginx as the safe incumbent for greenfield small/medium workloads but assume migration risk on the 3-5 year horizon. Kong is the right choice when you want a pre-built plugin marketplace (auth, rate limit, transformations) and accept the OpenResty/Lua event-loop ceiling under CPU-bound plugins.

The non-obvious call: if you are doing AI inference gateways in 2026, start with Envoy AI Gateway, not Kong. Token-aware rate limiting and per-model priority routing land natively on Envoy ExtProc, and the ecosystem is moving there.

Best default choices

NginxStatic + reverse proxy on teams that know it; plan your ingress-nginx migration path HAProxyPure L4/L7 load balancing and database TCP proxying where tail latency is paramount EnvoyService mesh, AI inference gateways, and dynamic API gateways with xDS and OpenTelemetry PingoraCustom proxy products at hyperscale where Rust safety and lock-free pooling justify the curve KongREST API gateway with self-service plugin needs: auth, rate limiting, AI routing out of the box

1. Trade-Offs (per technology)

Each row is a deliberate "X for Y" choice. Gains are specific. Costs are operational. PE Nuance is what most engineers miss until on-call.

Nginx

Default for static serving combined with reverse proxying on teams that know it well; governance risk from F5 is real — anyone on ingress-nginx in Kubernetes needs a migration plan before March 2026.

Trade-Off	What You Gain	What You Give Up	When It Bites You	PE Nuance
Static config + reload model over dynamic API	Simple mental model, predictable behavior, file-based GitOps works trivially	No service-discovery-driven endpoint updates without 3rd party (consul-template, nginx-plus API, nginx-mesh)	Microservices fleet with 200+ services churning endpoints every minute. Reload storms cause connection drops and 502s during deploys.	Open-source Nginx `reload` spawns new workers and drains old ones, but during the drain window memory doubles. At 5K+ worker config, this is enough to OOM on small instances.
C-based event loop over thread pool model	Tiny memory per connection (around 2-4 KB), 50K+ idle connections per worker	Single CPU-bound module call blocks the whole worker, kneecapping latency for every other connection on that worker	A Lua script doing JSON parsing on every request. p99 latency triples under load even though average looks fine.	Memory CVEs in C-based proxies are a structural risk, not a tooling problem. The post-2020 push to Rust (Pingora, linkerd2-proxy) is precisely a response to this.
Lua plugin model via OpenResty over native modules	Customization without C compilation, large ecosystem (Kong sits on this)	LuaJIT memory ceiling around 2 GB per worker, CPU-bound Lua starves the event loop	Heavy auth plugin under traffic spike. Workers run Lua GC, p99 spikes 10x for the duration.	OpenResty trace flags (`lj-arch`, `lj-state`) are mandatory for any serious Lua deployment. Most teams discover this only after the first production incident.
Synchronous worker model over async I/O	Deterministic CPU pinning, NUMA-friendly, easy capacity planning	Connection reuse across workers is impossible without shared memory tricks	TLS handshake cost compounds: 4 workers with separate TLS session caches means 4x the handshake load on origin.	This is the specific reason Cloudflare built Pingora. At 1T requests/day, per-worker isolation became dominant overhead.
F5 corporate stewardship over community-led project	Commercial support, FIPS builds, dedicated security team, integration with F5 BIG-IP	Strategic direction outside community control (Freenginx fork, Angie fork)	Critical CVE released against a feature you depend on, and disclosure timing was a corporate decision rather than community-aligned.	For 2026+ greenfield projects, this is a real signal. The retirement of `ingress-nginx` in March 2026 is the canary. Anyone deploying Nginx in k8s should be planning a migration path.
Battle-tested over feature-current	15+ years in production, every edge case documented somewhere on Stack Overflow	HTTP/3 support arrived late and is still flagged experimental in many builds; gRPC support is functional but unloved	Greenfield team wants HTTP/3 + gRPC streaming + WebSocket multiplex in one proxy. Nginx is the wrong starting point.	"Battle-tested" cuts both ways at PE level. The code is stable but conservative. Innovation has migrated to Envoy and Pingora.
Single binary over framework	Configurable as a web server, reverse proxy, mail proxy, TCP load balancer, all from one binary	Jack-of-all-trades architecture leaves performance gaps versus specialized proxies	You need pure L4 load balancing at line rate. HAProxy or kernel-bypass options beat Nginx by 30-40%.	The breadth was a 2010-era win when teams had one tool budget. In 2026, separation of concerns (Envoy for mesh, Nginx for web, HAProxy for L4) is the more defensible architecture.
Configuration DSL over declarative API	Powerful pattern matching with map/geo/regex, conditional logic via `if`	Conditional `if` blocks are notoriously broken in location contexts ("if is evil" is official guidance)	New team member writes a seemingly correct `if`-based redirect rule, and traffic silently disappears for certain header combinations.	The Nginx `if` behavior is one of the most-cited examples of a config DSL leaking implementation details to operators. Compare to Envoy's RouteConfiguration which is explicit but verbose.

HAProxy

Default for pure load balancing workloads where throughput ceiling and sub-ms tail latency matter more than dynamic routing, plugin extensibility, or service mesh integration.

Trade-Off	What You Gain	What You Give Up	When It Bites You	PE Nuance
Specialized load balancer over web server	Best-in-class throughput, around 42K RPS per Kubernetes ingress benchmark, around 50% CPU vs 73% for Envoy at the same load	No static file serving, no caching layer (must front it with Varnish or Nginx if needed)	You wanted "one proxy" that also serves static assets. HAProxy is not that. You end up running HAProxy + Nginx + Varnish.	The narrow scope is the reason for the throughput. At PE scale, this is a feature: do one thing, do it at line rate.
Configuration file over dynamic API (with Runtime API as escape hatch)	Predictable, GitOps-friendly, no control plane needed for stable backends	Dynamic endpoint updates require Runtime API socket commands or DPAPI (HAProxy 2.4+)	Service mesh use case with constantly-changing endpoint lists. Possible but feels grafted on, not native.	The Data Plane API gets HAProxy closer to Envoy parity, but the design center is still static config. If you need xDS-grade dynamism, you are fighting the tool.
Multi-threaded over single-threaded with workers	True multithreading (since HAProxy 1.8), better core utilization, shared session table	Thread synchronization overhead under heavy lock contention on shared maps	Sticky session table with millions of entries and high churn. Lock contention becomes visible.	HAProxy's threading is more sophisticated than Nginx's worker isolation but less aggressive than Envoy's per-thread isolation. The middle path is good for L4 but compromises L7 customization.
Mature TCP/HTTP focus over service mesh feature set	Rock-solid L4 (TCP proxy), unbeatable for non-HTTP protocols (MySQL, PostgreSQL, Redis)	No native gRPC streaming optimization, no service mesh sidecar story, limited service discovery integration	Internal microservices comm needs proxying. HAProxy works but Envoy is the architecturally aligned choice.	HAProxy at the edge plus Envoy in the mesh is a common, defensible PE-level topology. Don't try to make HAProxy do mesh work.
Built-in observability (stats page, Prometheus exporter) over distributed tracing	Excellent low-cost metrics out of the box, robust admin socket for live introspection	Distributed tracing integration is grafted on, weaker than Envoy's native OpenTelemetry	You need per-request span context propagation across a 12-hop service path. Envoy gives this natively; HAProxy needs work.	HAProxy 3.2 (Oct 2025) significantly improved tracing, but the design center is still aggregate metrics, not request-level traces.
ACL-based routing over per-route filter chains	Highly composable, fast to evaluate, easy to read once you internalize the syntax	Complex per-route policy chains require ACL composition that gets unreadable past 5-6 conditions	Auth policy varies by tenant, region, and feature flag. The ACL config grows to thousands of lines and ownership becomes opaque.	For routing logic that complex, you have outgrown HAProxy and should be on a gateway (Kong, Envoy AI Gateway) where policy is a first-class data model.
Open source with commercial HAProxy Enterprise option	Vendor model is rational: enterprise features (WAF, advanced rate limiting, admin UI) cost money, core is fully free	Some operationally-valuable features (multi-cluster sync, dashboard) are paid tier only	You build a free-tier deployment, hit operational friction, and the upgrade path is "buy HAProxy Enterprise" with non-trivial pricing.	Compared to Nginx (F5 acquisition friction) and Envoy (no commercial entity), HAProxy's commercial model is the most predictable. That stability is itself a PE-relevant signal.

Envoy

Default for service meshes and AI inference gateways where xDS dynamic config, filter chain extensibility, and native OpenTelemetry justify the operational cost of running a control plane.

Trade-Off	What You Gain	What You Give Up	When It Bites You	PE Nuance
xDS dynamic config over static files	Zero-restart endpoint, route, listener, and secret updates from a control plane. The defining feature.	You need a control plane (Istio, Consul, Envoy Gateway, custom). Significant operational surface.	Greenfield team picks Envoy without realizing they also need to operate a control plane. Most of the cost is there, not in Envoy itself.	Incremental xDS (Delta) is now default in Istio 1.22+. If you are on classic SotW (State of the World), config push storms on large meshes are still a real failure mode.
C++ over Rust or C	Mature C++17, well-trodden ground for high-performance networking, large contributor pool	Memory-safety class of CVEs still possible, harder to onboard contributors versus Go or Rust	Custom filter authored in C++ has a use-after-free bug, hits production after fuzz testing missed the edge case.	The shift from C++ to Rust at the proxy layer is a multi-year trend (Pingora, linkerd2-proxy). Envoy is unlikely to follow but the gap will matter for security-conscious orgs.
Filter chain extensibility over single-purpose proxy	WASM, Lua, ext_proc (gRPC), ext_authz, native C++ filters: any logic at any point	WASM filters have non-trivial perf cost (often 20-40% latency add), ext_proc adds a network hop	Auth via ext_proc adds 5-15ms p99 per request. At 50 RPS per user, that's a noticeable user-facing slowdown.	ext_proc is the right choice for stateful auth where logic changes weekly. WASM is the right choice for stable, perf-sensitive logic. Native C++ filters are the right choice when both apply and you can staff the maintenance.
Per-thread isolation over shared worker model	No locks on hot paths, predictable tail latency, easy to reason about thread pinning	Connection pool fragmentation across threads, more memory overhead than Nginx	High keepalive-heavy workload. Each thread maintains its own pool, total memory usage scales with thread count.	Connection pool sharing is one of the specific pain points Pingora was designed around. Envoy has improved (connection_pool_per_downstream_connection options) but it's not as elegant as Pingora's lock-free hot pool.
Native observability over add-on metrics	Built-in OpenTelemetry, statsd, Prometheus, admin endpoint with config_dump for diagnostics	High cardinality stats can balloon memory; admin endpoint is a security risk if exposed	Production Envoy with thousands of clusters and per-route stats hits 4-6 GB memory just for the stats subsystem.	Use `match`-based stats inclusion lists in production, never the default. Most teams find out about stats cardinality cost only after the OOM.
Service mesh DNA over standalone proxy	Cleanest fit for mTLS-everywhere, identity-based routing, traffic shifting, fault injection	Standalone use feels heavyweight; you carry mesh complexity even if you only need an LB	Single-team need for a basic reverse proxy ends up running an Envoy + control plane stack worth a quarter of engineering time.	For single-LB use cases, Envoy is overkill. For multi-team microservices with identity, traffic policies, and observability requirements, nothing else competes.
HTTP/2, HTTP/3, gRPC first-class over HTTP/1.1 first	Modern protocols natively, gRPC streaming, HTTP/3 QUIC stable	Some HTTP/1.1 edge cases (chunked encoding, trailer handling) get less attention	Legacy HTTP/1.1 client with non-RFC-compliant headers (common in mobile SDKs and old CDN paths) gets rejected or misparsed.	Cloudflare specifically called this out as a reason for Pingora: "bizarre and non-RFC compliant HTTP traffic." If your traffic includes a lot of weird clients, Envoy is stricter than Nginx, which is stricter than Pingora.

Pingora

Default for teams building a custom proxy product in Rust, where the lock-free cross-thread connection pool and memory safety by construction justify the absence of declarative config.

Trade-Off	What You Gain	What You Give Up	When It Bites You	PE Nuance
Library/framework over binary product	You compose the proxy you want from Rust callbacks; total control over request lifecycle	No drop-in `nginx.conf` equivalent; every deployment is a custom Rust application	Team wants a quick reverse proxy with a YAML config. Pingora is the wrong choice for that use case.	Pingora competes with libraries (Tower, hyper) more than with binary proxies. The right comparison for "deploy a proxy" is Nginx/Envoy/HAProxy. The right comparison for "build a custom proxy product" is Pingora.
Rust memory safety over C/C++ performance ceiling	Eliminates buffer overflow, use-after-free, dangling pointer CVE classes by construction	Steeper learning curve, smaller hiring pool versus C++ or Go	Team needs to extend the proxy and you discover Rust async (Tokio runtime, lifetime juggling) is a real cliff for new engineers.	The hiring constraint is real. If you don't have at least 2 senior Rust engineers, picking Pingora means paying ramp cost in addition to development cost.
Async multithreaded over per-process workers	Cloudflare reports about 70% memory and 60% CPU reduction versus their Nginx baseline at trillion-request-per-day scale	Tokio async-Rust has its own learning curve and debugging surface (stack traces are async-aware but uglier)	Race condition in shared state shows up only at high concurrency. Reproducing in test is hard.	The async-Rust ecosystem matured significantly 2024-2026 (tokio-console, tracing). It's no longer the "experimental" tier it was in 2022.
Lock-free connection pool shared across threads over per-worker pools	Massively higher connection reuse, fewer TLS handshakes to origin, lower tail latency on warm paths	Implementation complexity is real; lock-free data structures are a source of subtle correctness bugs	Custom transport layer extension introduces a race that triggers only under specific connection-recycling patterns.	This is the specific design choice that makes Pingora superior to Nginx at Cloudflare scale. Lower-scale orgs may not see the same gains; the architecture pays off most where origin TLS cost is dominant.
Programmable callbacks over declarative config	Total flexibility: every phase (request, upstream selection, response, logging) is a Rust function you own	No GUI, no YAML, no "ops team configures it" model. Every change is a code change and a deploy.	Non-engineer ops team needs to add a redirect rule. With Nginx they'd edit a file; with Pingora they file a ticket.	For the "platform team builds proxy, app teams consume" model this is a feature, not a bug. For the "self-service via config" model, it's wrong.
Cloudflare-led roadmap over CNCF or community governance	Battle-tested at Cloudflare's 40M+ HTTP requests per second; bugs found in real traffic, fixed fast	Pre-1.0 API stability not guaranteed; roadmap optimizes for Cloudflare's needs first	You bet on a specific Pingora API, then Cloudflare changes the trait signature in v0.8 and your downstream code breaks.	ISRG (Internet Security Research Group) is funding adoption and the River project to provide a higher-level abstraction. The ecosystem is maturing but it is not yet ready for "anyone can adopt" status.
HTTP/1, HTTP/2, gRPC, WebSocket support over universal protocol	Covers the proxy protocols 95% of services need	HTTP/3 / QUIC was on the 2025 roadmap, status varies by build	Team needs production HTTP/3 today; check current Pingora support before committing.	If HTTP/3 is mandatory, Envoy and Nginx Plus are ahead. If you can wait 6-12 months, Pingora is catching up fast.

Kong

Default for REST API gateways where auth (OIDC, JWT), rate limiting, request transformations, and AI routing are needed from a plugin marketplace rather than built from scratch.

Trade-Off	What You Gain	What You Give Up	When It Bites You	PE Nuance
API gateway abstraction over raw proxy	First-class concepts (Service, Route, Consumer, Plugin) match how API teams think about gateways	Heavier than Nginx alone; you carry the gateway abstraction even when you only need a reverse proxy	Single-service deploy uses Kong because "it's the standard," carries database + admin API + control plane overhead unnecessarily.	If the answer to "what does this proxy do?" is "route traffic," skip Kong. If it's "auth, rate-limit, transform, observe, then route," Kong starts to pay off.
OpenResty / Lua plugin model over native or WASM	Plugins are quick to write, hot-reload friendly, accessible to non-systems engineers	Lua plugins share Nginx's event-loop ceiling; CPU-bound plugins (regex, JWT verify, JSON transform) starve other requests	Heavy auth plugin under traffic spike, p99 latency spikes for the entire gateway, not just authed requests.	Plugin ordering matters; light filtering plugins (ACL, IP restriction) should run before heavy plugins. This is operator knowledge that doesn't show up in docs prominently enough.
DB-backed config (Postgres) over DB-less mode	Multi-node config sharing, admin API works across cluster, declarative GitOps via decK	Postgres becomes a critical dependency; outage of Postgres can degrade Kong control plane (data plane usually stays up)	Postgres connection pool exhausted under config-change storm during a deploy, admin API stops responding.	DB-less mode (config from YAML/JSON, no Postgres) is the right default for new deployments. DB-backed mode is legacy and adds operational surface most teams don't need.
Open-source core with Kong Enterprise paid features	Free tier is functional for most use cases (basic auth, rate limit, transformations)	The plugins most teams reach for at scale (OIDC, advanced rate limiting, plugin ordering, FIPS) are enterprise-only	Team builds on free tier, hits an OIDC requirement, has to either reimplement or upgrade. Upgrade cost is significant.	Kong's commercial model is more aggressive than HAProxy's. Plan for the enterprise upgrade decision as part of adoption, not as a surprise later.
L7 HTTP/REST/WebSocket focus over universal L4 + L7	Optimized for the dominant use case (REST API gateway)	Native gRPC, TCP, UDP support is limited; for pure L4 work, HAProxy or Envoy is better	Team needs gRPC streaming with bidirectional flow and full HTTP/2 features. Kong gets you 70% there, the rest is a struggle.	For full gRPC service mesh, Envoy is the right tool. Kong is for REST APIs and is unapologetic about it.
Plugin marketplace over build-everything-yourself	300+ plugins covering common needs (JWT, OAuth2, OIDC, Datadog, transformations, Kafka, AI plugins)	Plugins vary in quality; most-needed plugins (advanced rate limit, OIDC) are enterprise; some community plugins are unmaintained	Team picks a community plugin, it has a memory leak, no maintainer to fix it.	Evaluate plugins like dependencies: maintained, tested at your scale, security-reviewed. The "300+ plugins" headline number hides quality variance.
AI Gateway features added 2024-2026 over pure REST gateway	Multi-LLM routing, semantic caching, prompt guards, AI-specific rate limiting now in Kong	Envoy AI Gateway is also evolving fast, with deeper integration into the Envoy ext_proc model	Bet on Kong AI plugins, then Envoy AI Gateway ships features Kong doesn't have for 12 months.	In 2026, both products are racing for the AI gateway segment. Pick based on existing infrastructure (Envoy mesh → Envoy AI Gateway; Kong-centric API platform → Kong AI plugins).

2. Use Cases (per technology)

Driving Property is the single attribute that ruled out alternatives. Why Not Alternative is the specific reason another reasonable choice was rejected.

Nginx

Use Case	Company / Scenario	Driving Property	Scale Dimension	Why Not Alternative
Static site + reverse proxy on a single VM	Mid-size SaaS shipping a marketing site plus API	One binary serves static, TLS-terminates, and proxies to backend	5-20K RPS per node, single instance	HAProxy doesn't serve static; Envoy needs a control plane; Kong is overkill
Ingress controller for legacy Kubernetes deployments	Hundreds of teams on the soon-retiring `ingress-nginx` project (community version retires March 2026)	Familiar Ingress resource model, mature ecosystem	10K-50K services across thousands of clusters globally	HAProxy ingress is faster but less ubiquitous; F5 NGINX Ingress is the migration target now that community version is retiring
TLS termination + caching layer in front of origin	Mid-size media site with CDN miss path	Built-in HTTP cache, microcaching for hot paths, low memory	30K RPS sustained, 200 GB local cache	Varnish has a stronger cache model but a separate config language; Nginx wins on operational simplicity
Lua-extended platform layer	Internal platform team extending Nginx via OpenResty for custom routing	Lua scripting at every phase, runtime modification of behavior	50K RPS with custom logic on every request	Envoy WASM has higher overhead per call; Kong adds gateway abstractions you don't need
Multi-tenant SaaS L7 routing by Host header	SaaS with thousands of customer subdomains routed to backends	`map` directive scales to millions of entries with predictable lookup cost	20K+ tenant domains, single-digit ms p99 routing decisions	HAProxy's ACL-based routing is comparable but less ergonomic; Envoy needs runtime config sync

HAProxy

Default for pure load balancing workloads where throughput ceiling and sub-ms tail latency matter more than dynamic routing, plugin extensibility, or service mesh integration.

Use Case	Company / Scenario	Driving Property	Scale Dimension	Why Not Alternative
L4 database load balancer (MySQL, PostgreSQL, Redis)	Fintech routing reads across replicas, writes to primary	TCP proxy at line rate with health-check awareness	50K connections, sub-ms proxy overhead	Nginx stream module works but lacks the same health-check granularity; Envoy's L4 story is functional but secondary
High-throughput Kubernetes ingress	Workloads where ingress is a bottleneck	42K RPS per benchmark, lowest p99 across competitors	10K+ pods, hundreds of services per cluster	Nginx ingress retiring March 2026; Envoy uses more CPU at the same throughput
Active-passive load balancer for legacy enterprise apps	Banks, telcos with stateful TCP services in front of WebSphere or similar	Mature health checks, session persistence, predictable failover	Hundreds of backends per LB, multi-decade uptime expectations	F5 hardware LB costs more; Nginx requires Plus tier for advanced health checks
Edge LB for trading platforms	Low-latency financial venues (named: Deutsche Bank type deployments)	Lowest tail latency under heavy concurrency, predictable jitter	Sub-ms p99 at 50K+ concurrent connections	Envoy's filter chain adds latency; Nginx tail latency degrades faster under stress
SSL/TLS terminator front end with WAF integration	Compliance-heavy industry (PCI, HIPAA) needing rigorous TLS termination	FIPS-validated builds (HAProxy Enterprise), tight observability of TLS handshake metrics	5-20K TLS handshakes/sec with full mTLS	Nginx Plus is comparable but more expensive; OpenResty-based Kong adds Lua overhead on the TLS hot path

Envoy

Default for service meshes and AI inference gateways where xDS dynamic config, filter chain extensibility, and native OpenTelemetry justify the operational cost of running a control plane.

Use Case	Company / Scenario	Driving Property	Scale Dimension	Why Not Alternative
Service mesh data plane	Istio, Consul Service Mesh, AWS App Mesh users	xDS dynamic config, mTLS, identity-based routing, traffic shifting	10K+ sidecar instances per mesh, sub-second config propagation	Linkerd2-proxy is leaner but less feature-rich; Nginx mesh products are second-tier
API gateway with dynamic routing	Cloud-native companies (Lyft was the origin), platform teams building self-service APIs	Filter chain extensibility (WASM, ext_proc), HTTP/3, gRPC streaming	100K RPS, hundreds of upstream clusters, route changes per minute	Kong is more abstraction-heavy but easier to onboard; HAProxy lacks dynamic config story
AI inference gateway	Teams routing across LLM providers (OpenAI, Anthropic, internal models)	Per-model rate limiting, token-aware quotas, ExtProc for prompt processing, semantic routing	Tens of thousands of LLM requests/sec, multi-second response streaming	Kong AI plugins are catching up; native ext_proc integration in Envoy AI Gateway is currently ahead
Multi-region edge gateway	Global SaaS routing to nearest region with health-aware failover	Locality-aware load balancing, outlier detection, circuit breakers built-in	50+ regions, p99 routing decision under 1ms	HAProxy lacks built-in locality awareness; Nginx requires plugins/scripting
Observability-heavy enterprise gateway	SREs needing per-request tracing with OpenTelemetry to Tempo or Jaeger	Native OpenTelemetry exporters, distributed tracing built into the request lifecycle	10K services with full trace propagation, 1B+ spans/day	HAProxy tracing is newer and less feature-complete; Nginx tracing requires plugins

Pingora

Default for teams building a custom proxy product in Rust, where the lock-free cross-thread connection pool and memory safety by construction justify the absence of declarative config.

Use Case	Company / Scenario	Driving Property	Scale Dimension	Why Not Alternative
Hyperscale edge proxy	Cloudflare's own edge network	Lock-free connection pool across threads, 70% memory and 60% CPU reduction vs Nginx	40M+ HTTP req/sec globally, 1T+ requests/day	Nginx hit per-worker isolation ceiling; Envoy is more memory-heavy at the same throughput
Custom proxy product	CDN companies, security vendors building proxy-as-a-product	Rust library/framework model gives full control over request lifecycle	Varies by product; typically 10K-1M+ RPS per node	Building on Nginx or Envoy means fighting their config models; Pingora is designed to be embedded
Memory-safety-critical edge	Security vendors, government infrastructure, compliance-heavy industries	Rust eliminates entire classes of memory-safety CVEs by construction	Comparable to Nginx workloads with stronger CVE posture	C-based proxies carry residual memory-safety risk; auditing alternatives (BoringSSL, sandboxing) add complexity
Programmable per-request logic at scale	Internal platforms where every request hits custom routing, auth, transformation logic	Rust callbacks at every phase with zero overhead versus dynamic config	50K-500K RPS with custom logic on every request	OpenResty/Lua hits event-loop ceiling; Envoy WASM has per-call overhead
Ultra-low-latency financial proxy	Trading platforms, market data distribution	No GC pauses, predictable Rust async runtime, lock-free hot paths	Sub-100us p99 proxy overhead at 100K+ msg/sec	Go-based proxies (Traefik) have GC pauses; C++ Envoy has unpredictable tail latency from filter chains

Kong

Default for REST API gateways where auth (OIDC, JWT), rate limiting, request transformations, and AI routing are needed from a plugin marketplace rather than built from scratch.

Use Case	Company / Scenario	Driving Property	Scale Dimension	Why Not Alternative
Multi-tenant REST API gateway	SaaS exposing APIs to external developers with per-tenant rate limits, OAuth, observability	Native Consumer, Service, Route, Plugin abstractions match the problem domain	Hundreds to thousands of consumers, hundreds of services	Envoy requires building the abstractions yourself; raw Nginx requires Lua scripting for everything
Internal developer platform gateway	Platform team offering self-service API publishing for internal teams	Admin API + decK GitOps + plugin marketplace covers common needs out of the box	50-500 internal services, ops team of 2-5 engineers	Envoy AI Gateway is the modern alternative but newer; Nginx + custom tooling is more work
OIDC / OAuth2 termination point	Enterprise integrating with Okta, Auth0, Azure AD for API access control	Kong Enterprise OIDC plugin handles full OAuth2 flow including refresh, introspection	10K+ tokens/sec verified, sub-10ms auth overhead	Envoy ext_authz needs an external auth service; Nginx + Lua requires custom OIDC implementation
Multi-LLM AI gateway (2024-2026 use case)	Teams routing across OpenAI, Anthropic, Bedrock, on-prem LLMs with cost controls	AI plugins handle prompt guards, semantic cache, multi-LLM routing, token-aware billing	1K-10K LLM requests/sec, multi-provider failover	Envoy AI Gateway is the direct competitor; choice depends on existing infrastructure
Hybrid mode: control plane in central cluster, data plane at edge	Enterprise with global presence wanting central policy + edge data plane	Kong hybrid mode separates CP and DP, config flows via mTLS	Dozens of edge clusters, central control plane	Envoy + custom control plane is more work; Nginx lacks hybrid mode entirely

3. Limitations (multi-tech matrix)

Each cell describes how that limitation manifests in that tech, with severity tagged. Use the severity colors to scan for blockers in your context.

Limitation Category	Nginx	HAProxy	Envoy	Pingora	Kong
Dynamic config without restart	High OSS requires reload; Nginx Plus has API	Medium Runtime API socket + DPAPI works but feels grafted	Medium xDS native but requires control plane	Medium Programmable but code-level changes need redeploy	Medium Admin API native, hybrid mode adds complexity
Memory safety	High C-based, structural CVE class risk	High C-based, same residual risk	High C++17, better than C but not memory-safe	Medium Rust eliminates the class; `unsafe` blocks still possible	High C (Nginx) + Lua; carries Nginx's memory risk
HTTP/3 / QUIC maturity	Medium Stable but late to land	Medium HAProxy 3.0+ has HTTP/3	Medium Production-grade since 2023	Medium On 2025 roadmap, status varies by build	Medium Inherits Nginx's HTTP/3 maturity
Service mesh integration	High Not designed for mesh; F5 mesh products exist but second-tier	High Not a mesh data plane	Medium Native mesh data plane (Istio, Consul)	Medium Possible but requires custom integration	Medium Kong Mesh exists (built on Kuma/Envoy)
Filter / plugin extensibility	Medium OpenResty/Lua mature but event-loop bound	Medium Lua + SPOE + native filters; less rich than Envoy	Medium WASM, ext_proc, ext_authz, Lua, native C++	Medium Rust callbacks; full control but no plugin ecosystem	Medium 300+ plugins, mix of OSS and enterprise
HTTP/1.1 lenient parsing for legacy clients	Medium Reasonably lenient, decades of accumulated tolerance	Medium Strict by default, configurable	High Strict RFC compliance can reject legacy clients	Medium Designed to handle Cloudflare's "bizarre" HTTP traffic; relatively lenient	Medium Inherits Nginx's leniency
Operational governance	High F5 ownership, community fork pressure (Freenginx, Angie), ingress-nginx retiring March 2026	Medium HAProxy Technologies stable, predictable	Medium CNCF graduated, broad governance	Medium Cloudflare-led, Apache 2.0; pre-1.0 API stability not guaranteed	Medium Kong Inc. led; open core with significant enterprise lock-in
Per-instance memory at 50K+ idle conns	Medium Around 2-4 KB per conn, low	Medium Comparable to Nginx	High Higher per-conn overhead, around 4-8 KB plus thread overhead	Medium Best-in-class for high-concurrency; Cloudflare's whole point	Medium Nginx base + Lua VM overhead per worker
Learning curve for new engineers	Medium Familiar to most ops engineers; `if` gotchas catch newcomers	Medium ACL syntax has a learning curve but well-documented	High xDS, YAML structure, filter chains, control plane: steep	High Rust async + Tokio + Pingora traits; very steep	Medium Plugin model is approachable; Lua development is niche

4. Fault Tolerance (multi-tech matrix)

Most cells describe how the proxy behaves under partial failure. RTO and RPO conventions are used in HA pair / cluster context.

Dimension	Nginx	HAProxy	Envoy	Pingora	Kong
Process / instance model	Master + workers; worker crash respawned by master	Single process, multi-threaded since 1.8	Single process, multi-threaded with per-thread isolation	Single process, async multi-threaded Rust	Nginx master + workers + Lua VM per worker
Failure detection (upstream)	Passive on request error; active via ngx_http_upstream_check_module or Plus	Active health checks built-in, configurable check intervals, layer 4/7 checks	Active + passive (outlier detection), configurable failure thresholds, statistical eviction	Programmable via Rust callbacks; whatever you implement	Active + passive checks, configurable per upstream
Failover mechanism	Mark upstream down on N failures, retry next in pool	Backup servers, weighted retry, configurable retry-on conditions	Outlier detection ejects host; circuit breaker prevents cascading failures	Application-defined via load balancer callback	Healthchecks + ring balancer; failed nodes excluded automatically
RTO (upstream failure)	1-3s with health checks tuned; 5-10s with defaults	Sub-second with aggressive health check tuning	Sub-second with outlier detection	Depends on implementation, can be sub-second	1-3s typical
RPO (data loss on proxy failure)	In-flight requests lost; clients retry	In-flight requests lost; client retry expected	Same; ext_proc state requires external store	Same; design choice for connection-affinity state	In-flight requests lost; rate-limit counters may lose precision
HA pair / cluster topology	Typically deployed behind keepalived (VRRP) or cloud LB; no native clustering	keepalived VRRP common; HAProxy 2.4+ has limited peer sync	Stateless; rely on external LB or DNS for HA	Stateless; you build the HA topology	Multi-node cluster with shared Postgres or DB-less + config push
Split-brain behavior	N/A — single instance per host; cluster is external	VRRP-based; can split-brain if network partition isolates LBs	Control plane (Istio) handles consistency; data plane is stateless	Application-defined; same constraints as a custom service	Postgres-backed: standard DB consistency; DB-less: last-write-wins per node until sync
Blast radius of single-instance failure	All requests on that instance fail; cloud LB or VRRP shifts traffic	Same; classic VRRP failover (1-3s)	Same; control plane reroutes via xDS	Same; depends on deployment topology	Same; cluster reroutes via shared config
Cross-region failover	External: GeoDNS, anycast, or upstream LB handles it	External: same as Nginx; no native multi-region	Native locality-weighted LB, priority-failover within a single cluster	Application-defined; can implement any policy	Hybrid mode supports multi-region control + local data planes
Data loss scenarios	Stats/access logs lost on crash before flush	Same; configurable buffer flush intervals	Same; admin endpoint stats lost on crash	Same; depends on logger implementation	Postgres-backed config persists; rate-limit counters in-memory may diverge briefly

5. Sharding / Horizontal Scale (multi-tech matrix)

For proxies, "sharding" means how the deployment scales horizontally (multiple instances, traffic distribution, config partitioning). Some rows are reinterpreted from the canonical database sharding axes.

Dimension	Nginx	HAProxy	Envoy	Pingora	Kong
Horizontal scaling model	Stateless replicas behind external LB (cloud LB, DNS round-robin, anycast)	Same; VRRP pairs or N+1 behind L4 LB	Stateless data plane replicas; control plane handles config	Stateless replicas; you build deployment topology	Cluster of stateless data planes; shared Postgres or DB-less config sync
Traffic distribution / sharding key	External: usually IP-hash, round-robin via cloud LB	Source IP hash, header hash, URL hash, leastconn — built-in algorithms	Configurable: ring hash, maglev, round-robin, least-request, locality-aware	Programmable; whatever load balancing logic you implement	Ring balancer with consistent hashing, weighted round-robin
Config partitioning across instances	All instances run identical config; partition by deployment	Same; identical config across cluster	xDS can push partitioned config to subsets of instances	Application-defined; can shard config by region or tenant	Workspaces (enterprise) allow logical partitioning of config
Adding / removing instances	Manual or autoscaling group with health-checked rollout	Same; ASG + health checks	Stateless; ASG-friendly; control plane handles config distribution	Same; stateless instances scale horizontally	Add to cluster, config syncs via DB or hybrid push
Hot upstream / hot path behavior	Per-worker isolation means one hot upstream can saturate one worker; others stay healthy	Threading means hot path is shared; lock contention possible on hot maps	Per-thread isolation; hot upstream affects only threads serving it	Lock-free hot paths; designed to share connections across threads efficiently	Lua VM per worker means hot plugin can saturate a worker; others stay healthy
Practical max instances per cluster	No practical limit; cloud LBs manage thousands	Same; thousands routinely	Istio supports 10K+ sidecars; xDS push storms become a concern past that	Depends on deployment; Cloudflare runs at planetary scale	Postgres-backed Kong: hundreds of nodes; DB-less or hybrid: thousands
Resharding / config change without downtime	Reload spawns new workers; old drain; brief memory spike	Hot reload supported; seamless config swap	xDS pushes config without restart; truly hot	Code change requires redeploy; runtime config can hot-swap	Admin API changes propagate; data plane no restart needed
Cross-shard / cross-instance state	External: Redis/memcached for rate-limit counters, session affinity	peers section for stick-table sync (limited); external for serious state	External: ratelimit service via ext_proc, distributed cache for shared state	Application-defined; you bring your own state store	Redis-backed plugins (rate limit, sessions); Postgres for config

6. State & Config Replication (multi-tech matrix)

For proxies, "replication" maps to two things: (1) configuration replication across data plane instances, (2) runtime state replication (stick tables, rate limit counters, session data).

Dimension	Nginx	HAProxy	Envoy	Pingora	Kong
Config replication topology	File-based; GitOps (Ansible, ConfigMap) pushes to all instances	Same file-based; orchestration handles distribution	xDS control plane fans out to data planes; gRPC streaming	Application choice; you build the distribution mechanism	Postgres (DB mode) or hybrid mode (CP pushes to DPs via mTLS)
Sync vs async config propagation	Async via reload; some seconds between push and effect	Same; or socket-based runtime API for sync updates	Near-real-time async via xDS streaming; SotW or Delta xDS	Async via redeploy or hot reload	DB-backed: 5-10s polling; hybrid: gRPC streaming, near-real-time
Replication factor (config copies)	One per instance; no shared source of truth in OSS Nginx	Same as Nginx	One control plane state, fanned out to N data planes	Application-defined	Postgres = one source of truth; replicated to all DPs
Consistency level (config across instances)	Eventually consistent (depends on rollout speed)	Same	Eventually consistent within seconds; configurable via ADS for strong ordering	Application choice	Eventually consistent; DB mode has stronger ordering than hybrid push
Config propagation lag (typical)	Seconds to minutes (deployment pipeline-bound)	Same	Sub-second via xDS streaming on healthy mesh	Depends on implementation	5-10s DB-backed; sub-second hybrid mode
Runtime state replication (rate limit, sessions)	External: Redis, memcached	peers / stick-tables for limited sync; external for serious workloads	External: ratelimit service, Redis	External: you build it	Redis-backed plugins; per-node counters in DB-less mode
Conflict resolution on config divergence	N/A — single source via GitOps; last-deploy-wins	Same	Control plane is source of truth; data plane has no conflict	Application-defined	Postgres-backed: DB version wins; DB-less: file version wins
Multi-region config replication	External: Git mirroring, multi-region ConfigMaps	Same	Cross-region Istio control planes (multi-cluster mesh) supported	Application-defined	Hybrid mode is purpose-built for this (central CP, regional DPs)
Behavior during control plane partition	N/A — file-based, no live control plane	N/A — same	Data plane continues with last-good config; new config blocked	Application-defined; common: last-good behavior	Data plane continues; admin API unavailable; new policy changes blocked

7. Better Usage Patterns (per technology)

PE-grade patterns most teams miss until on-call. This is where the artifact earns its keep.

Nginx

Pattern	What Most Teams Do Wrong	The Better Way	Why It Matters
Avoid `if` inside `location` blocks	Use `if` to do conditional redirects, rewriting; hit Nginx's broken-by-design behavior	Use `map`, `error_page`, or restructure to multiple `location` blocks	Silent traffic loss when the `if` evaluates differently than expected; "if is evil" is official guidance for a reason
Microcaching for hot paths	Disable caching entirely because backend is "dynamic"	Enable 1-5 second microcache for backend responses; even tiny TTLs absorb traffic storms	Order-of-magnitude reduction in backend load during traffic spikes with negligible staleness
Tune worker_connections per `ulimit -n`	Leave default 1024 and wonder why connections drop at scale	Set `worker_connections` to match system `ulimit -n` minus headroom, set `worker_rlimit_nofile` explicitly	Without this, you cap throughput far below what the box could handle
Separate stats endpoint from public traffic	Expose `/nginx_status` or Prometheus exporter on the same listener as public traffic	Internal listener on a different port, locked to internal IPs; never on the public listener	Avoids information leak and stats endpoint becoming a DDoS amplification surface
Reload-safe upstream changes	Hot-reload during traffic spikes assuming graceful drain	Use Nginx Plus dynamic API or consul-template with quiet windows; avoid reload storms	Reload spawns new workers and doubles memory transiently; on undersized boxes this is OOM territory
Use `proxy_next_upstream` conservatively	Set to retry on all errors and timeouts	Retry only on specific conditions (connection refused, timeout); avoid retrying 5xx by default	Aggressive retries amplify backend outages instead of mitigating them; classic retry storm anti-pattern

HAProxy

Default for pure load balancing workloads where throughput ceiling and sub-ms tail latency matter more than dynamic routing, plugin extensibility, or service mesh integration.

Pattern	What Most Teams Do Wrong	The Better Way	Why It Matters
Use Runtime API instead of reload for dynamic changes	Reload on every config change, lose in-flight connections	Use stats socket or Data Plane API to add/drain servers without reload	Reload is graceful but not free; for high-churn topologies, Runtime API is the right tool
Aggressive health-check tuning for failover	Leave default 2s intervals, accept slow failover	Sub-second intervals (200-500ms) with fall/rise thresholds tuned per backend characteristics	Failover time is dominated by detection time; tuning this is the highest-ROI HAProxy optimization
Stick-tables only for what truly needs them	Use stick-tables for analytics, abuse detection, everything	Reserve stick-tables for session persistence; push analytics to external systems	Stick-tables are in-memory and don't replicate well; misuse leads to memory pressure and data loss on restart
Use ACLs for explicit, scannable routing	Long lists of `use_backend` with embedded conditions, hard to audit	Define named ACLs, then route by ACL combinations; treat ACLs as building blocks	Routing logic becomes auditable, testable, and maintainable; reduces "what does this config actually do" cognitive load
Multi-process mode for SSL offload at scale	Single process for everything	Multi-thread (since 1.8) with thread pinning; or separate HAProxy instances for TLS termination	TLS offload is CPU-bound; isolating it from L7 routing prevents one workload from starving the other
Use observe + on-error for connection-level circuit breaking	Rely only on health checks for failure detection	Add `observe layer7 error-limit N on-error mark-down` per server	Active health checks have detection lag; passive observation catches failures the second they happen

Envoy

Default for service meshes and AI inference gateways where xDS dynamic config, filter chain extensibility, and native OpenTelemetry justify the operational cost of running a control plane.

Pattern	What Most Teams Do Wrong	The Better Way	Why It Matters
Use Incremental (Delta) xDS, not SotW	Stay on default SotW config push for compatibility	Migrate to Delta xDS (default in Istio 1.22+); only changed resources are pushed	Config push storms on large meshes are dramatically reduced; control plane stays responsive under churn
Limit stats cardinality explicitly	Accept default stats sinks with all clusters and routes emitting metrics	Configure `stats_config` with inclusion list of match patterns	Stats subsystem can consume 4-6 GB in large deployments; explicit inclusion lists keep it bounded
Pick the right extension model per use case	Use WASM for everything because it's "the modern way"	Native filter for perf-critical, ext_proc for stateful and frequently-changing logic, WASM for stable performance-sensitive logic	WASM has 20-40% latency overhead per call; using it for hot-path logic is needless cost
Set circuit breakers explicitly per cluster	Leave defaults, which are surprisingly permissive (1024 max connections)	Tune `max_connections`, `max_pending_requests`, `max_retries` per upstream's actual capacity	Circuit breakers prevent cascading failures; defaults will not protect you in a real incident
Outlier detection for fast eviction	Rely on active health checks alone	Configure `outlier_detection` with consecutive_5xx and success_rate thresholds	Active checks have intervals; outlier detection ejects in real-time on observed failures
Lock down the admin endpoint	Expose admin on 0.0.0.0:9901 because tutorials say so	Bind admin to 127.0.0.1 only; expose specific endpoints via secure proxy if needed	Admin endpoint exposes config_dump, stats, and full Envoy state; public exposure is a security incident waiting to happen
Use locality-weighted load balancing	Round-robin across all upstreams regardless of locality	Configure locality_weighted_lb_config with priority-based failover	Cross-AZ or cross-region traffic costs money and adds latency; locality awareness reduces both

Pingora

Default for teams building a custom proxy product in Rust, where the lock-free cross-thread connection pool and memory safety by construction justify the absence of declarative config.

Pattern	What Most Teams Do Wrong	The Better Way	Why It Matters
Treat the framework as a library, not a server	Try to find `pingora.conf` and configure it like Nginx	Embrace the Rust callback model; structure your proxy as a service crate with explicit phases	Pingora's value is in the programmability; fighting it for declarative config defeats the purpose
Leverage shared connection pool deliberately	Treat connection pooling as a black box	Tune pool sizing per upstream; monitor pool stats to verify reuse is happening	The lock-free hot pool is Pingora's headline feature; without monitoring, you can't tell if you're getting the value
Use the Service / Server model for multi-tenant proxies	One monolithic ProxyHttp per use case	Multiple Service instances within one Server, each isolated; share crypto and conn pool	Resource efficiency improves; failure isolation between tenants becomes natural
Async-aware error handling and timeouts	Bubble errors up via `?` without phase-aware context	Capture timeouts and errors per phase (upstream_peer, response_filter, etc.) and return structured errors with status codes	Debugging async chains is hard; structured per-phase errors are the difference between solvable incidents and 4am pages
Cache upstream selection decisions	Recompute upstream selection on every request	Use upstream_peer with internal state to cache routing decisions per session	Reduces per-request CPU; matters most for high-RPS workloads with stable routing
Monitor Tokio runtime metrics, not just request metrics	Watch only request-level p99 latency	Also watch tokio-console / runtime metrics: worker utilization, task wakeups, blocking time	Async runtime issues (blocking on a sync call, task starvation) only show up in runtime metrics until they show up in user-visible latency

Kong

Default for REST API gateways where auth (OIDC, JWT), rate limiting, request transformations, and AI routing are needed from a plugin marketplace rather than built from scratch.

Pattern	What Most Teams Do Wrong	The Better Way	Why It Matters
Use DB-less mode for new deployments	Default to Postgres-backed Kong because tutorials use it	Use DB-less with declarative config (decK + Git); reserve DB mode for legacy multi-node needs	Postgres is a critical dependency that adds operational surface; DB-less is simpler and well-supported in 2026
Order plugins from light to heavy	Default plugin order, heavy plugins (transformations) before light filters (ACL, IP restriction)	Place early-termination plugins (ACL, IP rate limit) first; heavy plugins last	Failed requests should reject fast and cheap; running heavy plugins on requests that get blocked downstream wastes CPU
Profile and limit Lua plugin CPU	Enable any plugin that looks useful; assume Lua is "fast enough"	Profile plugins under representative load; cap concurrent Lua execution per worker	One CPU-bound Lua plugin can saturate a worker and degrade p99 for unrelated traffic on the same worker
Use hybrid mode for multi-region deployments	Run separate Kong clusters per region with manual config sync	Central control plane + regional data planes via Kong hybrid mode (mTLS, gRPC config push)	Single source of truth, faster config propagation, lower operational burden than independent clusters
Reserve workspaces for true tenant isolation	Use workspaces for arbitrary organizational grouping	Use workspaces when teams need RBAC + config isolation; use tags for non-isolated grouping	Workspaces add admin API complexity; using them for cosmetic grouping makes everyday admin tasks harder
Plan for the enterprise upgrade decision upfront	Build on OSS Kong, discover OIDC / advanced rate limit / FIPS requirements later, scramble to upgrade	Evaluate the enterprise feature list during initial architecture; budget for it if requirements are likely	Reactive enterprise upgrades are more expensive than planned ones; pricing pressure increases under deadline

8. Advanced / Next-Gen Alternatives

Where the field is moving. Maturity badges: Production = ready for adoption, Emerging = watch and pilot, Early = bet only if you're the case study.

Nginx

Successor / Alternative	What It Improves	Maturity	Migration Cost	When To Consider
Pingora	Memory safety, connection pool reuse across threads, 70% memory + 60% CPU savings at scale	Emerging	High — Rust rewrite of all custom logic, no config-file path	Edge workloads at planetary scale, especially TLS-heavy; teams with Rust capacity
Angie (fork)	Drop-in Nginx replacement with active development from former Nginx core team; faster cadence on features (HTTP/3, ACME, dynamic upstreams)	Production	Low — wire-compatible, drop-in replacement for nginx.conf	When you want Nginx-the-tool without F5 governance; Russian-company ownership is a factor to evaluate
Freenginx (fork)	Community-led, lighter feature drift from upstream Nginx; security-policy alignment with original developers	Early	Low — drop-in	When you want minimal drift but escape F5 governance; smaller community than Angie
F5 NGINX Ingress Controller 5.x	Direct migration path from community ingress-nginx (which retires March 2026)	Production	Medium — Ingress API mostly compatible but feature differences exist	Kubernetes clusters running ingress-nginx today; this is the official migration path

HAProxy

Default for pure load balancing workloads where throughput ceiling and sub-ms tail latency matter more than dynamic routing, plugin extensibility, or service mesh integration.

Successor / Alternative	What It Improves	Maturity	Migration Cost	When To Consider
Envoy	Dynamic xDS config, service mesh integration, native filter chain, observability	Production	High — different config model, control plane required	When dynamic backends and service mesh become primary requirements
Pingora	Memory safety, programmability for custom proxy products	Emerging	High — Rust rewrite	Building a proxy product rather than deploying a proxy
Kernel-bypass options (DPDK-based, XDP-based)	Sub-microsecond L4 forwarding for ultra-low-latency trading and edge	Emerging	Very High — requires DPDK or eBPF expertise, NIC and kernel coupling	Trading systems and CDN edge where every microsecond counts; specialized hardware and team

Envoy

Default for service meshes and AI inference gateways where xDS dynamic config, filter chain extensibility, and native OpenTelemetry justify the operational cost of running a control plane.

Successor / Alternative	What It Improves	Maturity	Migration Cost	When To Consider
linkerd2-proxy (Rust)	Memory safety, simpler operational model, lighter resource footprint than Envoy sidecar	Production	High — Linkerd mesh is a full alternative, not drop-in	When the Envoy + Istio operational complexity exceeds team capacity; Linkerd is the simpler mesh
Envoy Gateway / Envoy AI Gateway	Higher-level abstractions over raw Envoy, Kubernetes Gateway API support, AI-specific features (ext_proc for prompt processing, token-aware rate limiting)	Emerging	Low — built on Envoy, easier onboarding	Greenfield gateways, especially AI inference gateways; this is where the Envoy ecosystem is investing in 2026
Pingora	Memory safety, programmability, lower per-instance resource cost	Emerging	Very High — Rust rewrite, no mesh integration story	Custom proxy products at scale; not a service mesh replacement
Cilium Service Mesh (eBPF)	Sidecar-less mesh via eBPF, lower overhead per pod, kernel-level enforcement	Emerging	Medium — Cilium CNI replacement, mesh features layered on top	Kubernetes clusters where Cilium is already the CNI; mesh-without-sidecar is an attractive simplification

Pingora

Default for teams building a custom proxy product in Rust, where the lock-free cross-thread connection pool and memory safety by construction justify the absence of declarative config.

Successor / Alternative	What It Improves	Maturity	Migration Cost	When To Consider
ISRG River (built on Pingora)	Higher-level reverse proxy abstraction over Pingora; binary product rather than framework	Early	Low — when ready, intended as a Pingora-powered Nginx alternative	When you want Pingora's memory safety without writing Rust application code
Rama (Tower-based Rust proxy)	Layered middleware model (Tower / tower-http ecosystem); more familiar to Rust web devs	Early	Medium — different API model from Pingora; same Rust foundation	When you prefer Tower's layered middleware over Pingora's callback model
hyper + custom Rust proxy	Maximum flexibility, lowest abstraction; raw HTTP library	Production	Very High — you build everything Pingora gives you for free	Niche custom requirements where Pingora's framework doesn't fit

Kong

Default for REST API gateways where auth (OIDC, JWT), rate limiting, request transformations, and AI routing are needed from a plugin marketplace rather than built from scratch.

Successor / Alternative	What It Improves	Maturity	Migration Cost	When To Consider
Envoy AI Gateway	Native ext_proc integration, per-model rate limiting, deeper Envoy ecosystem integration, no Lua event-loop ceiling	Emerging	Medium — different abstractions; Gateway API for routing	AI inference gateway use cases where Envoy is the architectural fit
Apache APISIX	etcd-based config (no Postgres), claims 2x Kong throughput in benchmarks, fewer enterprise-locked features	Production	Medium — similar plugin model, conceptually similar	When Kong's Postgres dependency or enterprise pricing is a problem; APISIX is the closest direct alternative
Tyk	Go-based, smaller resource footprint, simpler architecture	Production	Medium — different config model	When you want a lighter API gateway with fewer plugin dependencies
Cloud-managed API gateways (AWS API Gateway, Apigee, Azure APIM)	No infrastructure to operate; pay-per-request pricing	Production	Medium-High — vendor lock-in, different config and observability	When operational simplicity trumps cost predictability and feature flexibility; works well for low-to-mid traffic gateways

Best default choices

Search and compare

1. Trade-Offs (per technology)

Nginx

HAProxy

Envoy

Pingora

Kong

2. Use Cases (per technology)

Nginx

HAProxy

Envoy

Pingora

Kong

3. Limitations (multi-tech matrix)

4. Fault Tolerance (multi-tech matrix)

5. Sharding / Horizontal Scale (multi-tech matrix)

6. State & Config Replication (multi-tech matrix)

7. Better Usage Patterns (per technology)

Nginx

HAProxy

Envoy

Pingora

Kong

8. Advanced / Next-Gen Alternatives

Nginx

HAProxy

Envoy

Pingora

Kong