Server-Sent Events (SSE) Protocol

A unidirectional server-to-client streaming protocol over standard HTTP, formalized in the HTML Living Standard as text/event-stream. The protocol that quietly powers ChatGPT's typing effect, Cloudflare's edge AI, and Uber's original real-time map.

Streaming Protocol HTTP-Native Single-Tech Deep Dive

As of 2026-06-09 · Aligned to WHATWG HTML Living Standard, HTTP/2 (RFC 9113), and current EventSource API

PE Verdict

SSE is the most underrated protocol of the AI era. The "WebSocket is strictly better" consensus aged badly the moment LLM token streaming became the default UX: SSE's one-way model, HTTP-native plumbing, and built-in reconnect map directly onto the workload. The trap is treating it as "WebSocket lite". The right framing is: SSE is a streaming primitive over HTTP, and every gotcha (proxy buffering, idle timeouts, the 6-connection cap, no custom headers in EventSource) is a function of that fact, not a defect. Pick SSE when the data flow is one-way, the payloads are text or text-encoded, and you want a Layer-7 streaming abstraction that load balancers, CDNs, and edge runtimes already understand. Reach for WebSockets when bidirectional, binary, or sub-50ms RTT control loops are non-negotiable.

Best default choices

Use SSE for one-way streamsDefault for LLM tokens, notifications, progress, dashboards, and server-driven browser updates Always support resumePair browser reconnect with id and Last-Event-ID backed by a replay buffer Disable proxy bufferingSet text/event-stream, no-transform, heartbeats, and Nginx X-Accel-Buffering: no Use HTTP/2 in productionAvoid HTTP/1.1 per-origin connection caps and load-balancer idle timeout surprises

Overview

Server-Sent Events is a server-push protocol where a client opens a single long-lived HTTP request and the server streams a sequence of UTF-8 text messages over the open response body. The wire format is a tiny line-based grammar (field: value\n ended by a blank line), the MIME type is text/event-stream, and the browser-side API is the synchronous EventSource constructor. Everything else — reconnection, ID tracking, missed-event recovery — is built into the user agent.

SSE was specified in 2009 as part of HTML5 and shipped behind WebSockets in the public imagination for a decade. Two industry shifts inverted the narrative: HTTP/2 multiplexing killed the 6-connection-per-origin cap, and LLM token streaming made unidirectional server push the dominant real-time workload on the web. SSE went from "WebSocket Lite" to the default transport for AI products.

The protocol is intentionally narrow. It does not do binary. It does not do bidirectional. It does not do per-message acknowledgement. What it does is give you a streaming primitive that is indistinguishable from a normal HTTP response to every load balancer, CDN, WAF, and serverless runtime ever built. That single property is why it scales on platforms (edge workers, ALB-fronted clusters, API gateways) where WebSockets either fail outright or carry punitive cost.

Architecture

SSE end-to-end: client opens long-lived GET, proxy must disable buffering, server streams chunked text/event-stream, reconnect handled by browser via Last-Event-ID.

Three architectural facts drive every other property of the protocol:

1. It is plain HTTP. A SSE response is a 200 OK with Content-Type: text/event-stream and a chunked body that never ends. There is no upgrade handshake (unlike WebSockets), no framing layer, no per-message headers. Every L7 device on the path treats it as a normal slow response. That is both its superpower (works everywhere HTTP works) and the source of most production failures (the same path will helpfully buffer, compress, or close it).

2. Reconnection is in the browser, not the application. The EventSource API tracks the last received id: field, and on disconnect it automatically reissues the GET with a Last-Event-ID header. This is enormously useful and a trap: if the server does not implement resumption, the browser will silently restart the stream from scratch on every blip — and in an LLM context, that means billing the user twice for the same generation.

3. The protocol is a streaming primitive, not a messaging system. There is no schema, no backpressure signal, no delivery acknowledgement. Those properties have to be layered on top — typically with a message bus (Kafka, Redis Streams) on the server side and application-level sequence numbers in the payload. Uber's original RAMEN system is the canonical example: it wrapped SSE with a sequence number protocol and per-message ACK over a side channel to get at-least-once delivery semantics.

Core Concepts

Wire format

Every message is a sequence of field: value lines terminated by a single blank line. Lines starting with : are comments (used for heartbeats). Four field names are defined by the spec; everything else is ignored.

: this is a comment / keepalive heartbeat

event: token
id: 42
retry: 3000
data: {"choices":[{"delta":{"content":"Hello"}}]}

event: token
id: 43
data: {"choices":[{"delta":{"content":" world"}}]}

event: done
data: [DONE]

The four protocol fields

Field	Purpose	Client Behavior	PE Notes
`data:`	The payload. Multiple `data:` lines in a single message are concatenated with `\n` on the client.	Becomes `event.data` string. Application must parse if JSON.	Multi-line `data:` is the only way to send a payload containing a literal newline. Most SDKs use a single line per message and JSON-encode anything structured.
`event:`	Event type. Lets the client subscribe to named channels on the same stream.	Dispatched to `addEventListener('foo', ...)` instead of `onmessage`.	Used heavily in LLM streams to separate token deltas from tool calls from errors. OpenAI uses event types like `response.output_text.delta`, `response.tool_call.delta`, `response.completed`.
`id:`	Sequence identifier. Stored by the browser as the "last event ID seen".	Sent back on the next reconnect via the `Last-Event-ID` HTTP header.	The only piece of state the protocol gives you for resumption. If you skip it, reconnect == restart. Most LLM APIs do NOT emit `id:` by default and instead implement resumption via application-level request IDs.
`retry:`	Reconnection delay in milliseconds.	Overrides the default 3000ms backoff in the browser for the rest of the session.	Underused. Set this high (10-30s) for non-critical streams to avoid reconnect storms after a regional outage. Server can emit a single `retry:` early in the stream and never again.

Required HTTP headers

HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache, no-transform
Connection: keep-alive          // HTTP/1.1 only
X-Accel-Buffering: no            // Nginx specific, disables response buffering

no-transform is the underused one. Without it, intermediaries can gzip-buffer your stream and you will see events arrive in 15-second bursts in production while working perfectly on localhost.

Execution Model

The lifecycle is a single state machine on each side:

Phase	Client Side	Server Side	Failure Mode
1. Connect	`new EventSource(url)` sends GET with `Accept: text/event-stream`. Sends `Last-Event-ID` if a prior session left one.	Returns 200 with `text/event-stream` MIME and a chunked body. Reads `Last-Event-ID` header to position the cursor.	Non-2xx response closes the stream permanently (no auto-retry on 4xx/5xx). The "permanently" is the trap.
2. Stream	Parses incoming chunks line by line. Fires `onmessage` (or named event) on each blank-line-terminated block.	Writes events to the response body and flushes after each one. Sends a comment line periodically as a heartbeat.	Forgetting to flush means events queue in the application buffer. Forgetting to heartbeat means the load balancer closes the idle TCP connection (ALB: 60s, Cloudflare: 100s, GCP LB: 600s).
3. Disconnect	TCP error or non-200 status. `readyState` goes to `CONNECTING`. `onerror` fires.	Server-side cleanup must happen on the disconnect signal. Without it, you leak per-connection resources (Redis subscriptions, goroutines, file handles).	The disconnect signal in long-lived requests is notoriously delayed in some HTTP stacks. Node.js fires `req.on('close')`; Go's `ctx.Done()`; Python ASGI `request.is_disconnected()`. All have edge cases.
4. Reconnect	After `retry:` interval (default 3000ms), automatically reissues GET with `Last-Event-ID` header.	Reads the ID, replays from the appropriate position in the source-of-truth buffer (Kafka offset, Redis Stream ID, DB cursor, LLM resume token).	If the server does not implement resumption, the user experiences a "Groundhog Day" stream that restarts on every blip. For LLMs this doubles billing.
5. Close	`eventSource.close()` for explicit termination. Once closed, no reconnect.	Server signals end by closing the response. Convention: emit a `event: done` or `data: [DONE]` sentinel just before closing so the client knows it was intentional.	No graceful close in the protocol. The sentinel pattern is essential to distinguish "stream completed" from "stream interrupted".

Critical Gotcha The EventSource constructor cannot set custom headers. There is no Authorization header on the initial GET. Production deployments work around this with (a) cookies, (b) query-string tokens, or (c) fetch-based polyfills like Microsoft's @microsoft/fetch-event-source which uses fetch() + a manual SSE parser to get full header control.

Feature Reference

The twelve protocol-level features that define SSE, each with the PE-grade nuance most surface-level guides miss.

Feature	What It Does	How It Works	PE Nuance
Unidirectional data flow (server → client)	One-way streaming from server to browser. No upstream channel after the initial GET.	The connection is a single HTTP request that the server never finishes responding to. Client cannot push data on the same connection.	If you need to send anything from the client (e.g. cancel a stream), you use a separate HTTP request to a sibling endpoint. This is fine — most products discover they didn't actually need bidirectional after they built it. WebSocket users frequently overpay for "bidirectional" they never use.
Text-based data format (UTF-8)	All payloads are UTF-8 text. No native binary support.	The protocol parses the response body line-by-line as text. Binary bytes break the parser.	For small binary payloads, base64 in a JSON field works fine. For real binary streams (video, audio, large blobs), you are using the wrong protocol. The "no binary" objection is overweighted — most real-time data is JSON anyway.
Native HTTP foundation (`text/event-stream`)	Plain HTTP response with a specific MIME type. No upgrade handshake.	Server returns 200 OK with `Content-Type: text/event-stream`. The body is a long-lived chunked response.	This is the single biggest reason SSE survived. CDNs, WAFs, API gateways, edge runtimes, OAuth proxies, and corporate firewalls all understand HTTP. WebSockets require explicit configuration on every hop, and many serverless platforms simply don't support them.
Open, long-lived persistent connections	The connection stays open indefinitely while the server has data to send.	Server holds the response body open and writes events as they occur. Client treats it as a normal HTTP response that never terminates.	"Indefinitely" is a lie at the infrastructure layer. AWS ALB closes idle connections at 60s by default; Cloudflare at ~100s; Heroku at 30s. Heartbeats are non-optional in production. The protocol has no concept of "long enough" — every load balancer disagrees.
Automatic client-side reconnection	Browser automatically reconnects when the stream drops.	On TCP error or 5xx, the browser waits `retry:` ms then reissues the GET. No application code required.	A feature and a footgun. Without server-side resume support, "auto-reconnect" means "auto-restart-from-scratch". For idempotent streams (live tickers) this is fine. For LLM token streams it means re-billing. Always pair with Last-Event-ID handling.
Configurable reconnection delay (`retry:`)	Server can specify how long the browser waits before reconnecting.	One `retry: 5000` line tells the browser to wait 5s on next disconnect. Persists for the session.	Underused. Default 3s means a regional outage produces a thundering herd of reconnects. Setting `retry:` to 10-30s with jitter on the client side is the right move for any stream with more than a few thousand connections. The spec doesn't define jitter — bolt it on yourself.
Message identification tracking (`id:`)	Each event can carry an ID. The browser remembers the last one seen.	The browser stores the last `id:` value and exposes it via `Last-Event-ID` on reconnect.	The ID is opaque to the browser — anything that uniquely identifies a position in your stream works (Kafka offset, Redis Stream ID, monotonic counter, timestamp+seq). Don't use random UUIDs for this; you need them to be ordered or addressable in your replay buffer.
Missed message recovery (`Last-Event-ID` header)	On reconnect, browser sends the last seen ID so server can resume.	HTTP header on the reconnect GET. Server reads it, queries its replay buffer, streams from that position.	Resumption requires a server-side replay buffer with retention that exceeds your worst-case disconnect duration. Redis Streams (XADD/XREAD with MAXLEN) is the standard pattern; Kafka with a topic-per-stream works at scale; in-memory ring buffer works for ephemeral streams.
Custom event grouping (`event:`)	Logical channels multiplexed on one connection via named event types.	`event: progress` dispatches to `addEventListener('progress', ...)` on the client. Default is `message`.	Use this for protocol versioning and for separating error/metadata from data. OpenAI emits ~40 distinct event types on a single stream to disambiguate text deltas, tool calls, audio chunks, citations, and lifecycle markers. Treating everything as `data:` and switching on a JSON field works but loses static analyzability.
Multi-line data payloads	A single event can have multiple `data:` lines concatenated with `\n`.	Parser joins consecutive `data:` lines until the blank-line terminator. `\n` inserted between them.	Rarely needed since most payloads are JSON-encoded on a single line. Useful when emitting raw markdown or pre-formatted text where the newlines matter for rendering. Watch out: some HTTP/2 implementations have buffered chunks at 16KB boundaries, so a single giant multi-line event can stall.
Low protocol-overhead streaming	Minimal framing — no per-message headers, no opcodes, no masking.	Wire format is plain text with field names. A token-sized event is ~30-50 bytes overhead vs ~6-14 for WebSocket frames.	Overhead is higher than WebSockets per message but the protocol is much simpler to terminate at the proxy. On HTTP/2 the per-stream overhead is amortized via HPACK header compression and TCP-level multiplexing. The "overhead" objection is mostly theoretical at modern message sizes.
Built-in heartbeat capability (comment lines)	Lines starting with `:` are ignored by the parser, can be used as keepalives.	Server writes `:\n\n` (or `:heartbeat\n\n`) every N seconds. Keeps the TCP connection and intermediary state alive without triggering an event on the client.	Set the interval to ~50% of the most aggressive intermediary timeout in your path. For AWS ALB (60s), use 15-20s. The cost is one TCP packet per interval per connection; at 100K connections with 15s heartbeats, this is ~6,700 packets/sec — fits on a single moderate instance. The opposite failure (no heartbeat) creates a silent stall that's very hard to debug.

Trade-Offs

The structural tensions in SSE — what you gain, what you give up, and when it actually bites you on-call.

Trade-Off	What You Gain	What You Give Up	When It Bites You	PE Nuance
HTTP-native plumbing vs feature ceiling	Works through every CDN, WAF, L7 LB, edge runtime, and corporate proxy with zero special config; deploys on serverless platforms where WebSockets fail outright.	No binary, no bidirectional, no compression negotiation, no per-message ACK; everything beyond text streaming must be layered on.	When PM asks for "in-stream client input" (e.g. a "stop" button or live edit) and you realize you need a second endpoint for the back-channel, breaking your clean single-connection model.	Most teams discover they don't actually need the missing features; the cost of WebSocket complexity (sticky LB, idle timeouts, framing bugs, retries) often outweighs the bidirectional benefit.
Unidirectional simplicity vs bidirectional power	Server is the only actor with state on the stream; trivial to fan out, replay, and reason about; no concurrent read/write races.	Any client→server interaction during the stream needs a separate request; latency floor of ~1 round trip on the side channel.	Building collaborative editing, multiplayer games, or sub-50ms control loops; the side-channel pattern adds 50-200ms RTT per interaction and breaks your latency budget.	For 80% of real-time workloads (notifications, tickers, LLM streams, dashboards, progress bars), unidirectional is genuinely sufficient. The remaining 20% is where WebSockets earn their complexity.
Browser-managed reconnect vs control over replay semantics	Zero application code for connection recovery; handles cellular handoff, sleep/wake, and tab switching transparently.	If you don't implement Last-Event-ID server-side, "reconnect" means "start over" — silently and indefinitely.	First production outage where users see "duplicate" LLM responses because the stream restarted at every TCP hiccup; finance dashboards re-running the same query 50x/minute under flaky wifi.	The browser's auto-reconnect is a feature only if you also build server-side resumption. Skipping the resumption layer turns the feature into an exponentially-growing cost source. Always pair the two.
Text-only payloads vs binary efficiency	Human-readable wire format; trivial to debug with `curl -N`; gzip compression remains effective even with deeply nested JSON.	~33% size penalty for base64-encoded binary; can't stream raw protobuf, audio frames, or compressed snapshots.	Real-time audio/video pipelines where every kilobyte and millisecond matter; high-frequency tick data where binary protobuf would halve bandwidth costs.	For non-audio/video workloads, the JSON-over-text penalty is overstated. HTTP/2 HPACK + gzip absorbs most of it. If you're at the scale where this matters, you're probably also at the scale where you should be running gRPC streaming or WebTransport anyway.
Long-lived connections vs serverless economics	Cloudflare Workers, AWS Lambda response streaming, Vercel Edge Functions all natively support SSE; no Durable Object or persistent process required.	Each open stream consumes a request slot for the lifetime of the connection; long streams (LLM generation, batch processing) can hit per-invocation limits.	Free-tier Cloudflare Worker hits the 30s CPU limit mid-LLM-stream; AWS Lambda response streaming caps at 15 min hard.	Streaming inverts the serverless cost model. You pay for wall-clock holding the connection, not just compute. For long streams (>5 min), Durable Objects, ECS, or always-on backends are often cheaper than streaming Lambda. Model the cost before committing.
Per-origin connection cap (HTTP/1.1) vs HTTP/2 mandatory	On HTTP/2, multiplexed streams over a single TCP connection — default 100 streams, no per-origin tab limit.	On HTTP/1.1 you're locked to 6 connections per origin per browser; tabs 7+ block. WebSockets bypass this entirely.	Power user opens 8 dashboard tabs, connections 7 and 8 silently hang; debugging surfaces only via screen recording from the user. (Real failure reported by multiple AI products.)	HTTP/2 is non-optional for production SSE. If your infra is still HTTP/1.1 at the LB, you're shipping a broken experience to power users. The fix is infra-side, not code-side, which makes it easy to miss in code review.
Protocol simplicity vs no built-in flow control	No backpressure protocol to implement; client TCP backpressure naturally throttles a slow consumer.	If the consumer is slower than the producer, events queue in the server's send buffer indefinitely. Server has no way to know the client is falling behind.	High-frequency market data stream with 10K events/sec hitting a mobile client on cellular; server buffer grows to GB, OOM, restart, every client reconnects in a thundering herd.	Application-level coalescing is the answer. Don't send every tick; debounce or batch on the server side. The protocol gives you no help here — it's a design discipline question, not a feature question.
EventSource simplicity vs no custom headers	Two-line client code: `new EventSource(url); es.onmessage = ...`; built into every modern browser.	Cannot set Authorization, X-API-Key, or any custom header on the initial GET; auth must use cookies or query-string tokens.	Bearer-token auth pattern (standard for APIs) doesn't work; teams either compromise security with tokens in URLs (logged everywhere) or abandon EventSource for fetch-based polyfills.	The Microsoft `@microsoft/fetch-event-source` polyfill is the standard escape hatch; it gives you fetch-level header control with EventSource-shaped parsing. For LLM products, this polyfill is effectively mandatory.

Use Cases

Concrete production deployments where SSE was the deliberate choice over WebSockets, polling, or push notifications.

Use Case	Company / Scenario	Driving Property	Scale Dimension	Why Not Alternative
LLM token streaming	OpenAI, Anthropic, Google Gemini, every wrapper product	Sub-200ms time-to-first-token; perceived latency dominates real latency; one-way data flow matches generation semantics.	~5-50 tokens/sec per stream; tens of millions of concurrent streams during peak; 10s-10min duration per stream.	WebSockets add upgrade handshake + framing overhead with no benefit (generation is unidirectional). HTTP polling adds RTT per token (unacceptable). gRPC streaming requires Connect-Web or grpc-web bridges in the browser.
Edge AI inference	Cloudflare Workers AI, Vercel AI SDK, Fly.io edge inference	Compatibility with stateless serverless runtimes; no Durable Object required; native ReadableStream → Response pattern.	200+ edge data centers; 4,000% YoY growth in inference requests reported by Cloudflare in 2025.	WebSockets need stateful workers (Durable Objects) which add latency and cost; gRPC bidirectional streaming isn't supported in most edge runtimes.
Real-time market data (read-only)	TradingView Data API, brokerage dashboards, crypto exchanges' public ticker feeds	One-way price updates; battery-efficient on mobile; no need for client→server messaging on the data path.	10-20 symbols per stream typical; sub-second update cadence; thousands of concurrent retail viewers per popular asset.	WebSockets are used when subscribe/unsubscribe must be on the same connection (active trading); SSE wins when the subscription set is fixed at connect time. Many platforms offer both.
Live delivery/driver tracking	Uber (original RAMEN system), DoorDash, Instacart customer apps	Server-driven location updates at 4s cadence; at-least-once delivery via sequence numbers; works through carrier proxies.	1M+ location requests/sec at Uber scale; tens of millions of concurrent tracked deliveries.	Uber's RAMEN initially shipped on SSE because WebSocket support across the carrier landscape in 2015 was uneven; later migrated to gRPC bidirectional streams for the driver side as the network stack matured.
Long-running job progress	eBay/Amazon seller portals, GitHub Actions logs, CI/CD pipeline UIs, AWS CloudFormation events	Progress bar without polling; natural fit for stateless backend (job state lives in a DB/queue); resumable on tab refresh.	Thousands of concurrent uploads per seller; minute-to-hour duration; sparse update cadence (1-10/sec).	Polling wastes RTT and DB load on idle jobs; WebSockets are overkill for unidirectional progress; SSE with Redis Streams as the replay buffer is the canonical pattern.
Notifications and activity feeds	Mastodon, Slack-clones, social activity streams, Linkedin notification dot	Always-on connection; auto-reconnect handles mobile background-to-foreground; broadcast-friendly with a pub/sub backend.	10K-1M concurrent subscribers per server; very sparse messages (often <1/min/user).	WebSockets work but add complexity for a strictly one-way notification path; mobile push handles app-closed but doesn't help the in-app feed.
MCP server transport for LLM tools	Model Context Protocol implementations on Cloudflare Workers, Anthropic's MCP servers, third-party MCP integrations	SSE is the default MCP transport; integrates with HTTP-native edge environments; built-in resumption for long-running tool calls.	Per-agent connections to tools; expected to grow with agentic AI adoption.	stdio transport is local-only; WebSockets would require explicit support in every MCP runtime; SSE was chosen by the MCP spec for HTTP compatibility.

Limitations

The ceilings and constraints — and the workarounds teams actually use in production.

Limitation	Severity	Workaround	Workaround Cost
EventSource cannot set custom headers (no Authorization header)	High	Use `@microsoft/fetch-event-source` polyfill, cookie-based auth, or query-string tokens.	Polyfill adds ~3KB bundle and bypasses native browser auto-reconnect logic (you reimplement it); query-string tokens leak in logs.
6 connections per origin on HTTP/1.1	Critical	Deploy HTTP/2 or HTTP/3 end-to-end (browser → LB → app); single TCP connection multiplexes 100+ streams.	Infra-side fix; requires ALB/CloudFront/nginx to terminate HTTP/2 and pass to backend correctly. Common to miss in older clusters.
No binary payloads	Medium	base64-encode small binary in JSON data field; switch to WebSocket / WebTransport / gRPC for true binary streams.	~33% size overhead from base64; CPU cost on both ends; if "small" grows to "large" (audio frames), the encoding becomes the bottleneck.
Proxy buffering breaks real-time delivery	Critical	Set `X-Accel-Buffering: no` response header; configure `proxy_buffering off` on nginx; disable gzip for `text/event-stream`.	Each intermediary needs configuration. Works on localhost, fails in production (the canonical "delayed bursts" bug). Add to deploy checklist or it bites you eventually.
Load balancer idle timeout closes "idle" SSE connections	High	Server-side heartbeat (comment line `:keepalive\n\n`) every 15-20s; configure LB timeout to be longer than your maximum quiet period.	Extra packets per connection per interval; at 1M concurrent connections with 15s heartbeats, ~67K packets/sec just for keepalive.
No backpressure / flow control	High	Application-level coalescing (debounce, batch, sample); server-side per-connection queue with bounded size + drop policy.	Lossy by design; requires UX decisions about which updates can be dropped (price tick: drop OK; LLM token: must not drop).
No native message acknowledgement	Medium	Implement application-level sequence numbers + side-channel ACK endpoint (Uber's RAMEN pattern).	Doubles the request count; complicates the protocol; client must implement gap detection. At Uber's scale, justified; for most products, overkill.
Connection state is per-server (no built-in clustering)	High	Pub/sub backend (Redis pub/sub, Kafka, NATS) so any app server can deliver any user's events; sticky sessions only for the connection itself.	Adds one network hop per event; introduces a backend dependency that becomes the new bottleneck for fan-out scale.
Browser puts SSE connections to sleep in background tabs	Medium	Use Page Visibility API to detect background, switch to less-frequent polling or pause; for cross-tab dedup, use BroadcastChannel + SharedWorker.	Adds significant client-side complexity; SharedWorker has its own browser quirks (Safari support is partial).
Mobile network handoffs and OS-level connection killing	High	Aggressive reconnect with exponential backoff + jitter; rely on Last-Event-ID for resume; treat reconnection as the normal case, not an exception.	Increases server load from reconnect events; requires server-side resumption buffer with retention longer than typical handoff (30s+).

Fault Tolerance

What happens when things go wrong — and the operational reality versus the spec.

Dimension	Behavior	Operational Reality
Replication model	N/A at protocol level — SSE is a transport. Replication lives in the message bus (Kafka, Redis Streams) or DB behind the stream.	The transport gives you no durability guarantees. Whatever durability you have is what the backing bus provides. Plan accordingly.
Failure detection	TCP RST or read timeout on the client; `req.on('close')` or equivalent on the server.	Server-side detection can lag 30-60s on cellular networks (TCP retransmit budget); client-side detection is immediate. Don't rely on server-side disconnect for billing decisions.
Failover mechanism	Browser automatic reconnect after `retry:` interval (default 3s); DNS or LB routes to a healthy backend.	"Failover" means "reconnect to whatever LB sends you to next". If that server doesn't know your session, you've effectively lost it unless you have a shared replay buffer.
RTO (typical)	3-10s for the client-side reconnect; 0s on the server if a sibling instance is healthy.	For LLM streams, 3s feels like an eternity to the user. UX should show a reconnecting state rather than freezing.
RPO (typical)	0 with Last-Event-ID + replay buffer; full session loss without resumption.	Most teams underbuild this. The default "0 events lost" only holds if your backing bus has retention covering your worst-case disconnect duration. Tune Redis Streams MAXLEN or Kafka retention accordingly.
Split-brain behavior	Not applicable — there is no consensus state in the protocol. Two clients reconnecting to two servers each get whatever those servers can replay.	Split-brain shows up at the message bus level. If Kafka loses a leader and re-elects, you may see duplicate events across the partition; the SSE layer has no idea.
Blast radius of single-node failure	All connections on that node drop; clients reconnect; LB routes to surviving nodes.	Reconnect storm is the real failure mode. Without retry jitter, 10K dropped clients hit a single replacement instance within ~3s and overload it. Always randomize reconnection on both server `retry:` and client side.
Cross-region failover story	DNS-level (Route 53 / Cloudflare GeoDNS); clients reconnect to the new region.	Cross-region resumption requires a globally-replicated replay buffer; few teams build this. Most accept session loss on regional failover and design the UX to recover gracefully.
Data loss scenarios	(a) Server crashes before flushing buffered events; (b) Client disconnects between event emission and TCP ACK; (c) Replay buffer evicts old IDs before client reconnects.	Mitigation: flush after every event, treat the bus as source of truth (not in-process state), set replay retention > max realistic disconnect duration (5-10 min for mobile).

Sharding

SSE doesn't shard data — it manages connections. "Sharding" here is about connection placement and fan-out topology.

Dimension	Behavior	Operational Reality
Sharding model	Connection-affinity sharding — clients are placed on a server based on routing rules (sticky session by user ID, consistent hash on subscription topic, geo-region).	Most teams default to "no sharding" until they hit C10K problems; then they reach for sticky sessions or per-topic sharding. The right model depends on whether you fan out broadly (any server can deliver to anyone) or narrowly (a topic lives on one server).
Shard key constraints	Typically user ID (sticky LB) or topic/channel ID (for broadcast streams). Must be derivable from the connection request (URL path or auth context).	You set this at the LB or service mesh layer. Wrong choice (hashing on something with low cardinality) creates hot shards. Picking user ID is safe but loses any fan-out efficiency for broadcast events.
Rebalancing mechanism	None at the protocol level. Rebalancing means draining connections from one node and letting clients reconnect (browser auto-reconnect handles this).	"Drain" pattern: stop accepting new connections, send a polite `retry: 30000` to existing ones, wait, then shut down. Most LBs (ALB, Envoy) have connection draining built in but you must trigger it explicitly.
Rebalancing cost / impact	All affected clients reconnect; brief blip of ~3-10s per client; replay buffer must serve their resumption.	At scale (100K+ connections per node), draining a node creates a reconnect spike. Always rebalance during low-traffic windows or use slow drain (over minutes).
Hot-shard behavior	If one topic has 90% of subscribers and lives on one node, that node bottlenecks. CPU saturates on event serialization or memory exhausts from per-connection buffers.	The fan-out pattern that solves this: separate the "connection-holder" tier (stateless, scales horizontally) from the "event source" tier (Kafka/Redis); any connection-holder can deliver any event. This is the standard architecture for Slack, Discord, and post-RAMEN Uber.
Maximum connections (practical)	~10K-100K concurrent connections per modest Linux instance (subject to ulimit, kernel sysctl, GC pressure of your runtime).	Reported scaling: a tuned Node.js instance hits ~50K idle SSE connections per box; Go and Rust easily reach 100K+; the limit is rarely the protocol itself, it's the message-bus fan-out fanning to all of them in real time.
Resharding without downtime?	Yes — drain pattern is graceful. Browser auto-reconnect plus Last-Event-ID replay yields zero observable disruption to users if implemented correctly.	"If implemented correctly" hides a lot. The teams that get this right have invested in replay buffers, retry jitter, and capacity headroom for reconnect spikes. The teams that don't see thundering herds.
Cross-shard event delivery	Via the pub/sub backend. Any node subscribed to the relevant topic gets the event and can deliver to any of its connections.	This is where the architecture earns its complexity. The pub/sub layer (Redis, Kafka, NATS, Pulsar) becomes the new bottleneck. At 1M+ events/sec, choose carefully — Redis pub/sub is fastest but lossy; Kafka is durable but adds latency.

Replication

SSE has no native replication. Replication means how events are propagated from the source through the message bus to all relevant connection-holders.

Dimension	Behavior	Operational Reality
Replication topology	Fan-out via pub/sub backend. Common patterns: Redis pub/sub (fastest, lossy), Redis Streams (durable, replay-capable), Kafka (durable, ordered), NATS JetStream (durable, low-latency).	The choice is dominated by replay needs and fan-out scale. For LLM token streams (must not lose tokens, single consumer per stream): Redis Streams. For broadcast (many consumers, OK to drop): Redis pub/sub. For audit-grade: Kafka.
Sync vs async	Always async from the source's perspective; the connection-holder asynchronously reads from the bus and pushes to SSE.	End-to-end latency budget: source → bus → connection-holder → TCP → client. Each hop adds 1-10ms typically. For sub-100ms feel, every hop matters.
Replication factor	Determined by the bus, not SSE. Kafka default 3, Redis Streams 1 (single shard), Redis Cluster replicates per shard.	Single-replica Redis Streams is a common production anti-pattern — fast and clean until you lose the node. Either accept the data loss explicitly or move to a replicated alternative.
Consistency level options	Not exposed at the SSE layer. Determined by the source-of-truth and how the connection-holder reads from it.	The default-and-most-common pattern: the connection-holder reads its own pub/sub subscription, so any event published while the connection is active is delivered. Events published before the connection (or during disconnect without resumption) are not.
Replication lag (typical)	1-10ms for in-memory pub/sub within a region; 10-100ms for durable bus (Kafka).	For LLM streams the lag is invisible against per-token model inference time (50-200ms). For high-frequency trading it dominates the budget and pushes teams toward in-process fan-out instead of bus-based.
Conflict resolution	N/A — SSE delivers events in arrival order from the bus. No multi-writer conflict at the transport.	Conflict resolution lives upstream of the bus. If you have multiple writers (multiple LLM workers generating the same response), you must dedup or pick a single source before publishing to the SSE topic.
Cross-region replication	Via cross-region bus replication (Kafka MirrorMaker, Redis Enterprise Active-Active, Confluent Cluster Linking).	Adds 50-200ms of cross-region replication lag. For interactive streams, most products keep users pinned to a region (geo-DNS) and only replicate state for failover, not steady-state.
Replication during partition	Determined by the bus. Kafka: prefers consistency (minority partition rejects writes); Redis Streams in cluster mode: minority unavailable; Redis pub/sub: silently drops.	"Silently drops" is the trap with vanilla Redis pub/sub. During a network partition, subscribers on the wrong side simply stop receiving events with no error. Test this explicitly with chaos engineering before relying on it.

Better Usage Patterns

The PE-grade patterns that separate "SSE works on localhost" from "SSE works under load with proxies and flaky mobile clients."

Pattern	What Most Teams Do Wrong	The Better Way	Why It Matters
Heartbeat every 15-20s with comment lines	Skip heartbeats; assume connections stay alive indefinitely.	Server writes `:keepalive\n\n` on a timer set to ~50% of the most aggressive intermediary timeout (AWS ALB 60s → heartbeat every 20s).	Without heartbeats, connections silently die at the LB after ~60s of quiet, triggering reconnect storms. Symptoms look like "the app randomly disconnects" with no log signal — among the hardest production bugs to debug.
Server-side resumption with Last-Event-ID + replay buffer	Emit events without IDs; treat reconnect as restart-from-scratch.	Emit a monotonic `id:` on every event (Redis Stream ID, Kafka offset, or DB cursor); on reconnect, read `Last-Event-ID` header and replay from that position; size the replay buffer to cover your worst-case disconnect (5-10 minutes for mobile).	For LLM streams, restart-on-reconnect means re-billing the user for the same generation. For market data, it means an inconsistent view (skipped or duplicated ticks). The resumption pattern is the single biggest production differentiator.
Disable proxy buffering and compression on the stream path	Default nginx/CloudFront settings; gzip enabled globally.	Set `X-Accel-Buffering: no` response header; `proxy_buffering off` in nginx; `Cache-Control: no-cache, no-transform`; exclude `text/event-stream` from gzip.	Without these, events arrive in 15-30s bursts in production while working perfectly on localhost. This is the canonical "works on my machine" bug for SSE.
Use HTTP/2 end-to-end and lean on multiplexing	Terminate HTTP/2 at the LB then HTTP/1.1 to the backend; ignore the per-origin 6-connection cap.	HTTP/2 from browser to backend; one TCP connection multiplexes up to 100 SSE streams per origin per tab; capacity planning shifts from connections to streams.	The 6-conn cap is invisible during development and breaks for power users with multiple tabs. Once you go HTTP/2, the cap disappears and the protocol becomes practical for multi-stream UIs (Slack-style notification + activity + presence on one page).
Use named events for protocol versioning	Stuff everything into `data:` with a JSON discriminator field.	Use `event:` field for type (token, tool_call, error, done, ping); reserve `data:` for the payload. Use a generic `onmessage` handler only when you genuinely don't care about types.	Named events give you cheap protocol evolution (add a new event type without breaking existing parsers); they're easier to filter and observe in tooling; and they map directly to typed handlers in TypeScript clients.
Set `retry:` explicitly with jitter on the client	Accept the 3000ms default; rely entirely on the browser's reconnect cadence.	Emit `retry: 15000` early in the stream; in client code (especially fetch-based polyfills) add ±50% random jitter so 10K clients don't reconnect in lockstep after an outage.	3s reconnect with no jitter means a regional outage produces a thundering herd at exactly t+3s. Jittered reconnects spread the load over the retry window and let the recovered system handle the spike.
Treat the message bus as source of truth, not the connection	Generate events inside the SSE request handler; lose them on disconnect.	Always publish to a bus (Redis Streams, Kafka) first; the SSE handler reads from the bus and forwards. Generation continues even if the client disconnects.	For LLM streams, this is the only way to avoid wasted inference cost: the model keeps generating into the bus, the client can reconnect and pick up. For progress streams, it lets you survive worker restarts mid-job.
Coalesce / batch events under high frequency	Forward every backend event to the client 1:1.	Aggregate by time window (every 100ms) or count (every 50 ticks); send a single composite event; show the user the latest, not every interim state.	Without coalescing, a 10K-event/sec source overwhelms slow clients (cellular, low-power devices), creating server-side queue buildup and eventual OOM. Coalescing is a UX call dressed as an engineering call — most users see no difference.
Use a fetch-based SSE polyfill for auth-protected streams	Stuff auth tokens in query strings to work around EventSource's header limit.	Use `@microsoft/fetch-event-source` or equivalent; gain full header control, ability to set Bearer tokens, ability to send POST bodies for the initial request.	Tokens in URLs leak everywhere (server logs, browser history, referer headers, CDN access logs). The polyfill cost is ~3KB and reimplementing reconnect logic, both well worth it for any auth'd stream.
Emit an explicit completion sentinel before closing	Just close the TCP connection when done.	Send `event: done` or `data: [DONE]` as the last message; flush; then close. Client distinguishes "finished cleanly" from "connection lost".	Without a sentinel, the client cannot tell if a stream ended successfully or was cut off; reconnect logic kicks in needlessly, wasting cycles and confusing users with a "reconnecting..." flash after a completed stream.

Advanced / Next-Gen Alternatives

What's emerging that improves on SSE for specific use cases — and when to consider migration.

Successor / Alternative	What It Improves	Maturity	Migration Cost	When To Consider
WebSockets	True bidirectional, binary frames, lower per-message overhead, no proxy-buffering issues.	Production at scale	High — different client API, different LB config, sticky sessions, no native auto-reconnect.	When the client legitimately needs to send messages on the same connection (real-time collaboration, multiplayer games, voice/video signalling, financial order entry).
HTTP/2 server push	Server-initiated streams for ancillary resources alongside the main response.	Effectively deprecated	High and getting higher — Chrome removed support in 2022; Firefox followed.	Don't. Server push was a different problem (resource preloading) misread as a streaming solution; the browsers walked it back. SSE survived; server push didn't.
gRPC server-streaming / bidirectional streaming	Strongly-typed protobuf payloads, binary, multiplexed streams, lower overhead per message, ecosystem of generated clients.	Production at scale	High — requires Connect-Web, grpc-web, or gRPC-Gateway to work in browsers (no native gRPC over HTTP/2 in browsers); mobile clients prefer native gRPC.	For internal service-to-service streaming, gRPC is strictly better than SSE. For browser-facing, the bridging cost rarely justifies the upgrade unless you're already deep in protobuf.
WebTransport (over HTTP/3 + QUIC)	Native bidirectional streaming over UDP/QUIC; unreliable datagrams plus reliable streams; sub-RTT connection establishment; survives network changes.	Emerging	Very high — Chrome and Edge only as of 2026; no Safari; no Firefox stable; no broad CDN support.	For applications where mobile network handoffs and 0-RTT reconnect matter (live streaming, cloud gaming, AR/VR sync). Not yet ready as a default; watch for the next 2-3 years.
MoQ (Media over QUIC)	Subscription-based pub/sub over QUIC; designed for low-latency video and large fan-out; explicit cache hierarchy.	Standardization in progress (IETF)	Very high — protocol still in draft; few production deployments.	For media streaming at huge fan-out scale (sports, live concerts, gaming streaming). Overkill for text and JSON workloads where SSE already wins.
Long polling	Predecessor pattern. Works everywhere SSE works (and where SSE doesn't, e.g. some ancient corporate proxies that strip chunked encoding).	Production fallback	Low — implementation is trivial; reconnect logic is trivial.	As a fallback for environments where SSE genuinely doesn't work; for very-low-frequency notification systems (1 event per minute), the simplicity may win over the per-event overhead.
Resumable HTTP streaming over Redis Streams	Application-level pattern, not a new protocol. SSE on the wire, Redis Streams as the durable replay buffer, monotonic stream IDs as `id:` values.	Production-proven	Low — additive to existing SSE; just adds resumption.	Always, for any SSE stream where you cannot tolerate restart-on-reconnect. This is the canonical LLM streaming architecture and the right default for new SSE deployments.

Production Case Studies

How five categories of company actually use SSE — the architecture, the scale, the optimizations they shipped, and the structural reason they chose SSE over WebSockets.

OpenAI, Anthropic, Google — LLM Token Streaming

Use Case: LLM token streaming Scale: 10M+ concurrent streams Optim: Sub-200ms TTFT

The OpenAI Chat Completions API, the Responses API, the Anthropic Messages API, and Google's Gemini streaming endpoint all use SSE as the default wire transport when stream=true is set. The protocol shape is consistent across vendors: a single GET (or POST with an SSE response body) returns a series of events, each data: field carrying a JSON delta. OpenAI's format emits data: {"choices":[{"delta":{"content":"The"}}]} chunks token-by-token until a final data: [DONE] sentinel; Anthropic uses named events (event: content_block_delta) for richer routing on the client.

Why SSE and not WebSockets: LLM generation is fundamentally unidirectional — the model produces, the client consumes. There is nothing for the client to send on the same connection. SSE matches the workload exactly. Layering WebSockets on top would add the upgrade handshake (extra RTT), framing overhead (negligible but real), and infrastructure pain (sticky sessions, idle timeouts harder to manage) for zero benefit. The HTTP-nativeness also means OpenAI's API works through every corporate proxy, every CDN, every fetch-compatible client without special configuration.

Optimizations shipped in production:

(1) Multiple event types per stream for granular client logic — text deltas, tool call deltas, audio chunks, citations, lifecycle markers (response.created, response.completed, response.failed) are all separate event: types. This lets clients build typed handlers and skip events they don't care about.

(2) Server-side resumption via opaque request IDs, not Last-Event-ID directly. OpenAI's previous_response_id mechanism on the Responses API lets you resume from a prior response, which is structurally similar to Last-Event-ID but works across full disconnections and across sessions.

(3) Aggressive heartbeat tuning for the long-tail of slow generations. When a generation pauses (e.g. while a tool call resolves), the API emits a periodic comment line to prevent intermediaries from closing the connection.

(4) Explicit usage chunks at the end: setting stream_options: {include_usage: true} appends a final event with token counts before the [DONE] sentinel, letting clients display billing data without a separate request.

(5) Pre-published failure-mode documentation — every vendor explicitly tells you about the proxy buffering trap and the X-Accel-Buffering header. They learned the hard way and now bake it into the docs.

Why not WebSockets: Adds bidirectional capability the workload never uses; adds upgrade handshake latency; adds sticky-LB complexity; breaks on serverless and edge runtimes; bypasses the HTTP feature set (caching, logging, observability) every team already has. For LLM streaming, SSE is structurally the right answer and the entire industry has converged on it.

Cloudflare — Edge AI Streaming and MCP Transport

Use Case: Edge AI inference streaming Scale: 200+ data centers, 4,000% YoY inference growth Optim: Stateless serverless fit

Cloudflare Workers AI runs inference at 200+ edge data centers. Workers are stateless by design: any persistent state requires Durable Objects (a separate, more expensive primitive). The standard pattern for AI streaming on Workers is to convert the model's ReadableStream output into an SSE response via a TransformStream, and return it as a normal Response object. The entire connection lifecycle fits within a stateless Worker invocation because the Worker holds the connection open for as long as the stream runs — no need for Durable Objects, no per-connection state.

Why SSE and not WebSockets: WebSockets on Cloudflare Workers require Durable Objects to hold the bidirectional connection state. Durable Objects cost more (10x+ for a hot connection), are pinned to a specific colo location (losing the "run close to the user" benefit), and add operational complexity. SSE sidesteps all of this: the connection lives inside a stateless Worker, and any Worker in any colo can serve it. This is why Cloudflare's MCP server template uses SSE as the default transport — it's the only sane choice for edge-native multi-tenant runtimes.

Optimizations shipped:

(1) Native ReadableStream → Response pattern — no framework needed. The Workers AI binding returns a ReadableStream; the Worker wraps it in new Response(stream, {headers: {'content-type': 'text/event-stream'}}). Total code is single-digit lines.

(2) Pricing model rewards SSE — Workers bill on CPU time, not wall-clock. An SSE stream waiting on the upstream model accrues almost no CPU cost while the stream is open; this makes long-running streams economically viable in a way that traditional per-second compute would not.

(3) AI Gateway integration — Cloudflare's AI Gateway product can transparently proxy SSE streams from any upstream LLM provider (OpenAI, Anthropic, Workers AI itself), adding caching for repeated prompts, rate limiting, and observability without breaking the streaming contract.

(4) Compatibility with OpenAI SDK — Workers AI emits the exact SSE wire format OpenAI uses, so any OpenAI-compatible client library works unchanged.

(5) HTTP/2 and HTTP/3 by default — Cloudflare's edge speaks both, so the 6-connection-per-origin cap is a non-issue for any client behind Cloudflare's CDN.

Why not WebSockets: WebSockets on serverless are fundamentally an impedance mismatch — they assume long-lived stateful endpoints, while serverless assumes stateless ephemeral invocations. Cloudflare's solution (Durable Objects) works but is a separate, more expensive primitive. SSE eliminates the impedance entirely.

Robinhood, TradingView — Market Data Streaming

Use Case: Live tickers, market data feeds Scale: Sub-second updates, thousands of concurrent viewers per symbol Optim: Battery-efficient mobile reads

The financial industry is a mixed-protocol space: actively trading users hit WebSocket APIs (because they need to send orders on the same connection as receiving quotes), but read-only market data — ticker dashboards, watchlists, price-only widgets — is increasingly served over SSE. TradingView's Data API exposes both WebSocket and SSE endpoints for the same data, with SSE explicitly recommended for "one-way server-to-client data flow" because of its battery efficiency on mobile and simpler implementation. Robinhood's mobile clients are reported to use WebSockets for the active trading path (where order entry shares the connection) but the read-only ticker and watchlist updates fit the SSE shape.

Why SSE for read-only feeds: A ticker widget that just displays prices has zero need for an upstream channel. Using WebSockets here would mean paying for bidirectional capability that's never exercised, plus the operational tax of WebSocket-specific LB config, plus the battery drain of an active WebSocket on mobile (which is non-trivial compared to a quiescent SSE connection with TCP keepalives).

Optimizations shipped:

(1) Symbol multiplexing per connection — instead of one SSE connection per symbol (which would exhaust the 6-connection cap on HTTP/1.1), one connection subscribes to a comma-separated list of symbols passed in the URL. The server fans out from a Kafka topic per symbol.

(2) Server-side coalescing — high-frequency tick data is debounced at the server before being pushed; clients receive at most ~10 updates/sec per symbol regardless of underlying tick rate. This keeps mobile clients from being overwhelmed and prevents queue buildup on the server.

(3) JWT auth via query parameters — because EventSource can't set headers, JWTs are passed in the URL with short TTLs (typically 5-15 minutes). The client refreshes the token via a sibling endpoint and reconnects when needed.

(4) Kafka-backed replay — each symbol is a Kafka topic; the SSE handler reads from a consumer offset that corresponds to the client's last seen ID. Reconnects resume from the right offset, which keeps the price ladder consistent even through brief disconnects.

(5) Selective dropping under load — for ticker streams, dropping interim ticks is fine (the latest is what matters); for order book deltas, dropping is not OK and the system either keeps up or disconnects the client.

Why not WebSockets: For read-only feeds, WebSockets add cost without benefit. The trading platforms that use WebSockets do so because they need order entry on the same connection. The pure-display use case (watchlist, public ticker, price chart) lines up cleanly with SSE.

Uber, DoorDash, Instacart — Real-Time Delivery Tracking (RAMEN)

Use Case: Live driver/courier location updates Scale: 1M+ location requests/sec at Uber Optim: At-least-once via sequence numbers

Uber's RAMEN (Real-time Asynchronous Messaging Network) was originally built on SSE, and it's one of the most-cited real-world SSE production architectures. The driver app pushes GPS coordinates to Uber's backend; the rider app subscribes to those coordinates over an SSE connection at /ramen/receive?seq=N. The server holds the connection open and streams location updates as they arrive from the driver. Heartbeats run at 4-second intervals (one byte per heartbeat) to keep the connection alive across carrier proxies. The "seq" query parameter implements Last-Event-ID semantics at the application layer.

Why SSE originally: In 2015, WebSocket support across mobile carriers and corporate proxies was unreliable. SSE's HTTP-native shape meant it worked through every carrier, every proxy, every NAT — wherever HTTP worked, SSE worked. The carriers were the binding constraint, not the protocol theory. Uber's later migration to gRPC bidirectional streams (for the driver side) happened only after the network stack matured and after gRPC tooling matured enough to handle mobile.

Optimizations Uber shipped on top of SSE:

(1) Application-level sequence numbers — every message carries a seq field. The client reconnects with ?seq=N in the URL, telling the server where to resume. This is Last-Event-ID semantics implemented in the URL because they wanted explicit control over the format.

(2) 4-second single-byte heartbeats — aggressive heartbeat cadence designed for cellular networks where idle connections die fast. The one-byte size minimizes battery and bandwidth impact (millions of devices * one byte per 4s adds up).

(3) At-least-once delivery via TCP + sequence checks — Uber relies on TCP for in-order delivery and uses the seq number to detect gaps; if a gap is detected, the client reconnects with the last-seen seq.

(4) Geographic region sharding — drivers are mapped to geographic regions; SSE subscribers receive only events for relevant regions. This prevents the fan-out problem of "every rider sees every driver".

(5) Eventual migration to gRPC bidirectional for the driver side once the workload outgrew unidirectional semantics (drivers needed to receive dispatch commands on the same connection). The rider-tracking path, which is genuinely one-way, stayed on SSE for longer.

DoorDash and Instacart use architecturally similar patterns. Modern DoorDash documentation explicitly cites SSE as one of two acceptable transports (the other being WebSockets) for location streaming, choosing per-product based on whether the bidirectional capability is needed.

Why not WebSockets (in 2015): Carrier and proxy support was uneven; the operational tax of debugging "works on home WiFi but not on T-Mobile" outages was much higher than the cost of SSE's protocol limitations. By the time WebSockets became universally reliable (~2018+), Uber had built so much application infrastructure around SSE that the migration cost was high; they moved selectively where the bidirectional benefit justified it.

eBay, Amazon — Batch Upload and Bulk Operation Progress

Use Case: Progress bars for long-running merchant operations Scale: Thousands of concurrent uploads per portal; minute-to-hour duration Optim: Stateless backend + sparse update cadence

Seller-facing portals on eBay and Amazon Seller Central handle high-volume bulk operations: spreadsheet uploads with thousands of product configurations, bulk price updates, inventory imports, image processing jobs. The work happens asynchronously on a backend job queue; the seller's browser needs to show a live progress bar without polling the backend every second per active job. SSE is the canonical pattern here: the seller opens an SSE connection to a progress endpoint keyed by the job ID, and the backend pushes progress events as the job advances.

Why SSE and not polling: Polling at 1-second intervals across thousands of concurrent jobs creates significant database load and wastes most of the requests (most polls find no change). SSE inverts this: the server pushes only when there is actual progress, often very sparse (1-10 events per minute for a slow job). For the seller, the UI feels live without the backend cost of polling.

Why SSE and not WebSockets: Progress streams are one-way by nature (the server reports, the client displays). WebSockets add complexity for nothing in return. SSE's HTTP-nativeness also means the progress endpoint composes cleanly with the existing seller portal authentication (cookie-based), with no special LB config.

Optimizations shipped:

(1) Job state in DB, SSE handler reads from change stream — DynamoDB Streams, Postgres LISTEN/NOTIFY, or Redis Streams provide change notifications; the SSE handler subscribes and forwards. This makes the SSE layer purely a transport, with all durability in the DB.

(2) Resumable from any point via the job ID; reconnecting reads the current job state from the DB and replays missed progress events. The Last-Event-ID pattern maps to a sequence number stored on each progress event.

(3) Sparse heartbeats for slow jobs — a job that's slow but not stuck still sends a heartbeat every 30s so the connection survives LB timeouts and the seller doesn't see "Connection lost".

(4) Completion sentinel — every job emits a final event: complete with the result summary (rows imported, errors, warnings) before closing the stream; the client knows the job finished cleanly.

(5) Multi-job multiplexing — a single SSE connection can carry events for multiple jobs the seller has running; the event: field carries the job ID, and the client dispatches updates to the right UI element.

Why not polling: Across the seller-portal scale, polling-based progress would multiply the read load on the job-state DB by 60x (one read per second per active job) for no UX benefit; SSE reduces that to ~1 read per actual progress event, often a 50-100x reduction in DB load.

Best default choices

Search this guide

Overview

Architecture

Core Concepts

Wire format

The four protocol fields

Required HTTP headers

Execution Model

Feature Reference

Trade-Offs

Use Cases

Limitations

Fault Tolerance

Sharding

Replication

Better Usage Patterns

Advanced / Next-Gen Alternatives

Production Case Studies

OpenAI, Anthropic, Google — LLM Token Streaming

Cloudflare — Edge AI Streaming and MCP Transport

Robinhood, TradingView — Market Data Streaming

Uber, DoorDash, Instacart — Real-Time Delivery Tracking (RAMEN)

eBay, Amazon — Batch Upload and Bulk Operation Progress