Kafka vs Kinesis vs Pulsar

Three distributed logs, three storage philosophies, three operational regimes. The cells below explain when each one stops being the right answer.

Streaming · Head-to-Head

As of 2026-06-01 · Kafka 4.3.0 · Kinesis OD-Advantage · Pulsar 4.0 + Oxia

PE Verdict

Storage coupling is the master variable. Kafka couples (highest throughput, painful rebalance), Pulsar separates (elastic, three-tier ops cost), Kinesis hides (predictable bill, no tuning surface).

The decision rarely turns on throughput. It turns on topic count, headcount, and how many AZs you're willing to pay AWS to cross.

Best default choices

01 · Trade-Offs

One row per distinct trade-off. A trade-off is something where you give up X to get Y. "Fast" is not a trade-off. Tables sort by clicking the column header.

Kafka 4.3 · Trade-Offs

Trade-OffWhat You GainWhat You Give UpWhen It BitesPE Nuance
Coupled compute + storage on broker Highest throughput-per-dollar at sustained load; zero-copy sendfile path; locality wins for hot reads. Adding capacity requires moving partition data; rebalancing is a planned ops event. A failed broker on a 50TB cluster takes hours to fully re-replicate. Cruise Control automates it, the bytes still move. PETiered storage (KIP-405) softens but does not remove this. Hot tier still on broker SSD; rebalance only copies hot tier.
Pull-based consumer model Free backpressure; consumer controls rate; replay from arbitrary offset is trivial. Baseline poll latency adds 1-5ms on idle/low-volume topics. Low-QPS topic with sub-5ms SLO. Tune fetch.min.bytes and fetch.max.wait.ms. PELong-polling masks the latency for most workloads. Pull is the correct default; push systems pay this cost in flow-control complexity instead.
Partition as the parallelism unit Total order per partition; clear ownership; one consumer per partition gives easy correctness. Cannot scale parallelism above the partition count without re-keying everything. Under-partitioned topic at year 2 needs full producer re-key migration to add consumers. PEOver-partition early. Cost is per-partition overhead (controller load, replication threads); cost of under-partitioning is offline migration.
KRaft replaces ZooKeeper (4.0+) Single-system operation; partition ceiling jumped from ~200K to 1M+; faster failover. Newer code path than ZK; controller bugs at extreme partition counts still surfacing through 4.x series. Edge cases at >500K partitions where controller fetch saturates. KIP-1219 (4.3) added tuning knobs. PE4.3 removed the ZK migration path entirely. Upgrade story is cleaner now but 4.x is still the maturing release line.
Transactional EOS Atomic multi-partition writes; read-process-write pipelines without app-level dedup. ~20% throughput overhead; coordinator state to manage; longer recovery on coordinator failover. Hung transactions blocking partition progress. KIP-890 (partition verification) closed the worst patterns. PEEOS is "exactly-once within Kafka", not "across systems". The application is still responsible for idempotency at every external sink.
Tiered storage (KIP-405) Unbounded retention at S3 prices; broker disk stays small; rebalance becomes cheap. Cold-read latency increases 10x or more; remote log manager adds operational surface. Incident replay of week-old data saturates broker fetch threads and blocks live traffic. PEPer-topic config. Only enable on high-retention topics. KIP-1235 (4.3) fixed default ISR for metadata topic, was a foot-gun before.
ISR durability model Tunable durability via min.insync.replicas; strong by default with RF=3. Write availability tied to ISR size; an AZ blip can shrink ISR below min and reject writes. Regional network event drops a follower below min.insync; producers see write rejections on healthy partitions. PECanonical config is RF=3, min.insync.replicas=2, unclean.leader.election.enable=false. The "lose availability vs lose data" knob is here.
JVM-based broker Massive ecosystem; mature tools; well-understood tuning; every APM vendor speaks JMX. GC tuning matters; PageCache discipline required; off-heap allocations need deliberate config. G1 pauses at high throughput cause producer timeouts. PageCache thrashing under multi-topic load. PEProvision 2x active topic working set as page cache headroom. Consider ZGC for low-pause requirements. Redpanda exists because of this.

Kinesis Data Streams · Trade-Offs

Trade-OffWhat You GainWhat You Give UpWhen It BitesPE Nuance
Fully managed serverless Zero broker operation; auto-scaling without ops review; AWS-grade availability SLA. Per-GB markup; no protocol-level optimization; no visibility into broker behavior. Sustained 10TB/hour where cost crosses self-managed Kafka by 3-5x even including headcount. PETCO-with-headcount wins for Kinesis below ~5-10 TB/day. Above that, the per-GB premium dwarfs the on-call savings.
Shard as capacity + billing unit Predictable per-shard limits: 1 MB/s in, 2 MB/s out, 1000 records/s. Easy math. Hard ceiling. No oversubscription. Cannot tune around hot keys. A skewed partition key throws ProvisionedThroughputExceededException with no application-level recourse. PEPick partition key with care. High-cardinality random suffix is the workaround, then group downstream.
On-Demand auto-scale to 2x prior 30d peak Survives normal traffic spikes without manual resharding; "set and forget" sizing. Does not handle cold-start spikes >2x or 5x sudden ramps; scale-up window is ~15 min. Black Friday cold-start; new product launch traffic; viral event. Writes throttle silently. PEPre-warm by writing synthetic traffic before known events. Or stay on provisioned mode for known-spiky workloads.
AWS-only deep integration Firehose to S3/Redshift in one click; Lambda event source mapping; native KMS, IAM, CloudWatch. Vendor lock-in; cross-cloud replication requires custom plumbing. Multi-cloud strategy mandates or post-M&A integration with non-AWS shop. PEKCL is open source. Some portability via Kafka-on-Kinesis tools but operationally expensive. Build assuming you stay on AWS.
At-least-once delivery semantics Simpler producer/consumer code; no transactional state to manage; smaller surface. No exactly-once. Dedup is your job, in your consumer, every time. Payments, financial event sourcing, idempotency-critical pipelines. Retry storms generate duplicates. PECanonical pattern: DynamoDB conditional write keyed on the Kinesis SequenceNumber. Costs a DDB call per record.
365-day max retention Long enough for most audit, replay, and ML feature-store windows. Tiered retrieval pricing. Hard ceiling. Cannot retain forever. Long-term retrieval is charged separately per GB. Regulatory 7-year retention; ML retraining on multi-year data; full event-sourcing replay. PEPair Kinesis with Firehose-to-S3 archive for true long-term. Treat Kinesis as the working window, S3 as the cold tier.
Enhanced Fan-Out (EFO) Push-based; per-consumer dedicated 2 MB/s; sub-second latency; bypasses shared shard egress. Per-consumer-hour charge; 20 consumer cap on Standard, 50 on OD-Advantage. Many-consumer fan-out (ML feature store with N readers, multi-team analytics). PEPool consumers behind a single EFO subscription; fan out further downstream via SNS or in-app routing.
KCL/KPL client libraries Best-in-class checkpoint management, lease balancing, automatic resharding handling. JVM-heavy; complex to operate; KCL 2.x checkpoint semantics non-trivial. Non-JVM language teams. Debugging lease races. Checkpoint races on consumer restart. PEKCL 3.x improvements ongoing. For simple workloads consider direct SDK + manual checkpointing in DynamoDB.

Pulsar 4.0 · Trade-Offs

Trade-OffWhat You GainWhat You Give UpWhen It BitesPE Nuance
Stateless brokers + BookKeeper storage Brokers scale in seconds; storage scales independently; failure isolation between layers. Three stateful systems to operate (brokers, bookies, Oxia). Upgrade choreography. Bookie under-replication alerts. Cross-component performance debugging. PEElasticity win is real. Ops headcount is ~1.5x Kafka's. Decision is whether you want to pay for that elasticity.
Segment-based replication via BookKeeper ensemble Topic data spreads across many bookies; no broker-pinned partition; faster recovery from bookie loss. Write path adds bookie quorum hop; 2-3ms latency penalty vs Kafka's direct local write. Sub-5ms p99 SLOs. Kafka's tuned local SSD path is hard to beat for raw latency. PEWorth it for elasticity and multi-tenancy. Not worth it for single-tenant low-latency workloads.
Native multi-tenancy (tenant / namespace / topic) 1M+ topics per cluster; per-namespace quotas, retention, replication, authentication. Mental model complexity; up-front design of tenant/namespace boundaries. Teams skipping namespace design end up with messy isolation 18 months in. PESaaS platforms get isolation free. Single-tenant workloads pay the complexity tax without the benefit.
Both streaming and queueing in one system Four subscription modes (Exclusive, Shared, Failover, Key_Shared) cover queue and stream patterns. API complexity; mode is per-subscription; misconfiguration is silent. Subscription mode mismatch produces either out-of-order delivery or single-consumer bottleneck. PEKey_Shared for ordered parallelism. Shared for queue-style work. Exclusive for "consumer group" semantics. Document the choice per topic.
Push-based consumer with flow control Lower idle latency than pull; broker manages dispatch rate; less network chatter. Backpressure is the consumer's job via receiveQueueSize; getting it wrong is silent. Slow consumer with high receiveQueueSize causes broker memory pressure and topic backlog. PETune receiveQueueSize to working set, not "max". Monitor broker dispatcher metrics, not just consumer lag.
Native geo-replication Configured at namespace level; async or sync; survives full-region loss with no MirrorMaker. Instance-level configuration store (extra coordination layer) at the global scope. WAN brownout causes replication lag spikes; rare-path failover code seldom drilled. PEAsync is the default and correct choice. Sync only for low-write financial workloads where DC loss must not lose committed data.
Oxia replaces ZooKeeper (4.0+) Million-topic ceiling; faster metadata operations; cleaner ops than ZK. Newest piece in the stack. Less production-tested than ZK or KRaft. Split-brain edge cases not yet fully mapped at the longest tail of failure scenarios. PERun 5-node Oxia ensemble minimum. Pin versions carefully through the 4.x line until major-version stability is proven.
Tiered offload (S3 / GCS / Azure) Unbounded retention; offload entire ledgers transparently; reads pass through broker via offloader. Cold-read latency penalty; offloader plugin tuning surface; per-namespace policy. Year-old data replay during incident pulls hard on bookies plus S3 simultaneously. PESchedule offload during off-peak windows. Size broker managed-ledger cache for hot working set.

02 · Use Cases

Real deployments with named drivers. The Driving Property is the one thing that pinned the choice; everything else is consequence.

Kafka · Use Cases

Use CaseCompany / ScenarioDriving PropertyScale DimensionWhy Not Alternative
Event sourcing + CDC backbone LinkedIn, Stripe, Shopify Unbounded retention + sub-10ms p99 at sustained TB/hour ~7 trillion events/day at LinkedIn; 100s of MB/s sustained Kinesis: 365-day retention ceiling. Pulsar: ecosystem still catching up on CDC connectors (Debezium is Kafka-native).
Real-time analytics pipelines Netflix, Pinterest, Uber Sub-10ms producer-to-consumer p99 at 100+ MB/s sustained 1.6M msg/s at tier-1 banks; 100s of brokers per cluster Kinesis: 70-200ms floor disqualifies. Pulsar: viable but Flink/Spark integration leans Kafka.
Microservice async backbone Uber, Airbnb, Amazon Prime Connect ecosystem + 17 language clients + community tooling 1000+ topics, 100s of services, cross-team contracts Kinesis: cross-region support requires custom plumbing. Pulsar: client coverage gaps for non-JVM polyglot orgs.
Log aggregation Datadog ingest, internal logging pipelines Cost-per-GB at TB/hour ingest with multi-day retention PB-scale ingest; 7-30 day retention Kinesis: per-GB cost untenable at this volume. Pulsar: team familiarity tax, no clear architectural win.
Financial event sourcing with EOS Robinhood-style trade pipelines, ledger systems Multi-partition atomic writes via transactional producer 10s of K msg/s with strict idempotency Kinesis: at-least-once requires app-layer dedup table per consumer. Pulsar: dedup is simpler but no multi-partition atomicity.
ML feature pipelines Uber Michelangelo, Spotify ML, Pinterest Replay-from-offset + Connect ecosystem + Streams DSL 1000s of feature topics; weeks of replay window Kinesis: replay window capped; KCL checkpoint races on consumer restart. Pulsar: viable but tooling gap on ML side.
Stream processing input layer Flink jobs, Spark Streaming, Kafka Streams Native partition model is the de facto stream processing primitive 100s of topics feeding 10s of processors Kinesis: Flink Kinesis connector works but partition mapping is awkward. Pulsar: Flink connector mature but ecosystem leans Kafka.

Kinesis · Use Cases

Use CaseCompany / ScenarioDriving PropertyScale DimensionWhy Not Alternative
Mobile / IoT ingest AWS Mobile SDK shops, consumer IoT vendors Cognito auth + zero broker ops + AWS-native PKI 100K-1M devices; bursty per-device traffic Kafka: PKI/auth integration for device fleets is a project. Pulsar: same plus newer to mobile SDK landscape.
CloudWatch Logs streaming Enterprise AWS shops, observability pipelines Native CloudWatch subscription filter integration Account-wide log volume; TB/day Kafka: integration is build-it-yourself. Pulsar: same.
Clickstream into Redshift / S3 E-commerce on AWS, content platforms Firehose 1-click sink to S3/Redshift/OpenSearch 10K-100K events/sec sustained Kafka: at <1TB/day TCO including headcount loses by 3-4x. Pulsar: same TCO problem at smaller scale.
Lambda fan-out trigger Serverless-first AWS shops Native event source mapping with parallelization factor 1000s of invocations/sec per stream Kafka: MSK Lambda trigger exists but brittle on rebalance. Pulsar: no native Lambda trigger.
Startup MVP with streaming need Early-stage product with no streaming engineer on staff 0.05 FTE ops cost; ship before raising Series A <100 GB/day; 5-10 microservices Kafka: needs a dedicated platform engineer. Pulsar: same plus less hiring pool.
Bounded retention compliance Healthcare, payments, audit trails on AWS 365-day retention out of the box; HIPAA / PCI-eligible Regulatory window; not throughput-driven Kafka: same retention possible via tiered storage but compliance docs are your problem. Pulsar: similar.
Real-time ML inference pipeline SageMaker-based inference pipelines Native SageMaker + Lambda + DynamoDB integration 10K-100K inference req/sec Kafka: SageMaker doesn't integrate natively; you build the bridge. Pulsar: same.

Pulsar · Use Cases

Use CaseCompany / ScenarioDriving PropertyScale DimensionWhy Not Alternative
Multi-tenant SaaS event platform Yahoo Japan, Tencent, Splunk Observability Native tenant/namespace isolation at 100K+ topics 1M+ topics per cluster; 1000s of tenants Kafka: topic ceiling + isolation needs ACL gymnastics or cluster sprawl. Kinesis: no topic abstraction at this scale.
Geo-replicated event bus 3+ regions Iterable, Verizon Media (formerly Yahoo) First-class async geo-replication configured per namespace 3-5 active regions; cross-region failover drilled Kafka: MirrorMaker 2 works but adds an ops layer with its own failure modes. Kinesis: no primitive; build custom Lambda fan-out.
IoT platforms with millions of device topics Comcast video platforms, large telco IoT Million-topic ceiling; one topic per device or session 10M+ devices; topic-per-device Kafka: partition ceiling and per-topic overhead make this uneconomic. Kinesis: stream-per-device is API-call-limited.
Unified queue + stream workloads Iterable email/SMS infrastructure, marketing platforms Shared subscription for queue semantics plus offset replay for stream 1000s of queues with stream replay needs Kafka: no queue semantics natively (Share Groups in 4.x still maturing). Kinesis: no queue semantics, no replay primitive.
Financial transaction processing Tencent Billing (migrated from Kafka) Multi-region sync replication with strong durability per ledger Per-tenant ledger isolation at 100s of tenants Kafka: multi-tenancy + sync geo-replication forces cluster-per-tenant sprawl. Kinesis: no cross-region primitive.
AI agent platforms with per-session memory Emerging agentic AI platforms; per-agent or per-session topic isolation Topic-per-session at >100K sessions; isolation + lifecycle per namespace 100K-1M concurrent sessions; short-lived topics Kafka: per-topic broker overhead at million scale. Kinesis: stream-per-session is API-cost-prohibitive.
Mixed retention per tenant Analytics SaaS with tier-based retention SLAs Per-namespace retention policy; trial users get 7d, enterprise gets 365d 10s of tiers; 1000s of customers Kafka: retention is topic-level; would require topic-per-tier. Kinesis: retention is stream-level with same problem.

03 · Limitations

Cross-tech limitation matrix. Where each system is constrained, how badly, and what the workaround costs you.

Limitation Axis Kafka 4.3 Kinesis Pulsar 4.0
Topic / partition ceiling Medium
~1M partitions per KRaft cluster (4.3 tested). Controller load grows with partition count. Workaround: cluster sharding (multi-cluster). Cost: ops surface multiplies.
High
Stream is the unit, not topic. Per-account shard limits (20K in US-East/West/Ireland, 6K elsewhere). Workaround: limit increase ticket. Cost: AWS support latency.
Low
Designed for 1M+ unique topics per cluster (Oxia-backed). Workaround: N/A at most scales.
p99 latency floor Low
~5ms achievable end-to-end at sustained 100MB/s+. Workaround: N/A. This is the floor.
High
~70-200ms end-to-end typical. EFO improves read, not write. Workaround: EFO + KPL aggregation. Cost: per-EFO-consumer fee.
Medium
~5-15ms typical. Bookie quorum hop adds 2-3ms vs Kafka. Workaround: tune ensemble + Qack. Cost: reduced durability margin.
Multi-region cost High
MirrorMaker 2 ops burden + cross-region egress. Workaround: Confluent Cluster Linking. Cost: vendor license.
Critical
No native cross-region replication. Workaround: Lambda cross-region forwarder. Cost: custom code + per-record Lambda invocation.
Low
Native per-namespace geo-replication. Workaround: N/A.
Hot key / partition skew Medium
Hot partition saturates one broker leader. Workaround: custom partitioner + key salting. Cost: downstream re-grouping.
High
1 MB/s hard cap per shard, no exception. Workaround: random suffix + reshard. Cost: reshard latency, downstream complexity.
Medium
Hot topic concentrates broker load. Workaround: Key_Shared + manual bundle split. Cost: tuning effort.
Vendor lock-in Low
Open source protocol; many distributions. Workaround: N/A.
Critical
AWS-only; proprietary API and SDKs. Workaround: Kafka-on-Kinesis proxy. Cost: performance hit, build effort.
Low
Open source; KoP for Kafka client compatibility. Workaround: N/A.
Ecosystem breadth Low
17+ official clients; every APM, ETL, and stream processor speaks Kafka. Workaround: N/A.
Medium
Strong AWS-native, thin elsewhere. Workaround: KCL + custom adapters. Cost: integration work.
Medium
6 official clients; growing but smaller than Kafka. Workaround: KoP proxy for Kafka clients. Cost: translation hop.
Operational FTE cost High
~0.5-1 FTE at 10 TB/day self-managed. Workaround: Confluent Cloud or MSK. Cost: 2-3x infra premium.
Low
~0.05 FTE; AWS owns ops. Workaround: N/A.
High
~0.75-1.5 FTE for the three components. Workaround: StreamNative Cloud. Cost: managed-service premium.
Storage retention ceiling Low
Unbounded via tiered storage. Workaround: N/A.
High
365-day hard ceiling. Workaround: Firehose to S3 archive. Cost: dual ingest pipeline.
Low
Unbounded via tiered offload. Workaround: N/A.
Throughput ceiling Low
Linearly scales with brokers. ~605 MB/s on commodity cloud nodes. Workaround: N/A.
Medium
OD: 2 GB/s in, 4 GB/s out per stream (with limit raise). Workaround: stream sharding at app level. Cost: routing logic.
Low
Comparable to Kafka in tuned benchmarks; bookie scaling is independent. Workaround: N/A.
Exactly-once semantics Low
Native EOS within Kafka via transactional producer + read_committed. Workaround: N/A.
Critical
At-least-once only. Workaround: DDB conditional write on SequenceNumber. Cost: per-record DDB call.
Low
Producer-side dedup per namespace. No multi-partition atomicity but simpler model. Workaround: N/A for single-partition.

04 · Fault Tolerance

How each system survives the failure modes that show up in real on-call rotations.

Dimension Kafka 4.3 Kinesis Pulsar 4.0
Replication model Leader-follower per partition, ISR-based. Default RF=3 with min.insync.replicas=2. Hidden. AWS-internal multi-AZ; documented as "synchronously replicated across 3 AZs". No tuning surface. BookKeeper ensemble per ledger. Write quorum and ack quorum configurable per namespace (e.g., E=3 W=3 A=2).
Failure detection KRaft controller heartbeats; replica.lag.time.max.ms drives ISR shrink. AWS-internal. You see PutRecords failures and ProvisionedThroughputExceeded, not node-level failure. ZK/Oxia session timeout for broker; bookie health checks; broker dispatcher heartbeat to client.
Failover mechanism Controller elects new partition leader from ISR. Producer metadata refresh triggers reconnect. Transparent to clients. SDK retries are automatic; shard reassignment is invisible. Topic ownership reassigned to surviving broker (stateless, instant). Bookie failure triggers BookKeeper auto-recovery.
RTO (typical) 10-30s for partition leader failover. Longer if controller is the failing node. Seconds for shard reassignment. RTO from AWS-internal failures is not user-visible. Sub-second for broker topic ownership transfer (stateless). Seconds-to-minutes for bookie quorum reformation.
RPO (typical) Zero with acks=all + min.insync.replicas=2. Up to last unacked batch otherwise. Zero for acknowledged PutRecord calls. AWS commits synchronously across 3 AZs. Zero with E=3 W=3 A=2. Configurable per namespace.
Split-brain behavior KRaft Raft prevents split-brain by design. ZK era could see brief stale-leader windows. N/A at user level. AWS-internal concern. Oxia leader lease prevents broker-side split-brain. BookKeeper ledger fencing prevents storage-side split-brain.
Blast radius of single-node failure All partitions led by that broker re-elect. Followers continue serving. Cross-broker work for new leaders. Single-shard or single-AZ failure invisible to users. AWS handles it. Broker failure: topics shift to others (cheap). Bookie failure: only ledgers in its ensemble affected (narrow blast radius).
Cross-region failover story Not built-in. MirrorMaker 2 or Cluster Linking required. Failover is an app-layer concern. Not built-in. Cross-region replication requires Lambda forwarder pattern. Native via geo-replication. Configure async or sync at the namespace level. Failover is consumer-side reconnect.
Data loss scenarios unclean.leader.election=true + ISR shrink to one node + that node fails = data loss. Mitigate with min.insync.replicas=2 and unclean=false. Producer not retrying on ProvisionedThroughputExceeded; SDK default has bounded retry. Data loss possible if app drops the retry. Quorum loss in a ledger (e.g., 2 of 3 bookies down) blocks writes until recovery; data loss only with E=2 W=2 A=1 and double failure.

05 · Sharding

The unit of parallelism, how it's keyed, and what changing it costs.

Dimension Kafka 4.3 Kinesis Pulsar 4.0
Sharding model Hash partitioning by key (default murmur2). Custom partitioners allowed. Hash range over MD5(partition key) mapped to shard hash space. Hash by routing key (Java/native client). For non-partitioned topics, single-broker ownership.
Shard key constraints Any byte sequence. Null key = round-robin. Same key = same partition (ordering guarantee). Up to 256 char string. Optional ExplicitHashKey to override. Same key = same shard. Any byte sequence. Routing modes: Round-robin, Single-partition, Custom.
Rebalancing mechanism Manual partition reassignment via admin tool or Cruise Control. Cooperative rebalancing for consumer groups (KIP-429). Automatic shard splits and merges (provisioned), automatic on OD. Triggered by traffic or API call. Broker bundle redistribution is automatic. BookKeeper auto-recovery rebalances ledger segments on bookie failure.
Rebalancing cost / impact Multi-hour to multi-day for TB-scale partition moves. Network and disk I/O intensive. Cruise Control rate-limits. Shard split takes ~30s; merge ~30s. Some records may be processed twice during the transition. Broker bundle moves are seconds (stateless brokers). Bookie ledger rebalance is background, no traffic impact.
Hot-shard behavior Hot partition saturates leader broker. Producer latency on that partition spikes; others unaffected. Hard 1 MB/s ingest cap. Excess returns ProvisionedThroughputExceededException. No oversubscription. Hot bundle migrates to less-loaded broker via load shedding (configurable thresholds).
Maximum shards (practical) ~1M partitions per cluster (KRaft, 4.3). Per-partition cost grows with replication threads + file handles. 20K shards per account in tier-1 regions; 6K elsewhere. Per-stream practical limit ~1000s. Topic partitions effectively unbounded at cluster level; bundles are the load-balancing unit, not partitions.
Resharding without downtime? Adding partitions: yes, online (but breaks key-to-partition mapping for new keys). Reducing: no, requires migration. Yes. UpdateShardCount API on provisioned; automatic on OD. Brief duplicate processing during transition. Yes. Increase partition count via admin API. Decrease requires topic migration.
Cross-shard query support No, by design. Streams DSL provides app-layer joins. Cross-partition transactions via EOS. No. Consumer pulls per-shard; aggregation is app-side. No native cross-partition queries. Pulsar Functions provides per-message routing/aggregation.

06 · Replication

How each system makes durability promises and what those promises cost.

Dimension Kafka 4.3 Kinesis Pulsar 4.0
Replication topology Leader-follower per partition. Each partition has one leader; followers pull from leader. Hidden multi-AZ replication. From outside: single logical stream. Leaderless quorum at the BookKeeper layer. Broker writes to multiple bookies in parallel.
Sync vs async Sync to ISR (acks=all). Async beyond ISR (lagging followers catch up). Sync across 3 AZs (documented). Async user-side via consumer apps. Sync to write quorum (W bookies). Ack returned after A bookies confirm.
Replication factor (default / max) Default 3; max bounded by broker count. 3 AZs, fixed, not user-configurable. Default ensemble=3, write=3, ack=2; max bounded by bookie count per namespace.
Consistency level options acks=0 (fire-and-forget), acks=1 (leader-only), acks=all (ISR). min.insync.replicas controls all. Single option: synchronous PutRecord ack after multi-AZ commit. Per-namespace: configure E (ensemble), W (write quorum), A (ack quorum). Highly tunable.
Replication lag (typical) Sub-ms within AZ; 1-10ms cross-AZ for ISR followers. Not exposed. Inferred from PutRecord latency (~10-20ms typical). Single-digit ms for write quorum ack.
Conflict resolution Single-writer (leader) eliminates conflicts. Producer idempotence (KIP-98) handles retry dedup. N/A. Single ordered stream per shard. Single-writer per topic via broker ownership. Producer idempotence via dedup ID.
Cross-region replication MirrorMaker 2 (async, eventual). Cluster Linking (Confluent commercial) for log-level mirroring. Not native. Build custom via Lambda or third-party tools. Native, per-namespace. Async by default; sync available at extra latency cost.
Replication during partition ISR shrinks to surviving brokers. Writes succeed until ISR drops below min.insync.replicas, then reject. AWS-internal; user-visible result is potential throttling, not split-brain. If write quorum unreachable, writes block until BookKeeper opens new ensemble. Auto-recovery is automatic.

07 · Better Usage Patterns

What most teams do wrong, the right way to do it, and why the difference matters at scale.

Kafka · Patterns

PatternWhat Most Teams Do WrongThe Better WayWhy It Matters
Partition count sizing Pick a small number (say 6 or 12) and hope to scale up later. Start at 2-3x your forecasted peak consumer count; over-provision early. Adding partitions changes key-to-partition mapping. Re-keying old data is offline work; doing it right once saves a migration project.
Cross-AZ consumer fetch Consumers default to fetching from the leader; ~50% of fetches cross AZ. Enable rack-aware fetch from followers (KIP-392). Set client.rack on consumer. Cross-AZ traffic is often the largest line item in a Kafka AWS bill. Rack-aware fetch can cut consumer egress by 60-70%.
Tiered storage segment sizing Leave segment.bytes at the default (1 GB). Tune segment.bytes per topic to match offload cadence; smaller for fast-moving topics, larger for archival. Offload happens at segment roll. Wrong size = either too-frequent S3 churn (cost) or slow promotion to cold tier (disk pressure).
KRaft controller quorum size Run 3 controllers (the documentation minimum). Run 5 controllers for production. Tolerates 2 simultaneous failures. Controller availability is the single most critical dependency in KRaft. 3-node quorum has zero margin during a planned restart plus an unplanned failure.
Topic creation policy Enable auto-create-topics; let producers create topics ad hoc. Disable auto-create. Provision topics via CI with explicit config (RF, partitions, retention). Auto-create topics get default config you don't want (RF=1 often). Explicit creation is auditable; topic configs become reviewable infra.
Page cache sizing Provision broker memory based on heap + headroom; treat page cache as automatic. Provision 2x active topic working set as page cache, separate from JVM heap. Kafka reads from page cache; cache misses go to disk and trash latency. Right-sizing the OS cache is more important than heap tuning.
Consumer rebalance protocol Use eager rebalancing (the legacy default). Use cooperative rebalancing (KIP-429). Set partition.assignment.strategy accordingly. Eager rebalance is stop-the-world for the consumer group. Cooperative is incremental. For groups with many consumers, the difference is minutes of consumer downtime per rebalance.

Kinesis · Patterns

PatternWhat Most Teams Do WrongThe Better WayWhy It Matters
Capacity mode selection Pick On-Demand Standard by default. Use OD-Advantage if you have 3+ EFO consumers or sustained fan-out workloads. 60% lower data rates. OD-Advantage removes per-stream fixed charge and per-EFO-consumer-shard-hour cost. Crosses over to cheaper at modest consumer counts.
Cold-start handling Trust OD auto-scale to handle a known launch event. Pre-warm 15-30 min before with synthetic traffic at expected load, or pre-provision shards for the launch window. OD scales to 2x prior 30-day peak with a ~15 min ramp. Cold-start spikes exceed this. Synthetic warming is the documented AWS workaround.
Partition key strategy Use user_id or tenant_id as partition key for "ordering". Salt the key with a random suffix; re-group downstream by the unsalted key for ordering. Hot tenants saturate one shard. Salting distributes load. Ordering can be reconstructed downstream by event timestamp + unsalted key.
Consumer library Poll directly with GetRecords from a custom consumer. Use KCL 2.x or 3.x. Let it handle lease management, checkpointing, and resharding. GetRecords has retry, throttling, and shard-iterator semantics that are non-trivial. KCL handles them correctly. Custom consumers fail on resharding.
Producer batching PutRecord one at a time. Use PutRecords up to 500 records or 5 MB per call. KPL for aggregation if needed. Single-record PutRecord burns 1 of 1000 records/s per shard for each call. Batching 100x lifts effective throughput by 100x.
Checkpoint cadence Checkpoint after every record (correctness-first instinct). Checkpoint after successful batch processing, with a max time bound. Tune for replay tolerance. Per-record checkpoint hammers DynamoDB (KCL state). Batch checkpoint trades a few seconds of replay window for 100x cost reduction.
Retention sizing Set max retention (365 days) "just in case". Set retention to the minimum your replay use case requires. Archive to S3 via Firehose for the rest. Long-term retention is billed per GB-month. At 1 TB/day with 365-day retention, the storage bill exceeds the ingest bill within months.

Pulsar · Patterns

PatternWhat Most Teams Do WrongThe Better WayWhy It Matters
Tenant / namespace hierarchy Treat tenant and namespace as bureaucratic; put everything in public/default. Plan the hierarchy upfront. One tenant per business domain, namespaces by lifecycle (prod / staging) or by retention class. Quotas, retention, replication, auth are all per-namespace. Skipping the design means flat permissions, no per-team quotas, and a reorganization project later.
Subscription mode selection Default to Shared subscription and lose ordering. Use Key_Shared when you need ordered parallel processing. Exclusive for "consumer group" semantics. Failover for active-passive consumers. Mode mismatch is silent. Shared subscription on an ordered workload produces interleaved messages downstream and breaks invariants days later.
Retention policy scope Set retention per topic, repeated across hundreds of topics. Set retention at the namespace level. Group topics by retention class into namespaces. Per-topic retention is a maintenance burden and a config-drift surface. Namespace-level retention scales to thousands of topics.
Oxia ensemble sizing Run 3-node Oxia in production, mirroring the ZK recipe. Run 5-node Oxia for prod. Separate ensemble from dev/staging. Oxia is new; production tail of failures still being mapped. 5-node tolerates rolling restart + unplanned failure. Shared dev/prod ensembles cross-pollinate incidents.
Tiered offload threshold Leave default offload policy off, then hit bookie capacity limits. Enable tiered offload for any namespace with >30-day retention. Tune offloadAfterElapsedMs per namespace. Bookies are expensive storage. Tiered offload to S3 is 10x cheaper per GB. Tuning offload threshold is the difference between bookie sprawl and a small bookie fleet.
Broker managed-ledger cache sizing Leave cache size at default; treat broker memory as JVM heap problem. Size managedLedgerCacheSizeMB to your hot working set, not topic count. Cache misses go to bookies. Bookie load = network + disk hit. Right-sized cache absorbs the hot read path entirely.
BookKeeper ensemble configuration Set E=W=A=3 "for safety" on every namespace. Tune E (ensemble) larger than W (write) for elasticity. E=5 W=3 A=2 spreads ledgers wider without slowing writes. Larger E means smaller per-bookie blast radius. W=A controls latency vs durability. Bigger isn't always safer; it's a per-namespace trade-off.

08 · Advanced / Next-Gen Alternatives

Where each system might be displaced, what the successor improves, and when migration is worth the cost.

Kafka · Successors / Alternatives

Successor / AlternativeWhat It ImprovesMaturityMigration CostWhen To Consider
AutoMQ (diskless Kafka on S3) 10x infra cost reduction; eliminates cross-AZ replication tax; second-scale elasticity; Kafka API compatible. Emerging Low: protocol-compatible; client code unchanged. Operational re-learning required. Cloud Kafka where cross-AZ traffic dominates the bill; teams comfortable trading 10-30ms tail latency for cost.
WarpStream (S3-direct) BYOC architecture (data plane in customer VPC, control plane managed); zero inter-AZ; Kafka API compatible. Emerging Low: client-compatible. Higher latency floor (~400ms p99) limits use cases. Log aggregation, analytics ingest, cost-sensitive workloads where latency budget is generous.
Redpanda (C++ Kafka-compatible) No JVM/GC pauses; sub-ms tail latency; thread-per-core architecture; single binary deployment. Production Low: Kafka API compatible. Operational differences (no ZK/KRaft equivalent, different tuning model). Sub-ms latency requirements; teams that want fewer moving parts; smaller-scale deployments where C++ ops is feasible.
Bufstream (Buf's S3-direct) Protocol-compatible; S3-native storage; integrated schema management via Buf Schema Registry. Early Low: protocol-compatible. Newest entrant; ecosystem still small. Greenfield deployments where schema-first matters; teams already using Buf for Protobuf.

Kinesis · Successors / Alternatives

Successor / AlternativeWhat It ImprovesMaturityMigration CostWhen To Consider
MSK Serverless Kafka protocol + AWS-managed scaling. Same "no ops" story with broader ecosystem. Production Medium: protocol migration but client SDK changes; integration patterns shift from KPL/KCL to Kafka clients. Hitting Kinesis limits (latency, retention) but committed to AWS; want managed Kafka without operating MSK.
Confluent Cloud on AWS Full managed Kafka with Schema Registry, ksqlDB, Connect, Stream Governance. More features than MSK. Production Medium: protocol migration; commercial license; private link configuration. Teams needing full Confluent feature set (Cluster Linking, governance) on AWS; willing to pay 2-3x infra premium.
AWS Firehose alone Removes the stream abstraction entirely for ingest-to-S3 pipelines. Simpler, cheaper. Production Low: drop the Kinesis layer for pure ingest pipelines. You had Kinesis only to land in S3 via Firehose anyway. Skip the intermediate.
EventBridge Event-routing semantics with content-based filtering. Better fit for fan-out to multiple services. Production Medium: API and conceptual change from log to event bus. Workloads that look more like async RPC than streaming. Microservices integration.

Pulsar · Successors / Alternatives

Successor / AlternativeWhat It ImprovesMaturityMigration CostWhen To Consider
StreamNative Cloud Fully managed Pulsar; removes three-component ops burden. Production Low: same Pulsar, hosted. You want Pulsar's architecture without operating it. Same trade as Confluent Cloud for Kafka.
Pulsar Functions (embedded stream processing) Lightweight Functions for stateless processing without a separate Flink/Spark cluster. Production None (additive, not replacement). Simple stream transforms (filter, route, enrich) where dragging in Flink is overkill.
Kafka-on-Pulsar (KoP) Run Kafka protocol on Pulsar storage. Migrate clients incrementally. Emerging Low for protocol; medium for operational learning of Pulsar internals. Want Pulsar's multi-tenancy and elasticity, but cannot rewrite Kafka clients yet.
Astra Streaming (DataStax) Managed Pulsar with native Cassandra integration. Streaming + wide-column store in one console. Production Low. Workloads that span streaming and wide-column persistence; existing DataStax customers.
SOURCES · Apache Kafka 4.3.0 release notes (May 2026) · Apache Pulsar 4.0 architecture docs · AWS Kinesis Data Streams pricing & FAQ · OpenMessaging Benchmark · StreamNative, Confluent, AutoMQ engineering blogs 06.2026 · For Staff+ interview prep