Real-Time OLAP — ClickHouse vs Apache Pinot

User-facing analytics at high QPS, sub-second p99, streaming ingest from Kafka. Two engines that converge on the same use cases from opposite architectural ends.

Real-Time OLAP PE / Staff+ Depth

As of 2026-05-31

PE Verdict

ClickHouse vs Pinot is not "which is faster." It is "what is your query concurrency, do you need first-class upserts, and is your team SQL-native or Java-native." ClickHouse wins on compression, flexible analytical SQL, ecosystem (Tinybird, Posthog, observability), and operational simplicity — one binary, fewer moving parts. Pinot wins on user-facing p99 under high concurrency, native real-time upserts, star-tree pre-aggregation for low-cardinality dashboards, and deterministic Kafka partition assignment (no consumer-group rebalancing storms). The 2026 reality: ClickHouse Cloud has narrowed Pinot's user-facing moat with separated compute/storage and lightweight updates; Pinot's remaining moat is upserts, the star-tree index, and the LinkedIn/Uber/Stripe pattern of millions of end-users hitting analytical queries. Constant Contact migrated off Pinot to ClickHouse for the flexibility win at their workload shape; LinkedIn/Stripe stay on Pinot for the concurrency win at theirs. Both are correct, in their respective workload shapes.

Best default choices

ClickHouseFlexible analytical SQL, compression, simpler ops, observability and product analytics Apache PinotUser-facing p99, native upserts, star-tree dashboards, deterministic Kafka ingest Kafka-backed freshnessUse Pinot for deterministic partition assignment; use external consumers for ClickHouse at scale Managed cloudsClickHouse Cloud for ops relief; StarTree Cloud for Pinot upsert and ingestion scale

1. Trade-Offs

Per-technology trade-off lists. The trade space is genuinely different: ClickHouse trades LSM merge-amplification for compression and flexible queries; Pinot trades operational complexity for sub-second p99 and native upserts.

ClickHouse

Columnar LSM OLAP

Trade-Off	What You Gain	What You Give Up	When It Bites You	PE Nuance
Append-only MergeTree LSM architecture	Extreme write throughput; 10-20x compression on typical event data; columnar scans on 1B rows in milliseconds	Single-row updates are anti-pattern; deletes use tombstones with eventual consistency on disk reclamation	Team treats ClickHouse like Postgres for a small operational use case; UPDATE WHERE id = X for each event creates one part per row and compaction can't keep up	The mental model is "you write batches, the engine merges in background." Any pattern that produces one part per write at high frequency degrades the merge pipeline.
Async inserts with server-side batching	High-frequency small writes work without client-side batching; thousands of small clients can write directly	wait_for_async_insert=0 trades durability between buffer and flush; wait=1 eliminates the perf gain	Telemetry agent fires 1KB writes at 10K/sec with wait=0; ClickHouse process crashes between buffer and flush, you lose a few seconds of in-flight data	26.1+ extended async insert deduplication to materialized views. Always pair async_insert with idempotent INSERT statements (use block_id or insert hash) for production durability.
Per-column codec compression	Tunable per column: DoubleDelta, Gorilla, LZ4, ZSTD; 30-50% less disk than Pinot equivalents	More configuration burden; default codecs leave significant perf on the table	You leave defaults; storage bill is 2x what it would be with Gorilla on timestamps and LowCardinality on string dimensions	LowCardinality(String) on columns with under ~10K distinct values is the single highest-leverage tuning in ClickHouse. 10x compression and faster filter on the same column.
Projections as alternative sort orders or pre-aggregations	Multiple query patterns supported by one table; 25.6+ projection-as-secondary-index reduces storage overhead	Write amplification on insert; projection storage cost; complexity in query optimization	Six projections on a high-throughput table cause merge backpressure during traffic spikes; ingest lag balloons	In 25.6/25.11, projections can store only the sort key plus a _part_offset pointer back to the base table — true secondary-index semantics. Adopt new style before legacy data-copy projections.
Materialized views aggregate at insert time	Sub-second queries on pre-aggregated rollups; trigger-style chained MVs let you build aggregation hierarchies	MV-as-trigger semantics are subtle; deduplication scoped per table since 26.1; failures in MV chains cascade	MV chain has 3 levels; backfill produces unexpected double-counts because deduplication is per-table not per-chain	In 26.1, deduplication is on by default for both sync and async inserts. If you explicitly disabled it, you need deduplicate_insert='backward_compatible_choice' to keep old behavior.
Single-binary architecture	One process handles ingest, query, replication coordination; simpler ops than Pinot's 5-component cluster	Replication still needs ZooKeeper or ClickHouse Keeper; cluster management is non-trivial at scale	Self-hosted cluster of 50 servers; ZK ensemble becomes the operational bottleneck; you migrate to Keeper mid-project	ClickHouse Keeper is the modern default for new deployments. C++ implementation, lower latency, fewer ops surprises than the ZooKeeper Java pairing.
Lightweight Updates (Patch Parts)	UPDATE without table locks; instant read consistency vs old mutation-based updates	Still not OLTP; high-volume updates degrade query performance as patches accumulate	Use case sends 1K updates/sec to a billion-row table; patch parts accumulate faster than merge can collapse them; SELECT performance regresses 30%	Patch Parts is a structural improvement over the old mutation model but does not turn ClickHouse into an OLTP store. For high-rate update workloads, use ReplacingMergeTree or move the workload upstream.
Kafka engine via consumer groups	Built-in Kafka ingest as an engine; SELECT FROM Kafka materialized to target table	Consumer group rebalancing pauses ingestion when consumers join or leave; server count limited by Kafka partition count	Cluster autoscale event triggers a Kafka rebalance storm; ingest pauses for tens of seconds; downstream alerting fires	For high-throughput, run an external consumer (Vector, Bytewax, or custom) doing INSERTs instead of the Kafka engine. Better operational control, easier debugging, scales beyond partition count.
preads I/O instead of MMAP	Better random read perf; no TLB shootdowns; benefits from CPU prefetching and caching	Requires OS page cache tuning at large scale; cache-cold queries pay disk latency	Server with insufficient RAM for working set; queries hit disk and p99 jumps under load	The Constant Contact migration story (Pinot to ClickHouse) cited preads vs MMAP as a key win for them. Pinot's MMAP shines when segments fit in RAM, hurts otherwise.

Apache Pinot

Distributed Real-Time OLAP

Trade-Off	What You Gain	What You Give Up	When It Bites You	PE Nuance
Star-tree index for multi-dimensional pre-aggregation	Real-time aggregation of billions of rows in milliseconds without explicit materialized views; covers many low-cardinality dimensions in one tree	Storage overhead for the tree; must be configured per query pattern; not free for arbitrary queries	Team enables default star-tree on a wide table; tree doesn't match actual query shapes; storage bill rises but query perf barely moves	Star-tree is purpose-built for "GROUP BY a, b, c with filters on the same columns" — the LinkedIn 'Who viewed your profile' shape. For arbitrary analytical queries, you're better off with ClickHouse's flexible projections.
Native real-time upserts via per-segment bitmaps	PK-based upserts at ingestion; in-memory bitmap marks obsolete rows; queries filter at read time; full and partial upserts supported	JVM heap-bound: PK count per server limited by heap size; OSS Pinot tops out at hundreds of millions of PKs per server	Upsert table grows past heap limit; server OOMs; you scale horizontally and discover sticky partition assignment caps your scaling	StarTree Cloud's off-heap upsert metadata lifts the heap ceiling roughly 10x. If your upsert PK count is in the billions, StarTree is the route; OSS Pinot won't scale there.
Deterministic Kafka partition-to-server mapping (LLC)	No consumer-group rebalancing storms; predictable ingestion latency; coordinated segment completion across replicas	Sticky assignment: num_partitions × num_replicas must be ≥ num_servers or some nodes go idle	You scale to 10 servers expecting load balancing; you have 6 Kafka partitions and 1 replica, so only 6 of 10 servers ingest anything	Pauseless Ingestion (StarTree Cloud) takes this further to 40M+ events/sec/table. For high-throughput ingest where rebalancing pauses are a hard SLO violation, this is the structural advantage over ClickHouse's Kafka engine.
Segment-based architecture with deep store	Clear cold/warm separation; S3 or HDFS as source of truth; servers can be rebuilt from deep store	Five moving pieces: Controller + Broker + Server + Minion + ZK/Helix; ops complexity is the highest in the OLAP space	Team picks Pinot for a 5-engineer startup; spends three engineer-quarters operating the cluster before any feature ships	For self-hosted, the operational depth required is real. Either commit to it with dedicated platform engineers or use StarTree Cloud and pay the managed-service tax.
Sub-second p99 at high concurrency	Thousands of QPS user-facing dashboards; LinkedIn 'Who Viewed Your Profile' pattern; Stripe merchant dashboards	Less flexible SQL surface than ClickHouse; v2 engine improves joins but still trails for full-scan analytical patterns	Analyst tries an ad-hoc multi-CTE query against Pinot expecting ClickHouse-like flexibility; query plan is suboptimal or unsupported	Pinot is purpose-built for "many users querying simple analytical questions over fresh data." If your queries look like SQL the analyst wrote, ClickHouse is closer to what you want.
MMAP-based segment loading	Fast access when segments fit in memory; OS handles paging	If segments exceed RAM, TLB shootdowns and memory contention degrade performance	Workload grows past RAM working set; queries get slow under contention; you can't tune your way out without adding servers or moving data to tiered storage	The Constant Contact migration cited MMAP scaling issues as a real driver to ClickHouse. Sizing rule: keep hot segments in RAM. Tiered storage in StarTree mitigates this for cold data.
Rich indexing toolkit per column	JSON, text, range, bloom, inverted, timestamp, and Array text-index all coexist per column	Configuration burden; choosing wrong indexes costs both storage and query perf	Default everything; segments balloon 3x; query speedup is marginal because indexes don't match access patterns	Indexes in Pinot are a per-column-per-query decision. Profile queries against a representative segment before enabling indexes broadly. Default star-tree config (≤10K cardinality dims, all DateTime cols) is often the wrong shape for production.
Pauseless Ingestion (StarTree Cloud)	40M+ events/sec per table; no pauses on segment completion	Managed-tier feature; OSS Pinot tops out lower; lock-in to StarTree	Self-hosted Pinot hits ingestion ceiling at sustained 1M events/sec/table; migration to StarTree Cloud or re-architect	The OSS-vs-managed gap on Pinot is meaningful for high-throughput workloads. ClickHouse OSS and ClickHouse Cloud have a smaller gap on raw ingest throughput.
Off-heap upsert metadata (StarTree)	Reduces JVM heap pressure ~10x; enables billions of PKs per server	StarTree-only differentiator; OSS Pinot upserts hit heap ceiling first	OSS Pinot upserts work for hundreds of millions of PKs; you scale past, GC pauses start tripling p99	If upsert scale is your driver and you're committed to OSS, you'll hit this. If you need upsert scale without StarTree, ClickHouse's ReplacingMergeTree + lightweight updates is the OSS-friendly alternative.

2. Use Cases

Concrete production workloads with the driving property that ruled out the alternative. The pattern is consistent: ClickHouse wins on analyst-flexible workloads with high cardinality and complex queries; Pinot wins on user-facing dashboards with predictable query shapes and update semantics.

ClickHouse

Columnar LSM OLAP

Use Case	Company / Scenario	Driving Property	Scale Dimension	Why Not Alternative
Observability and logs backend	Uber — Logging platform processing PB-scale logs/traces	Flexible analytical SQL over high-cardinality time-series data; massive compression on log fields	Multi-PB log volume; thousands of fields; ad-hoc multi-dimensional queries from on-call engineers	Pinot's star-tree assumes known query dimensions; log analysis needs ad-hoc dimensionality the engineer didn't pre-define
Web analytics SaaS backend	Cloudflare Analytics — request-level visibility for millions of customers	Sub-second queries over high-cardinality dimensions (URL, ASN, country, device); flexible filters per customer	Tens of billions of events per day; petabytes hot; millions of tenants with varied query patterns	Pinot's tenant isolation is harder at this multi-tenancy scale; ClickHouse's flexibility lets each customer query their own shape
Migrated from Pinot for flexibility + ops	Constant Contact — moved Automated Path Builder analytics off Pinot to ClickHouse	10x speedup and 10x cost reduction on their analytical query shape; LSM + preads vs MMAP for their working-set size	Marketing automation event data; complex multi-join queries that didn't match Pinot's star-tree shape	Pinot had been the original choice but operational complexity and MMAP scaling issues drove the migration; ClickHouse's flexibility won for their workload
SaaS analytics platform built on ClickHouse	Tinybird — real-time analytics API platform; ClickHouse-as-a-service	Single-binary ops; SQL surface analysts and developers both speak; predictable cost model on managed ClickHouse	Multi-tenant SaaS; thousands of customers building their own analytics products on top	Pinot's operational depth is a SaaS-builder anti-feature; the platform engineer experience matters as much as raw perf
Product analytics platform	PostHog — open-source product analytics; events, funnels, retention	Open-source; flexible SQL surface for funnel/retention queries; cost-efficient self-host option for enterprise	Billions of events per day for large customers; multi-step funnel queries; arbitrary cohort definitions	Pinot's licensing is also open but operational complexity is higher; for an OSS product analytics company the simpler ops path matters

Apache Pinot

Distributed Real-Time OLAP

Use Case	Company / Scenario	Driving Property	Scale Dimension	Why Not Alternative
User-facing analytics for millions of end-users	LinkedIn — 'Who viewed your profile,' notifications, member analytics (founding use case)	Sub-second p99 over millions of concurrent users hitting analytical queries; star-tree pre-aggregation makes counts and aggregates instant	Hundreds of millions of members; queries per user per session; freshness measured in seconds	ClickHouse can do high-QPS but the deterministic Kafka mapping + star-tree advantage are LinkedIn-shaped problems
Restaurant manager analytics for couriers	Uber — Restaurant Manager dashboards; UberEats restaurant-facing real-time metrics	Sub-second freshness from order events to restaurant-facing dashboard; per-merchant slice queries at high concurrency	Tens of thousands of merchants; per-merchant dashboards; fresh-within-seconds order data	ClickHouse Kafka engine rebalance pauses would violate freshness SLO; Pinot's LLC determinism wins here
Merchant dashboards	Stripe — real-time merchant payment analytics	Per-merchant queries fan out to thousands of concurrent dashboards; deterministic latency under load	Millions of merchants; transaction-level events; sub-second dashboard refresh	ClickHouse could serve this but Pinot's purpose-built user-facing pattern is harder to beat at the p99 tail
Real-time pricing intelligence	Walmart — pricing and inventory analytics across stores and SKUs	Star-tree index over (store × SKU × time) for sub-second aggregation; upsert support for changing prices	Tens of thousands of stores; millions of SKUs; price updates flow through Kafka	ClickHouse ReplacingMergeTree handles upserts but star-tree on the multi-dim aggregation is where Pinot pulls ahead
Connected vehicle telemetry	Mercedes-Benz / automotive — connected fleet analytics	Upsert-friendly model for slowly-changing vehicle state; per-vehicle queries at scale	Millions of vehicles; per-vehicle state with frequent updates; geographic and fleet-level rollups	ClickHouse handles append-heavy IoT well; Pinot's native upserts make the "current state per vehicle" pattern cleaner

3. Limitations

Per-technology limitation tables. Severity reflects blast radius on production workloads. Workaround Cost is what you actually pay to dodge each one.

ClickHouse

Columnar LSM OLAP

Limitation	Severity	Workaround	Workaround Cost
Single-row INSERTs without async are an anti-pattern	High	Always batch (1000+ rows) client-side OR enable async_insert with appropriate busy_timeout	Client-side batching adds latency; async_insert without wait risks durability between buffer flush
Joins historically weaker than DW SQL surface	High	Denormalize at ingest; use dictionaries for small lookup tables; use the new query planner for distributed joins	Storage duplication on denormalized data; complexity at ingest layer
Updates and deletes via mutations are expensive on hot data	High	Use Lightweight Updates (Patch Parts) for low-volume updates; ReplacingMergeTree for upsert-like patterns	Patch parts accumulate under high-volume updates and degrade SELECT perf; ReplacingMergeTree returns multiple versions until merge
Kafka consumer group rebalancing pauses ingestion	Medium	External consumer process (Vector, Bytewax, custom) doing batched INSERTs instead of Kafka engine	Extra service to operate; loses the built-in Kafka engine convenience
Concurrent query handling less optimized than DW engines	Medium	Set max_concurrent_queries; partition workloads via resource_pool / user profiles; scale horizontally	Operational complexity in tuning concurrency limits; horizontal scale requires more cluster ops
Tombstone-based deletes are eventually consistent on disk reclamation	Medium	OPTIMIZE TABLE ... FINAL for explicit compaction; partition-level TTL for time-bound data	OPTIMIZE FINAL is heavy; partition TTL only works for time-series-shaped data
Cluster scaling requires careful Kafka partition planning if using Kafka engine	Medium	External consumer pattern (separate ingestion from storage scaling)	One more service to operate and monitor; ops surface grows

Apache Pinot

Distributed Real-Time OLAP

Limitation	Severity	Workaround	Workaround Cost
OSS upsert tables are JVM-heap-bound for PK count per server	High	Move to StarTree Cloud for off-heap upsert metadata; partition tables more aggressively to spread PK count	Managed service cost (StarTree); or operational overhead in partitioning + monitoring heap pressure
Operational complexity: 5 components plus ZK/Helix	High	Use StarTree Cloud, or commit a 2-3 engineer platform team to operate it	Managed service premium; or engineer headcount in perpetuity
Less mature SQL surface than ClickHouse for complex analytics	Medium	Use v2 query engine (multi-stage); for analyst-facing workloads, layer Trino or another engine; for product-side queries, stick to Pinot-idiomatic shapes	Two-engine complexity; analyst learning curve on Pinot SQL dialect
30-50% more disk than equivalent ClickHouse	Medium	Tune index choices; drop default star-tree on tables that don't need it; use tiered storage for cold	Configuration effort; tiered storage adds query latency variance
MMAP I/O causes issues when segments exceed RAM	High	Size servers so hot segments fit in memory; move cold to tiered storage; partition more aggressively	Cluster sizing overhead; cold-tier query latency degrades; possible re-architecture at scale
Kubernetes operator less mature than ClickHouse Altinity Operator	Medium	Use StarTree Cloud K8s tooling; or operate on VMs; or invest in custom K8s operator config	Managed service cost or platform engineering effort
Join performance trails ClickHouse for complex multi-table queries	Medium	Denormalize at ingest; use lookup joins for small dimension tables; structure data for star-tree-friendly queries	Storage duplication; data model complexity at ingest; less flexibility for ad-hoc joins

4. Fault Tolerance

Both engines replicate at the data layer, but the durability semantics differ. ClickHouse async inserts have a known in-flight risk window; Pinot's source-of-truth-in-Kafka pattern means recovery is bounded by Kafka offset retention.

Dimension	ClickHouse	Apache Pinot
Replication model	ReplicatedMergeTree with async multi-master replication coordinated by ClickHouse Keeper (or ZooKeeper). Write to any replica, propagated to others.	Segment-based replication; Controller assigns segments to multiple Servers via Helix; deep store (S3/HDFS) is the source of truth.
Failure detection	Keeper-based heartbeats; replica failure detected via session timeout (default 30s, tunable).	ZooKeeper + Helix manage cluster state; Controller detects Server failure via heartbeat.
Failover mechanism	Other replicas serve reads; client retries to other shard via load balancer or smart driver. Distributed table routes around failed replicas.	Broker routes queries to surviving replicas; on persistent Server failure, segments redistributed from deep store to other Servers.
RTO (typical)	Seconds for read failover (other replica serves); minutes for full server recovery and replica sync.	Seconds for broker re-routing; minutes for segment redownload from deep store to replacement server.
RPO (typical)	Sub-second for replicated commits; async insert buffer at risk between buffer and flush if using wait_for_async_insert=0.	0 for committed segments (Kafka offsets replayable); sub-second for current consuming segment which is regenerated from Kafka.
Split-brain behavior	Keeper quorum prevents split-brain; minority side becomes read-only until partition heals.	ZK quorum prevents controller split-brain; servers in minority partition stop accepting writes until partition heals.
Blast radius of single-node failure	Single shard impacted; other shards continue. If single-replica, the shard is unavailable until recovery.	Segments on failed server are redistributed; brief query degradation; deep store as recovery source.
Cross-region failover story	Manual: async replica in another region OR Distributed table over WAN; no managed cross-region replication outside ClickHouse Cloud.	Manual: tiered storage + multi-cluster setup; deep store can be regional; StarTree Cloud offers cross-region as a tier.
Data loss scenarios	Async insert buffer loss between buffer flush; replica divergence under aggressive K8s autoscaling without graceful shutdown.	Mutable real-time segment loss before commit (rare with Kafka source-of-truth); deep store corruption (extremely rare on managed object stores).

5. Sharding

ClickHouse's sharding is manual and operator-defined; Pinot's is segment-based and controller-managed. The trade is operational simplicity (Pinot's automation) vs flexibility (ClickHouse's hand-tuned distribution).

Dimension	ClickHouse	Apache Pinot
Sharding model	Distributed table over local shards; sharding via expression (hash, rand, or custom) defined per Distributed table.	Hash by partition column for offline tables; for real-time tables, segments map to Kafka partitions (1:1 typical).
Shard key constraints	Sharding expression set at Distributed table creation; changing it means rebuilding the Distributed table.	Real-time upsert tables require Kafka stream to be partitioned by primary key for correctness; partition column is set in table config.
Rebalancing mechanism	Manual: INSERT INTO ... SELECT to new sharded table, or external rebalance scripts; no automatic rebalancer.	Helix-managed automatic segment assignment; rebalance API; Minion-based segment movement.
Rebalancing cost / impact	High — typically full data movement; can require downtime or shadow-table cutover for clean resharding.	Online — segments move via deep store; queries continue during rebalance; minor performance variance during the move.
Hot-shard behavior	Manual repartition; no auto-rebalance; hot shard is operator-detected and operator-fixed via shard expression change.	Segment splits via Minion compaction; star-tree mitigates hot aggregation paths; sticky Kafka partition assignment for real-time upserts can create unbalanced server load.
Maximum shards (practical)	10s to 100s of shards typical; operational complexity rises with shard count due to manual rebalance requirement.	Hundreds of servers with tens of thousands of segments; Helix handles segment-to-server assignment automatically.
Resharding without downtime?	Difficult — typically requires shadow table + cutover; some shops use rolling resharding tooling but it's custom.	Yes — segment-level moves orchestrated by Minion; deep store ensures durability during move.
Cross-shard query support	Distributed engine performs scatter-gather; result merge happens at the initiator; new query planner improves distributed join performance.	Native Broker scatter-gather; v2 multi-stage engine handles distributed joins; star-tree can answer aggregation queries from a single tree without scatter-gather.

6. Replication

ClickHouse replication is async multi-master with optional quorum writes. Pinot replication is replica-group-based with Kafka as the durable source of truth for real-time tables — a meaningfully different durability model.

Dimension	ClickHouse	Apache Pinot
Replication topology	Multi-master async via ReplicatedMergeTree + Keeper/ZK; writes to any replica propagate to others; Distributed table layers sharding on top.	Replica-group-based; Controller-managed via Helix; deep store as durable source of truth; LLC coordinates Kafka offsets across replicas.
Sync vs async	Async by default; quorum INSERT setting (insert_quorum=N) forces sync to N replicas before ACK.	Sync within replica-group on segment commit (LLC + Kafka offset coordination); async push to deep store.
Replication factor (default / max)	Configurable per ReplicatedMergeTree table; typical 2-3; bounded by cluster size.	Configurable per table; typical 2-3; sticky Kafka partition assignment constrains practical maximum.
Consistency level options	Eventual by default; insert_quorum + select_sequential_consistency for stronger guarantees; consistency tunable per query.	Per-query consistency strict within replica-group view; upsert tables provide read-your-writes within partition.
Replication lag (typical)	Sub-second within region for ReplicatedMergeTree; cross-region uses async replica configurations.	Sub-second (Kafka offset-coordinated for real-time tables); deep store sync is also sub-second to seconds.
Conflict resolution	No conflict resolution per se — ReplacingMergeTree, CollapsingMergeTree, AggregatingMergeTree provide engine-level dedup or rollup semantics at merge time.	Per-PK upsert resolution at ingestion time using configurable strategy (last-write-wins, partial-upsert with column-level merge).
Cross-region replication	Manual: Distributed table over WAN, or async replica in another region; ClickHouse Cloud adds managed cross-region.	Manual: tiered storage + WAN sync; StarTree Cloud provides managed cross-region as a tier.
Replication during partition	Replicas continue accepting writes; divergence resolved on heal via merge; insert_quorum prevents writes on minority side.	Replica-group writes continue if quorum maintained; minority side rejects writes to preserve consistency.

7. Better Usage Patterns

The patterns that distinguish a production deployment from a prototype. ClickHouse's are tuning-heavy (codecs, projections, MV chains); Pinot's are model-heavy (index choice per query, partition-by-PK for upserts, replica-group sizing).

ClickHouse

Columnar LSM OLAP

Pattern	What Most Teams Do Wrong	The Better Way	Why It Matters
Batch inserts or use async_insert	Single-row INSERTs from app code; one part per row; merge pipeline collapses	Batch ≥1000 rows client-side, OR enable async_insert with wait_for_async_insert=1 for durability	Merge throughput stays healthy; storage-level part count stays bounded; SELECT performance doesn't degrade over time
LowCardinality(String) for high-repeat string columns	Plain String columns for status, country, device_type, etc.	LowCardinality(String) on any column with <10K distinct values	10x compression on that column; faster filter and group-by; lowest-effort win in ClickHouse tuning
ORDER BY clause as the index	Choose ORDER BY for "logical" sort order without thinking about filter columns	Choose ORDER BY for the columns you filter on most often, in order of cardinality (low-card first)	Primary key skip-index works on ORDER BY; wrong choice means full-table scans on filter predicates
ReplacingMergeTree for upsert-like patterns	Try to UPDATE rows on regular MergeTree; mutations queue up; SELECT performance degrades	ReplacingMergeTree with version column; SELECT FINAL or argMax aggregation for current state	Engine handles dedup at merge; no mutation backlog; production-ready upsert semantics
Materialized Views for pre-aggregation	GROUP BY at query time on huge tables; latency varies with query parameters	Materialized View with AggregatingMergeTree target; queries hit pre-aggregated data; chain MVs for hierarchical rollups	Order-of-magnitude latency reduction on common aggregations; predictable cost
ClickHouse Keeper over ZooKeeper for new deployments	Keep using ZooKeeper because "it's what the docs showed"	ClickHouse Keeper (C++ implementation, lower latency, simpler ops)	Removes JVM operational complexity; lower coordination latency; fewer surprises at scale
Avoid SELECT * on wide tables	SELECT * habit from row-oriented databases	Always select the specific columns needed; columnar means column count drives I/O	Scan I/O scales with column count selected, not row count; SELECT * negates the columnar benefit
External Kafka consumer for high throughput	Use Kafka engine; hit rebalance pauses and partition-count scaling limits	External consumer (Vector, Bytewax, Benthos) doing batched INSERTs; separates ingest scaling from storage scaling	Eliminates rebalance pauses; scales beyond Kafka partition count; better observability into ingest pipeline

Apache Pinot

Distributed Real-Time OLAP

Pattern	What Most Teams Do Wrong	The Better Way	Why It Matters
Star-tree for multi-dim aggregations, not arbitrary queries	Enable default star-tree on every table; expect uniform speedup	Configure star-tree explicitly for the (dimensions, aggregations) of your top dashboard queries; profile before broad enablement	Storage overhead pays back only when star-tree matches actual query shape; default config rarely matches production patterns
Partition Kafka by PK for upsert tables	Use round-robin or random Kafka partitioning; upsert correctness breaks	Kafka stream MUST be partitioned by primary key; all messages for a PK land on the same Pinot server	Upsert correctness requires this; otherwise PK metadata maps don't converge; this is a hard correctness requirement, not an optimization
Cluster sizing: num_partitions × num_replicas ≥ num_servers	Scale to 10 servers with 6 Kafka partitions and 1 replica; 4 servers go idle	Always plan: num_partitions × num_replicas must equal or exceed planned server count; over-provision Kafka partitions for headroom	Sticky partition-to-server mapping means under-partitioning leaves capacity unused; this is non-obvious until you try to scale
Minion-based segment compaction for upsert-heavy tables	Let segments accumulate; obsolete-row ratio rises; queries slow over time	Schedule SegmentRefreshTask via Minion to compact segments based on obsolete-row ratio	StarTree Cloud automates this; OSS Pinot requires explicit task scheduling but the same primitive works
Bloom filter on high-selectivity columns, inverted on low-cardinality	Default everything; segments balloon 3x; query speedup is marginal	Bloom on columns with high selectivity for point queries; inverted on low-cardinality filter columns; range index on time and numeric ranges	Index choice is per-column-per-query; profile actual query patterns against representative segments before broad enablement
Offline + real-time hybrid tables for historical + fresh	Single real-time table; historical data accumulates in real-time tier; cost grows	Real-time table for fresh window (e.g., last 7 days); offline table for historical data, fed by daily batch jobs	Resource isolation; cost optimization (offline servers can be smaller); query routing handles the split transparently
Set retention explicitly	OSS Pinot default has no retention; segments accumulate indefinitely	Set retentionTimeUnit + retentionTimeValue in segmentsConfig; verify periodically	Storage cost spirals on long-running tables without retention; this is a "you'll forget about it" footgun
Replica-group routing for tenant isolation	Single replica group; all tenants share resources; noisy neighbor under load	Per-tenant or per-workload replica groups; broker routes queries to dedicated replica group	Isolates noisy tenants; predictable p99 per tenant; chargeback becomes possible

8. Advanced / Next-Gen Alternatives

Where each engine may be augmented or displaced. The real-time OLAP space is consolidating; Druid is fading, ClickHouse and Pinot are the two viable open engines plus their managed services (ClickHouse Cloud, StarTree Cloud).

ClickHouse

Columnar LSM OLAP

Successor / Alternative	What It Improves	Maturity	Migration Cost	When To Consider
ClickHouse Cloud (managed)	Separated storage and compute on object store; auto-scaling; managed Keeper; lower ops burden	Production	Low — lift-and-shift from self-hosted; SQL surface is identical; cost model shifts to managed-service line-item	When self-hosted operational burden exceeds the managed-service premium; when you need rapid scaling without operating Keeper + cluster
Apache Pinot / StarTree Cloud	Native upserts at scale; sub-second p99 for user-facing dashboards; star-tree for pre-aggregated multi-dim queries	Production	Medium-High — different data model; star-tree shape vs MergeTree shape; rewrite of materialized view logic	When user-facing dashboard p99 becomes the SLO and ClickHouse can't hit it under load; when upsert workload outgrows ReplacingMergeTree
DuckDB / Polars (embedded analytical)	Single-node analytical performance for small-to-medium data; no cluster to operate; embeddable	Production	Low for specific workloads — complementary, not replacement; useful for edge analytics or notebook-scale workloads	When data fits on one node (low TB scale) and you'd rather not operate a cluster; for ad-hoc analytics adjacent to ClickHouse
Apache Druid	Streaming ingest flexibility for diverse sources and out-of-order events; mature for time-series-shaped workloads	Production	High — different segment model; operational complexity comparable to Pinot; Druid mindshare has been declining	Mostly when inheriting an existing Druid deployment; greenfield, ClickHouse and Pinot are the more active choices in 2026

Apache Pinot

Distributed Real-Time OLAP

Successor / Alternative	What It Improves	Maturity	Migration Cost	When To Consider
StarTree Cloud (managed Pinot)	Off-heap upsert metadata (10x PK scale); Pauseless Ingestion (40M+ events/sec/table); tiered storage; managed K8s	Production	Low from OSS Pinot — same engine, managed control plane and add-ons	When OSS Pinot's heap-bound upserts limit scaling; when ingest throughput approaches OSS ceiling; when operating 5 components in-house is the bottleneck
ClickHouse / ClickHouse Cloud	Operational simplicity (single binary); higher compression; more flexible SQL surface; richer ecosystem	Production	Medium-High — different data model; rewrite of star-tree-shaped queries; index strategy rethink	When ad-hoc analytical queries dominate over pre-defined dashboard queries; when ops headcount is the binding constraint; Constant Contact pattern
Apache Druid	More flexible streaming ingest; better tooling polish in some areas (Imply Polaris UI); mature K8s operator	Production	Medium — adjacent architecture but different segment + upsert model; Pinot's upsert maturity is hard to match	When streaming ingest from diverse sources matters more than upsert semantics; mostly relevant for greenfield Druid evaluations
Snowflake Hybrid Tables or Databricks Lakebase	OLTP + OLAP unified on a warehouse; no separate operational store needed for some user-facing patterns	Emerging	High — different architecture; not a 1:1 replacement; only viable for specific patterns where warehouse-grade latency is enough	When user-facing latency requirements relax to 100ms+ and the unified-platform pitch outweighs the dedicated OLAP engine perf

As-of disclaimer. Specifics reflect public documentation, vendor blogs, and production reports through May 2026. ClickHouse 26.1 (Jan 2026) extended async insert deduplication to materialized views and made deduplication default-on for all inserts. ClickHouse 25.6/25.11 introduced projection-as-secondary-index. Apache Pinot 2026: native upserts mature, v2 query engine for distributed joins, JSON/text/range index toolkit. StarTree Cloud adds Pauseless Ingestion and off-heap upsert metadata as managed-tier differentiators. Constant Contact's 2026 case study documents a real-world Pinot-to-ClickHouse migration. Verify against current docs before architectural commitments.

Best default choices

Search and compare

1. Trade-Offs

ClickHouse

Apache Pinot

2. Use Cases

ClickHouse

Apache Pinot

3. Limitations

ClickHouse

Apache Pinot

4. Fault Tolerance

5. Sharding

6. Replication

7. Better Usage Patterns

ClickHouse

Apache Pinot

8. Advanced / Next-Gen Alternatives

ClickHouse

Apache Pinot