Time Series Databases
Five contenders, one workload class, very different bets on storage, cardinality, and what "production" means.
Category Sweep · PE / L7 DepthAs of 2026-06-04 · InfluxDB 3.8 · TimescaleDB 2.26+ · Prometheus 3.x · VictoriaMetrics v1.x · ClickHouse 25.x
The category has split. Prometheus and VictoriaMetrics are operational metrics engines built around a pull model and label cardinality budget. ClickHouse and InfluxDB 3 are columnar OLAP engines retrofitted for time series, with effectively unbounded cardinality but real operational complexity. TimescaleDB bridges them by paying a row-store tax in exchange for full SQL and Postgres ecosystem. For commodity service monitoring, default to VictoriaMetrics. For observability at scale across metrics, logs, and traces, ClickHouse has won the market. InfluxDB only on greenfield where the FDAP ecosystem matters — the 1→2→3 architecture pivots have cost users a five-year migration tax.
Best default choices
1. Trade-Offs
Each table is one technology, one row per distinct trade-off. Click any column header to sort.
InfluxDB 3
Use when object storage economics, Parquet/Arrow interoperability, and SQL over time-series data matter more than raw ingest speed or mature HA in Core.
FDAP / Rust / Parquet on S3| Trade-Off | What You Gain | What You Give Up | When It Bites You | PE Nuance |
|---|---|---|---|---|
| Object storage as primary persistence | Storage cost drops to S3 economics, ~$23/TB/month vs ~$80 for EBS gp3 | Cold-tier query latency jumps from sub-second to seconds; predicate pushdown only helps for filtered scans | Dashboards that scan 30+ days of historical data with wide GROUP BY | The diskless story sells the architecture but the hot tier is still in-memory Arrow; cache misses are brutal |
| Complete rewrite from Go to Rust on FDAP stack | Unlimited cardinality (the v1/v2 ceiling at ~10M series is gone) | v3 Core is a different product; v1/v2 migration is rewrite-grade, not upgrade-grade | Existing InfluxQL/Flux pipelines, TSM-specific tooling, Telegraf plugins assuming v2 semantics | InfluxData has burned the boats three times. Customers learned. New adoption has slowed visibly |
| Parquet on disk, Arrow in memory | Interop with Pandas, DuckDB, Spark, Snowflake for free; no export ETL | Ingest throughput trails purpose-built engines; ~320K rows/sec vs QuestDB's 11M rows/sec on the same hardware | High-frequency telemetry, financial tick data, sub-second sensor arrays | The architectural bet is on long-term ecosystem leverage, not raw ingest. Reasonable bet, but it's a bet |
| SQL as primary query language | DataFusion handles standard SQL, joins, window functions; analysts don't need Flux | Flux ecosystem (tasks, transforms) is not fully replaced; InfluxQL is compatibility-only | Teams that built on Flux scripting for downsampling and alerts | The Flux walk-back is the most painful UX regression. Users who invested are stranded |
| Tag/field model carried over from v1/v2 | Line Protocol still works; existing Telegraf collectors require minimal changes | The data model is still string-tag-heavy; schema design still matters for performance | Schemas with high-cardinality tags treated as free dimensions, then queried with wide aggregations | Unlimited cardinality at the engine doesn't mean unlimited cardinality is wise. Tag hygiene still matters |
| Open-core licensing split between Core and Enterprise | Core is free and Apache 2.0; covers single-node and basic multi-node | Replication, clustering, RBAC, and durability features are Enterprise-only | Production deployments needing HA without paying for Enterprise or Timestream | The "free for single-node" model has historically driven users to alternatives once they hit scale |
| Embedded Python runtime for transforms | Zero-copy from Arrow to Python via PyArrow; transforms run in-process | Python in the hot path is a new failure mode; UDF crashes block ingest | Heavy compute UDFs run inline rather than via downstream Kafka consumers | This is genuinely novel but operationally risky. Bound resource limits hard |
| Compaction to Parquet runs background | Recent data immediately queryable from WAL + in-memory Arrow | Background compactor can fall behind under sustained ingest, growing WAL on disk and query latency | Sustained ingest at 100% of node capacity for multiple hours | Watch the compaction queue depth metric like a hawk. Falling behind is a slow-motion outage |
TimescaleDB (Tiger Data)
Best when the workload benefits from Postgres semantics, joins, extensions, and operational familiarity while accepting single-node or cloud-managed scale limits.
Postgres extension / Hypertables| Trade-Off | What You Gain | What You Give Up | When It Bites You | PE Nuance |
|---|---|---|---|---|
| Postgres extension instead of standalone | Full SQL, joins, foreign keys, transactions, pgvector, every Postgres tool just works | You inherit Postgres's WAL, autovacuum, single-writer-per-row limits, and connection pool fragility | 10K+ concurrent writers; autovacuum storms during compression cycles | The Postgres dependency is the entire value prop and the entire ceiling. Both at once |
| Hybrid row/column storage (Hypercore) | Recent data in rowstore for low-latency reads; old data in columnstore for 90%+ compression | Conversion is policy-driven background work; can lag, leaving rowstore chunks consuming 7x more disk than expected | Compression policies failing silently while disk fills (known bug 8771 affects 2.22.1) | Hypercore TAM was deprecated in 2.21 for sparse indexes. The story keeps changing, watch release notes |
| Compressed chunks are functionally immutable | Predictable compression ratios; LZ4/gorilla/delta-of-delta encoding by type | DML on compressed data triggers decompress-modify-recompress cycles per affected segment | Backfills, late-arriving data, frequent corrections on historical metrics | If you correct historical data routinely, set compress_after to multiples of your correction window |
| Automatic time-based chunking | Chunk pruning makes time-range queries O(chunks-in-range), not O(table-size) | Chunk interval is chosen at hypertable creation; resizing live tables requires migration | Wrong chunk_interval (too small: index bloat; too large: poor pruning) discovered at 6 months of data | Default is 7 days. For typical IoT, 1 day. For high-cardinality observability, 1-6 hours. Get this right at design time |
| Continuous aggregates as materialized views | Pre-rolled hourly/daily aggregates; concurrent refresh in 2.26+ with sparse index backing | Refresh policies consume CPU; if you have 50 continuous aggregates, refresh windows overlap | Continuous aggregate refresh jobs starve normal ingest CPU during business hours | Stagger refresh schedules. Cap parallel jobs. This is the most common ops surprise in TimescaleDB |
| SQL is the only query interface | No PromQL/Flux to learn; existing BI tools (Tableau, Metabase, Looker) work natively | No native pull-based scrape model; you need Prometheus Adapter or similar for /metrics endpoints | Kubernetes observability shops expecting Prometheus-style scrape | This is why TimescaleDB rarely replaces Prometheus; it replaces InfluxDB or sits next to Prometheus via remote_write |
| Direct Compress ingest (tech preview) | 37x ingestion speedup by compressing in memory before WAL write | Requires pre-sorted COPY input; incorrect sort silently corrupts data | Pipelines that can't guarantee sort order at ingest time | Read the warning. Wrong sort key gives wrong query results without erroring. Not safe yet for arbitrary workloads |
| Vertical scaling primary, horizontal limited | A single 64-core, 256GB node handles 1M+ rows/sec with proper schema | Multi-node TimescaleDB was deprecated; horizontal scale is now read replicas + sharding-by-app-logic | Workloads beyond a single beefy node, or HA across regions | The honest answer is "Tiger Cloud or shard yourself." Multi-node distributed hypertables are gone as of 2.14 |
Prometheus
Choose for service monitoring, scrape-based collection, PromQL alerting, and the exporter ecosystem when single-node retention and HA trade-offs are acceptable.
Pull-based / Single-node / Local TSDB| Trade-Off | What You Gain | What You Give Up | When It Bites You | PE Nuance |
|---|---|---|---|---|
| Pull-based scrape model | Targets self-describe via /metrics; service discovery via Kubernetes API or Consul is trivial | Cannot ingest short-lived jobs reliably; needs Pushgateway as a workaround | Batch jobs, serverless functions, anything that exits before the next scrape interval | Pushgateway is universally hated but universally deployed. The pull model's one real weakness |
| In-memory head block + WAL + 2h compaction | Newest 2h of data lives in RAM; queries on recent data are sub-millisecond | RAM is the cardinality budget; ~5GB RAM per 100K samples/sec ingest rate | Cardinality explosion from unbounded labels (user_id, trace_id) OOMs the process | Healthy instance: 100K-2M active series. North of 5M is a cardinality bomb. North of 10M is on fire |
| Single-node by design | Zero distributed-system complexity; one process, one disk, one runbook | No replication, no HA, NFS unsupported, drive loss equals data loss for that retention window | Anything requiring durability or HA without Thanos/Cortex/Mimir bolted on | The architecture choice is honest: "we are a single-node store; if you need more, here are the federation patterns" |
| PromQL as the query language | Range vectors, rate/irate, histogram_quantile are purpose-built for monitoring | Not SQL; analysts and BI tools need an adapter; joins across metrics are awkward | Cross-cutting analytical queries; reporting that needs JOIN semantics | PromQL is excellent at what it's designed for and bad at everything else. Don't try to make it a data warehouse query language |
| Compaction at fixed intervals [2h, 6h, 18h, 54h, 162h, 486h] | Predictable I/O pattern; compactions don't run mid-query | Compaction spikes (10s of seconds to minutes at high cardinality) starve scrape loop | Cardinality > 5M during compaction window causes scrape timeouts and gaps | compaction_duration_seconds > 30s is the canonical warning sign. Alert on it |
| 15-day default retention | Disk usage stays bounded; storage planning is straightforward | Long-term storage requires remote_write to Thanos, Cortex, Mimir, or VictoriaMetrics | Capacity planning, year-over-year comparisons, compliance retention requirements | Prometheus is a 2-week buffer, not a data warehouse. Treat it accordingly |
| Block deletion is whole-block | No per-series GC; retention is trivial cleanup | Cannot delete individual high-cardinality metrics without rewriting blocks; tombstones are partial | "Drop this one rogue label that's polluting our TSDB" — there's no clean way | The cleanest fix is metric_relabel_configs at scrape time. Server-side deletion is for emergencies |
| Local-disk only, no S3 | No network in the storage path; reads stay local and fast | No object storage offload; cold data is as expensive as hot data per GB | Long retention on Prometheus-only deployments; >50GB local disk pressure | This is the single biggest reason teams add Thanos sidecar. Worth knowing before you scale |
VictoriaMetrics
Best when Prometheus compatibility, high-cardinality tolerance, and simple shared-nothing scaling matter more than consensus-driven replication semantics.
Prometheus-compatible / Shared-nothing cluster| Trade-Off | What You Gain | What You Give Up | When It Bites You | PE Nuance |
|---|---|---|---|---|
| Three-component cluster (vminsert, vmselect, vmstorage) | Each layer scales independently; stateless front-ends, stateful storage | Three deployments to operate instead of one; observability stack now has its own observability problem | Small teams running single-node Prometheus and adopting the cluster too early | Single-node VictoriaMetrics handles ~1M samples/sec on a beefy box. Don't reach for cluster prematurely |
| Consistent hashing on metric+labels for sharding | Deterministic placement; vmselect knows which vmstorage holds which series | No automatic resharding when adding nodes; new nodes only receive new series | Capacity expansion in growing clusters; manual rebalance via vmctl is required for even distribution | This is the deliberate KISS choice. Knowing it up front prevents nasty surprises at month 6 |
| Inverted-index design for high cardinality | Handles 10M+ active series comfortably where Prometheus dies at 5M | Index size grows with unique label combinations; 10x more efficient but not infinite | Truly unbounded labels (request_id as a metric label) still kill it eventually | VictoriaMetrics raises the cardinality ceiling by 10x. It does not remove the ceiling |
| PromQL + MetricsQL extensions | Drop-in Prometheus compatibility plus useful extensions like rollup_rate and quantiles_over_time | MetricsQL queries don't run on Prometheus; lock-in if you adopt the extensions | Migrations between VictoriaMetrics and Prometheus, federated query setups | Use extensions sparingly. Treat MetricsQL features as Vendor-only sugar, not standard |
| Replication via -replicationFactor flag | Each sample written to N consecutive storage nodes; vmselect tolerates N-1 down | Replication is at application layer, not storage; linearly increases resource needs across components | Multi-AZ deployments where you want 3x replication and underestimated the resource multiplier | RF=2 doubles your storage and CPU. RF=3 triples it. Plan capacity accordingly |
| No gossip, no Paxos, no consensus protocol | No split-brain; no leader election; operators can reason about the system end-to-end | No automated failover semantics; failure handling is "retry the other node" | Cross-region coordination, ordered writes, anything requiring distributed consensus | The KISS architecture is the actual product. Compare to Cortex/Mimir's gossip-based complexity |
| Storage replication offloaded to underlying disk | Use GCE persistent disks or EBS replication; no app-level replication overhead | App-level replication is opt-in; default is RF=1 and you can lose data on disk failure | Cost-optimized single-AZ deployments where someone assumed replication was default | Default RF=1 is a documentation-only feature. Verify your replicationFactor flag in prod |
| Single-binary single-node mode also available | Same engine as cluster; can scale up before scaling out | No HA in single-node mode; one process is one point of failure | Production deployments where ops assumed cluster would magically appear | Most teams stay on single-node much longer than they expect. That's by design and it's correct |
ClickHouse
Use when the time-series workload is really high-volume analytical data and schema design can be tuned around MergeTree, ORDER BY, and materialized views.
Columnar OLAP / MergeTree| Trade-Off | What You Gain | What You Give Up | When It Bites You | PE Nuance |
|---|---|---|---|---|
| General-purpose columnar OLAP, not TSDB-specific | Same engine handles metrics, logs, traces, events, business analytics; one stack for all | Not opinionated about time-series; schema design and ORDER BY choice are your problem | Teams expecting it to work out of the box without thinking about partitioning, ordering, and TTL | This is why ClickHouse needs a wrapper like ClickStack/SigNoz/Uptrace for observability; the raw engine is too general |
| MergeTree with sparse primary index | Index granularity 8192 keeps memory tiny; index for 1B rows fits in MB | Point lookups are O(granule), not O(1); single-row reads cost 8K-row scan minimum | Workloads expecting OLTP-style point lookup latency | Wrong tool for transactional reads. Right tool for aggregations over millions of rows |
| ORDER BY clause permanently determines layout | Optimal ORDER BY (e.g., metric_name, service, time) makes prefix-filter queries scan minimal data | Wrong ORDER BY can make queries 100x slower; schema changes require rewriting tables | Discovering at month 12 that your dashboards filter by service first, but you ordered by host first | This is the highest-leverage schema decision in ClickHouse. Treat it with the seriousness of choosing a partition key in DynamoDB |
| Materialized views for pre-aggregation | AggregatingMergeTree maintains rolling counts/sums/quantiles; dashboards query aggregated views | Each materialized view is a write amplification multiplier; 10 MVs equals 11x write cost | Adding new MVs on hot tables in production; the write fanout surprises teams | Plan MVs at schema design. Adding them later means backfilling, which is expensive |
| LowCardinality(String) and Map types | Dictionary encoding for low-cardinality columns; flexible labels via Map(String,String) | Mixing LowCardinality with high-cardinality data degrades to dictionary thrashing | Using LowCardinality on columns with >10K distinct values; performance silently regresses | LowCardinality threshold is ~10K. Above that, plain String is faster |
| Distributed table = view over sharded replicas | SELECT pushes down to shards; results merged at coordinator | No automatic sharding on ingest; you choose sharding key at table creation | Default round-robin sharding hurts when queries filter by service_id but data isn't co-located | Sharding key = your most common filter dimension. Get it right or accept fanout queries |
| ReplicatedMergeTree via ZooKeeper / ClickHouse Keeper | Multi-master replication; any replica accepts writes; eventual consistency | ZooKeeper / Keeper is a hard dependency; quorum loss blocks writes | Keeper performance issues during high merge activity; the cluster's bottleneck moves to Keeper | Move to ClickHouse Keeper if still on ZK. The Keeper transition has been stable since 2024 |
| S3-backed disk for cold storage | Tiered storage moves cold parts to S3; storage cost drops dramatically | S3-tier queries are 10x slower than local; predicate pushdown helps but isn't magic | Dashboards scanning 90-day windows where most data lives in S3 tier | Tier boundary matters more than raw storage cost. Most teams set 7-30 day hot tier and live with cold latency |
2. Use Cases
Production-grade scenarios with named operators, the driving property, and why the alternative fails.
InfluxDB 3
Use when object storage economics, Parquet/Arrow interoperability, and SQL over time-series data matter more than raw ingest speed or mature HA in Core.
| Use Case | Company / Scenario | Driving Property | Scale Dimension | Why Not Alternative |
|---|---|---|---|---|
| IoT sensor telemetry with high-cardinality device IDs | Industrial IoT platforms migrating from InfluxDB 1.x | Unlimited cardinality without v1's series cardinality OOM | 10M+ devices, 1Hz sensor readings, multi-year retention | InfluxDB 1/2 hit the cardinality wall; Prometheus pull model doesn't fit device push |
| Time-series analytics with Parquet interop | Data science teams running pandas/DuckDB downstream | Zero-copy export to Arrow; direct Parquet queries from notebooks | 10-100GB time-series datasets queried from multiple engines | Native TSDB formats require ETL to Parquet; ClickHouse needs export |
| Managed time-series on AWS via Timestream for InfluxDB 3 | AWS shops standardizing on managed services (since Oct 2025) | Managed Parquet-on-S3 with InfluxDB API compatibility | Multi-TB workloads with elastic compute and decoupled storage | Self-managed VictoriaMetrics is more work; Timestream original (LSM) has different semantics |
| Real-time analytics with embedded Python transforms | Trading and quant teams transforming tick data inline | Zero-copy Arrow→Python UDF without process boundary | 1K-100K ticks/sec with custom indicator computation | External stream processors (Flink, Spark) add latency; in-DB UDF is novel here |
| Greenfield SQL-native time-series workloads | New observability platforms choosing SQL over PromQL | DataFusion's standard SQL with time-series-aware optimizations | Moderate scale (sub-billion-row tables) with SQL-skilled team | InfluxQL/Flux were proprietary; SQL avoids re-skilling cost |
| Hybrid hot/cold tiered storage on object store | Cost-sensitive multi-year retention scenarios | Hot data in memory, cold in S3-class object storage | 10TB+ data with 95% queried within 30 days | Prometheus has no cold tier; TimescaleDB tiering requires Tiger Cloud or app-level work |
TimescaleDB (Tiger Data)
Best when the workload benefits from Postgres semantics, joins, extensions, and operational familiarity while accepting single-node or cloud-managed scale limits.
| Use Case | Company / Scenario | Driving Property | Scale Dimension | Why Not Alternative |
|---|---|---|---|---|
| Financial market data with JOINs to reference tables | Trading firms storing tick data alongside instrument/account tables | SQL joins between time-series and relational data without external systems | Millions of ticks/sec; reference tables in same DB; ACID across them | InfluxDB has no real JOIN; ClickHouse joins are awkward; Prometheus can't store reference data |
| SaaS product analytics dashboards | B2B SaaS exposing usage analytics to customers | Multi-tenant Postgres with RLS and continuous aggregates | 1K-10K tenants, 100M-1B events/day, sub-second dashboard queries | ClickHouse RLS is weak; Prometheus is not multi-tenant; InfluxDB tenancy model is awkward |
| IoT analytics with geospatial queries | Fleet management, asset tracking, smart agriculture | PostGIS + TimescaleDB in one database; joins between location and metrics | 10K-1M devices with lat/lon + sensor data; spatial-temporal queries | No other TSDB has first-class geospatial; InfluxDB's geo support is minimal |
| DeFi/blockchain analytics (e.g., Solana DEX swap streams) | Crypto exchanges and DEX analytics platforms (per Tiger case study) | 1M+ tx/sec ingestion with Direct Compress; SkipScan for distinct queries | Real-time swap ingestion; FIFO P/L calculations; multi-timeframe aggregates | ClickHouse lacks ACID for ledger; InfluxDB lacks JOIN for instrument linkage |
| Application metrics from Postgres-shop monitoring | Teams already on RDS Postgres adding time-series workloads | Add timescaledb extension; no new ops surface | Sub-1M series; one DB cluster instead of two | Prometheus would require new ops surface and lacks SQL |
| Energy and utilities meter data | Smart meter pipelines (15-min interval reads, regulatory retention) | 97% compression with delta-of-delta + ACID retention guarantees | 1M+ meters, 15-min reads, 7-year regulatory retention | Prometheus has 15-day retention; InfluxDB 1/2 hit cardinality ceiling at meter scale |
Prometheus
Choose for service monitoring, scrape-based collection, PromQL alerting, and the exporter ecosystem when single-node retention and HA trade-offs are acceptable.
| Use Case | Company / Scenario | Driving Property | Scale Dimension | Why Not Alternative |
|---|---|---|---|---|
| Kubernetes infrastructure monitoring | Default for every K8s cluster on the planet | Native service discovery via K8s API; /metrics endpoint convention | 10K-100K pods per cluster; 15s scrape interval | Nothing else has the same K8s ecosystem gravity; Prometheus Operator is the standard |
| SRE alerting and SLO tracking | Every SRE team running Alertmanager + recording rules | PromQL for SLO math; histogram_quantile for latency percentiles | ~1M-2M active series per Prometheus, alerts in single-digit-second eval times | SQL-based alternatives lack range vector semantics; ClickHouse-based stacks need extra layers |
| Application-level metrics via client libraries | Every microservices stack with prometheus_client instrumentation | Counter/Gauge/Histogram/Summary primitives with strong typing | Thousands of services exposing /metrics; instrumentation is convention | The instrumentation ecosystem is unmatched; OpenTelemetry caught up but adoption still trails |
| Edge/embedded device monitoring | Resource-constrained edge nodes with local Prometheus and remote_write | Single binary, small footprint, no dependencies | Single-digit-MB metrics; remote_write upstream to central long-term store | VictoriaMetrics is comparable; ClickHouse/InfluxDB 3 are too heavyweight for edge |
| Network and infrastructure exporters (node, blackbox, snmp) | Hardware monitoring via the exporter ecosystem | 250+ official and community exporters covering every hardware/protocol | 1K-10K targets per Prometheus; pull cadence at 30s-1m | The exporter ecosystem is the moat; no other TSDB has equivalent coverage |
| Short-term operational metrics (under 2 weeks) | On-call dashboards and incident triage | Sub-millisecond queries on the in-memory head block | Real-time troubleshooting at 15s resolution for current sprint | Cold-tier stores (Mimir, Thanos, VM cluster) have longer query latency for fresh data |
VictoriaMetrics
Best when Prometheus compatibility, high-cardinality tolerance, and simple shared-nothing scaling matter more than consensus-driven replication semantics.
| Use Case | Company / Scenario | Driving Property | Scale Dimension | Why Not Alternative |
|---|---|---|---|---|
| Long-term Prometheus storage | Teams with Prometheus + remote_write to VM for 1-5 year retention | 10x compression vs Prometheus local blocks; Prometheus-compatible query API | 10M+ active series, multi-year retention, sub-second query response | Thanos/Cortex/Mimir add gossip and consensus complexity; VM avoids both |
| Multi-tenant metrics platform | SaaS observability vendors and internal platform teams | Native accountID/projectID tenancy in cluster URLs | 1K-10K tenants on shared cluster; tenant isolation at API level | Prometheus has no tenancy; Cortex requires gateway + ACLs; VM has it built in |
| High-cardinality service metrics (per-customer, per-endpoint) | Platforms with intrinsically high-cardinality dimensions | Inverted-index handles 10M+ series where Prometheus OOMs at 5M | 10M+ active series per node; per-customer SLO tracking | Prometheus dies; ClickHouse needs full schema redesign; InfluxDB 3 is more ops-heavy |
| Edge-to-central metrics aggregation | Multi-region or multi-cluster setups federating to a single global view | vmagent at edge, cluster at center; cluster-to-cluster vminsert chaining | 50-500 remote Prometheus instances writing to a global VM cluster | Thanos Receiver is comparable but more complex; Mimir gossip protocol is fragile at this scale |
| Cost-optimized observability replacement | Teams replacing commercial Datadog/New Relic to cut spend | Self-hosted at 10-50x cheaper than commercial per metric | 10K-1M time series; cost-per-GB-stored down from $0.50 to $0.05 | Datadog/New Relic licensing economics force the eject; VM is the most operational drop-in |
| Single-node high-volume metrics | Teams that prefer vertical scaling and operational simplicity | Single binary handles 1M+ samples/sec on a beefy box | Up to single-node hardware ceiling, then go cluster | Prometheus single-node ceiling is much lower; cluster setup is heavier |
ClickHouse
Use when the time-series workload is really high-volume analytical data and schema design can be tuned around MergeTree, ORDER BY, and materialized views.
| Use Case | Company / Scenario | Driving Property | Scale Dimension | Why Not Alternative |
|---|---|---|---|---|
| Web-scale observability (logs+metrics+traces) | Cloudflare, Uber, Shopify, GitLab, OpenAI run bespoke ClickHouse-backed platforms | One engine handles logs, metrics, and traces with 90% storage reduction vs JVM-based stores | Billions of events/day; petabyte-scale retention; sub-second dashboard queries | Purpose-built TSDBs can't handle logs; commercial vendors cost 100x more at this scale |
| Real-time analytics with materialized view rollups | SigNoz, HyperDX, ClickStack, Uptrace as commercial offerings | AggregatingMergeTree pre-aggregates metrics at write time; dashboards query rollups | 1M+ events/sec ingest; queries on rolled-up data in tens of ms | Prometheus can't ingest events; TimescaleDB continuous aggregates have more refresh lag |
| Security analytics and SIEM | Cloudflare's network analytics, security event processing | Fast scan over high-cardinality event data with arbitrary filtering | Trillions of events queried with millisecond-to-second latency | Elasticsearch is more expensive; Splunk economics are punishing; Prometheus is wrong tool entirely |
| Financial analytics and tick storage | Deutsche Bank, Bloomberg-style analytics on market data | Vectorized query engine on columnar data; window functions for time-series math | Trillions of trades; aggregations over years of history | InfluxDB lacks SQL window functions; Prometheus doesn't fit the data model |
| OpenTelemetry storage at scale | OTel-native platforms (ClickStack with HyperDX UI) | Native OTLP ingest path; trace_id-correlated logs/metrics/spans queryable in SQL | 10K-1M spans/sec; 30-90 day retention with S3 cold tier | Jaeger/Tempo are trace-only; Loki is logs-only; ClickHouse unifies them |
| Product analytics and user behavior | Yandex.Metrica origins, now widely used for clickstream | Funnel analysis, cohort retention, session reconstruction in SQL | Trillions of events; sub-second analytical queries over high-cardinality user dimensions | Amplitude/Mixpanel costs scale linearly; ClickHouse is fixed-cost; Snowflake is more expensive per query |
3. Limitations
Rows are limitation categories; columns are technologies. Click the chips above the table to filter.
| Limitation | InfluxDB 3 | TimescaleDB | Prometheus | VictoriaMetrics | ClickHouse |
|---|---|---|---|---|---|
| Cardinality ceiling | Medium Unbounded by architecture but ingest throughput degrades on wide schemas | High Index bloat past ~10M series; per-chunk indexes don't fully avoid metadata pressure | Critical OOM at 5-10M active series; linear RAM scaling | Medium Handles 10-100M; still has a ceiling on truly unbounded labels | Medium Cardinality lives in row data, not metadata; trade-off is column proliferation if you flatten labels |
| Backfill / late-arriving data | Medium Out-of-order writes accepted; compaction reorders; cost in extra Parquet rewrites | High Backfill into compressed chunks triggers decompress-modify-recompress; expensive at scale | High Backfilling beyond 3-hour window requires offline block generation; current head can't be backfilled | Medium Configurable -dedup.minScrapeInterval; out-of-order accepted within deduplication window | Medium Inserts to old partitions are cheap; merges handle ordering; materialized views may need backfill |
| Real-time / point-update support | High Append-mostly; updates are essentially rewrites of affected Parquet files | Medium Full UPDATE/DELETE in rowstore; compressed chunks need decompress cycle | Critical No update semantics; data is immutable once written | Critical Append-only; no updates or per-row deletes | High ALTER ... UPDATE/DELETE are mutations (async, heavy); ReplacingMergeTree is workaround |
| Multi-tenancy | Medium Database/namespace isolation; RBAC is Enterprise-only | Medium Postgres roles + RLS; per-tenant performance isolation is weak | Critical No native multi-tenancy; one Prometheus per tenant is the workaround | Medium Native accountID/projectID in cluster URLs; logical isolation, shared resources | Medium Database/user-level isolation; row policies for fine-grained; performance isolation is up to you |
| Query language ergonomics | Medium SQL primary, InfluxQL compat; Flux is being walked back | Medium Full Postgres SQL plus time_bucket and hyperfunctions; verbose for some metrics operations | Medium PromQL is excellent for range queries, weak for joins and reports | Medium PromQL + MetricsQL extensions; useful additions but cause lock-in if used | Medium Full SQL with time-series functions; lacks native PromQL operators like rate() |
| Operational maturity at scale | High v3 Core GA April 2025; production operator experience is still being written | Medium Postgres maturity for OLTP, but compression/columnstore subsystem still has open bugs (e.g., issue 8771) | Medium Mature single-node; "at scale" means federation, which is a different beast | Medium Mature cluster; KISS choices mean fewer surprises but also fewer features | Medium Battle-tested at Cloudflare/Uber/Yandex scale; learning curve is steep |
| HA / replication out of box | Critical Replication is Enterprise-only; Core single-node is a SPOF | Medium Postgres streaming replication for read replicas; no native multi-master | Critical Run two Prometheus side-by-side for HA; no built-in replication | Medium Cluster mode native; -replicationFactor flag; cells/AZ architecture for DR | Medium ReplicatedMergeTree multi-master; requires Keeper quorum |
| Resource isolation between writers/readers | Medium Separate compactor process; coordinator and ingester can be split | High Same Postgres backend serves both; OLAP queries can starve OLTP writes | High Single process; heavy queries can starve scrape loop | Medium vminsert/vmselect/vmstorage split is the value prop here | Medium Resource pools and quotas; Cloud separates compute and storage entirely |
| Schema evolution | Medium Schema-on-write via Line Protocol; adding tags/fields is implicit | Medium ALTER TABLE works; new columns on compressed hypertables limited to nullable types | Medium Labels are dynamic; no schema enforcement; cardinality risk is inherent | Medium Same dynamic labels as Prometheus; same cardinality risk | High Schema is explicit; ALTER TABLE on large tables is slow; ORDER BY immutable post-creation |
| Ecosystem maturity | Medium Telegraf, Chronograf, Kapacitor exist but v3 has reset the integration surface | Medium Inherits Postgres ecosystem entirely; Grafana, BI tools, ORMs all work | Medium 250+ exporters, Grafana-native, Alertmanager; the gold standard for monitoring ecosystem | Medium Drop-in for Prometheus ecosystem; vmagent, vmalert, vmauth as native equivalents | Medium Strong analytics ecosystem; observability ecosystem (ClickStack, SigNoz, etc) is growing fast |
4. Fault Tolerance
Failure modes and operational reality. The interesting cells are where defaults differ from production-grade settings.
| Dimension | InfluxDB 3 | TimescaleDB | Prometheus | VictoriaMetrics | ClickHouse |
|---|---|---|---|---|---|
| Replication model | Object-store-backed (S3 durability); Enterprise adds node-level replication | Postgres streaming replication (leader-follower); sync or async | None native; run 2 Prometheus side-by-side, query both | App-level via -replicationFactor on vminsert; storage-level optional via persistent disk | ReplicatedMergeTree multi-master via Keeper; any replica accepts writes |
| Failure detection | Health endpoints + S3 durability; node failure detected via standard probes | Postgres heartbeats + WAL replay status; pg_stat_replication | External (Alertmanager / your monitoring of your monitoring) | vminsert detects vmstorage unreachable; reroutes; no gossip | Keeper-based; replicas heartbeat; replica lag detected via system.replicas |
| Failover mechanism | Restart on healthy node; data already on S3; WAL replay for recent writes | External (Patroni, repmgr, pg_auto_failover); not built in | Manual; you serve from the surviving instance until both rejoin | Automatic reroute; vminsert skips down vmstorage; vmselect tolerates RF-1 down | Reads served from any replica; writes blocked if quorum lost |
| RTO (typical) | Minutes (process restart + WAL replay); seconds if compute is pre-warmed | 30s-5min depending on failover tooling; Patroni can hit ~10s | N/A (no failover); query the other replica immediately | Sub-second to seconds; vminsert retries in flight | Sub-second for reads (other replica); writes may block during Keeper election |
| RPO (typical) | Sub-second (WAL on object store); seconds if WAL is local-only | Zero with sync replication; sub-second async; minutes if relying on WAL archive | Last 2 hours of unflushed head block; whatever was after last block flush is lost | Zero with RF>=2; otherwise depends on disk durability | Near-zero with insert_quorum; sub-second async; minutes if Keeper unreachable |
| Split-brain behavior | S3 strong consistency prevents true split-brain; conflicts surface on read | Postgres doesn't allow multi-master; split-brain requires app-level prevention | Not applicable; instances are independent | No consensus, no split-brain; data may be duplicated across cells | Keeper quorum prevents split-brain; minority partition becomes read-only |
| Blast radius of single-node failure | In-memory hot data lost; cold tier in S3 unaffected | Single instance loss = all data unreachable until failover | Up to retention window of data lost if disk unrecoverable | 1/N of shards unreachable until node rejoins; queries get partial results unless RF>=2 | One shard's data unreachable; other replicas of same shard serve |
| Cross-region failover | S3 cross-region replication; compute redeployed in DR region | Postgres streaming replica in DR region; manual promote | Federation or remote_write to central; not a failover model | Cell-based architecture replicates across regions; explicit DR pattern | ReplicatedMergeTree across regions or external (DR cluster with CDC) |
| Data loss scenarios | WAL not yet uploaded to S3 when node dies; Core single-node = all unflushed data | async replication lag at failover; WAL archive gaps | Unflushed head block, NFS storage (unsupported), filesystem corruption | RF=1 + disk failure = local shard data lost | Async insert without quorum; Keeper partition combined with replica failure |
6. Replication
Replication is where the "single-node by design" philosophies show. Treat default settings as marketing, not architecture.
| Dimension | InfluxDB 3 | TimescaleDB | Prometheus | VictoriaMetrics | ClickHouse |
|---|---|---|---|---|---|
| Replication topology | Storage replication via S3; compute-level replication is Enterprise | Leader-follower (Postgres streaming) | Independent replicas (no protocol between them) | Leaderless; vminsert writes to N consecutive vmstorage nodes | Multi-leader (any replica accepts writes for the same shard) |
| Sync vs async | Async to S3 from WAL; configurable upload cadence | Both supported; sync replication = synchronous_commit=on | N/A; two independent ingestors | Sync write to N nodes from vminsert; ack after all succeed | Async by default; insert_quorum makes it sync |
| Replication factor | S3 default 3x (provider-managed) | 1 leader + N read replicas (no upper bound, practical 5-10) | Just run 2; no in-protocol concept | RF=1 default; recommended RF=2 or 3; flag-controlled | Configurable per shard; commonly 2-3 replicas per shard |
| Consistency level options | Read-your-writes within node; cross-node depends on S3 visibility | Read-from-leader strong; read-from-replica eventual | Eventual within each instance; cross-instance not guaranteed | Tunable via -search.maxQueryDuration and dedup interval | insert_quorum + select_sequential_consistency for strong reads |
| Replication lag (typical) | Seconds for WAL-to-S3; sub-second within hot path | Sub-second for streaming async replicas in same region | Up to 1 scrape interval skew between independent instances | Near-zero for sync RF; vmselect merges replica responses | Tens of ms within DC; seconds cross-region for ReplicatedMergeTree |
| Conflict resolution | Last-write-wins within a series; idempotent for point-with-same-timestamp | Not applicable (single-writer) | Up to query layer (HA pair deduplication via Thanos or VictoriaMetrics) | Deduplication on read (configurable interval) | ReplacingMergeTree / CollapsingMergeTree for explicit conflict handling |
| Cross-region replication | S3 cross-region; Enterprise has explicit cluster-level cross-region | Postgres streaming over WAN; latency-sensitive | remote_write to central cluster; federation patterns | Native cell-based architecture; cells per region with global routing layer | ReplicatedMergeTree across regions; common but bandwidth-heavy |
| Replication during partition | S3 unavailability halts uploads; WAL grows locally until S3 returns | Async replicas lag; sync replicas block writes if quorum lost | Each instance keeps writing locally; no coordination needed | vminsert reroutes around unreachable vmstorage; data goes to remaining replicas | Minority partition becomes read-only; writes blocked without Keeper majority |
7. Better Usage Patterns
Where PE depth shows up. The patterns most engineers miss, the anti-patterns that show up in code review.
InfluxDB 3
Use when object storage economics, Parquet/Arrow interoperability, and SQL over time-series data matter more than raw ingest speed or mature HA in Core.
| Pattern | What Most Teams Do Wrong | The Better Way | Why It Matters |
|---|---|---|---|
| Tag vs field discipline | Treat tags as free-form metadata; promote anything that might be filtered to a tag | Tags are indexed and indexed dimensions multiply storage cost. Reserve tags for fields you filter on; everything else is a field | v3 has unlimited cardinality but indexed-tag count still drives Parquet column count and metadata overhead |
| Schema-on-write reliance | Let Line Protocol auto-create schema; never look at what's been created | Pre-declare schema, monitor schema additions, alert on unexpected tag/field appearance | Schema drift is silent in InfluxDB. A typo creates a new measurement and a new schema branch |
| Compaction tuning | Accept defaults; wonder why queries on yesterday's data are slow | Tune compaction parameters based on retention and query patterns; pin hot ranges | Compaction backlog = slow queries on recent data; the canonical warning sign of capacity issues |
| Python UDF resource bounds | Run arbitrary Python in the embedded runtime with no limits | Set explicit memory and CPU bounds per UDF; treat UDFs as untrusted code | A bad UDF can OOM the database. The embedded runtime is convenience, not isolation |
| Migration from v1/v2 | Treat as upgrade; reuse Flux tasks and InfluxQL queries; assume API parity | Treat as new product. Plan schema migration, query rewrite, and tooling replacement | v3 is a different engine. The compatibility layer is partial. Pretending otherwise wastes months |
| Replication and HA | Run Core in production assuming "InfluxDB has HA" | Core single-node is a SPOF. Use Enterprise, Timestream, or accept SLO impact | Production deployments on Core require explicit HA strategy. Don't discover this during an outage |
TimescaleDB (Tiger Data)
Best when the workload benefits from Postgres semantics, joins, extensions, and operational familiarity while accepting single-node or cloud-managed scale limits.
| Pattern | What Most Teams Do Wrong | The Better Way | Why It Matters |
|---|---|---|---|
| Chunk interval choice | Accept default 7-day interval; discover problems at 100GB | Choose interval such that one chunk fits in 25% of shared_buffers; iterate | Wrong interval is the most common ops issue. Get it right at table-create time, hard to fix later |
| Compression segmentby/orderby | Skip these settings; accept whatever compression you get | segmentby on dimension you filter on; orderby on time DESC; verify ratios | Without segmentby, point queries decompress entire segments. 8600-row decompression vs targeted row |
| Continuous aggregate scheduling | Create 20 continuous aggregates with default refresh; CPU spikes every 5 minutes | Stagger refresh windows; concurrent_refresh in 2.26+; cap parallel jobs | Refresh storms starve ingest. Visible as CPU spikes correlated with timer:5m |
| Backfill into compressed chunks | Run UPDATE on year-old compressed data; hit decompress-modify-recompress storm | Decompress affected chunks, modify, recompress as batch; or stage in temp table and swap | Decompress per row is unbearably slow. Batch the operation |
| Indexes on hypertables | Create the same set of indexes as on a regular table | Index on (time DESC, dimension) for time-range + filter queries; verify per-chunk index sizes | Indexes are per-chunk; oversized indexes blow up disk usage. REINDEX is the maintenance cost |
| Direct Compress on unsorted data | Enable the flag because "37x faster"; ship the change | Verify data is sorted by segmentby+orderby before enabling; treat as tech preview only | Wrong sort causes silent data corruption. Read the warning, run benchmarks first |
Prometheus
Choose for service monitoring, scrape-based collection, PromQL alerting, and the exporter ecosystem when single-node retention and HA trade-offs are acceptable.
| Pattern | What Most Teams Do Wrong | The Better Way | Why It Matters |
|---|---|---|---|
| Label hygiene | Include user_id, request_id, trace_id as labels because "they might be useful" | Drop unbounded labels at scrape via metric_relabel_configs; log unbounded values separately | The single highest-leverage thing you can do. Unbounded labels are the cardinality bomb root cause |
| Recording rules vs ad-hoc queries | Compute the same expensive aggregate in every dashboard panel | Push expensive aggregates into recording rules; dashboards query the precomputed series | Saves CPU, makes dashboards snappy, and ensures consistent values across panels |
| Long-term storage assumption | Increase retention to 1 year; watch disk fill and queries slow | Use Prometheus as 2-week buffer; remote_write to Thanos/Mimir/VictoriaMetrics for long-term | Prometheus is not a data warehouse. Treat it as a buffer; storage tier is a different product |
| HA strategy | Run one Prometheus and call it HA because it has WAL | Run two Prometheus side-by-side scraping the same targets; query both via Thanos/VM dedup | WAL protects against process crash, not host loss. HA requires two ingest paths |
| Histogram bucket design | Use default buckets for all latency histograms; get useless quantiles | Custom buckets per service; use native histograms (Prometheus 2.40+) for accurate quantiles | Default buckets are calibrated for 10ms-10s. Your service has different SLOs. Native histograms are accurate |
| Federation vs remote_write | Use federation for long-term centralization; hit query-time fanout pain | Use remote_write for centralization; federation only for selective high-value series | Federation fans out at query time; remote_write does it once at write time. Different cost profile entirely |
VictoriaMetrics
Best when Prometheus compatibility, high-cardinality tolerance, and simple shared-nothing scaling matter more than consensus-driven replication semantics.
| Pattern | What Most Teams Do Wrong | The Better Way | Why It Matters |
|---|---|---|---|
| Single-node vs cluster decision | Reach for cluster from day one because "production" | Run single-node until you exceed its capacity (often 1M+ samples/sec on big hardware) | Cluster adds three components to operate. Avoid until you actually need horizontal scale |
| Replication factor default | Accept RF=1 default; lose data on disk failure | Set RF=2 minimum for production; understand the resource multiplier (linear in N) | RF=1 is a "test it works" default. Production needs RF>=2 and capacity planned for it |
| vmagent vs direct scrape | Point Prometheus at VictoriaMetrics via remote_write | Replace Prometheus with vmagent for scrape + remote_write; cuts an entire moving part | vmagent is the lighter scrape-only sibling. Same scrape model, less overhead |
| MetricsQL extensions | Use MetricsQL extensions everywhere because they're more ergonomic | Use MetricsQL only when needed; keep dashboards on standard PromQL for portability | MetricsQL is vendor sugar. Portable PromQL keeps your options open |
| Deduplication interval | Run HA Prometheus pair into VM without setting dedup interval | Set -dedup.minScrapeInterval matching your scrape interval; both replicas write but only one wins | Without dedup, queries return duplicated samples and inflated rates |
| Capacity planning | Linear-scale from current ingest; assume you can add nodes later | Use the calculator: storage = ingestion_rate * retention_seconds; double for safety | VM-Cluster has no auto-rebalance. Adding nodes doesn't help old series. Plan capacity, don't react |
ClickHouse
Use when the time-series workload is really high-volume analytical data and schema design can be tuned around MergeTree, ORDER BY, and materialized views.
| Pattern | What Most Teams Do Wrong | The Better Way | Why It Matters |
|---|---|---|---|
| ORDER BY clause design | Pick ORDER BY based on intuition; discover at 6 months that queries scan whole partitions | Design ORDER BY around the most selective filter prefix; benchmark before committing | Highest-leverage schema decision in ClickHouse. Wrong choice = 100x slower queries forever |
| LowCardinality misuse | Slap LowCardinality on every string column to "save space" | Use only when distinct count < 10K; verify with system.parts_columns | Above 10K distinct values, LowCardinality degrades; dictionary thrashing kills query performance |
| Materialized views as schema migration tool | Add MVs later when dashboards get slow; backfill manually | Design MV strategy at schema time; treat MVs as part of the table definition | MVs in ClickHouse are write-time triggers; adding them later means backfill, which is painful |
| Insert batching | Insert one row at a time via HTTP API; create millions of tiny parts | Batch inserts (100K-1M rows per insert); use Buffer engine or ClickPipes for streams | Each insert = new part; parts merge in background. Small inserts = merge storm, query slowdown |
| TTL design | Set TTL once at table creation; never revisit | Use multi-tier TTL: hot SSD → cold S3 → DELETE; align TTL boundaries with dashboard query windows | Cold-tier latency hits when dashboard windows cross hot/cold boundary. Align them |
| Distributed table sharding | Shard by rand() because "even distribution" | Shard by your primary filter column (user_id, tenant_id); query locality matters more than write balance | rand() makes writes balanced but reads fan out to all shards. Shard by query pattern |
8. Advanced / Next-Gen Alternatives
Successors, adjacent technologies that do specific cases better, and patterns that obviate the original need.
InfluxDB 3
Use when object storage economics, Parquet/Arrow interoperability, and SQL over time-series data matter more than raw ingest speed or mature HA in Core.
| Alternative | What It Improves | Maturity | Migration Cost | When To Consider |
|---|---|---|---|---|
| QuestDB | 12-36x faster ingest, 17-418x faster queries on benchmarks; SIMD-vectorized columnar | Production | High; different schema and ILP/SQL semantics | Pure ingestion throughput matters; high-frequency sensor or financial data |
| Amazon Timestream for InfluxDB 3 | Managed InfluxDB 3 with elastic storage and compute; AWS billing integration | Production (Oct 2025) | Low; same API, schema, query language | AWS shops standardizing on managed services; capacity planning offload |
| Apache IoTDB | Optimized for industrial IoT; tree-structured device model | Production | Medium; different data model conventions | Industrial IoT scenarios with tree-hierarchical device topology |
| ClickHouse + ClickStack | Same FDAP-style benefits via established columnar engine; broader ecosystem | Production | High; full rewrite of schemas and queries | Want columnar but with deeper observability ecosystem support |
TimescaleDB (Tiger Data)
Best when the workload benefits from Postgres semantics, joins, extensions, and operational familiarity while accepting single-node or cloud-managed scale limits.
| Alternative | What It Improves | Maturity | Migration Cost | When To Consider |
|---|---|---|---|---|
| Citus + Postgres time-series | Horizontal scale for Postgres; works for time-series with manual partitioning | Production | Medium; Citus model is different from hypertables | Need horizontal Postgres scale without hypertable abstraction |
| QuestDB | 10-50x faster on tick-data workloads; native Postgres wire protocol | Production | Medium; queries port mostly with minor adjustments | High-throughput tick or sensor data without need for Postgres ecosystem |
| ClickHouse with ReplacingMergeTree | Full SQL and analytics; columnar; better at high-cardinality wide tables | Production | High; full schema redesign, no Postgres extensions | Analytics workloads dominating over OLTP; willing to give up ACID transactions |
| DuckDB + Parquet on S3 | Embedded analytics; no server to operate; great for offline analysis | Production | High; not a server, completely different deployment model | Analytical workloads where data lake + DuckDB is acceptable |
Prometheus
Choose for service monitoring, scrape-based collection, PromQL alerting, and the exporter ecosystem when single-node retention and HA trade-offs are acceptable.
| Alternative | What It Improves | Maturity | Migration Cost | When To Consider |
|---|---|---|---|---|
| VictoriaMetrics (single or cluster) | 10x compression; 10x cardinality ceiling; native long-term storage | Production | Low; drop-in for Prometheus ingest and query APIs | You're hitting cardinality limits or storage costs are unsustainable |
| Grafana Mimir | Horizontally scalable Prometheus with object storage; multi-tenant native | Production | Medium; new ops surface; gossip-based architecture | Grafana Labs ecosystem; large multi-tenant deployments |
| Thanos | Long-term storage on object store; cross-cluster query | Production | Medium; sidecar architecture adds components | Existing Prometheus deployment; want object storage offload |
| OpenTelemetry + ClickHouse | Unified logs/metrics/traces; vendor-neutral instrumentation; analytics-grade backend | Emerging | High; new instrumentation, new query language, new ops | Starting fresh or doing observability platform consolidation |
VictoriaMetrics
Best when Prometheus compatibility, high-cardinality tolerance, and simple shared-nothing scaling matter more than consensus-driven replication semantics.
| Alternative | What It Improves | Maturity | Migration Cost | When To Consider |
|---|---|---|---|---|
| Grafana Mimir | Object-storage-native; gossip-based clustering; multi-tenant by design | Production | Medium; same query API, different ops model | Object-storage-first architecture; willing to operate gossip protocol |
| ClickHouse + qryn | Same TSDB workload on general OLAP; broader analytics ecosystem | Emerging | High; full migration of schemas and queries | Want metrics + logs + traces unified; willing to leave PromQL behind |
| VictoriaLogs (sibling product) | Same architecture for logs; pairs naturally with VictoriaMetrics | Production | Low; complementary, not replacement | Looking to extend VM stack into logs without adding ElasticSearch |
| Cortex | Original distributed Prometheus; predates Mimir but has shrunk in adoption | Production (declining) | Medium; similar to Mimir transition | Existing Cortex deployment; likely transitioning to Mimir |
ClickHouse
Use when the time-series workload is really high-volume analytical data and schema design can be tuned around MergeTree, ORDER BY, and materialized views.
| Alternative | What It Improves | Maturity | Migration Cost | When To Consider |
|---|---|---|---|---|
| Apache Doris / StarRocks | MPP analytics with stronger update semantics and primary-key model | Production | High; different SQL dialect and ops model | Need real-time updates and primary-key semantics that ClickHouse doesn't do well |
| Apache Pinot | Real-time analytics with sub-second SLAs; tighter latency guarantees | Production | High; different model entirely (segments, deep-store) | Sub-second analytics with strict latency SLAs; willing to operate ZK + servers + brokers |
| Apache Druid | Stream-first ingestion with native rollup; real-time analytics | Production | High; very different architecture (historical/middle-manager/broker) | Stream-first workloads with heavy use of rollup and approximate algorithms |
| DuckDB on Iceberg/Parquet | Lakehouse-native analytics; no server to operate | Emerging | High; different deployment model entirely | Data-lake-centric organizations; analytics workloads that fit serverless query |
| Snowflake + materialized views | Fully managed; broader BI ecosystem integration | Production | High; commercial pricing model | Cost-insensitive enterprise; willing to trade self-host control for managed simplicity |