Cloud Data Warehouse / Lakehouse — PE Trade-Offs

Snowflake, Databricks, Amazon Redshift, Google BigQuery — what to give up and what you get, from the operator's seat.

Cloud DW / Lakehouse PE / Staff+ Depth

As of 2026-05-31

PE Verdict

In 2026, the DW/Lakehouse choice has collapsed to one decisive question: where does your data already live, and what is the shape of your AI/ML workload. Snowflake wins on SQL ergonomics, cross-org data sharing, and governance posture. Databricks wins on ML, open formats, and engineering teams that want one engine for ETL+BI+ML. BigQuery wins on serverless simplicity when your bytes-scanned math holds and your team is GCP-native. Redshift rarely wins greenfield, but the AWS Zero-ETL gravity and FedRAMP/GovCloud surface keep it the default for AWS-locked compliance shops. The differences narrow every quarter (Iceberg everywhere, in-DW AI everywhere, serverless everywhere), so the decision is increasingly about cost model + ecosystem gravity, not raw capability.

Best default choices

SnowflakeSQL-first analytics, governance, sharing, marketplace data products, and multi-cloud warehouse ergonomics DatabricksLakehouse workloads where ETL, streaming, notebooks, ML, Delta, and open data formats drive the architecture Amazon RedshiftAWS-native analytics, Zero-ETL, GovCloud/FedRAMP, Lake Formation, Glue, and Aurora/RDS gravity Google BigQueryGCP-native serverless analytics, bytes-scanned economics, slots, Firebase, geospatial, and BigQuery ML

1. Trade-Offs

One row per distinct give-up-X-to-get-Y. Per-technology because the trade space differs. PE Nuance column is the insight most engineers miss until they hit it in production.

Snowflake

Cloud-native DW

Trade-Off	What You Gain	What You Give Up	When It Bites You	PE Nuance
Separation of storage and compute via virtual warehouses	Workload isolation: BI dashboards never contend with overnight ETL	Per-warehouse cost stacking; 5 warehouses with auto-suspend off = 5x credit burn	Team copies the "create a dedicated warehouse" pattern across every workload, hits month-end with $80K in warehouses idling at 90% of their billable minute	The right granularity is one warehouse per workload class (BI, ETL, ad-hoc), not per team or per pipeline. Most teams over-fragment.
Credit-based pricing with auto-suspend	Pay-per-second after the 60s minimum; predictable line-item billing	60s minimum billing per warehouse resumption; warm-warehouse heuristic discipline required	Dashboard hits warehouse at minute 59, suspends at 60, refresh hits 1s later, you billed two full minutes for ~2s of work	Default auto-suspend is 600s (10 min). Drop to 60s and the math changes materially on bursty BI. The trade is cold-cache penalty on next query.
Iceberg tables GA (October 2025) for open format	Open table format, multi-engine read, narrows lock-in vs proprietary FDN micro-partitions	Externally-written Parquet without full statistics loses approx 2x perf vs Snowflake-managed Iceberg	Lakehouse migration team writes Iceberg from Spark, expects parity with native tables, finds 2x slower scans because stats coverage is incomplete	Snowflake-managed Iceberg gets the same caching as native tables. Externally-managed Iceberg pays a real performance tax even with the optimized Parquet scanner.
Hybrid Tables for OLTP+OLAP unification	Operational and analytical on one platform; eliminates write-back ETL	Approx 16K ops/sec/db ceiling; single-region only; FK constraints enforced at write	Team picks Hybrid Tables for a product analytics use case at 50K writes/sec sustained and hits throttling at quarter-end Black Friday	March 2026 pricing change dropped per-request billing; storage + warehouse compute only now. Cost calculator from 2025 is obsolete.
Snowpark + Cortex AI in-platform	No data movement for ML/LLM workloads; governance and access controls follow the data	Cortex pricing is opaque per-token; LLM inference on Cortex runs roughly 2-3x equivalent dedicated GPU infra	Cortex bill compounds quietly; finance asks "why is our Snowflake bill up 40% MoM" and the answer is "GenAI prototyping"	The $200M OpenAI partnership (Feb 2026) and Snowflake Intelligence position the platform for NL-to-SQL, but it deepens the ecosystem moat. Plan exit costs early.
Time Travel + Fail-safe	Point-in-time recovery to 90 days plus 7-day fail-safe; trivial rollback	Storage cost is silently 1.07-2x apparent table size on high-churn tables due to retained versions	You analyze storage cost by SELECT FROM TABLE_STORAGE_METRICS and find ACTIVE bytes are 30% of total billable bytes; the rest is time travel retention	DATA_RETENTION_TIME_IN_DAYS on heavy-update tables is a credit lever most teams never touch. For staging tables, set it to 1.
Automatic clustering on micro-partitions	Pruning without explicit index management; clustering keys are advisory hints	Re-clustering burns credits silently; no upfront tuning knobs for unusual access patterns	Cluster key change kicks off background re-clustering on a 10TB table, you wake to a $4K surprise on AUTOMATIC_CLUSTERING_HISTORY	Always set the cluster key BEFORE bulk load. Re-clustering an existing large table is expensive; loading into a clustered empty table is free.
Multi-cluster warehouses for concurrency	Auto-add clusters under queue depth; up to 10 clusters	Each added cluster is a credit-burning entity; auto-scaling can stack silently	BI dashboard refresh storm fans out across 10 clusters at peak, bill spikes 10x for one hour; you didn't set a max cluster count	MIN_CLUSTER_COUNT = 1, MAX_CLUSTER_COUNT = 3, SCALING_POLICY = STANDARD. Economy scaling is the trap default for cost-sensitive workloads.

Databricks

Lakehouse

Trade-Off	What You Gain	What You Give Up	When It Bites You	PE Nuance
Photon vectorized engine on Delta	Up to 12x price/perf vs traditional DW per Databricks benchmarks on SQL/DataFrame workloads	Photon is a per-DBU surcharge; closed-source; UDF-heavy workloads see partial benefit	You enable Photon on a UDF-heavy ETL job expecting speedup, see 1.2x for 2x the DBU; Photon falls back outside its vectorized path	Photon excels on aggregations, joins, and Spark SQL. Pandas UDFs and Python UDFs do not vectorize the same way. Profile before flipping the flag globally.
Open Delta Lake format	Open-source table format; Iceberg interop; lakehouse pattern with no Parquet vendor tax	Vendor lock on the Databricks Runtime + Photon combination; OSS Spark on Delta is materially slower	You plan a "we can leave anytime, data is in S3" exit and find your Photon-tuned pipelines run 3x slower on OSS Spark elsewhere	Delta is open. The optimization gap to OSS is the moat. Plan the exit assuming you run on Photon, and budget the rewrite if you migrate.
Spark + DBSQL unified compute	One engine for ETL, BI, and ML; no separate query layer to operate	Classic cluster startup approx 4 minutes; cold-start latency hurts ad-hoc BI	Analyst opens dashboard at 9am, waits 4 minutes for cluster startup before first query runs; trust in the tool erodes	Serverless SQL warehouses fix cold start (sub-second) at premium pricing. The IWM (Intelligent Workload Management) ML autoscaler is Serverless-only — Pro/Classic still use reactive cluster autoscaling.
Unity Catalog unified governance	Single governance plane for tables, ML models, files, dashboards, AI assets, and Lakebase	UC enforcement adds a metadata roundtrip on every query; migration from Hive metastore is multi-quarter work	Mid-migration, you discover legacy notebooks bypass UC; permissions are inconsistent across catalogs and the audit team is unhappy	UC-from-day-one is the path of least pain. Mid-life migration costs 1-2 engineer-quarters per workspace and rarely catches all bypasses.
Lakebase Postgres OLTP integrated with lakehouse	Operational Postgres next to analytics; one-click sync from Delta tables; UC-governed	Single-region (as of Apr 2026); CMK still rolling out; two-system complexity persists	Team builds a real-time app on Lakebase expecting global multi-region failover, discovers it's not GA for cross-region yet	Lakebase competes with Snowflake Hybrid Tables on the same OLTP+OLAP unification thesis. Both are early. Pick based on which platform your team already runs day-to-day, not which is technically superior.
Intelligent Workload Management (IWM) on Serverless SQL	ML-predicted compute allocation per query; near real-time elasticity	Serverless tier only; Pro and Classic warehouses use static-threshold autoscaling	You compare Pro and Serverless costs on identical workload, find Serverless wins on throughput but loses on simple TCO math at low concurrency	IWM works best when concurrency is bursty. For steady-state ETL with predictable shape, Pro or Classic + manual sizing wins on DBU/query.
DBU + cloud-vendor compute stacking	You see DBU consumption per workload in Databricks; cloud vendor still bills VMs separately	Cost attribution is harder than Snowflake's single-line bill; chargeback requires joining DBU usage with EC2/ADV/GCE billing	FinOps team asks for monthly cost per team; you spend two weeks building a join across Databricks usage tables and AWS Cost Explorer to answer it	Photon Serverless hides the underlying VM cost inside DBU pricing — easier attribution but loses the visibility for capacity planning.
Multi-table transactions (March 2026)	ACID across multiple Delta tables; success or rollback as a unit	Requires catalog commits enabled per participating table; UC-managed Delta tables only	You design a financial reconciliation pipeline assuming multi-table atomicity, find one of your tables is still on Hive metastore and the transaction is silently single-table	Enabling catalog commits is a one-way migration in practice. Test reads from external Delta clients (Trino, Spark OSS) before flipping it on production tables.

Amazon Redshift

AWS-native MPP

Trade-Off	What You Gain	What You Give Up	When It Bites You	PE Nuance
RA3 nodes separate compute from RMS	Independent scale of compute and Redshift Managed Storage; data sharing across clusters	Still pick node type and count manually; less elastic than Snowflake's warehouse model	Workload doubles overnight, you need to resize the cluster, manual ALTER CLUSTER takes 30+ minutes and queries throttle during transition	DC2 nodes are deprecated; RA3 is the only modern path. If you're still on DC2, plan migration before the deprecation date, not after.
Concurrency Scaling transient clusters	Auto-add transient clusters under queue depth; first hour per cluster per day is free	Charged per-second after free hour; concurrency limit 50 per main cluster	BI workload bursts past 50 concurrent and queries queue; you turn on concurrency scaling, blow through free hour by mid-morning, surprise bill next day	The free-hour-per-cluster-per-day math is per cluster spawned, not per main cluster. High burst workloads can rack up multiple free hours daily.
Zero-ETL from Aurora, RDS, DynamoDB	Near-real-time CDC replication; no DMS plumbing; sub-minute lag typical	Replicated tables are read-only; cannot run CTAS or INSERT against them	dbt model tries to materialize off a zero-ETL table, fails with "Operation is not supported for the database from INTEGRATION"; you rebuild as views or stage-and-copy	Continuous CDC prevents Serverless auto-pause. Teams report 2-3x RPU bills on Serverless under zero-ETL. Either use RA3 provisioned or set strict Max RPU-Hours.
Deep AWS-native integration	IAM, S3, KMS, Lake Formation, Glue, EventBridge all first-class; FedRAMP and GovCloud surface	Cross-cloud is rare and painful; lock-in to AWS data plane is the deepest of the four	Org-wide multi-cloud strategy is set; Redshift can't follow workloads to Azure or GCP without a full re-platform	For AWS-only shops, the integration depth pays back as days saved on every adjacent project. For multi-cloud, it's the wrong default.
Spectrum for S3-resident parquet/Iceberg	Query external data without loading; lakehouse-style separation at the engine level	Slower than native RMS; pays per byte scanned similar to BigQuery on-demand	Cost-conscious team puts cold data on Spectrum, queries it daily for a year, Spectrum scan costs exceed what RMS would have cost	Spectrum is an extension, not a replacement for RMS. Use it for true cold archive or for joining lake data to warehouse data, not as a default tier.
Redshift Serverless RPUs scale by load	No cluster to size; auto-pause when idle; sub-minute scale-up	24/7 workloads keep cluster perpetually warm; Serverless can cost more than RA3 at sustained load	Team picks Serverless for "elasticity," then runs zero-ETL into it 24/7; Serverless never auto-pauses, monthly bill is 1.8x equivalent RA3	The break-even for Serverless vs RA3 is roughly 12-16 hours/day of active compute. Sustained workloads belong on provisioned.
Python UDF deprecation (Patch 198+, support ends June 30 2026)	N/A — this is a deprecation, no gain	No new Python UDFs; existing UDFs work until cutoff; refactor to Lambda UDFs or SQL UDFs	Pipeline depends on a Python UDF for a custom hash, you discover the deprecation 2 weeks before cutoff while planning a release	Signal that AWS is consolidating compute primitives. Lambda UDFs are the migration target but add network hop and IAM complexity. Inventory Python UDFs across all clusters now.
WLM queue-based workload management	Predictable resource carving between ETL and BI workloads	Static queue config; auto-WLM tuning lags real-world adaptation	You add a new high-priority dashboard, it runs in the wrong queue, queue saturates, p99 jumps 5x for that workload class	Auto-WLM (default in modern clusters) is the right starting point. Manual WLM is for shops that already have a tuned config and a person who owns it.

Google BigQuery

Fully serverless DW

Trade-Off	What You Gain	What You Give Up	When It Bites You	PE Nuance
Fully serverless slot-based architecture	No clusters, no warehouses, no nodes; zero ops	Slot contention under cross-edition prioritization; Standard and on-demand lose to Enterprise+ baselines under regional load	Region-wide demand spike during quarter-end; your on-demand workload queries take 4x longer because Enterprise Plus baselines get priority	BigQuery prioritization order: Ent+/Ent baselines, then Ent+ autoscale, then Ent autoscale, then Standard and on-demand. If you're cost-sensitive, you're at the back of the queue under contention.
On-demand bytes-scanned pricing	Zero idle cost; perfect for spiky workloads; nothing to provision	A single misconfigured query can scan 100TB and bill $625; no kill switch by default; on-demand rates rose approximately 25% in recent revisions	Analyst writes SELECT * from unpartitioned 50TB events table; bill arrives Monday; project is now over budget for the year	maximum_bytes_billed at query level is the kill switch most teams never set. Set it at the project level via custom quota for true protection.
Editions with autoscaling slots (Std/Ent/Ent+)	Predictable cost per slot-hour; commitments discount up to 40%; baseline + autoscale shape matches most workloads	1-minute minimum due to autoscaler 1-minute scale-down; a 10-second query bills a full minute of slots	Many small queries hitting Editions reservation; you expect proportional billing but actual cost is 6x because every short query rounds up to a full minute	For short-query-heavy workloads, on-demand often beats Editions despite the 25% on-demand price hike. The break-even is around 300-500 TiB monthly processing.
BigLake + Iceberg external tables	Lakehouse pattern; query GCS Parquet/Iceberg without loading; federated queries to other clouds via Omni	External tables are slower than native; metadata refresh patterns are version-dependent; not all features apply	You enable BI Engine acceleration assuming it covers BigLake tables, find it doesn't, dashboard p99 is 4x what you projected	BigLake metadata caching narrows the gap but doesn't close it. For high-QPS BI on lake data, materialize into native BQ tables.
BigQuery ML in-place SQL ML/AI	Train and predict in SQL; Vertex AI integration; Gemini in BQ for NL-to-SQL and code-gen	Good for tabular ML; not competitive for deep learning vs Databricks/SageMaker; pricing per model invocation	Team picks BQ ML for a recommendation model expecting feature parity with Vertex, hits limits on custom architectures and migrates mid-project	BQ ML excels at classical models (linear, XGBoost, ARIMA, k-means) plus pre-trained model invocation. For training transformer-scale models, send the data to Vertex.
Nested and repeated schema (STRUCT, ARRAY)	Idiomatic semi-structured data; no JSON parse overhead; powerful UNNEST queries	Schema lock-in; nested/repeated layout doesn't port cleanly to Snowflake/Redshift/Databricks without flattening	Two-year migration to another DW; your "data is just SQL, we can move it" plan stalls on rewriting hundreds of UNNEST patterns	Use STRUCT/ARRAY when the access pattern is "query the whole record" or "fan out on a sub-array." Avoid them for fields you'll join on; they kill join planning.
Storage Write API for streaming	gRPC, exactly-once delivery, high throughput, no DML quota impact	Separate quota; different mental model than DML; legacy streaming API is in maintenance	Team builds new pipeline on legacy tabledata.insertAll, runs into the per-table 1MB/sec quota at production scale	DML quotas were largely removed years ago; the surprise is now on streaming quotas. For new pipelines, default to Storage Write API.
Opaque slot scheduler	No tuning burden; Google's scheduler handles everything	No power-user knobs; query performance regressions are hard to diagnose	p99 regresses 30% one Tuesday; INFORMATION_SCHEMA.JOBS shows nothing actionable; you file a support ticket and wait	For deep performance work, BigQuery is the most opaque of the four. Snowflake and Databricks both expose more of the engine. Plan for less control.

2. Use Cases

Concrete workloads with the driving property that ruled out the obvious alternative. Scale dimension is the number that mattered for the architecture, not the marketing number.

Snowflake

Cloud-native DW

Use Case	Company / Scenario	Driving Property	Scale Dimension	Why Not Alternative
Cross-business-unit secure data sharing	Capital One — modernizing data sharing across orgs and external partners	Define-once policies (row, column, masking) that travel with the data, no copy required	Hundreds of consumers; PB-scale shared datasets; SOX + financial regulation	Redshift Data Sharing exists but lacks the centralized policy layer; BQ Authorized Views require copy-and-replicate posture
Content analytics with partner studio sharing	Disney Streaming — viewership across services and partner studios	Cross-account live data sharing without ETL pipes between Snowflake accounts	10s of partner accounts; titles in the millions; daily refresh on engagement signals	Databricks Delta Sharing is open but requires both sides to operate Spark; Snowflake-to-Snowflake is zero-config
Actuarial modeling on policy data with strong governance	Mid-cap insurance carrier — claims and underwriting analytics	Row and column masking at policy boundary; auditable lineage; HIPAA-equivalent posture	10-20TB of structured policy data; 50-100 analysts; quarterly state-by-state regulatory submissions	Redshift Lake Formation does fine-grained access but lineage tooling is weaker; Databricks UC is comparable but team is SQL-first not Spark-first
Data product monetization via marketplace	Data vendors publishing live datasets on Snowflake Marketplace	Listing data products as live shares; consumers query without copy; billing routed through Snowflake	Hundreds of consumers per dataset; daily-refreshed; consumption-based revenue model	BQ Analytics Hub exists, smaller marketplace; Databricks Marketplace is younger and Spark-centric
Multi-cloud SQL surface for global org	Global retailer with workloads on AWS + Azure + GCP	Single SQL surface across clouds; account-to-account replication for cross-cloud disaster recovery	3 clouds; 4 regions; petabyte-scale; same dbt project against all environments	BQ is GCP-locked; Redshift is AWS-locked; Synapse is Azure-locked; only Snowflake and Databricks are truly multi-cloud

Databricks

Lakehouse

Use Case	Company / Scenario	Driving Property	Scale Dimension	Why Not Alternative
Petabyte streaming + ML for video QoE	Comcast — video quality-of-experience analytics	One platform for streaming ingest, feature engineering, model training, and serving	Petabytes of telemetry; sub-minute freshness on quality signals; thousands of feature pipelines	Snowflake Snowpark handles ML but Spark heritage gives Databricks the streaming + ML pipeline edge
Feature engineering and ML on customer telemetry	AT&T — telecom customer experience and churn modeling	Notebooks + MLflow + Feature Store + serving in one platform; data scientists ship without DevOps	10s of TBs of customer signals; hundreds of features per model; weekly model refresh cadence	SageMaker on Redshift requires more glue; Vertex on BQ is good but the org is AWS-centric
IoT and ML on industrial sensor data	Shell — upstream and refining sensor analytics	High-volume time-series ingest with Spark structured streaming; OPC-UA and Kafka sources	Millions of sensors; multi-second freshness; multi-decade retention on operational data	Snowflake handles structured streams via Snowpipe Streaming but doesn't match Spark for custom protocol ingest
Genomics and bio-pharma feature engineering	Mid-cap pharma R&D — drug discovery pipelines on variant data	Massive Spark workloads on Parquet/Delta; Python/Scala/R coexistence; HIPAA-aligned	Multi-petabyte variant data; 100s of researchers; multi-cloud per project	BQ scales but SQL-only doesn't match the researcher tooling expectations
Mid-cap fintech ETL + BI	SaaS fintech replacing Hadoop + Hive + Presto stack	One platform that subsumes both data engineering and BI; lift-and-shift from HDFS to Delta is straightforward	100s of TB; dozens of pipelines; 100s of analysts on Tableau/Power BI via DBSQL	Snowflake is a clean alternative for the BI side but doesn't replace the Spark/Hive ETL surface as cleanly

Amazon Redshift

AWS-native MPP

Use Case	Company / Scenario	Driving Property	Scale Dimension	Why Not Alternative
AWS-native fintech operational reporting	Mid-cap fintech with Aurora as operational DB	Zero-ETL from Aurora to Redshift; sub-minute lag on operational data without DMS plumbing	50-100GB CDC volume per day; reporting SLA under 5 minutes from transaction commit	Snowflake requires Fivetran or Kafka Connect; BQ requires Datastream; both add cost + ops
AdTech with Spectrum querying S3 logs	Programmatic ad platform — billing reconciliation across raw event lake + warehouse	Native Redshift queries can join RMS hot data with S3 Iceberg cold data via Spectrum	Hot table 5-10TB; cold S3 lake 500TB+; daily reconciliation joins both	Snowflake external tables work but Spectrum is more deeply tuned for AWS S3; BQ Omni is more limited for cross-cloud S3
Compliance-heavy workloads on GovCloud	Federal contractor — analytics on classified or controlled data	FedRAMP High and GovCloud surface; Redshift is the broadest-deployed managed DW in those zones	Multi-TB; 100s of cleared analysts; auditable end-to-end	Snowflake supports FedRAMP High and GovCloud since 2024 but Redshift has more incumbency in those shops
AWS shop with deep Glue + Lake Formation investment	Org with 50+ Glue jobs, Lake Formation centralized governance, and Athena ad-hoc	Lake Formation permissions follow data into Redshift; no separate IAM model	100s of TB on Lake; dozens of Glue ETLs; Redshift as the curated mart	Cross-platform requires reimplementing Lake Formation in another governance plane
SaaS multi-tenant analytics with data sharing	B2B SaaS isolating tenant data via Redshift data sharing	Producer/consumer clusters share data without copy; per-tenant compute isolation	1000s of tenants; per-tenant 1-100GB; tenant clusters share from central producer	Snowflake reader accounts are an alternative but per-account billing model differs; BQ tenant isolation requires per-project setup

Google BigQuery

Fully serverless DW

Use Case	Company / Scenario	Driving Property	Scale Dimension	Why Not Alternative
Petabyte analytics on listening data	Spotify — analytics on music streaming events	Serverless slot scheduler handles petabyte scans without operator intervention; BQ ML for embeddings	Multi-PB; billions of events daily; nested/repeated schema for event records	Snowflake and Databricks scale similarly but require capacity planning; BQ requires literally none for variable workloads
Ad attribution at scale	Large adtech consumer with massive event volume	Massively parallel scans across petabyte fact tables; nested schema for event hierarchies	Multi-PB daily; 100s of analysts; queries scan TBs each	Snowflake at this scale needs heavy warehouse fleet; cost model becomes harder to predict than BQ slot reservations
GCP-native SaaS ops	Mid-cap SaaS born on GCP; no DBA on the team	Zero operational burden; deep integration with Firestore, Pub/Sub, Dataflow, Cloud Run	10-100TB; 10s of pipelines; small data team	Snowflake on GCP is an option but loses the deep-integration simplicity; Databricks on GCP requires Spark expertise
Geospatial analytics for logistics	Fleet / delivery platform — geospatial queries on routes and locations	Native ST_* GIS functions; BQ GIS is mature and competitive with PostGIS at scale	Billions of geospatial points; daily route analytics; sub-100ms BI Engine queries	Snowflake GIS exists but is less mature; Databricks needs Mosaic or H3 libraries; Redshift GIS works but smaller toolkit
Mobile gaming analytics via Firebase	Mid-cap mobile gaming studio — Firebase Analytics → BQ → ML	Firebase events flow into BQ natively; one-click; daily player segmentation	10s of millions of DAU; billions of events daily; BQ ML for LTV and churn	Other DWs require Fivetran or custom ETL from Firebase; the GCP integration is free

3. Limitations

Per-technology layout. Severity reflects blast radius. Workaround Cost is what you actually pay in money, complexity, or latency to dodge the limit.

Snowflake

Cloud-native DW

Limitation	Severity	Workaround	Workaround Cost
60-second warehouse billing minimum after suspend	Medium	Tune auto-suspend down to 60s; consolidate workloads to keep warehouse warm	Cold-cache penalty on first query; more cross-workload contention
Hybrid Tables limited to ~16K ops/sec/db, single region	High	Use external OLTP (RDS, Aurora, CockroachDB) and replicate to Snowflake	Two-system complexity; CDC pipeline + lag; two-system consistency burden
Externally-managed Iceberg tables ~2x slower than native or managed Iceberg	Medium	Use Snowflake-managed Iceberg or pay closer attention to writer-side Parquet statistics	Loss of multi-engine writer flexibility; or ops effort tuning external writers
Cross-cloud Iceberg data transfer cost	Medium	Co-locate Snowflake region with external volume region	Lose the cross-cloud freedom Iceberg was meant to offer
Snowsight notebook experience is mediocre vs Databricks	Medium	Use external notebooks (Hex, Deepnote, Jupyter) with Snowflake connector	Loss of in-platform governance for notebook artifacts; chargeback fragmentation
Cortex AI per-token pricing is opaque and changes	Medium	Set per-warehouse usage policies and resource monitors; track Cortex spend separately	Resource monitor management overhead; spend caps can block legitimate work
Query result cache TTL is 24h, not configurable	Medium	Build materialized views or persisted tables for repeatedly accessed results	Storage and compute for materialization; staleness management

Databricks

Lakehouse

Limitation	Severity	Workaround	Workaround Cost
Classic cluster startup ~4 minutes	High	Use Serverless SQL warehouses for interactive; keep job clusters for batch only	Serverless tier premium pricing; cost attribution split across cluster types
Photon is a per-DBU surcharge and closed-source	Medium	Use OSS Spark on Delta; tune by hand; accept slower performance	2-3x slower on equivalent workload; vendor exit becomes a rewrite project
Unity Catalog migration from Hive metastore	High	UC-from-day-one on new workspaces; phased migration with shadow catalogs	1-2 engineer-quarters per workspace; legacy notebooks may bypass UC silently
Multi-table transactions require catalog commits per table	Medium	Enable catalog commits on participating tables; test reads from external Delta clients before flipping	One-way migration in practice; external Delta clients may break on new commit format
Lakebase is Beta and single-region (as of Apr 2026)	High	Use managed Postgres (RDS, CloudSQL, Cosmos) for production OLTP; sync to Delta via CDC	Two-platform complexity; lose the unified governance pitch
Cost attribution split between DBU and underlying cloud spend	High	Tag clusters and jobs; join Databricks system tables with cloud billing exports in your BI layer	Build and maintain a chargeback pipeline; finite engineer time
Job clusters can run forever if not bounded	Medium	Set timeout_seconds on every job; set max_concurrent_runs; monitor abandoned clusters	Job design discipline; occasional false-positive timeouts on long-running ML training

Amazon Redshift

AWS-native MPP

Limitation	Severity	Workaround	Workaround Cost
Zero-ETL replicated tables are read-only	Critical	Wrap in views or materialize to a separate writeable table	Dual-table maintenance; rebuilds on schema change; staleness vs source
50 zero-ETL integrations limit per target warehouse	Medium	Consolidate sources; use AWS DMS for the long tail	Operational fragmentation; DMS adds plumbing and cost
Schema changes (add column) can trigger 20-90min table unavailability for zero-ETL	High	Schedule schema changes during maintenance windows; communicate to consumers	Coordination overhead; data availability gaps; not always predictable
Concurrency Scaling free hour is per spawned cluster per day, not per main cluster	Medium	Monitor concurrency_scaling_seconds and gate spend with usage limits	Usage limits can cause query queueing under unexpected bursts
DC2 nodes deprecated; RA3 is the only forward path	Medium	Migrate to RA3 or Serverless; snapshot-and-restore approach minimizes downtime	Migration project effort; cluster sizing rethink; possible app-side compatibility checks
Python UDFs deprecated; new creation blocked; existing UDFs work until June 30, 2026	Medium	Refactor to SQL UDFs or Lambda UDFs	Lambda UDFs add network hop, IAM, and cold-start latency; refactor effort
VACUUM and ANALYZE mostly automatic but can still need manual intervention	Medium	Monitor STV_BLOCKLIST + SVV_TABLE_INFO; manually VACUUM tables with high unsorted region	Periodic ops attention; learning curve on the relevant system tables

Google BigQuery

Fully serverless DW

Limitation	Severity	Workaround	Workaround Cost
On-demand bytes-scanned can blow budget on a single query	Critical	Set maximum_bytes_billed at query level; project-level custom quota; require partitioning on large tables	Custom quota blocks legitimate big queries; query-level limits add friction to ad-hoc analysis
Editions autoscaler 1-minute minimum billing per scale-up	Medium	For short-query-heavy workloads, keep some on-demand; reserve Editions for sustained workloads	Dual cost model; reservation sizing complexity
Slot contention under cross-edition prioritization	High	Upgrade to Enterprise+ baseline for guaranteed capacity in the region	Significantly higher cost; ties workload to predictable region demand
Streaming has its own quotas; legacy tabledata.insertAll deprecated path	Medium	Use Storage Write API (gRPC) for new pipelines; request quota increases for legacy	Migration effort; gRPC tooling is heavier than REST insertAll
No cluster control means no power-user tuning	Medium	Tune at the query level (partitioning, clustering, materialized views); accept the opacity	Performance regressions are harder to root-cause; Google's support is the only escalation
Vendor lock via STRUCT/ARRAY nested schema	Medium	Avoid deep nesting on tables likely to migrate; keep flat shape for portability-critical data	Loss of native semi-structured ergonomics; more joins; larger row sizes
Reservation pricing structure is complex (baseline, autoscale, commitments, edition tiers)	Medium	Build a workload-mapping model; assign projects to right-sized reservations; revisit quarterly	FinOps headcount; tooling investment; ongoing tuning vs other priorities

4. Fault Tolerance

Cross-platform matrix. Compute is ephemeral in all four; storage durability comes from cloud object stores or proprietary equivalents. The differences live in RTO, failover automation, and cross-region failover story.

Dimension	Snowflake	Databricks	Redshift	BigQuery
Replication model	3x sync replication across 3 AZs in the underlying cloud object store; compute is ephemeral and stateless	Underlying cloud storage (S3, ADLS, GCS) provides 3x sync; compute is ephemeral; Delta log is single-writer per table	Redshift Managed Storage 3x sync across 3 AZs (RA3, Serverless); provisioned cluster nodes are warm and stateful	Colossus 3x cross-zone within region; compute is ephemeral slot allocation
Failure detection	Cloud Services layer heartbeats; sub-30s typical	Driver/worker heartbeats via cluster manager; ~30s	Cluster supervisor monitors compute nodes; ~30s for node-level	Borg scheduler; sub-minute slot rebalancing
Failover mechanism	Automatic VM replacement on warehouse failure; query retried transparently	Auto cluster node replacement; in-flight Spark stage retried by driver	Auto node replacement for RA3 worker nodes; leader failure triggers cluster failover; Serverless re-provisions RPU	Transparent — slots reassigned by scheduler; user does not observe failover
RTO (typical)	Under 60s for compute; near-zero for storage-served queries	60-300s depending on cluster type; Serverless sub-second; Classic 4 min cold start	60-120s for RA3 worker node replacement; longer if leader node fails; Serverless seconds	Sub-second to seconds; user-visible RTO is essentially nil
RPO (typical)	0 for committed writes within region; seconds for cross-region Replication Groups	0 for Delta-committed writes; lag for streaming sources; geo-rep is async	0 for committed writes on RA3 RMS; zero-ETL CDC lag is sub-minute typical	0 for completed jobs; streaming buffer at risk until persisted
Split-brain behavior	N/A — Cloud Services layer is quorum-coordinated; single writer per micro-partition	N/A — Delta log uses optimistic concurrency control with single committer per table	N/A — single leader node coordinates writes	N/A — job coordinator pattern; no multi-writer for the same data
Blast radius of single-node failure	Single warehouse query retried; other warehouses on same account are unaffected	Single Spark task retried; ETL job may delay by retry duration	Worker node failure causes shard reassign and node replacement; leader failure impacts entire cluster briefly	Single slot — query may queue or be redistributed; rarely visible to user
Cross-region failover story	Replication Groups + Failover Groups; auto-failover on Business Critical and higher tiers; Client Redirect URL	Delta Sharing + UC replication; workspace failover is manual; geo-replication of catalog and tables	Cross-region snapshot copy with auto-cadence; manual restore for cluster cutover; data sharing across regions	Multi-region datasets (US, EU); Cross-region DR copies; manual cutover for project-level failover
Data loss scenarios	Possible if 2+ AZs fail simultaneously plus replication group lag is non-zero; rare and documented	Uncommitted Delta writes during cluster crash; checkpoint reset on structured streaming if state lost	RMS rare; concurrency scaling cluster transient state may be lost on rapid failover	Rare for completed jobs; streaming inserts in buffer state at risk during zonal failure

5. Sharding

All four hide sharding behind a managed abstraction, but the abstractions leak in different places. The interesting differences are in hot-shard behavior, resharding cost, and how much control the operator has.

Dimension	Snowflake	Databricks	Redshift	BigQuery
Sharding model	Automatic micro-partition (16MB compressed) units; immutable; metadata-driven pruning	Delta file-level (Parquet) + Z-order or Liquid Clustering for sort-based pruning	Hash distribution on DISTKEY column (or AUTO); slice-level data placement on RA3 nodes	Capacitor columnar format auto-managed; user-specified partitioning + clustering on top
Shard key constraints	No required shard key; clustering keys are advisory hints for re-clustering	Z-order: any subset of columns; Liquid Clustering: up to 4 columns, can be changed online	DISTKEY: single column, immutable post-create (ALTER DISTKEY rewrites); AUTO defers decision	Partition: 1 column (time/integer); Cluster: up to 4 columns; both can be changed but require rewrite
Rebalancing mechanism	Automatic re-clustering by background service when clustering key is set	Auto-Optimize + Auto-Compaction or manual OPTIMIZE / VACUUM commands	Auto Table Optimization (RA3) handles sort/dist; manual VACUUM possible	Fully managed by Google; no user action required
Rebalancing cost / impact	Credits charged for clustering operations; can be silent budget burn if not monitored	DBUs charged for OPTIMIZE jobs; visible as a separate job in workspace	Compute slice during VACUUM; minimal on RA3 with auto-optimization	Hidden inside slot accounting; not separately billed
Hot-shard behavior	Micro-partition pruning + automatic clustering; pruning is the safety valve, not hot-shard handling per se	File-level statistics-based pruning; OPTIMIZE rebalances; Liquid Clustering adapts to query patterns	Slice-level hashing — bad DISTKEY choice causes uneven node load; AUTO mitigates but does not eliminate	Hash-distributed at slot scheduler level; well-mitigated by the scheduler
Maximum shards (practical)	Effectively unbounded — billions of micro-partitions per large table	Tens of millions of Parquet files practical; small-file problem mitigated by Auto-Compaction	16-32 nodes typical; up to 128 RA3 nodes; Serverless abstracts node count	N/A — fully managed; users see slot count not shard count
Resharding without downtime?	Yes — fully transparent; clustering key change triggers background re-clustering	Yes — OPTIMIZE and re-cluster are online operations	DISTKEY change requires full table rewrite; ALTER DISTKEY exists but is expensive; resize is online but slow	N/A — Google handles; partition/cluster change requires CTAS or table rewrite
Cross-shard query support	Native — no penalty; planner handles micro-partition fan-out	Native via Spark distributed plan	Yes — distributed plan engine; broadcast joins or redistribute as needed	Yes — slot-based scatter-gather; planner optimizes shuffles

6. Replication

Storage durability is similar across all four — 3x cross-zone in the underlying cloud store. The interesting differences are in cross-region story, consistency level options, and how conflicts are handled under concurrent DML.

Dimension	Snowflake	Databricks	Redshift	BigQuery
Replication topology	Leader-managed per micro-partition over cloud storage; Replication Groups + Failover Groups for cross-region/cloud	Single-writer per Delta table via optimistic concurrency control; geo-replication via UC + Delta Sharing	Leader node coordinates; worker nodes hold local slices; RMS spans AZs in region	Colossus tri-zone; metadata-driven replication; no user-visible topology
Sync vs async	Sync within region; async cross-region via Replication Groups	Sync within region via cloud storage; async geo-replication	Sync within AZ for RMS; async cross-region snapshot copy	Sync within region; async across regions for multi-region datasets
Replication factor (default / max)	3x in cloud storage by default; can replicate to any number of additional Snowflake accounts	Underlying cloud storage 3x; UC tables can replicate to multiple workspaces	RMS 3x; cross-region snapshot is on-demand to N regions	3x default cross-zone; multi-region uses additional zone replicas
Consistency level options	Read-after-write consistency within region; eventual cross-region; snapshot isolation per transaction	Snapshot isolation per Delta version; cross-region is eventual	Read-committed; concurrent serialization for DML	Snapshot isolation at job start; consistent reads within a job; new for 2026 cross-table snapshot reads via session
Replication lag (typical)	Sub-second within region; seconds to minutes cross-region depending on volume	Sub-second commit visibility within region; geo-rep is seconds to minutes	Sub-second within AZ; minutes for cross-region snapshot copy	Sub-second within region; cross-region replication minutes
Conflict resolution	Last-writer-wins on micro-partition; transactional within table	Optimistic concurrency control on Delta log; second writer aborts and retries	MVCC + commit serialization; concurrent DML serialized at commit	Snapshot-based; concurrent DML against same partition retried up to 3x automatically by BQ
Cross-region replication	Replication Groups + Failover Groups; Business Critical tier auto-failover with Client Redirect	Delta Sharing + UC replication; manual workspace setup per region	Cross-region snapshot copy + restore for DR; data sharing across regions	Multi-region datasets (US, EU) plus Cross-region disaster recovery
Replication during partition	Single-region writes continue; cross-region pause until partition heals	Local writes continue; geo-replication pauses; resumes on heal	RMS writes continue if quorum maintained; CDC zero-ETL pauses on source disconnect	Multi-region tolerates zone failure transparently; cross-region failover requires manual cutover

7. Better Usage Patterns

Where PE depth shows up. The patterns most teams discover too late, the anti-patterns that survive review because they look reasonable, and the optimizations that compound at scale.

Snowflake

Cloud-native DW

Pattern	What Most Teams Do Wrong	The Better Way	Why It Matters
Warehouse sizing and auto-suspend	Default 10-minute auto-suspend; one warehouse per team	60-second auto-suspend; one warehouse per workload class (BI, ETL, ad-hoc); resource monitors on every warehouse	Credit burn drops 30-50% on bursty BI workloads with no perf regression
Right-size warehouses per workload	Pick "Medium" for everything; never revisit	X-Small for BI dashboards, Medium for ETL, Large only when concurrency or data volume justifies; profile via QUERY_HISTORY periodically	Larger warehouse is rarely the right answer; usually it's a slow query, not capacity
Use RESULT_SCAN to chain queries	Re-run the same query in a downstream step; pay twice	RESULT_SCAN(LAST_QUERY_ID()) reuses cached result for chained logic in same session	Free compute on chained analyst workflows; result cache TTL is 24h
Streams + Tasks for in-DB CDC and orchestration	External Airflow + Lambda to detect changes and trigger downstream	Snowflake Stream on source table + Task chain consuming the stream	Eliminate orchestration tier for in-Snowflake transformations; reduce ops surface
Iceberg vs native — when to choose	Default to Iceberg "for openness" even when no other engine reads it	Native FDN micro-partitions when Snowflake is sole consumer; managed Iceberg when interop matters; external Iceberg only when you control writers	Performance and cost gap is real; openness without consumers is a tax with no benefit
Search Optimization Service for selective lookups	Apply SOS broadly assuming it's free perf	SOS only on tables where queries are highly selective point-lookups on otherwise-large datasets	SOS has its own credit cost; broad application can exceed query-side savings
Snowpipe Streaming over classic Snowpipe	Default to classic Snowpipe with file-based loads	Snowpipe Streaming for sub-second freshness on streaming sources; classic for batch file-based	10x freshness improvement on streaming workloads; lower per-row ingest cost at sustained throughput

Databricks

Lakehouse

Pattern	What Most Teams Do Wrong	The Better Way	Why It Matters
Cluster type per workload	Use Classic clusters for everything; share interactive clusters across teams	Serverless SQL for BI, Photon Serverless for high-throughput SQL, Job clusters with auto-termination for batch, dedicated cluster pools for ML	Cold-start cost drops materially; chargeback becomes possible; resource isolation between workloads
OPTIMIZE and clustering strategy	Never run OPTIMIZE; suffer small-file problem and slow scans	Auto-Optimize on append-heavy tables; Liquid Clustering on tables with multi-dimensional filter patterns; OPTIMIZE on filter columns explicitly	Scan performance 2-10x on tables that would otherwise have thousands of small files
Streaming ingestion approach	Roll your own Spark Structured Streaming pipeline from scratch	Auto Loader + Delta Live Tables (Lakeflow) for declarative streaming ETL with built-in checkpointing, retries, and lineage	Operational burden drops; DLT handles failure semantics and data quality expectations natively
Delta auto-compaction on append-heavy tables	Append millions of small files; query performance degrades over time silently	SET spark.databricks.delta.autoCompact = true and tune autoCompact.maxFileSize per table shape	Prevents the small-file problem from compounding; query latency stays stable
Unity Catalog from day one	Start on Hive metastore; plan to migrate later; never get to it	UC from the first workspace; migration plan for legacy workspaces with shadow catalogs	Mid-life UC migration is 1-2 engineer-quarters; day-one adoption is free
Photon on vs off	Enable Photon everywhere assuming it's universal speedup	Photon ON for SQL and DataFrame workloads; OFF for UDF-heavy or pandas-heavy code that doesn't vectorize	You pay DBU premium for Photon regardless; you only get the speedup where vectorization applies
Job clusters with auto-termination	Share long-running clusters across teams; never set timeout	Job-specific clusters with timeout_seconds bound to expected runtime; isolated by workload	Prevents abandoned-cluster cost; isolates failures; simplifies chargeback

Amazon Redshift

AWS-native MPP

Pattern	What Most Teams Do Wrong	The Better Way	Why It Matters
RA3 + Concurrency Scaling for spiky workloads	Stay on DC2 because "we're not at the limits yet"	RA3 with Concurrency Scaling enabled and usage limit configured; gives elastic burst without provisioning	Handles BI peak without over-provisioning; DC2 is deprecating anyway
AUTO distribution and sort keys	Hand-pick DISTKEY and SORTKEY on every table based on initial guesses	Let Auto Table Optimization choose for most tables; override only when query patterns are well-understood	Auto adapts as query patterns evolve; manual choices age poorly and require periodic re-tuning
Materialized views with auto-refresh	Re-aggregate the same data in every dashboard query	Materialized views on common aggregations with AUTO REFRESH; queries against MV are 10-100x faster	Compute saved on repeated aggregation queries; analyst experience improves materially
Spectrum for true cold data only	Default cold tier to Spectrum to "save storage cost"	Spectrum for archive (>90d) or for joining with truly external lake data; native RMS for anything queried daily	Spectrum per-byte scan costs add up; native RMS storage is cheap relative to repeated scans
Serverless vs RA3 decision	Use Serverless for everything because "elastic is better"	Serverless for spiky or unpredictable; RA3 for sustained 12+ hours/day or zero-ETL CDC sources	Break-even for Serverless is around 12-16 hours/day of active compute; sustained workloads cost more on Serverless
WLM queues separating ETL from BI	Single default WLM queue for everything	Auto-WLM (default for new clusters) handles most cases; manual WLM with separate ETL and BI queues for shops with predictable workload mix	p99 stays predictable under concurrent load; one runaway query can't starve dashboards
Zero-ETL with downstream materialization	Try to write back to zero-ETL replicated tables; fail with read-only errors	Treat zero-ETL tables as sources only; build cross-database materialized views or stage-to-target tables for transformation	Avoids the read-only surprise; gives you a clean transformation layer

Google BigQuery

Fully serverless DW

Pattern	What Most Teams Do Wrong	The Better Way	Why It Matters
Partition + cluster every wide table	Leave large tables unpartitioned; one analyst scans everything by accident	Require time-based partition + clustering on the top filter columns for every table over 100GB	Bytes scanned drops 10-100x on filtered queries; cost predictability dramatically improves
Editions vs on-demand decision	Pick one and stick with it forever; never revisit	On-demand for spiky exploration projects; Editions baseline + autoscale for predictable production; revisit quarterly as workload shape changes	The break-even moves; 25% on-demand price hike narrows the gap; Editions 1-min minimum eats short-query savings
Storage Write API over Streaming Inserts	Use legacy tabledata.insertAll for new pipelines	Storage Write API (gRPC) with exactly-once semantics; appendRows for high throughput	Exactly-once delivery; higher throughput per quota; legacy API is in maintenance mode
Dry-run cost estimation before big queries	Run queries and discover cost after the fact	--dry_run on every large query in development; preview total_bytes_processed in client tooling	Catches the multi-TB scan before it bills; ten seconds of friction saves hundreds of dollars
Materialized views for repeated aggregations	Re-aggregate in every dashboard tile	Materialized views with incremental refresh on common aggregations; BQ rewrites compatible queries automatically	Order-of-magnitude bytes-scanned reduction; pricing scales accordingly
BigQuery DataFrames for pandas-style at scale	Pull terabytes into Python pandas via the client library; OOM	BigQuery DataFrames pushes the pandas operations down into BQ; data never leaves the warehouse	Scales beyond local memory; respects governance; ML feature engineering stays close to the data
Capacity baseline + autoscale ceiling	Pure autoscale starting from zero; surprised by 1-minute minimum billing	Set a small baseline of committed slots for steady-state; autoscale ceiling for peak; commitments for additional discount	Avoid the 1-minute minimum penalty on every short query; predictable cost floor

8. Advanced / Next-Gen Alternatives

Where the platform may be displaced or augmented. Migration cost reflects the realistic effort, not the vendor's "easy migration" claim. When To Consider is the actual trigger, not the marketing positioning.

Snowflake

Cloud-native DW

Successor / Alternative	What It Improves	Maturity	Migration Cost	When To Consider
Apache Iceberg + DuckDB / Polars / Trino (OSS lakehouse)	Eliminates per-credit pricing; open table format; multi-engine	Emerging	High — different query engine, governance built from parts, no managed service	When credit bill exceeds $1M/yr and team has the engineering bandwidth to operate an OSS stack
Databricks SQL with Photon	Better ML integration; competitive SQL perf; open Delta with Iceberg interop	Production	High — different governance model, dbt-style transformations port but Streams + Tasks do not	When ML workloads grow to dominate the budget; when engineering team prefers Python/Spark over SQL-first
Microsoft Fabric + OneLake	Tighter Power BI integration; OneLake Security (universal ACLs, 2026) closing governance gap	Emerging	Medium — Direct Lake mode reduces data movement; concepts map cleanly from Snowflake	Org standardized on Microsoft stack (Entra ID, Power BI, Office 365) and Fabric pricing math works out

Databricks

Lakehouse

Successor / Alternative	What It Improves	Maturity	Migration Cost	When To Consider
Snowflake with Snowpark + Cortex	SQL-first ergonomics; governance posture; lower ops burden for non-Spark teams	Production	High — rewrite Spark pipelines as Snowpark or pure SQL; cost model is different	When SQL workloads dominate and ML use cases fit Snowpark/Cortex's tabular ML envelope
OSS Spark + Iceberg + Trino/Presto	Full open-source stack; no DBU surcharge; portability across clouds	Emerging	High — operate Spark, manage clusters, build governance from Polaris/Nessie/OpenMetadata	When Photon premium is unjustifiable and team has data platform engineering depth
Microsoft Fabric (Spark + Lakehouse)	OneLake + Synapse Spark; native Power BI; tighter Office 365 integration	Emerging	Medium — Spark on Fabric is real but tooling/MLflow story is younger than Databricks	Microsoft-shop with Power BI as primary BI tool; Office 365 governance posture

Amazon Redshift

AWS-native MPP

Successor / Alternative	What It Improves	Maturity	Migration Cost	When To Consider
Snowflake on AWS	Better elasticity; multi-cloud option; richer SQL surface; stronger governance	Production	Medium — both are SQL warehouses; ELT logic ports cleanly; deep AWS integrations (Lake Formation, Glue) need replanning	When elasticity and cross-cloud strategy matter more than AWS-native integration depth
Databricks SQL on AWS	ML-native; open Delta; Photon performance; Spark for ETL	Production	Medium — keep S3 data layer; replace cluster compute model; UC vs Lake Formation reconciliation	When ML workloads grow and Redshift ML's SageMaker round-trip is the bottleneck
ClickHouse Cloud on AWS	Sub-second analytics on user-facing dashboards; lower latency floor than Redshift	Emerging	Medium — for hot-path workloads only; complement Redshift, don't replace	When Redshift can't meet sub-second p99 on user-facing dashboards and you need a real-time tier

Google BigQuery

Fully serverless DW

Successor / Alternative	What It Improves	Maturity	Migration Cost	When To Consider
Snowflake on GCP	Better governance; cross-cloud portability; more predictable bill	Production	Medium-High — SQL ports cleanly; nested/repeated schemas require flattening; ML pipeline rewrite	When multi-cloud strategy emerges; when on-demand bytes-scanned predictability becomes a finance issue
Databricks on GCP	ML/data-engineering depth; open Delta; multi-cloud workspace	Production	Medium — keep GCS data layer; rewrite Spark pipelines that consumed BQ ML features	When ML workloads grow and BQ ML's tabular envelope is the bottleneck
ClickHouse Cloud + BigLake	Split hot/cold; ClickHouse for sub-second user-facing; BQ for batch + ML	Emerging	Medium — additive, not replacement; new query path for user-facing analytics	When BQ p99 is the user-facing experience bottleneck; high QPS user-facing dashboards

As-of disclaimer. Specifics in this artifact reflect public documentation, vendor blogs, and production reports through May 2026. Hybrid Tables pricing changed March 1, 2026 (per-request billing retired). Iceberg GA in Snowflake October 2025. Multi-table transactions in Databricks Unity Catalog March 2026. Redshift Python UDF creation blocked Patch 198+, existing UDFs supported through June 30 2026. BigQuery on-demand rate rose approximately 25%; DML quotas removed years ago but Streaming has its own quotas. Verify before architectural commitments.

Best default choices

Search and compare

1. Trade-Offs

Snowflake

Databricks

Amazon Redshift

Google BigQuery

2. Use Cases

Snowflake

Databricks

Amazon Redshift

Google BigQuery

3. Limitations

Snowflake

Databricks

Amazon Redshift

Google BigQuery

4. Fault Tolerance

5. Sharding

6. Replication

7. Better Usage Patterns

Snowflake

Databricks

Amazon Redshift

Google BigQuery

8. Advanced / Next-Gen Alternatives

Snowflake

Databricks

Amazon Redshift

Google BigQuery