Cloud Data Warehouse / Lakehouse — PE Trade-Offs

Snowflake, Databricks, Amazon Redshift, Google BigQuery — what to give up and what you get, from the operator's seat.

Cloud DW / Lakehouse PE / Staff+ Depth

As of 2026-05-31

PE Verdict

In 2026, the DW/Lakehouse choice has collapsed to one decisive question: where does your data already live, and what is the shape of your AI/ML workload. Snowflake wins on SQL ergonomics, cross-org data sharing, and governance posture. Databricks wins on ML, open formats, and engineering teams that want one engine for ETL+BI+ML. BigQuery wins on serverless simplicity when your bytes-scanned math holds and your team is GCP-native. Redshift rarely wins greenfield, but the AWS Zero-ETL gravity and FedRAMP/GovCloud surface keep it the default for AWS-locked compliance shops. The differences narrow every quarter (Iceberg everywhere, in-DW AI everywhere, serverless everywhere), so the decision is increasingly about cost model + ecosystem gravity, not raw capability.

Best default choices

1. Trade-Offs

One row per distinct give-up-X-to-get-Y. Per-technology because the trade space differs. PE Nuance column is the insight most engineers miss until they hit it in production.

Snowflake

Cloud-native DW
Trade-OffWhat You GainWhat You Give UpWhen It Bites YouPE Nuance
Separation of storage and compute via virtual warehousesWorkload isolation: BI dashboards never contend with overnight ETLPer-warehouse cost stacking; 5 warehouses with auto-suspend off = 5x credit burnTeam copies the "create a dedicated warehouse" pattern across every workload, hits month-end with $80K in warehouses idling at 90% of their billable minuteThe right granularity is one warehouse per workload class (BI, ETL, ad-hoc), not per team or per pipeline. Most teams over-fragment.
Credit-based pricing with auto-suspendPay-per-second after the 60s minimum; predictable line-item billing60s minimum billing per warehouse resumption; warm-warehouse heuristic discipline requiredDashboard hits warehouse at minute 59, suspends at 60, refresh hits 1s later, you billed two full minutes for ~2s of workDefault auto-suspend is 600s (10 min). Drop to 60s and the math changes materially on bursty BI. The trade is cold-cache penalty on next query.
Iceberg tables GA (October 2025) for open formatOpen table format, multi-engine read, narrows lock-in vs proprietary FDN micro-partitionsExternally-written Parquet without full statistics loses approx 2x perf vs Snowflake-managed IcebergLakehouse migration team writes Iceberg from Spark, expects parity with native tables, finds 2x slower scans because stats coverage is incompleteSnowflake-managed Iceberg gets the same caching as native tables. Externally-managed Iceberg pays a real performance tax even with the optimized Parquet scanner.
Hybrid Tables for OLTP+OLAP unificationOperational and analytical on one platform; eliminates write-back ETLApprox 16K ops/sec/db ceiling; single-region only; FK constraints enforced at writeTeam picks Hybrid Tables for a product analytics use case at 50K writes/sec sustained and hits throttling at quarter-end Black FridayMarch 2026 pricing change dropped per-request billing; storage + warehouse compute only now. Cost calculator from 2025 is obsolete.
Snowpark + Cortex AI in-platformNo data movement for ML/LLM workloads; governance and access controls follow the dataCortex pricing is opaque per-token; LLM inference on Cortex runs roughly 2-3x equivalent dedicated GPU infraCortex bill compounds quietly; finance asks "why is our Snowflake bill up 40% MoM" and the answer is "GenAI prototyping"The $200M OpenAI partnership (Feb 2026) and Snowflake Intelligence position the platform for NL-to-SQL, but it deepens the ecosystem moat. Plan exit costs early.
Time Travel + Fail-safePoint-in-time recovery to 90 days plus 7-day fail-safe; trivial rollbackStorage cost is silently 1.07-2x apparent table size on high-churn tables due to retained versionsYou analyze storage cost by SELECT FROM TABLE_STORAGE_METRICS and find ACTIVE bytes are 30% of total billable bytes; the rest is time travel retentionDATA_RETENTION_TIME_IN_DAYS on heavy-update tables is a credit lever most teams never touch. For staging tables, set it to 1.
Automatic clustering on micro-partitionsPruning without explicit index management; clustering keys are advisory hintsRe-clustering burns credits silently; no upfront tuning knobs for unusual access patternsCluster key change kicks off background re-clustering on a 10TB table, you wake to a $4K surprise on AUTOMATIC_CLUSTERING_HISTORYAlways set the cluster key BEFORE bulk load. Re-clustering an existing large table is expensive; loading into a clustered empty table is free.
Multi-cluster warehouses for concurrencyAuto-add clusters under queue depth; up to 10 clustersEach added cluster is a credit-burning entity; auto-scaling can stack silentlyBI dashboard refresh storm fans out across 10 clusters at peak, bill spikes 10x for one hour; you didn't set a max cluster countMIN_CLUSTER_COUNT = 1, MAX_CLUSTER_COUNT = 3, SCALING_POLICY = STANDARD. Economy scaling is the trap default for cost-sensitive workloads.

Databricks

Lakehouse
Trade-OffWhat You GainWhat You Give UpWhen It Bites YouPE Nuance
Photon vectorized engine on DeltaUp to 12x price/perf vs traditional DW per Databricks benchmarks on SQL/DataFrame workloadsPhoton is a per-DBU surcharge; closed-source; UDF-heavy workloads see partial benefitYou enable Photon on a UDF-heavy ETL job expecting speedup, see 1.2x for 2x the DBU; Photon falls back outside its vectorized pathPhoton excels on aggregations, joins, and Spark SQL. Pandas UDFs and Python UDFs do not vectorize the same way. Profile before flipping the flag globally.
Open Delta Lake formatOpen-source table format; Iceberg interop; lakehouse pattern with no Parquet vendor taxVendor lock on the Databricks Runtime + Photon combination; OSS Spark on Delta is materially slowerYou plan a "we can leave anytime, data is in S3" exit and find your Photon-tuned pipelines run 3x slower on OSS Spark elsewhereDelta is open. The optimization gap to OSS is the moat. Plan the exit assuming you run on Photon, and budget the rewrite if you migrate.
Spark + DBSQL unified computeOne engine for ETL, BI, and ML; no separate query layer to operateClassic cluster startup approx 4 minutes; cold-start latency hurts ad-hoc BIAnalyst opens dashboard at 9am, waits 4 minutes for cluster startup before first query runs; trust in the tool erodesServerless SQL warehouses fix cold start (sub-second) at premium pricing. The IWM (Intelligent Workload Management) ML autoscaler is Serverless-only — Pro/Classic still use reactive cluster autoscaling.
Unity Catalog unified governanceSingle governance plane for tables, ML models, files, dashboards, AI assets, and LakebaseUC enforcement adds a metadata roundtrip on every query; migration from Hive metastore is multi-quarter workMid-migration, you discover legacy notebooks bypass UC; permissions are inconsistent across catalogs and the audit team is unhappyUC-from-day-one is the path of least pain. Mid-life migration costs 1-2 engineer-quarters per workspace and rarely catches all bypasses.
Lakebase Postgres OLTP integrated with lakehouseOperational Postgres next to analytics; one-click sync from Delta tables; UC-governedSingle-region (as of Apr 2026); CMK still rolling out; two-system complexity persistsTeam builds a real-time app on Lakebase expecting global multi-region failover, discovers it's not GA for cross-region yetLakebase competes with Snowflake Hybrid Tables on the same OLTP+OLAP unification thesis. Both are early. Pick based on which platform your team already runs day-to-day, not which is technically superior.
Intelligent Workload Management (IWM) on Serverless SQLML-predicted compute allocation per query; near real-time elasticityServerless tier only; Pro and Classic warehouses use static-threshold autoscalingYou compare Pro and Serverless costs on identical workload, find Serverless wins on throughput but loses on simple TCO math at low concurrencyIWM works best when concurrency is bursty. For steady-state ETL with predictable shape, Pro or Classic + manual sizing wins on DBU/query.
DBU + cloud-vendor compute stackingYou see DBU consumption per workload in Databricks; cloud vendor still bills VMs separatelyCost attribution is harder than Snowflake's single-line bill; chargeback requires joining DBU usage with EC2/ADV/GCE billingFinOps team asks for monthly cost per team; you spend two weeks building a join across Databricks usage tables and AWS Cost Explorer to answer itPhoton Serverless hides the underlying VM cost inside DBU pricing — easier attribution but loses the visibility for capacity planning.
Multi-table transactions (March 2026)ACID across multiple Delta tables; success or rollback as a unitRequires catalog commits enabled per participating table; UC-managed Delta tables onlyYou design a financial reconciliation pipeline assuming multi-table atomicity, find one of your tables is still on Hive metastore and the transaction is silently single-tableEnabling catalog commits is a one-way migration in practice. Test reads from external Delta clients (Trino, Spark OSS) before flipping it on production tables.

Amazon Redshift

AWS-native MPP
Trade-OffWhat You GainWhat You Give UpWhen It Bites YouPE Nuance
RA3 nodes separate compute from RMSIndependent scale of compute and Redshift Managed Storage; data sharing across clustersStill pick node type and count manually; less elastic than Snowflake's warehouse modelWorkload doubles overnight, you need to resize the cluster, manual ALTER CLUSTER takes 30+ minutes and queries throttle during transitionDC2 nodes are deprecated; RA3 is the only modern path. If you're still on DC2, plan migration before the deprecation date, not after.
Concurrency Scaling transient clustersAuto-add transient clusters under queue depth; first hour per cluster per day is freeCharged per-second after free hour; concurrency limit 50 per main clusterBI workload bursts past 50 concurrent and queries queue; you turn on concurrency scaling, blow through free hour by mid-morning, surprise bill next dayThe free-hour-per-cluster-per-day math is per cluster spawned, not per main cluster. High burst workloads can rack up multiple free hours daily.
Zero-ETL from Aurora, RDS, DynamoDBNear-real-time CDC replication; no DMS plumbing; sub-minute lag typicalReplicated tables are read-only; cannot run CTAS or INSERT against themdbt model tries to materialize off a zero-ETL table, fails with "Operation is not supported for the database from INTEGRATION"; you rebuild as views or stage-and-copyContinuous CDC prevents Serverless auto-pause. Teams report 2-3x RPU bills on Serverless under zero-ETL. Either use RA3 provisioned or set strict Max RPU-Hours.
Deep AWS-native integrationIAM, S3, KMS, Lake Formation, Glue, EventBridge all first-class; FedRAMP and GovCloud surfaceCross-cloud is rare and painful; lock-in to AWS data plane is the deepest of the fourOrg-wide multi-cloud strategy is set; Redshift can't follow workloads to Azure or GCP without a full re-platformFor AWS-only shops, the integration depth pays back as days saved on every adjacent project. For multi-cloud, it's the wrong default.
Spectrum for S3-resident parquet/IcebergQuery external data without loading; lakehouse-style separation at the engine levelSlower than native RMS; pays per byte scanned similar to BigQuery on-demandCost-conscious team puts cold data on Spectrum, queries it daily for a year, Spectrum scan costs exceed what RMS would have costSpectrum is an extension, not a replacement for RMS. Use it for true cold archive or for joining lake data to warehouse data, not as a default tier.
Redshift Serverless RPUs scale by loadNo cluster to size; auto-pause when idle; sub-minute scale-up24/7 workloads keep cluster perpetually warm; Serverless can cost more than RA3 at sustained loadTeam picks Serverless for "elasticity," then runs zero-ETL into it 24/7; Serverless never auto-pauses, monthly bill is 1.8x equivalent RA3The break-even for Serverless vs RA3 is roughly 12-16 hours/day of active compute. Sustained workloads belong on provisioned.
Python UDF deprecation (Patch 198+, support ends June 30 2026)N/A — this is a deprecation, no gainNo new Python UDFs; existing UDFs work until cutoff; refactor to Lambda UDFs or SQL UDFsPipeline depends on a Python UDF for a custom hash, you discover the deprecation 2 weeks before cutoff while planning a releaseSignal that AWS is consolidating compute primitives. Lambda UDFs are the migration target but add network hop and IAM complexity. Inventory Python UDFs across all clusters now.
WLM queue-based workload managementPredictable resource carving between ETL and BI workloadsStatic queue config; auto-WLM tuning lags real-world adaptationYou add a new high-priority dashboard, it runs in the wrong queue, queue saturates, p99 jumps 5x for that workload classAuto-WLM (default in modern clusters) is the right starting point. Manual WLM is for shops that already have a tuned config and a person who owns it.

Google BigQuery

Fully serverless DW
Trade-OffWhat You GainWhat You Give UpWhen It Bites YouPE Nuance
Fully serverless slot-based architectureNo clusters, no warehouses, no nodes; zero opsSlot contention under cross-edition prioritization; Standard and on-demand lose to Enterprise+ baselines under regional loadRegion-wide demand spike during quarter-end; your on-demand workload queries take 4x longer because Enterprise Plus baselines get priorityBigQuery prioritization order: Ent+/Ent baselines, then Ent+ autoscale, then Ent autoscale, then Standard and on-demand. If you're cost-sensitive, you're at the back of the queue under contention.
On-demand bytes-scanned pricingZero idle cost; perfect for spiky workloads; nothing to provisionA single misconfigured query can scan 100TB and bill $625; no kill switch by default; on-demand rates rose approximately 25% in recent revisionsAnalyst writes SELECT * from unpartitioned 50TB events table; bill arrives Monday; project is now over budget for the yearmaximum_bytes_billed at query level is the kill switch most teams never set. Set it at the project level via custom quota for true protection.
Editions with autoscaling slots (Std/Ent/Ent+)Predictable cost per slot-hour; commitments discount up to 40%; baseline + autoscale shape matches most workloads1-minute minimum due to autoscaler 1-minute scale-down; a 10-second query bills a full minute of slotsMany small queries hitting Editions reservation; you expect proportional billing but actual cost is 6x because every short query rounds up to a full minuteFor short-query-heavy workloads, on-demand often beats Editions despite the 25% on-demand price hike. The break-even is around 300-500 TiB monthly processing.
BigLake + Iceberg external tablesLakehouse pattern; query GCS Parquet/Iceberg without loading; federated queries to other clouds via OmniExternal tables are slower than native; metadata refresh patterns are version-dependent; not all features applyYou enable BI Engine acceleration assuming it covers BigLake tables, find it doesn't, dashboard p99 is 4x what you projectedBigLake metadata caching narrows the gap but doesn't close it. For high-QPS BI on lake data, materialize into native BQ tables.
BigQuery ML in-place SQL ML/AITrain and predict in SQL; Vertex AI integration; Gemini in BQ for NL-to-SQL and code-genGood for tabular ML; not competitive for deep learning vs Databricks/SageMaker; pricing per model invocationTeam picks BQ ML for a recommendation model expecting feature parity with Vertex, hits limits on custom architectures and migrates mid-projectBQ ML excels at classical models (linear, XGBoost, ARIMA, k-means) plus pre-trained model invocation. For training transformer-scale models, send the data to Vertex.
Nested and repeated schema (STRUCT, ARRAY)Idiomatic semi-structured data; no JSON parse overhead; powerful UNNEST queriesSchema lock-in; nested/repeated layout doesn't port cleanly to Snowflake/Redshift/Databricks without flatteningTwo-year migration to another DW; your "data is just SQL, we can move it" plan stalls on rewriting hundreds of UNNEST patternsUse STRUCT/ARRAY when the access pattern is "query the whole record" or "fan out on a sub-array." Avoid them for fields you'll join on; they kill join planning.
Storage Write API for streaminggRPC, exactly-once delivery, high throughput, no DML quota impactSeparate quota; different mental model than DML; legacy streaming API is in maintenanceTeam builds new pipeline on legacy tabledata.insertAll, runs into the per-table 1MB/sec quota at production scaleDML quotas were largely removed years ago; the surprise is now on streaming quotas. For new pipelines, default to Storage Write API.
Opaque slot schedulerNo tuning burden; Google's scheduler handles everythingNo power-user knobs; query performance regressions are hard to diagnosep99 regresses 30% one Tuesday; INFORMATION_SCHEMA.JOBS shows nothing actionable; you file a support ticket and waitFor deep performance work, BigQuery is the most opaque of the four. Snowflake and Databricks both expose more of the engine. Plan for less control.

2. Use Cases

Concrete workloads with the driving property that ruled out the obvious alternative. Scale dimension is the number that mattered for the architecture, not the marketing number.

Snowflake

Cloud-native DW
Use CaseCompany / ScenarioDriving PropertyScale DimensionWhy Not Alternative
Cross-business-unit secure data sharingCapital One — modernizing data sharing across orgs and external partnersDefine-once policies (row, column, masking) that travel with the data, no copy requiredHundreds of consumers; PB-scale shared datasets; SOX + financial regulationRedshift Data Sharing exists but lacks the centralized policy layer; BQ Authorized Views require copy-and-replicate posture
Content analytics with partner studio sharingDisney Streaming — viewership across services and partner studiosCross-account live data sharing without ETL pipes between Snowflake accounts10s of partner accounts; titles in the millions; daily refresh on engagement signalsDatabricks Delta Sharing is open but requires both sides to operate Spark; Snowflake-to-Snowflake is zero-config
Actuarial modeling on policy data with strong governanceMid-cap insurance carrier — claims and underwriting analyticsRow and column masking at policy boundary; auditable lineage; HIPAA-equivalent posture10-20TB of structured policy data; 50-100 analysts; quarterly state-by-state regulatory submissionsRedshift Lake Formation does fine-grained access but lineage tooling is weaker; Databricks UC is comparable but team is SQL-first not Spark-first
Data product monetization via marketplaceData vendors publishing live datasets on Snowflake MarketplaceListing data products as live shares; consumers query without copy; billing routed through SnowflakeHundreds of consumers per dataset; daily-refreshed; consumption-based revenue modelBQ Analytics Hub exists, smaller marketplace; Databricks Marketplace is younger and Spark-centric
Multi-cloud SQL surface for global orgGlobal retailer with workloads on AWS + Azure + GCPSingle SQL surface across clouds; account-to-account replication for cross-cloud disaster recovery3 clouds; 4 regions; petabyte-scale; same dbt project against all environmentsBQ is GCP-locked; Redshift is AWS-locked; Synapse is Azure-locked; only Snowflake and Databricks are truly multi-cloud

Databricks

Lakehouse
Use CaseCompany / ScenarioDriving PropertyScale DimensionWhy Not Alternative
Petabyte streaming + ML for video QoEComcast — video quality-of-experience analyticsOne platform for streaming ingest, feature engineering, model training, and servingPetabytes of telemetry; sub-minute freshness on quality signals; thousands of feature pipelinesSnowflake Snowpark handles ML but Spark heritage gives Databricks the streaming + ML pipeline edge
Feature engineering and ML on customer telemetryAT&T — telecom customer experience and churn modelingNotebooks + MLflow + Feature Store + serving in one platform; data scientists ship without DevOps10s of TBs of customer signals; hundreds of features per model; weekly model refresh cadenceSageMaker on Redshift requires more glue; Vertex on BQ is good but the org is AWS-centric
IoT and ML on industrial sensor dataShell — upstream and refining sensor analyticsHigh-volume time-series ingest with Spark structured streaming; OPC-UA and Kafka sourcesMillions of sensors; multi-second freshness; multi-decade retention on operational dataSnowflake handles structured streams via Snowpipe Streaming but doesn't match Spark for custom protocol ingest
Genomics and bio-pharma feature engineeringMid-cap pharma R&D — drug discovery pipelines on variant dataMassive Spark workloads on Parquet/Delta; Python/Scala/R coexistence; HIPAA-alignedMulti-petabyte variant data; 100s of researchers; multi-cloud per projectBQ scales but SQL-only doesn't match the researcher tooling expectations
Mid-cap fintech ETL + BISaaS fintech replacing Hadoop + Hive + Presto stackOne platform that subsumes both data engineering and BI; lift-and-shift from HDFS to Delta is straightforward100s of TB; dozens of pipelines; 100s of analysts on Tableau/Power BI via DBSQLSnowflake is a clean alternative for the BI side but doesn't replace the Spark/Hive ETL surface as cleanly

Amazon Redshift

AWS-native MPP
Use CaseCompany / ScenarioDriving PropertyScale DimensionWhy Not Alternative
AWS-native fintech operational reportingMid-cap fintech with Aurora as operational DBZero-ETL from Aurora to Redshift; sub-minute lag on operational data without DMS plumbing50-100GB CDC volume per day; reporting SLA under 5 minutes from transaction commitSnowflake requires Fivetran or Kafka Connect; BQ requires Datastream; both add cost + ops
AdTech with Spectrum querying S3 logsProgrammatic ad platform — billing reconciliation across raw event lake + warehouseNative Redshift queries can join RMS hot data with S3 Iceberg cold data via SpectrumHot table 5-10TB; cold S3 lake 500TB+; daily reconciliation joins bothSnowflake external tables work but Spectrum is more deeply tuned for AWS S3; BQ Omni is more limited for cross-cloud S3
Compliance-heavy workloads on GovCloudFederal contractor — analytics on classified or controlled dataFedRAMP High and GovCloud surface; Redshift is the broadest-deployed managed DW in those zonesMulti-TB; 100s of cleared analysts; auditable end-to-endSnowflake supports FedRAMP High and GovCloud since 2024 but Redshift has more incumbency in those shops
AWS shop with deep Glue + Lake Formation investmentOrg with 50+ Glue jobs, Lake Formation centralized governance, and Athena ad-hocLake Formation permissions follow data into Redshift; no separate IAM model100s of TB on Lake; dozens of Glue ETLs; Redshift as the curated martCross-platform requires reimplementing Lake Formation in another governance plane
SaaS multi-tenant analytics with data sharingB2B SaaS isolating tenant data via Redshift data sharingProducer/consumer clusters share data without copy; per-tenant compute isolation1000s of tenants; per-tenant 1-100GB; tenant clusters share from central producerSnowflake reader accounts are an alternative but per-account billing model differs; BQ tenant isolation requires per-project setup

Google BigQuery

Fully serverless DW
Use CaseCompany / ScenarioDriving PropertyScale DimensionWhy Not Alternative
Petabyte analytics on listening dataSpotify — analytics on music streaming eventsServerless slot scheduler handles petabyte scans without operator intervention; BQ ML for embeddingsMulti-PB; billions of events daily; nested/repeated schema for event recordsSnowflake and Databricks scale similarly but require capacity planning; BQ requires literally none for variable workloads
Ad attribution at scaleLarge adtech consumer with massive event volumeMassively parallel scans across petabyte fact tables; nested schema for event hierarchiesMulti-PB daily; 100s of analysts; queries scan TBs eachSnowflake at this scale needs heavy warehouse fleet; cost model becomes harder to predict than BQ slot reservations
GCP-native SaaS opsMid-cap SaaS born on GCP; no DBA on the teamZero operational burden; deep integration with Firestore, Pub/Sub, Dataflow, Cloud Run10-100TB; 10s of pipelines; small data teamSnowflake on GCP is an option but loses the deep-integration simplicity; Databricks on GCP requires Spark expertise
Geospatial analytics for logisticsFleet / delivery platform — geospatial queries on routes and locationsNative ST_* GIS functions; BQ GIS is mature and competitive with PostGIS at scaleBillions of geospatial points; daily route analytics; sub-100ms BI Engine queriesSnowflake GIS exists but is less mature; Databricks needs Mosaic or H3 libraries; Redshift GIS works but smaller toolkit
Mobile gaming analytics via FirebaseMid-cap mobile gaming studio — Firebase Analytics → BQ → MLFirebase events flow into BQ natively; one-click; daily player segmentation10s of millions of DAU; billions of events daily; BQ ML for LTV and churnOther DWs require Fivetran or custom ETL from Firebase; the GCP integration is free

3. Limitations

Per-technology layout. Severity reflects blast radius. Workaround Cost is what you actually pay in money, complexity, or latency to dodge the limit.

Snowflake

Cloud-native DW
LimitationSeverityWorkaroundWorkaround Cost
60-second warehouse billing minimum after suspendMediumTune auto-suspend down to 60s; consolidate workloads to keep warehouse warmCold-cache penalty on first query; more cross-workload contention
Hybrid Tables limited to ~16K ops/sec/db, single regionHighUse external OLTP (RDS, Aurora, CockroachDB) and replicate to SnowflakeTwo-system complexity; CDC pipeline + lag; two-system consistency burden
Externally-managed Iceberg tables ~2x slower than native or managed IcebergMediumUse Snowflake-managed Iceberg or pay closer attention to writer-side Parquet statisticsLoss of multi-engine writer flexibility; or ops effort tuning external writers
Cross-cloud Iceberg data transfer costMediumCo-locate Snowflake region with external volume regionLose the cross-cloud freedom Iceberg was meant to offer
Snowsight notebook experience is mediocre vs DatabricksMediumUse external notebooks (Hex, Deepnote, Jupyter) with Snowflake connectorLoss of in-platform governance for notebook artifacts; chargeback fragmentation
Cortex AI per-token pricing is opaque and changesMediumSet per-warehouse usage policies and resource monitors; track Cortex spend separatelyResource monitor management overhead; spend caps can block legitimate work
Query result cache TTL is 24h, not configurableMediumBuild materialized views or persisted tables for repeatedly accessed resultsStorage and compute for materialization; staleness management

Databricks

Lakehouse
LimitationSeverityWorkaroundWorkaround Cost
Classic cluster startup ~4 minutesHighUse Serverless SQL warehouses for interactive; keep job clusters for batch onlyServerless tier premium pricing; cost attribution split across cluster types
Photon is a per-DBU surcharge and closed-sourceMediumUse OSS Spark on Delta; tune by hand; accept slower performance2-3x slower on equivalent workload; vendor exit becomes a rewrite project
Unity Catalog migration from Hive metastoreHighUC-from-day-one on new workspaces; phased migration with shadow catalogs1-2 engineer-quarters per workspace; legacy notebooks may bypass UC silently
Multi-table transactions require catalog commits per tableMediumEnable catalog commits on participating tables; test reads from external Delta clients before flippingOne-way migration in practice; external Delta clients may break on new commit format
Lakebase is Beta and single-region (as of Apr 2026)HighUse managed Postgres (RDS, CloudSQL, Cosmos) for production OLTP; sync to Delta via CDCTwo-platform complexity; lose the unified governance pitch
Cost attribution split between DBU and underlying cloud spendHighTag clusters and jobs; join Databricks system tables with cloud billing exports in your BI layerBuild and maintain a chargeback pipeline; finite engineer time
Job clusters can run forever if not boundedMediumSet timeout_seconds on every job; set max_concurrent_runs; monitor abandoned clustersJob design discipline; occasional false-positive timeouts on long-running ML training

Amazon Redshift

AWS-native MPP
LimitationSeverityWorkaroundWorkaround Cost
Zero-ETL replicated tables are read-onlyCriticalWrap in views or materialize to a separate writeable tableDual-table maintenance; rebuilds on schema change; staleness vs source
50 zero-ETL integrations limit per target warehouseMediumConsolidate sources; use AWS DMS for the long tailOperational fragmentation; DMS adds plumbing and cost
Schema changes (add column) can trigger 20-90min table unavailability for zero-ETLHighSchedule schema changes during maintenance windows; communicate to consumersCoordination overhead; data availability gaps; not always predictable
Concurrency Scaling free hour is per spawned cluster per day, not per main clusterMediumMonitor concurrency_scaling_seconds and gate spend with usage limitsUsage limits can cause query queueing under unexpected bursts
DC2 nodes deprecated; RA3 is the only forward pathMediumMigrate to RA3 or Serverless; snapshot-and-restore approach minimizes downtimeMigration project effort; cluster sizing rethink; possible app-side compatibility checks
Python UDFs deprecated; new creation blocked; existing UDFs work until June 30, 2026MediumRefactor to SQL UDFs or Lambda UDFsLambda UDFs add network hop, IAM, and cold-start latency; refactor effort
VACUUM and ANALYZE mostly automatic but can still need manual interventionMediumMonitor STV_BLOCKLIST + SVV_TABLE_INFO; manually VACUUM tables with high unsorted regionPeriodic ops attention; learning curve on the relevant system tables

Google BigQuery

Fully serverless DW
LimitationSeverityWorkaroundWorkaround Cost
On-demand bytes-scanned can blow budget on a single queryCriticalSet maximum_bytes_billed at query level; project-level custom quota; require partitioning on large tablesCustom quota blocks legitimate big queries; query-level limits add friction to ad-hoc analysis
Editions autoscaler 1-minute minimum billing per scale-upMediumFor short-query-heavy workloads, keep some on-demand; reserve Editions for sustained workloadsDual cost model; reservation sizing complexity
Slot contention under cross-edition prioritizationHighUpgrade to Enterprise+ baseline for guaranteed capacity in the regionSignificantly higher cost; ties workload to predictable region demand
Streaming has its own quotas; legacy tabledata.insertAll deprecated pathMediumUse Storage Write API (gRPC) for new pipelines; request quota increases for legacyMigration effort; gRPC tooling is heavier than REST insertAll
No cluster control means no power-user tuningMediumTune at the query level (partitioning, clustering, materialized views); accept the opacityPerformance regressions are harder to root-cause; Google's support is the only escalation
Vendor lock via STRUCT/ARRAY nested schemaMediumAvoid deep nesting on tables likely to migrate; keep flat shape for portability-critical dataLoss of native semi-structured ergonomics; more joins; larger row sizes
Reservation pricing structure is complex (baseline, autoscale, commitments, edition tiers)MediumBuild a workload-mapping model; assign projects to right-sized reservations; revisit quarterlyFinOps headcount; tooling investment; ongoing tuning vs other priorities

4. Fault Tolerance

Cross-platform matrix. Compute is ephemeral in all four; storage durability comes from cloud object stores or proprietary equivalents. The differences live in RTO, failover automation, and cross-region failover story.

Dimension Snowflake Databricks Redshift BigQuery
Replication model 3x sync replication across 3 AZs in the underlying cloud object store; compute is ephemeral and stateless Underlying cloud storage (S3, ADLS, GCS) provides 3x sync; compute is ephemeral; Delta log is single-writer per table Redshift Managed Storage 3x sync across 3 AZs (RA3, Serverless); provisioned cluster nodes are warm and stateful Colossus 3x cross-zone within region; compute is ephemeral slot allocation
Failure detection Cloud Services layer heartbeats; sub-30s typical Driver/worker heartbeats via cluster manager; ~30s Cluster supervisor monitors compute nodes; ~30s for node-level Borg scheduler; sub-minute slot rebalancing
Failover mechanism Automatic VM replacement on warehouse failure; query retried transparently Auto cluster node replacement; in-flight Spark stage retried by driver Auto node replacement for RA3 worker nodes; leader failure triggers cluster failover; Serverless re-provisions RPU Transparent — slots reassigned by scheduler; user does not observe failover
RTO (typical) Under 60s for compute; near-zero for storage-served queries 60-300s depending on cluster type; Serverless sub-second; Classic 4 min cold start 60-120s for RA3 worker node replacement; longer if leader node fails; Serverless seconds Sub-second to seconds; user-visible RTO is essentially nil
RPO (typical) 0 for committed writes within region; seconds for cross-region Replication Groups 0 for Delta-committed writes; lag for streaming sources; geo-rep is async 0 for committed writes on RA3 RMS; zero-ETL CDC lag is sub-minute typical 0 for completed jobs; streaming buffer at risk until persisted
Split-brain behavior N/A — Cloud Services layer is quorum-coordinated; single writer per micro-partition N/A — Delta log uses optimistic concurrency control with single committer per table N/A — single leader node coordinates writes N/A — job coordinator pattern; no multi-writer for the same data
Blast radius of single-node failure Single warehouse query retried; other warehouses on same account are unaffected Single Spark task retried; ETL job may delay by retry duration Worker node failure causes shard reassign and node replacement; leader failure impacts entire cluster briefly Single slot — query may queue or be redistributed; rarely visible to user
Cross-region failover story Replication Groups + Failover Groups; auto-failover on Business Critical and higher tiers; Client Redirect URL Delta Sharing + UC replication; workspace failover is manual; geo-replication of catalog and tables Cross-region snapshot copy with auto-cadence; manual restore for cluster cutover; data sharing across regions Multi-region datasets (US, EU); Cross-region DR copies; manual cutover for project-level failover
Data loss scenarios Possible if 2+ AZs fail simultaneously plus replication group lag is non-zero; rare and documented Uncommitted Delta writes during cluster crash; checkpoint reset on structured streaming if state lost RMS rare; concurrency scaling cluster transient state may be lost on rapid failover Rare for completed jobs; streaming inserts in buffer state at risk during zonal failure

5. Sharding

All four hide sharding behind a managed abstraction, but the abstractions leak in different places. The interesting differences are in hot-shard behavior, resharding cost, and how much control the operator has.

Dimension Snowflake Databricks Redshift BigQuery
Sharding model Automatic micro-partition (16MB compressed) units; immutable; metadata-driven pruning Delta file-level (Parquet) + Z-order or Liquid Clustering for sort-based pruning Hash distribution on DISTKEY column (or AUTO); slice-level data placement on RA3 nodes Capacitor columnar format auto-managed; user-specified partitioning + clustering on top
Shard key constraints No required shard key; clustering keys are advisory hints for re-clustering Z-order: any subset of columns; Liquid Clustering: up to 4 columns, can be changed online DISTKEY: single column, immutable post-create (ALTER DISTKEY rewrites); AUTO defers decision Partition: 1 column (time/integer); Cluster: up to 4 columns; both can be changed but require rewrite
Rebalancing mechanism Automatic re-clustering by background service when clustering key is set Auto-Optimize + Auto-Compaction or manual OPTIMIZE / VACUUM commands Auto Table Optimization (RA3) handles sort/dist; manual VACUUM possible Fully managed by Google; no user action required
Rebalancing cost / impact Credits charged for clustering operations; can be silent budget burn if not monitored DBUs charged for OPTIMIZE jobs; visible as a separate job in workspace Compute slice during VACUUM; minimal on RA3 with auto-optimization Hidden inside slot accounting; not separately billed
Hot-shard behavior Micro-partition pruning + automatic clustering; pruning is the safety valve, not hot-shard handling per se File-level statistics-based pruning; OPTIMIZE rebalances; Liquid Clustering adapts to query patterns Slice-level hashing — bad DISTKEY choice causes uneven node load; AUTO mitigates but does not eliminate Hash-distributed at slot scheduler level; well-mitigated by the scheduler
Maximum shards (practical) Effectively unbounded — billions of micro-partitions per large table Tens of millions of Parquet files practical; small-file problem mitigated by Auto-Compaction 16-32 nodes typical; up to 128 RA3 nodes; Serverless abstracts node count N/A — fully managed; users see slot count not shard count
Resharding without downtime? Yes — fully transparent; clustering key change triggers background re-clustering Yes — OPTIMIZE and re-cluster are online operations DISTKEY change requires full table rewrite; ALTER DISTKEY exists but is expensive; resize is online but slow N/A — Google handles; partition/cluster change requires CTAS or table rewrite
Cross-shard query support Native — no penalty; planner handles micro-partition fan-out Native via Spark distributed plan Yes — distributed plan engine; broadcast joins or redistribute as needed Yes — slot-based scatter-gather; planner optimizes shuffles

6. Replication

Storage durability is similar across all four — 3x cross-zone in the underlying cloud store. The interesting differences are in cross-region story, consistency level options, and how conflicts are handled under concurrent DML.

Dimension Snowflake Databricks Redshift BigQuery
Replication topology Leader-managed per micro-partition over cloud storage; Replication Groups + Failover Groups for cross-region/cloud Single-writer per Delta table via optimistic concurrency control; geo-replication via UC + Delta Sharing Leader node coordinates; worker nodes hold local slices; RMS spans AZs in region Colossus tri-zone; metadata-driven replication; no user-visible topology
Sync vs async Sync within region; async cross-region via Replication Groups Sync within region via cloud storage; async geo-replication Sync within AZ for RMS; async cross-region snapshot copy Sync within region; async across regions for multi-region datasets
Replication factor (default / max) 3x in cloud storage by default; can replicate to any number of additional Snowflake accounts Underlying cloud storage 3x; UC tables can replicate to multiple workspaces RMS 3x; cross-region snapshot is on-demand to N regions 3x default cross-zone; multi-region uses additional zone replicas
Consistency level options Read-after-write consistency within region; eventual cross-region; snapshot isolation per transaction Snapshot isolation per Delta version; cross-region is eventual Read-committed; concurrent serialization for DML Snapshot isolation at job start; consistent reads within a job; new for 2026 cross-table snapshot reads via session
Replication lag (typical) Sub-second within region; seconds to minutes cross-region depending on volume Sub-second commit visibility within region; geo-rep is seconds to minutes Sub-second within AZ; minutes for cross-region snapshot copy Sub-second within region; cross-region replication minutes
Conflict resolution Last-writer-wins on micro-partition; transactional within table Optimistic concurrency control on Delta log; second writer aborts and retries MVCC + commit serialization; concurrent DML serialized at commit Snapshot-based; concurrent DML against same partition retried up to 3x automatically by BQ
Cross-region replication Replication Groups + Failover Groups; Business Critical tier auto-failover with Client Redirect Delta Sharing + UC replication; manual workspace setup per region Cross-region snapshot copy + restore for DR; data sharing across regions Multi-region datasets (US, EU) plus Cross-region disaster recovery
Replication during partition Single-region writes continue; cross-region pause until partition heals Local writes continue; geo-replication pauses; resumes on heal RMS writes continue if quorum maintained; CDC zero-ETL pauses on source disconnect Multi-region tolerates zone failure transparently; cross-region failover requires manual cutover

7. Better Usage Patterns

Where PE depth shows up. The patterns most teams discover too late, the anti-patterns that survive review because they look reasonable, and the optimizations that compound at scale.

Snowflake

Cloud-native DW
PatternWhat Most Teams Do WrongThe Better WayWhy It Matters
Warehouse sizing and auto-suspendDefault 10-minute auto-suspend; one warehouse per team60-second auto-suspend; one warehouse per workload class (BI, ETL, ad-hoc); resource monitors on every warehouseCredit burn drops 30-50% on bursty BI workloads with no perf regression
Right-size warehouses per workloadPick "Medium" for everything; never revisitX-Small for BI dashboards, Medium for ETL, Large only when concurrency or data volume justifies; profile via QUERY_HISTORY periodicallyLarger warehouse is rarely the right answer; usually it's a slow query, not capacity
Use RESULT_SCAN to chain queriesRe-run the same query in a downstream step; pay twiceRESULT_SCAN(LAST_QUERY_ID()) reuses cached result for chained logic in same sessionFree compute on chained analyst workflows; result cache TTL is 24h
Streams + Tasks for in-DB CDC and orchestrationExternal Airflow + Lambda to detect changes and trigger downstreamSnowflake Stream on source table + Task chain consuming the streamEliminate orchestration tier for in-Snowflake transformations; reduce ops surface
Iceberg vs native — when to chooseDefault to Iceberg "for openness" even when no other engine reads itNative FDN micro-partitions when Snowflake is sole consumer; managed Iceberg when interop matters; external Iceberg only when you control writersPerformance and cost gap is real; openness without consumers is a tax with no benefit
Search Optimization Service for selective lookupsApply SOS broadly assuming it's free perfSOS only on tables where queries are highly selective point-lookups on otherwise-large datasetsSOS has its own credit cost; broad application can exceed query-side savings
Snowpipe Streaming over classic SnowpipeDefault to classic Snowpipe with file-based loadsSnowpipe Streaming for sub-second freshness on streaming sources; classic for batch file-based10x freshness improvement on streaming workloads; lower per-row ingest cost at sustained throughput

Databricks

Lakehouse
PatternWhat Most Teams Do WrongThe Better WayWhy It Matters
Cluster type per workloadUse Classic clusters for everything; share interactive clusters across teamsServerless SQL for BI, Photon Serverless for high-throughput SQL, Job clusters with auto-termination for batch, dedicated cluster pools for MLCold-start cost drops materially; chargeback becomes possible; resource isolation between workloads
OPTIMIZE and clustering strategyNever run OPTIMIZE; suffer small-file problem and slow scansAuto-Optimize on append-heavy tables; Liquid Clustering on tables with multi-dimensional filter patterns; OPTIMIZE on filter columns explicitlyScan performance 2-10x on tables that would otherwise have thousands of small files
Streaming ingestion approachRoll your own Spark Structured Streaming pipeline from scratchAuto Loader + Delta Live Tables (Lakeflow) for declarative streaming ETL with built-in checkpointing, retries, and lineageOperational burden drops; DLT handles failure semantics and data quality expectations natively
Delta auto-compaction on append-heavy tablesAppend millions of small files; query performance degrades over time silentlySET spark.databricks.delta.autoCompact = true and tune autoCompact.maxFileSize per table shapePrevents the small-file problem from compounding; query latency stays stable
Unity Catalog from day oneStart on Hive metastore; plan to migrate later; never get to itUC from the first workspace; migration plan for legacy workspaces with shadow catalogsMid-life UC migration is 1-2 engineer-quarters; day-one adoption is free
Photon on vs offEnable Photon everywhere assuming it's universal speedupPhoton ON for SQL and DataFrame workloads; OFF for UDF-heavy or pandas-heavy code that doesn't vectorizeYou pay DBU premium for Photon regardless; you only get the speedup where vectorization applies
Job clusters with auto-terminationShare long-running clusters across teams; never set timeoutJob-specific clusters with timeout_seconds bound to expected runtime; isolated by workloadPrevents abandoned-cluster cost; isolates failures; simplifies chargeback

Amazon Redshift

AWS-native MPP
PatternWhat Most Teams Do WrongThe Better WayWhy It Matters
RA3 + Concurrency Scaling for spiky workloadsStay on DC2 because "we're not at the limits yet"RA3 with Concurrency Scaling enabled and usage limit configured; gives elastic burst without provisioningHandles BI peak without over-provisioning; DC2 is deprecating anyway
AUTO distribution and sort keysHand-pick DISTKEY and SORTKEY on every table based on initial guessesLet Auto Table Optimization choose for most tables; override only when query patterns are well-understoodAuto adapts as query patterns evolve; manual choices age poorly and require periodic re-tuning
Materialized views with auto-refreshRe-aggregate the same data in every dashboard queryMaterialized views on common aggregations with AUTO REFRESH; queries against MV are 10-100x fasterCompute saved on repeated aggregation queries; analyst experience improves materially
Spectrum for true cold data onlyDefault cold tier to Spectrum to "save storage cost"Spectrum for archive (>90d) or for joining with truly external lake data; native RMS for anything queried dailySpectrum per-byte scan costs add up; native RMS storage is cheap relative to repeated scans
Serverless vs RA3 decisionUse Serverless for everything because "elastic is better"Serverless for spiky or unpredictable; RA3 for sustained 12+ hours/day or zero-ETL CDC sourcesBreak-even for Serverless is around 12-16 hours/day of active compute; sustained workloads cost more on Serverless
WLM queues separating ETL from BISingle default WLM queue for everythingAuto-WLM (default for new clusters) handles most cases; manual WLM with separate ETL and BI queues for shops with predictable workload mixp99 stays predictable under concurrent load; one runaway query can't starve dashboards
Zero-ETL with downstream materializationTry to write back to zero-ETL replicated tables; fail with read-only errorsTreat zero-ETL tables as sources only; build cross-database materialized views or stage-to-target tables for transformationAvoids the read-only surprise; gives you a clean transformation layer

Google BigQuery

Fully serverless DW
PatternWhat Most Teams Do WrongThe Better WayWhy It Matters
Partition + cluster every wide tableLeave large tables unpartitioned; one analyst scans everything by accidentRequire time-based partition + clustering on the top filter columns for every table over 100GBBytes scanned drops 10-100x on filtered queries; cost predictability dramatically improves
Editions vs on-demand decisionPick one and stick with it forever; never revisitOn-demand for spiky exploration projects; Editions baseline + autoscale for predictable production; revisit quarterly as workload shape changesThe break-even moves; 25% on-demand price hike narrows the gap; Editions 1-min minimum eats short-query savings
Storage Write API over Streaming InsertsUse legacy tabledata.insertAll for new pipelinesStorage Write API (gRPC) with exactly-once semantics; appendRows for high throughputExactly-once delivery; higher throughput per quota; legacy API is in maintenance mode
Dry-run cost estimation before big queriesRun queries and discover cost after the fact--dry_run on every large query in development; preview total_bytes_processed in client toolingCatches the multi-TB scan before it bills; ten seconds of friction saves hundreds of dollars
Materialized views for repeated aggregationsRe-aggregate in every dashboard tileMaterialized views with incremental refresh on common aggregations; BQ rewrites compatible queries automaticallyOrder-of-magnitude bytes-scanned reduction; pricing scales accordingly
BigQuery DataFrames for pandas-style at scalePull terabytes into Python pandas via the client library; OOMBigQuery DataFrames pushes the pandas operations down into BQ; data never leaves the warehouseScales beyond local memory; respects governance; ML feature engineering stays close to the data
Capacity baseline + autoscale ceilingPure autoscale starting from zero; surprised by 1-minute minimum billingSet a small baseline of committed slots for steady-state; autoscale ceiling for peak; commitments for additional discountAvoid the 1-minute minimum penalty on every short query; predictable cost floor

8. Advanced / Next-Gen Alternatives

Where the platform may be displaced or augmented. Migration cost reflects the realistic effort, not the vendor's "easy migration" claim. When To Consider is the actual trigger, not the marketing positioning.

Snowflake

Cloud-native DW
Successor / AlternativeWhat It ImprovesMaturityMigration CostWhen To Consider
Apache Iceberg + DuckDB / Polars / Trino (OSS lakehouse)Eliminates per-credit pricing; open table format; multi-engineEmergingHigh — different query engine, governance built from parts, no managed serviceWhen credit bill exceeds $1M/yr and team has the engineering bandwidth to operate an OSS stack
Databricks SQL with PhotonBetter ML integration; competitive SQL perf; open Delta with Iceberg interopProductionHigh — different governance model, dbt-style transformations port but Streams + Tasks do notWhen ML workloads grow to dominate the budget; when engineering team prefers Python/Spark over SQL-first
Microsoft Fabric + OneLakeTighter Power BI integration; OneLake Security (universal ACLs, 2026) closing governance gapEmergingMedium — Direct Lake mode reduces data movement; concepts map cleanly from SnowflakeOrg standardized on Microsoft stack (Entra ID, Power BI, Office 365) and Fabric pricing math works out

Databricks

Lakehouse
Successor / AlternativeWhat It ImprovesMaturityMigration CostWhen To Consider
Snowflake with Snowpark + CortexSQL-first ergonomics; governance posture; lower ops burden for non-Spark teamsProductionHigh — rewrite Spark pipelines as Snowpark or pure SQL; cost model is differentWhen SQL workloads dominate and ML use cases fit Snowpark/Cortex's tabular ML envelope
OSS Spark + Iceberg + Trino/PrestoFull open-source stack; no DBU surcharge; portability across cloudsEmergingHigh — operate Spark, manage clusters, build governance from Polaris/Nessie/OpenMetadataWhen Photon premium is unjustifiable and team has data platform engineering depth
Microsoft Fabric (Spark + Lakehouse)OneLake + Synapse Spark; native Power BI; tighter Office 365 integrationEmergingMedium — Spark on Fabric is real but tooling/MLflow story is younger than DatabricksMicrosoft-shop with Power BI as primary BI tool; Office 365 governance posture

Amazon Redshift

AWS-native MPP
Successor / AlternativeWhat It ImprovesMaturityMigration CostWhen To Consider
Snowflake on AWSBetter elasticity; multi-cloud option; richer SQL surface; stronger governanceProductionMedium — both are SQL warehouses; ELT logic ports cleanly; deep AWS integrations (Lake Formation, Glue) need replanningWhen elasticity and cross-cloud strategy matter more than AWS-native integration depth
Databricks SQL on AWSML-native; open Delta; Photon performance; Spark for ETLProductionMedium — keep S3 data layer; replace cluster compute model; UC vs Lake Formation reconciliationWhen ML workloads grow and Redshift ML's SageMaker round-trip is the bottleneck
ClickHouse Cloud on AWSSub-second analytics on user-facing dashboards; lower latency floor than RedshiftEmergingMedium — for hot-path workloads only; complement Redshift, don't replaceWhen Redshift can't meet sub-second p99 on user-facing dashboards and you need a real-time tier

Google BigQuery

Fully serverless DW
Successor / AlternativeWhat It ImprovesMaturityMigration CostWhen To Consider
Snowflake on GCPBetter governance; cross-cloud portability; more predictable billProductionMedium-High — SQL ports cleanly; nested/repeated schemas require flattening; ML pipeline rewriteWhen multi-cloud strategy emerges; when on-demand bytes-scanned predictability becomes a finance issue
Databricks on GCPML/data-engineering depth; open Delta; multi-cloud workspaceProductionMedium — keep GCS data layer; rewrite Spark pipelines that consumed BQ ML featuresWhen ML workloads grow and BQ ML's tabular envelope is the bottleneck
ClickHouse Cloud + BigLakeSplit hot/cold; ClickHouse for sub-second user-facing; BQ for batch + MLEmergingMedium — additive, not replacement; new query path for user-facing analyticsWhen BQ p99 is the user-facing experience bottleneck; high QPS user-facing dashboards
As-of disclaimer. Specifics in this artifact reflect public documentation, vendor blogs, and production reports through May 2026. Hybrid Tables pricing changed March 1, 2026 (per-request billing retired). Iceberg GA in Snowflake October 2025. Multi-table transactions in Databricks Unity Catalog March 2026. Redshift Python UDF creation blocked Patch 198+, existing UDFs supported through June 30 2026. BigQuery on-demand rate rose approximately 25%; DML quotas removed years ago but Streaming has its own quotas. Verify before architectural commitments.