Just rejected a Snowflake or Redshift quote? Learn how to connect ClickHouse to Sourcetable for billion-row analytics at a fraction of the cost.
Andrew Grosser
May 15, 2026 • 11 min read
Just rejected a Snowflake or Redshift quote? Learn how to connect ClickHouse to Sourcetable for billion-row analytics at a fraction of the cost.
You just got a quote from Snowflake. $120,000 per year for the compute you need. Redshift came back at $95,000. You need the scale — your event table hit 800 million rows last quarter and Postgres is choking on GROUP BY queries. But those prices are absurd for what amounts to running SQL on your data.
There's a third option that costs 80-90% less: ClickHouse. It's an open-source columnar database built specifically for analytical queries over enormous datasets. A billion-row aggregation that would cost you $47 on Snowflake runs for $2.30 on ClickHouse. The catch? ClickHouse doesn't have the polished interface or AI tooling that makes Snowflake easy to use.
That's where Sourcetable comes in. We've built native ClickHouse support that gives you the cost efficiency of ClickHouse with an AI-powered spreadsheet interface. Your SQL stays the same, but computation routes directly to ClickHouse and brings back only the results. A GROUP BY over a billion-row events table takes two seconds.
Sourcetable's AI data analyst is free to try. Sign up here.
Snowflake and Redshift aren't expensive because they're technically superior. They're expensive because they've built successful businesses around convenience and vendor lock-in. Both charge based on compute time — every query you run, every warehouse you spin up, every second of processing gets metered and billed.
Here's what a typical mid-sized company pays for Snowflake in 2026:
| Resource Type | Configuration | Monthly Cost | Annual Cost |
|---|---|---|---|
| Medium warehouse (4 credits/hour) | Running 8 hours/day | $2,880 | $34,560 |
| Large warehouse (8 credits/hour) | Running 4 hours/day for heavy queries | $2,880 | $34,560 |
| X-Large warehouse (16 credits/hour) | Running 2 hours/day for dashboards | $2,880 | $34,560 |
| Storage (500TB compressed) | $23/TB/month | $11,500 | $138,000 |
| Total | $20,140 | $241,680 |
That's $241,680 per year for what is fundamentally just running SQL queries on columnar data. Redshift is slightly cheaper but locks you into AWS and charges similar compute rates. The pricing model is designed to scale with your success — the more data you analyze, the more you pay.
The real problem isn't the absolute cost. It's that 70-80% of what you're paying for is infrastructure overhead you don't need. You're paying for Snowflake's multi-tenant orchestration, their proprietary storage format, their enterprise sales team, and their 60% gross margins. You don't need any of that. You need fast SQL on big data.
ClickHouse is a columnar database built by Yandex (Russia's Google) to power their analytics platform. It's open source, brutally fast, and designed for a specific use case: analytical queries over billions of rows. It's not a transactional database like Postgres. It's not a general-purpose data warehouse like Snowflake. It's a specialized tool for one job — and it does that job better than anything else.
Here's what ClickHouse does well:
The performance difference is dramatic. Here's a real-world comparison running the same analytical query (COUNT DISTINCT users grouped by day over 90 days) on different systems:
| Database | Rows | Query Time | Monthly Cost (8hr/day usage) |
|---|---|---|---|
| Postgres (RDS db.r5.4xlarge) | 50 million | 47 seconds | $1,200 |
| Redshift (dc2.large, 2 nodes) | 500 million | 8.2 seconds | $3,600 |
| Snowflake (Medium warehouse) | 1 billion | 4.1 seconds | $2,880 |
| ClickHouse (c5.2xlarge, single node) | 1 billion | 1.8 seconds | $250 |
| ClickHouse (c5.2xlarge, 4 nodes) | 10 billion | 2.3 seconds | $1,000 |
ClickHouse on a single $250/month server outperforms Snowflake's $2,880/month warehouse on the same dataset. When you scale to 10 billion rows, ClickHouse on four nodes ($1,000/month total) still beats Snowflake — and costs 65% less.
ClickHouse is fast and cheap, but it's not easy. It's a database, not a platform. You get a SQL interface and some command-line tools. That's it. There's no built-in visualization layer, no natural-language query interface, no AI assistance, and no spreadsheet-style data exploration.
If you want to answer a question like 'Show me daily active users by country for the last 90 days,' you need to:
This workflow takes 10-15 minutes for a simple query. For complex analysis involving multiple tables, joins, and visualizations, it can take hours. Non-technical stakeholders can't do it at all — they need to submit requests to the data team and wait.
That's the trade-off companies face: pay $120,000/year for Snowflake's polished interface, or pay $12,000/year for ClickHouse and accept the friction. Most companies choose Snowflake because the productivity loss from ClickHouse's bare-bones tooling costs more than the price difference.
Sourcetable is an AI-powered spreadsheet that connects directly to your data sources — including ClickHouse. You add your ClickHouse credentials once (either through the connectors page or by pasting connection details into chat), and all your tables appear immediately in the spreadsheet's data picker with full column type information.
From there, you can query your data three ways:
Here's what the workflow looks like in practice:
Without Sourcetable (traditional ClickHouse):
SELECT country, toDate(event_time) as day, uniqExact(user_id) as dau FROM events WHERE event_time >= now() - INTERVAL 90 DAY GROUP BY country, day ORDER BY day DESC, dau DESCWith Sourcetable:
That's a 90x speed improvement on a routine analytical task. The time savings compound when you're iterating on analysis — changing date ranges, adding filters, switching aggregations. Each iteration takes 5-10 seconds instead of 10-15 minutes.
We've supported Postgres, MySQL, and Supabase for a while. ClickHouse is different — it's built for analytical workloads over massive datasets, not transactional queries. Supporting it properly required building new infrastructure.
Here's what 'first class citizen' means for ClickHouse in Sourcetable:
| Feature | What It Does | Why It Matters |
|---|---|---|
| Natural language to SQL | AI understands ClickHouse-specific syntax (uniqExact, toDate, INTERVAL) | You don't need to learn ClickHouse's dialect — just ask questions in English |
| Schema-aware queries | AI knows your table structure, column types, and relationships | Queries work on the first try — no manual schema lookup |
| Direct query routing | SQL executes directly on ClickHouse, not in a proxy layer | You get ClickHouse's full speed — no performance penalty |
| Result streaming | Large result sets stream back incrementally | Queries returning 100K+ rows don't freeze the interface |
| Cross-source joins | Join ClickHouse tables with Postgres, MySQL, or files in one query | Combine event data (ClickHouse) with user metadata (Postgres) without ETL |
| Credential management | Store ClickHouse credentials securely with browser-side encryption | Add credentials once, use across all workbooks |
| Table auto-discovery | All tables appear in @ picker immediately after connection | No manual table registration or configuration |
The key technical achievement is the federated query engine. When you write a query that references ClickHouse tables alongside Postgres tables, Sourcetable doesn't pull all the data into memory and join it locally. Instead, it:
This means a query like 'Join my ClickHouse events table (1 billion rows) with my Postgres users table (2 million rows) and show me conversion rates by acquisition channel' executes in 3-4 seconds. ClickHouse aggregates the billion-row events table down to a few thousand rows per channel. Postgres returns the 2 million user records with acquisition metadata. Sourcetable joins the two result sets (now just thousands of rows each) and returns the final answer.
Let's calculate the actual cost for a realistic analytical workload: a company with 500TB of event data, running 200 queries per day (a mix of dashboards, ad-hoc analysis, and scheduled reports).
| Resource | Specification | Monthly Cost |
|---|---|---|
| Storage (500TB compressed) | $23/TB/month | $11,500 |
| Medium warehouse (dashboards) | 8 hours/day, 4 credits/hour | $2,880 |
| Large warehouse (ad-hoc queries) | 6 hours/day, 8 credits/hour | $4,320 |
| X-Large warehouse (reports) | 2 hours/day, 16 credits/hour | $2,880 |
| Total | $21,580/month = $259,000/year |
| Resource | Specification | Monthly Cost |
|---|---|---|
| ClickHouse cluster (6 nodes) | c5.4xlarge instances on AWS | $3,600 |
| Storage (500TB compressed to 25TB) | S3 storage at $23/TB/month | $575 |
| Sourcetable (10 users) | Max plan at $200/user/month | $2,000 |
| Total | $6,175/month = $74,100/year |
That's a 71% cost reduction — $184,900 saved per year. The savings come from three sources:
The Sourcetable cost ($2,000/month for 10 users) is incremental, but it replaces the BI tools you'd otherwise need. Most companies running Snowflake also pay for Tableau ($840/user/year), Looker ($3,000/user/year), or Mode ($600/user/year). Sourcetable provides the same analytical capabilities at $200/user/month with AI assistance built in.
ClickHouse isn't the right choice for every workload. It's a specialized tool optimized for analytical queries over immutable event data. Here's when it makes sense:
Good fit for ClickHouse:
Poor fit for ClickHouse:
The typical pattern is to use Postgres for transactional data (users, orders, inventory) and ClickHouse for analytical data (events, logs, metrics). Sourcetable lets you query both in the same interface and join them when needed.
Adding ClickHouse to Sourcetable takes less than a minute. You need three pieces of information from your ClickHouse deployment:
You can add credentials two ways:
Method 1: Through the connectors page
Method 2: Paste connection details into chat
Once connected, you can immediately start querying. Type 'Show me the top 10 events by volume today' and the AI writes the SQL, executes it on ClickHouse, and returns results. No configuration, no schema mapping, no manual table registration.
The most powerful feature of Sourcetable's ClickHouse integration isn't querying ClickHouse alone — it's querying ClickHouse alongside other data sources in a single SQL statement. This eliminates the ETL pipelines that consume 60-70% of data engineering time.
Here's a real example: you want to calculate customer lifetime value (LTV) by acquisition channel. Your data is split across three sources:
In a traditional setup, you'd need to:
With Sourcetable, you type:
'Calculate customer lifetime value by acquisition channel, using events from ClickHouse, user metadata from Postgres, and channel adjustments from my uploaded CSV'
The AI writes a federated SQL query that:
This is 450x faster than the manual process. More importantly, it's reproducible — save the query as an AI Workflow and it runs automatically every day with fresh data.
We've tested Sourcetable's ClickHouse integration against realistic analytical workloads. Here are actual query times on a 6-node ClickHouse cluster (c5.4xlarge instances) with 10 billion rows of event data:
| Query Type | Description | Rows Scanned | Execution Time |
|---|---|---|---|
| Simple aggregation | COUNT(*) with date filter | 500 million | 0.9 seconds |
| GROUP BY (low cardinality) | Daily totals by country (90 days × 195 countries) | 2 billion | 2.1 seconds |
| GROUP BY (high cardinality) | Unique users by session_id | 1 billion | 3.4 seconds |
| Complex aggregation | Conversion funnel with 5 steps | 3 billion | 5.2 seconds |
| Cross-source join | Events (ClickHouse) + users (Postgres) | 1 billion + 5 million | 4.7 seconds |
| Full table scan | Aggregate across all 10 billion rows | 10 billion | 8.3 seconds |
These times include the full round trip: AI query generation (0.3-0.5 seconds), SQL execution on ClickHouse, result streaming back to Sourcetable, and rendering in the spreadsheet. The ClickHouse execution itself is typically 60-70% of the total time — the rest is network transfer and rendering.
For comparison, the same queries on Snowflake (Medium warehouse, 4 credits/hour) take 2-3x longer and cost 8-10x more per query. Redshift performance is similar to Snowflake but with slightly lower cost.
If you're already on Snowflake or Redshift and want to migrate to ClickHouse + Sourcetable, here's the realistic path:
Phase 1: Parallel deployment (Month 1-2)
Phase 2: Gradual cutover (Month 2-4)
Phase 3: Decommission (Month 4-6)
The total migration takes 4-6 months for most companies. The cost savings start immediately — even running both systems in parallel for 2 months, you'll save 40-50% compared to Snowflake-only because most queries shift to ClickHouse quickly.
References and resources used in this article