Connect your ClickHouse database to Sourcetable's AI spreadsheet. Query billions of rows with natural language, join across data sources, and analyze instantly.
Andrew Grosser
May 15, 2026 • 11 min read
Connect your ClickHouse database to Sourcetable's AI spreadsheet. Query billions of rows with natural language, join across data sources, and analyze instantly.
You just provisioned a ClickHouse cluster. Your data engineering team spent three weeks migrating 800 million event records from Postgres. Query performance went from 45 seconds to 1.8 seconds for aggregations. But now you're stuck writing SQL in the command line or wrestling with third-party visualization tools that don't understand your schema. You need a UI that treats ClickHouse as a first-class analytical database — not an afterthought.
Sourcetable now provides native ClickHouse support with natural language query translation, cross-source joins, and instant spreadsheet results. Add your connection string through chat or the connectors page. Your tables appear in the @ picker immediately with full column type inference. Query a billion-row events table with plain English: 'Show me daily active users by country for the last 90 days' — results in two seconds.
Sourcetable's AI data analyst is free to try. Sign up here.
ClickHouse is built for analytical queries over enormous datasets. Postgres slows down around 50 million rows. ClickHouse handles a billion. You provisioned it because you need speed — GROUP BY queries over hundreds of millions of events, real-time aggregations for dashboards, time-series analysis with sub-second latency.
But ClickHouse's command-line interface isn't built for exploration. Writing SQL manually for every question takes time. Third-party tools like Grafana or Metabase add visualization but don't understand your specific schema. Data analysts who aren't SQL experts can't access the data. You need a layer between your team and the database that speaks both natural language and ClickHouse SQL.
| Query Interface | Setup Time | Natural Language | Cross-Source Joins | Schema Inference |
|---|---|---|---|---|
| ClickHouse CLI | 0 minutes | No | No | No |
| Grafana + ClickHouse plugin | 45 minutes | No | No | Partial |
| Metabase | 30 minutes | Limited | No | Yes |
| Sourcetable | 2 minutes | Yes | Yes | Yes |
Connecting ClickHouse to Sourcetable takes two minutes. You can add credentials from the connectors page or paste connection details directly into the AI chat. Sourcetable syncs your schema in the background — tables appear in the @ picker as soon as the connection is established.
You need four pieces of information: host, port, database name, and credentials. For ClickHouse Cloud, your connection string looks like this:
clickhouse://default:your_password@abc123.us-east-1.aws.clickhouse.cloud:8443/production
For self-hosted ClickHouse, the format is similar:
clickhouse://username:password@your-server.com:9000/database_name
Open Sourcetable and navigate to the connectors page. Click 'Add Database' and select ClickHouse. Paste your connection string or fill in the fields manually: host, port, database, username, password. Click 'Connect.' Sourcetable validates the connection and begins syncing your schema.
Alternatively, paste your connection string directly into the AI chat: 'Connect to my ClickHouse database: clickhouse://default:password@abc123.us-east-1.aws.clickhouse.cloud:8443/production.' The AI extracts credentials, stores them securely using zero-knowledge escrow cryptography, and syncs your tables.
Once connected, your ClickHouse tables appear in the @ picker. Type '@' in the AI chat to see all available tables with column names and types. Now you can query in plain English:
Sourcetable translates your question into ClickHouse SQL, executes the query, and returns results directly to the spreadsheet. A GROUP BY over 500 million rows completes in under three seconds. No SQL knowledge required.
Sourcetable's AI understands ClickHouse-specific SQL syntax and optimizations. When you ask a question in natural language, the AI generates schema-aware SQL that leverages ClickHouse's columnar storage and indexing.
You ask: 'Show me daily active users by country for the last 90 days.' Sourcetable translates this to:
SELECT
toDate(event_timestamp) AS date,
country,
uniqExact(user_id) AS daily_active_users
FROM events
WHERE event_timestamp >= now() - INTERVAL 90 DAY
GROUP BY date, country
ORDER BY date DESC, daily_active_users DESC
The query uses ClickHouse's uniqExact function for precise distinct counts and toDate for date truncation. Execution time on 800 million rows: 1.9 seconds. Results appear in the spreadsheet with date, country, and daily_active_users columns.
You ask: 'Calculate total revenue by product category with a 7-day moving average.' Sourcetable generates:
SELECT
toDate(purchase_timestamp) AS date,
product_category,
sum(revenue) AS total_revenue,
avg(sum(revenue)) OVER (
PARTITION BY product_category
ORDER BY toDate(purchase_timestamp)
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_7d
FROM purchases
GROUP BY date, product_category
ORDER BY date DESC, total_revenue DESC
The AI recognizes that moving averages require window functions and constructs the OVER clause with correct partitioning and frame specification. Execution time on 120 million purchase records: 2.4 seconds.
Sourcetable's federated query engine treats ClickHouse, Postgres, MySQL, and local files as a unified SQL namespace. You can join a billion-row ClickHouse events table with a 50,000-row Postgres users table in a single query.
Your ClickHouse events table has 900 million rows. Your Postgres users table has 48,000 rows with user metadata (email, signup_date, subscription_tier). You want to calculate event counts by subscription tier.
You ask: 'Show me event counts by subscription tier for users who signed up in 2025.' Sourcetable generates a federated query:
SELECT
u.subscription_tier,
count(*) AS event_count
FROM clickhouse.events e
JOIN postgres.users u ON e.user_id = u.id
WHERE u.signup_date >= '2025-01-01'
AND u.signup_date < '2026-01-01'
GROUP BY u.subscription_tier
ORDER BY event_count DESC
Sourcetable routes the aggregation to ClickHouse (which handles the 900 million rows efficiently) and pulls the 48,000-row Postgres table into memory for the join. Query execution: 3.2 seconds. Results show subscription_tier and event_count columns in the spreadsheet.
You have a ClickHouse purchases table with 200 million transactions. Your product catalog lives in a CSV file with 15,000 products (product_id, product_name, category, margin_percent). You want to calculate total profit by category.
Upload the CSV to Sourcetable. Then ask: 'Calculate total profit by product category using the margin_percent from my product catalog.' Sourcetable generates:
SELECT
pc.category,
sum(p.revenue * pc.margin_percent / 100) AS total_profit
FROM clickhouse.purchases p
JOIN local.product_catalog pc ON p.product_id = pc.product_id
GROUP BY pc.category
ORDER BY total_profit DESC
The query joins 200 million ClickHouse rows with the 15,000-row CSV in memory. Execution time: 2.8 seconds. Results appear in the spreadsheet with category and total_profit columns.
ClickHouse is designed for analytical queries. Postgres is designed for transactional workloads. When you query both through Sourcetable, the performance difference is dramatic for large datasets.
| Query Type | Dataset Size | Postgres Time | ClickHouse Time | Speedup |
|---|---|---|---|---|
| COUNT DISTINCT users | 100M rows | 42.3 seconds | 1.8 seconds | 23.5x |
| GROUP BY date, country | 500M rows | 118 seconds | 2.4 seconds | 49.2x |
| SUM revenue by category | 200M rows | 67 seconds | 1.6 seconds | 41.9x |
| Percentile calculation | 1B rows | Timeout (>300s) | 4.2 seconds | 71.4x+ |
These benchmarks were run on a ClickHouse Cloud cluster (8 cores, 32GB RAM) and a Postgres instance (4 cores, 16GB RAM). ClickHouse's columnar storage and vectorized query execution deliver 20-70x faster results for analytical queries. Sourcetable routes computation directly to ClickHouse and brings back only the results — you never wait for data to transfer.
Sourcetable supports both ClickHouse and Postgres as first-class citizens. Use ClickHouse for analytical queries over large datasets. Use Postgres for transactional data, relational integrity, and smaller tables that change frequently.
In practice, most data stacks use both. Postgres stores transactional data. ClickHouse stores analytical data. Sourcetable lets you query both in the same spreadsheet and join them in a single query.
An e-commerce company migrated 1.2 billion product view events from Postgres to ClickHouse. Their analytics team needed to calculate daily conversion rates by traffic source, device type, and product category. In Postgres, this query took 4 minutes and 18 seconds. The database locked up during peak traffic hours.
After migrating to ClickHouse and connecting it to Sourcetable, the same query completed in 2.9 seconds. The analytics team could ask questions in natural language:
Sourcetable translated each question into optimized ClickHouse SQL and returned results in under 5 seconds. The team went from waiting minutes for query results to exploring data interactively. Time saved per analysis: 3.5 hours → 8 minutes.
When you query ClickHouse through Sourcetable, the AI doesn't just translate natural language to SQL. It applies ClickHouse-specific optimizations to maximize performance.
Sourcetable pushes WHERE clause filters down to ClickHouse before aggregation. If you ask 'Show me events from the last 7 days,' the AI adds WHERE event_timestamp >= now() - INTERVAL 7 DAY at the ClickHouse level. This reduces the dataset before computation begins, cutting query time by 60-80% on large tables.
ClickHouse's columnar storage means reading only the columns you need. Sourcetable's AI selects only the columns required to answer your question. If you ask 'Count events by country,' the query reads only the country column — not the entire row. On a table with 50 columns, this reduces I/O by 98%.
For very large datasets, Sourcetable can use ClickHouse's approximate aggregation functions. uniq (approximate distinct count) is 10x faster than uniqExact with 99.5% accuracy. The AI selects the appropriate function based on dataset size and query complexity.
Sourcetable stores ClickHouse credentials using zero-knowledge escrow cryptography. Encryption keypairs are generated entirely in the browser. The server never possesses plaintext passwords — keys exist in memory only for the duration of a single query operation.
All ClickHouse queries are read-only by design. Sourcetable enforces AST-level SQL validation to block INSERT, UPDATE, DELETE, and DROP statements. Your data is never modified through Sourcetable — only queried.
Credentials can be revoked instantly at three levels: organization-wide, user-level, or admin-level. Revocation is mathematically immediate — no grace period, no cache expiry, no race condition. When you rotate a session key, all credential access is invalidated across the organization.
ClickHouse is optimized for analytical queries, not transactional workloads. It's not a replacement for Postgres in all scenarios. Here's when ClickHouse struggles:
ClickHouse is append-optimized. Updates and deletes are expensive — they trigger background merges that can slow down queries. If your dataset changes frequently (user profiles, inventory levels, order statuses), Postgres is a better choice. ClickHouse works best for immutable or append-only data: event logs, time-series metrics, analytics.
ClickHouse handles joins, but it's not optimized for them. Joining two billion-row tables in ClickHouse is slower than joining two 10-million-row tables in Postgres. If your query requires multiple large-table joins, consider pre-aggregating data or denormalizing tables. Sourcetable's federated query engine can help by pulling smaller tables (Postgres, CSV) into memory and joining them with ClickHouse results.
ClickHouse's performance advantage appears at scale. For datasets under 10 million rows, Postgres is often faster due to lower query overhead. If your largest table has 5 million rows, stick with Postgres. Migrate to ClickHouse when you cross 50 million rows or when query times exceed 10 seconds.
References and additional reading