Articles / Query ClickHouse with Natural Language AI in Sourcetable

Query ClickHouse with Natural Language AI in Sourcetable

Connect your ClickHouse database to Sourcetable's AI spreadsheet. Query billions of rows with natural language, join across data sources, and analyze instantly.

Andrew Grosser

Andrew Grosser

May 15, 2026 • 11 min read

Query ClickHouse with Natural Language AI in Sourcetable

Connect your ClickHouse database to Sourcetable's AI spreadsheet. Query billions of rows with natural language, join across data sources, and analyze instantly.

You just provisioned a ClickHouse cluster. Your data engineering team spent three weeks migrating 800 million event records from Postgres. Query performance went from 45 seconds to 1.8 seconds for aggregations. But now you're stuck writing SQL in the command line or wrestling with third-party visualization tools that don't understand your schema. You need a UI that treats ClickHouse as a first-class analytical database — not an afterthought.

Sourcetable now provides native ClickHouse support with natural language query translation, cross-source joins, and instant spreadsheet results. Add your connection string through chat or the connectors page. Your tables appear in the @ picker immediately with full column type inference. Query a billion-row events table with plain English: 'Show me daily active users by country for the last 90 days' — results in two seconds.

Sourcetable's AI data analyst is free to try. Sign up here.

Why ClickHouse Needs a Better Query Interface

ClickHouse is built for analytical queries over enormous datasets. Postgres slows down around 50 million rows. ClickHouse handles a billion. You provisioned it because you need speed — GROUP BY queries over hundreds of millions of events, real-time aggregations for dashboards, time-series analysis with sub-second latency.

But ClickHouse's command-line interface isn't built for exploration. Writing SQL manually for every question takes time. Third-party tools like Grafana or Metabase add visualization but don't understand your specific schema. Data analysts who aren't SQL experts can't access the data. You need a layer between your team and the database that speaks both natural language and ClickHouse SQL.

Query Interface Setup Time Natural Language Cross-Source Joins Schema Inference
ClickHouse CLI 0 minutes No No No
Grafana + ClickHouse plugin 45 minutes No No Partial
Metabase 30 minutes Limited No Yes
Sourcetable 2 minutes Yes Yes Yes

How to Connect ClickHouse to Sourcetable in 90 Seconds

Connecting ClickHouse to Sourcetable takes two minutes. You can add credentials from the connectors page or paste connection details directly into the AI chat. Sourcetable syncs your schema in the background — tables appear in the @ picker as soon as the connection is established.

Step 1: Get Your ClickHouse Connection String

You need four pieces of information: host, port, database name, and credentials. For ClickHouse Cloud, your connection string looks like this:

clickhouse://default:your_password@abc123.us-east-1.aws.clickhouse.cloud:8443/production

For self-hosted ClickHouse, the format is similar:

clickhouse://username:password@your-server.com:9000/database_name

Step 2: Add Credentials in Sourcetable

Open Sourcetable and navigate to the connectors page. Click 'Add Database' and select ClickHouse. Paste your connection string or fill in the fields manually: host, port, database, username, password. Click 'Connect.' Sourcetable validates the connection and begins syncing your schema.

Alternatively, paste your connection string directly into the AI chat: 'Connect to my ClickHouse database: clickhouse://default:password@abc123.us-east-1.aws.clickhouse.cloud:8443/production.' The AI extracts credentials, stores them securely using zero-knowledge escrow cryptography, and syncs your tables.

Step 3: Query Your Data with Natural Language

Once connected, your ClickHouse tables appear in the @ picker. Type '@' in the AI chat to see all available tables with column names and types. Now you can query in plain English:

  • 'Show me the top 10 countries by event count in the last 30 days'
  • 'Calculate daily active users grouped by platform for Q1 2026'
  • 'Find average session duration by user cohort'
  • 'Count distinct users per day where event_type = "purchase"'

Sourcetable translates your question into ClickHouse SQL, executes the query, and returns results directly to the spreadsheet. A GROUP BY over 500 million rows completes in under three seconds. No SQL knowledge required.

Natural Language to ClickHouse SQL Translation

Sourcetable's AI understands ClickHouse-specific SQL syntax and optimizations. When you ask a question in natural language, the AI generates schema-aware SQL that leverages ClickHouse's columnar storage and indexing.

Example: Daily Active Users by Country

You ask: 'Show me daily active users by country for the last 90 days.' Sourcetable translates this to:

SELECT
      toDate(event_timestamp) AS date,
      country,
      uniqExact(user_id) AS daily_active_users
    FROM events
    WHERE event_timestamp >= now() - INTERVAL 90 DAY
    GROUP BY date, country
    ORDER BY date DESC, daily_active_users DESC

The query uses ClickHouse's uniqExact function for precise distinct counts and toDate for date truncation. Execution time on 800 million rows: 1.9 seconds. Results appear in the spreadsheet with date, country, and daily_active_users columns.

Example: Revenue by Product Category with Moving Average

You ask: 'Calculate total revenue by product category with a 7-day moving average.' Sourcetable generates:

SELECT
      toDate(purchase_timestamp) AS date,
      product_category,
      sum(revenue) AS total_revenue,
      avg(sum(revenue)) OVER (
        PARTITION BY product_category
        ORDER BY toDate(purchase_timestamp)
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
      ) AS moving_avg_7d
    FROM purchases
    GROUP BY date, product_category
    ORDER BY date DESC, total_revenue DESC

The AI recognizes that moving averages require window functions and constructs the OVER clause with correct partitioning and frame specification. Execution time on 120 million purchase records: 2.4 seconds.

Cross-Source Joins: Combine ClickHouse with Postgres and Files

Sourcetable's federated query engine treats ClickHouse, Postgres, MySQL, and local files as a unified SQL namespace. You can join a billion-row ClickHouse events table with a 50,000-row Postgres users table in a single query.

Example: Join ClickHouse Events with Postgres User Metadata

Your ClickHouse events table has 900 million rows. Your Postgres users table has 48,000 rows with user metadata (email, signup_date, subscription_tier). You want to calculate event counts by subscription tier.

You ask: 'Show me event counts by subscription tier for users who signed up in 2025.' Sourcetable generates a federated query:

SELECT
      u.subscription_tier,
      count(*) AS event_count
    FROM clickhouse.events e
    JOIN postgres.users u ON e.user_id = u.id
    WHERE u.signup_date >= '2025-01-01'
      AND u.signup_date < '2026-01-01'
    GROUP BY u.subscription_tier
    ORDER BY event_count DESC

Sourcetable routes the aggregation to ClickHouse (which handles the 900 million rows efficiently) and pulls the 48,000-row Postgres table into memory for the join. Query execution: 3.2 seconds. Results show subscription_tier and event_count columns in the spreadsheet.

Example: Join ClickHouse with Local CSV for Product Metadata

You have a ClickHouse purchases table with 200 million transactions. Your product catalog lives in a CSV file with 15,000 products (product_id, product_name, category, margin_percent). You want to calculate total profit by category.

Upload the CSV to Sourcetable. Then ask: 'Calculate total profit by product category using the margin_percent from my product catalog.' Sourcetable generates:

SELECT
      pc.category,
      sum(p.revenue * pc.margin_percent / 100) AS total_profit
    FROM clickhouse.purchases p
    JOIN local.product_catalog pc ON p.product_id = pc.product_id
    GROUP BY pc.category
    ORDER BY total_profit DESC

The query joins 200 million ClickHouse rows with the 15,000-row CSV in memory. Execution time: 2.8 seconds. Results appear in the spreadsheet with category and total_profit columns.

Performance Comparison: ClickHouse vs Postgres in Sourcetable

ClickHouse is designed for analytical queries. Postgres is designed for transactional workloads. When you query both through Sourcetable, the performance difference is dramatic for large datasets.

Query Type Dataset Size Postgres Time ClickHouse Time Speedup
COUNT DISTINCT users 100M rows 42.3 seconds 1.8 seconds 23.5x
GROUP BY date, country 500M rows 118 seconds 2.4 seconds 49.2x
SUM revenue by category 200M rows 67 seconds 1.6 seconds 41.9x
Percentile calculation 1B rows Timeout (>300s) 4.2 seconds 71.4x+

These benchmarks were run on a ClickHouse Cloud cluster (8 cores, 32GB RAM) and a Postgres instance (4 cores, 16GB RAM). ClickHouse's columnar storage and vectorized query execution deliver 20-70x faster results for analytical queries. Sourcetable routes computation directly to ClickHouse and brings back only the results — you never wait for data to transfer.

When to Use ClickHouse vs Postgres in Your Data Stack

Sourcetable supports both ClickHouse and Postgres as first-class citizens. Use ClickHouse for analytical queries over large datasets. Use Postgres for transactional data, relational integrity, and smaller tables that change frequently.

Use ClickHouse When:

  • Dataset exceeds 50 million rows
  • Queries are primarily aggregations (GROUP BY, COUNT, SUM, AVG)
  • You need sub-second response times on billion-row tables
  • Data is append-only or updated in batches (event logs, time-series, analytics)
  • You're running OLAP workloads (business intelligence, reporting, dashboards)

Use Postgres When:

  • Dataset is under 50 million rows
  • Queries involve frequent updates, deletes, or complex transactions
  • You need strict ACID guarantees and foreign key constraints
  • Data changes frequently (user profiles, order management, inventory)
  • You're running OLTP workloads (application backends, transactional systems)

In practice, most data stacks use both. Postgres stores transactional data. ClickHouse stores analytical data. Sourcetable lets you query both in the same spreadsheet and join them in a single query.

Real-World Use Case: E-Commerce Analytics with ClickHouse

An e-commerce company migrated 1.2 billion product view events from Postgres to ClickHouse. Their analytics team needed to calculate daily conversion rates by traffic source, device type, and product category. In Postgres, this query took 4 minutes and 18 seconds. The database locked up during peak traffic hours.

After migrating to ClickHouse and connecting it to Sourcetable, the same query completed in 2.9 seconds. The analytics team could ask questions in natural language:

  • 'Show me conversion rate by traffic source for mobile users in the last 7 days'
  • 'Calculate average time to purchase by product category'
  • 'Find the top 20 products by view-to-purchase ratio'

Sourcetable translated each question into optimized ClickHouse SQL and returned results in under 5 seconds. The team went from waiting minutes for query results to exploring data interactively. Time saved per analysis: 3.5 hours → 8 minutes.

How Sourcetable Optimizes ClickHouse Queries Behind the Scenes

When you query ClickHouse through Sourcetable, the AI doesn't just translate natural language to SQL. It applies ClickHouse-specific optimizations to maximize performance.

Predicate Pushdown

Sourcetable pushes WHERE clause filters down to ClickHouse before aggregation. If you ask 'Show me events from the last 7 days,' the AI adds WHERE event_timestamp >= now() - INTERVAL 7 DAY at the ClickHouse level. This reduces the dataset before computation begins, cutting query time by 60-80% on large tables.

Column Pruning

ClickHouse's columnar storage means reading only the columns you need. Sourcetable's AI selects only the columns required to answer your question. If you ask 'Count events by country,' the query reads only the country column — not the entire row. On a table with 50 columns, this reduces I/O by 98%.

Approximate Aggregations

For very large datasets, Sourcetable can use ClickHouse's approximate aggregation functions. uniq (approximate distinct count) is 10x faster than uniqExact with 99.5% accuracy. The AI selects the appropriate function based on dataset size and query complexity.

Security and Credential Management for ClickHouse

Sourcetable stores ClickHouse credentials using zero-knowledge escrow cryptography. Encryption keypairs are generated entirely in the browser. The server never possesses plaintext passwords — keys exist in memory only for the duration of a single query operation.

All ClickHouse queries are read-only by design. Sourcetable enforces AST-level SQL validation to block INSERT, UPDATE, DELETE, and DROP statements. Your data is never modified through Sourcetable — only queried.

Credentials can be revoked instantly at three levels: organization-wide, user-level, or admin-level. Revocation is mathematically immediate — no grace period, no cache expiry, no race condition. When you rotate a session key, all credential access is invalidated across the organization.

Limitations and When ClickHouse Isn't the Right Choice

ClickHouse is optimized for analytical queries, not transactional workloads. It's not a replacement for Postgres in all scenarios. Here's when ClickHouse struggles:

Frequent Updates and Deletes

ClickHouse is append-optimized. Updates and deletes are expensive — they trigger background merges that can slow down queries. If your dataset changes frequently (user profiles, inventory levels, order statuses), Postgres is a better choice. ClickHouse works best for immutable or append-only data: event logs, time-series metrics, analytics.

Complex Joins

ClickHouse handles joins, but it's not optimized for them. Joining two billion-row tables in ClickHouse is slower than joining two 10-million-row tables in Postgres. If your query requires multiple large-table joins, consider pre-aggregating data or denormalizing tables. Sourcetable's federated query engine can help by pulling smaller tables (Postgres, CSV) into memory and joining them with ClickHouse results.

Small Datasets

ClickHouse's performance advantage appears at scale. For datasets under 10 million rows, Postgres is often faster due to lower query overhead. If your largest table has 5 million rows, stick with Postgres. Migrate to ClickHouse when you cross 50 million rows or when query times exceed 10 seconds.

How long does it take to connect ClickHouse to Sourcetable?
Connecting ClickHouse to Sourcetable takes 2 minutes. Add your connection string through the connectors page or paste it into the AI chat. Sourcetable validates the connection and syncs your schema in the background. Tables appear in the @ picker as soon as the sync completes.
Can I join ClickHouse data with Postgres or CSV files?
Yes. Sourcetable's federated query engine treats ClickHouse, Postgres, MySQL, and local files as a unified SQL namespace. You can join a billion-row ClickHouse table with a 50,000-row Postgres table in a single query. Sourcetable routes computation to the appropriate database and brings back only the results.
Does Sourcetable modify my ClickHouse data?
No. All ClickHouse queries in Sourcetable are read-only by design. Sourcetable enforces AST-level SQL validation to block INSERT, UPDATE, DELETE, and DROP statements. Your data is never modified — only queried.
How fast are ClickHouse queries in Sourcetable?
Query speed depends on dataset size and complexity. A GROUP BY over 500 million rows typically completes in 2-4 seconds. COUNT DISTINCT on 100 million rows takes 1.8 seconds. Percentile calculations on 1 billion rows finish in 4.2 seconds. Sourcetable routes computation directly to ClickHouse and brings back only the results.
Do I need to know SQL to query ClickHouse in Sourcetable?
No. Sourcetable translates natural language questions into ClickHouse SQL automatically. Ask 'Show me daily active users by country for the last 90 days' and the AI generates the appropriate query, executes it, and returns results to the spreadsheet. You can also write SQL directly if you prefer.
Can I use ClickHouse-specific functions in Sourcetable?
Yes. Sourcetable's AI understands ClickHouse-specific syntax including uniqExact, uniq, quantile, arrayJoin, toDate, and window functions. When you write SQL directly or ask questions in natural language, the AI generates queries that leverage ClickHouse's full feature set.
What happens if my ClickHouse query is too slow?
Sourcetable applies ClickHouse-specific optimizations including predicate pushdown, column pruning, and approximate aggregations. If a query exceeds 30 seconds, the AI suggests optimizations: adding filters, reducing the date range, or using approximate functions like uniq instead of uniqExact.
Is my ClickHouse password stored securely?
Yes. Sourcetable uses zero-knowledge escrow cryptography. Encryption keypairs are generated in the browser. The server never possesses plaintext passwords — keys exist in memory only for the duration of a single query. Credentials can be revoked instantly at organization, user, or admin levels.
Can I schedule ClickHouse queries to run automatically?
Yes. Sourcetable's AI Workflows turn any query into a reusable automation. Ask a question, save the interaction as a workflow, and schedule it to run daily, hourly, or on a custom schedule. Results update in the spreadsheet automatically.
When should I use ClickHouse instead of Postgres?
Use ClickHouse when your dataset exceeds 50 million rows, queries are primarily aggregations (GROUP BY, COUNT, SUM), and you need sub-second response times. Use Postgres for transactional data, frequent updates, and datasets under 50 million rows. Sourcetable supports both — you can query them in the same spreadsheet.
Sourcetable Logo
Connect Your ClickHouse Database

Experience the future of spreadsheets

Sources

References and additional reading

  1. ClickHouse Official Documentation - Query Performance (2026)
  2. ClickHouse Cloud - Connection Strings and Authentication (2026)
  3. Sourcetable Engineering Blog - Federated SQL Engine Architecture (2026)
  4. ClickHouse Benchmarks - OLAP Query Performance vs Postgres (2025)
Andrew Grosser

Andrew Grosser

Founder, CTO @ Sourcetable

Sourcetable is the Agent first spreadsheet that helps traders, scientists, analysts, and finance teams hypothesize, evaluate, validate, make trades and iterate on trading strategies without writing code.

Share this article

Drop CSV