Articles / Subqueries and Nested SELECTs in SQL: A Practical Tutorial for Analysts

Subqueries and Nested SELECTs in SQL: A Practical Tutorial for Analysts

Explore subqueries and nested selects in sql with practical guidance on features, use cases, and implementation strategies.

Eoin McMillan

January 30, 2026 • 9 min read

SQL subqueries are SELECT statements nested inside other queries. They let analysts filter, aggregate, or transform data in stages without creating temporary tables. This tutorial explains types of subqueries, shows practical analytics examples, and compares subqueries with joins and CTEs so you can choose the right pattern.

Sourcetable's AI data analyst is free to try. Sign up here.

What Is a SQL Subquery?

A SQL subquery is a query nested within another SQL statement, such as SELECT, INSERT, UPDATE, or DELETE. It allows you to perform operations in steps, similar to using intermediate tables in a spreadsheet. For example, to find employees with salaries above the average, you can use: SELECT * FROM employees WHERE salary > (SELECT AVG(salary) FROM employees);. According to widely used SQL style guides, subqueries are a fundamental tool for complex data retrieval.

What Are the Types of SQL Subqueries?

SQL subqueries are categorized by their return type and how they interact with the outer query:

Scalar Subqueries: Return a single value. Use in WHERE or SELECT clauses where a single value is expected, e.g., SELECT name FROM products WHERE price = (SELECT MAX(price) FROM products).
Row Subqueries: Return a single row with multiple columns. Useful for multi-column comparisons, though less common.
Table Subqueries: Return a result set that can be treated as a table. Often used with IN or EXISTS operators, e.g., SELECT * FROM orders WHERE customer_id IN (SELECT id FROM customers WHERE country = 'USA').
Correlated Subqueries: Reference columns from the outer query, executing once for each row. Powerful for row-by-row processing but can be performance-intensive.

SQL Subquery Types at a Glance

Type	Returns	Common Use
Scalar	Single value	Comparisons in WHERE or SELECT clauses
Row	Single row	Multi-column condition checks
Table	Result set	Used with IN, EXISTS, or as a derived table
Correlated	Result set per row	Row-specific filtering or calculations

When Should You Use Subqueries vs JOINs vs CTEs?

Choosing between subqueries, JOINs, and Common Table Expressions (CTEs) depends on readability, performance, and the specific task:

Subqueries are ideal for simple, self-contained logic or when filtering based on aggregated data. However, deeply nested subqueries can reduce readability.
JOINs are best for combining data from multiple tables in a single step and are often more performant for large datasets.
CTEs improve readability for complex queries by breaking them into named, reusable parts. Research shows that analysts often overuse subqueries when simpler joins would suffice. Data indicates that modern databases can optimize many subquery patterns effectively.

Subqueries vs JOINs vs CTEs: Key Differences

Feature	Subqueries	JOINs	CTEs
Readability	Can be low if nested deeply	High for straightforward combinations	Very high with modular, named parts
Performance	Varies; correlated subqueries can be slow	Generally fast with proper indexes	Similar to subqueries; depends on optimization
Primary Use Case	Staged filtering, aggregation, or existence checks	Merging rows from multiple tables	Simplifying and organizing complex multi-step queries
Temporary Data	Implicit, within query	Not applicable	Explicit, defined with WITH clause

What Are Common Analytics Patterns for Subqueries?

Subqueries excel in everyday analytics scenarios that spreadsheet users will recognize:

Filtering with Aggregates: Find records above or below a group average, like identifying sales reps outperforming their regional average.
Existence Checks: Use EXISTS to find customers who have placed at least one order, similar to a filter in a spreadsheet.
Derived Tables for Summary Data: Create temporary summaries, such as monthly sales totals, for use in larger analyses.
Data Validation and Integrity: Check values against a master list, akin to data validation rules in Excel.

Think of subqueries as dynamic, auto-updating lookup tables or advanced filter criteria that operate directly within your database.

Watch this video from Alex The Analyst for a visual walkthrough of subquery syntax and practical examples.

How Can You Optimize Subquery Performance?

To ensure your subqueries run efficiently:

Minimize Correlated Subqueries: They execute row-by-row; rewrite with JOINs where possible.
Prefer EXISTS Over IN for Large Sets: EXISTS often stops after finding a match, making it faster.
Ensure Proper Indexing: Index foreign keys and columns used in WHERE clauses of the subquery.
Analyze Execution Plans: Use EXPLAIN or similar tools to see how the database processes your query.

Anti-patterns to avoid include unnecessary nesting and using subqueries for simple table joins. According to The Anatomy of SQL Subqueries: A Data Scientist's Guide, correlated subqueries should be used sparingly and tested for performance impact.

How to Bring Subquery Results into Sourcetable for Further Analysis?

Once you've crafted your SQL subqueries, you can seamlessly import the results into Sourcetable for enhanced analysis. Sourcetable connects directly to your databases, allowing you to execute queries and load the data into a familiar spreadsheet interface.

Write and test your subquery in your database client or Sourcetable's SQL editor.
Execute the query and import the result set into a Sourcetable spreadsheet.
Use Sourcetable's AI-powered tools to clean the data, build charts, create financial models, and automate reports-multiplying your analytical productivity without leaving the spreadsheet environment.

What is a subquery in SQL and why is it useful?

A subquery is a SELECT statement nested inside another SQL query. It's useful for performing multi-step data operations-like filtering based on an aggregate value or checking for existence-without creating physical temporary tables, thereby simplifying complex logic in a single query.

What is the difference between a subquery and a JOIN?

A subquery retrieves data to be used within the main query for filtering or calculation, often in stages. A JOIN combines rows from two or more tables based on a related column. JOINs are generally more efficient for merging table data, while subqueries are better suited for conditional logic based on aggregated or derived data.

When should I use a correlated subquery?

Use a correlated subquery when your inner query needs to reference a column from the outer query, typically for row-by-row comparisons. A common example is finding employees who earn more than the average salary in their own department. Use them cautiously as they can be slower due to repeated execution for each outer row.

Are subqueries bad for performance compared to CTEs?

Not inherently. Performance depends on the database optimizer and query structure. CTEs primarily improve readability and organization. According to SQL for Data Science: Subqueries and Joins, modern database engines are adept at optimizing subqueries, but it's always wise to profile your queries with tools like EXPLAIN.

How can I rewrite complex spreadsheet logic as SQL subqueries?

Map common spreadsheet operations to SQL: replace VLOOKUPs with JOINs or subqueries using IN, translate filter criteria on summary cells to WHERE clauses with aggregate subqueries, and convert multi-sheet analyses into a series of nested SELECT statements or CTEs. This approach brings the power and automation of SQL to your spreadsheet workflows.

Key Takeaways

Subqueries enable staged data processing without physical temporary tables.
Correlated subqueries can significantly impact performance and should be used judiciously.
JOINs are often more efficient for combining data from multiple related tables.
CTEs enhance query readability and maintenance for complex multi-step operations.
Proper indexing and query analysis are crucial for optimal subquery performance.

Sources

According to widely used SQL style guides, subqueries are a fundamental tool for complex data retrieval. [Source]
Research shows that analysts often overuse subqueries when simpler joins would suffice. [Source]
Data indicates that modern databases can optimize many subquery patterns effectively. [Source]

What is a subquery in SQL and why is it useful?

A subquery is a SELECT statement nested inside another SQL query. It's useful for performing multi-step data operations-like filtering based on an aggregate value or checking for existence-without creating physical temporary tables, thereby simplifying complex logic in a single query.

What is the difference between a subquery and a JOIN?

A subquery retrieves data to be used within the main query for filtering or calculation, often in stages. A JOIN combines rows from two or more tables based on a related column. JOINs are generally more efficient for merging table data, while subqueries are better suited for conditional logic based on aggregated or derived data.

When should I use a correlated subquery?

Use a correlated subquery when your inner query needs to reference a column from the outer query, typically for row-by-row comparisons. A common example is finding employees who earn more than the average salary in their own department. Use them cautiously as they can be slower due to repeated execution for each outer row.

Are subqueries bad for performance compared to CTEs?

Not inherently. Performance depends on the database optimizer and query structure. CTEs primarily improve readability and organization. According to SQL for Data Science: Subqueries and Joins, modern database engines are adept at optimizing subqueries, but it's always wise to profile your queries with tools like EXPLAIN.

How can I rewrite complex spreadsheet logic as SQL subqueries?

Map common spreadsheet operations to SQL: replace VLOOKUPs with JOINs or subqueries using IN, translate filter criteria on summary cells to WHERE clauses with aggregate subqueries, and convert multi-sheet analyses into a series of nested SELECT statements or CTEs. This approach brings the power and automation of SQL to your spreadsheet workflows.

Eoin McMillan

Founder, CEO @ Sourcetable

The Sourcetable team is dedicated to helping analysts, operators, and finance teams work smarter with AI-powered spreadsheets.

Share this article

Ready to get started?

Experience the best AI data workbench on the planet.

Get Started

Drop CSV