Explore subqueries and nested selects in sql with practical guidance on features, use cases, and implementation strategies.
Eoin McMillan
January 30, 2026 • 9 min read
SQL subqueries are SELECT statements nested inside other queries. They let analysts filter, aggregate, or transform data in stages without creating temporary tables. This tutorial explains types of subqueries, shows practical analytics examples, and compares subqueries with joins and CTEs so you can choose the right pattern.
Sourcetable's AI data analyst is free to try. Sign up here.
A SQL subquery is a query nested within another SQL statement, such as SELECT, INSERT, UPDATE, or DELETE. It allows you to perform operations in steps, similar to using intermediate tables in a spreadsheet. For example, to find employees with salaries above the average, you can use: SELECT * FROM employees WHERE salary > (SELECT AVG(salary) FROM employees);. According to widely used SQL style guides, subqueries are a fundamental tool for complex data retrieval.
SQL subqueries are categorized by their return type and how they interact with the outer query:
Scalar Subqueries: Return a single value. Use in WHERE or SELECT clauses where a single value is expected, e.g., SELECT name FROM products WHERE price = (SELECT MAX(price) FROM products).
Row Subqueries: Return a single row with multiple columns. Useful for multi-column comparisons, though less common.
Table Subqueries: Return a result set that can be treated as a table. Often used with IN or EXISTS operators, e.g., SELECT * FROM orders WHERE customer_id IN (SELECT id FROM customers WHERE country = 'USA').
Correlated Subqueries: Reference columns from the outer query, executing once for each row. Powerful for row-by-row processing but can be performance-intensive.
SQL Subquery Types at a Glance
| Type | Returns | Common Use |
|---|---|---|
| Scalar | Single value | Comparisons in WHERE or SELECT clauses |
| Row | Single row | Multi-column condition checks |
| Table | Result set | Used with IN, EXISTS, or as a derived table |
| Correlated | Result set per row | Row-specific filtering or calculations |
Choosing between subqueries, JOINs, and Common Table Expressions (CTEs) depends on readability, performance, and the specific task:
Subqueries are ideal for simple, self-contained logic or when filtering based on aggregated data. However, deeply nested subqueries can reduce readability.
JOINs are best for combining data from multiple tables in a single step and are often more performant for large datasets.
CTEs improve readability for complex queries by breaking them into named, reusable parts. Research shows that analysts often overuse subqueries when simpler joins would suffice. Data indicates that modern databases can optimize many subquery patterns effectively.
Subqueries vs JOINs vs CTEs: Key Differences
| Feature | Subqueries | JOINs | CTEs |
|---|---|---|---|
| Readability | Can be low if nested deeply | High for straightforward combinations | Very high with modular, named parts |
| Performance | Varies; correlated subqueries can be slow | Generally fast with proper indexes | Similar to subqueries; depends on optimization |
| Primary Use Case | Staged filtering, aggregation, or existence checks | Merging rows from multiple tables | Simplifying and organizing complex multi-step queries |
| Temporary Data | Implicit, within query | Not applicable | Explicit, defined with WITH clause |
Subqueries excel in everyday analytics scenarios that spreadsheet users will recognize:
Filtering with Aggregates: Find records above or below a group average, like identifying sales reps outperforming their regional average.
Existence Checks: Use EXISTS to find customers who have placed at least one order, similar to a filter in a spreadsheet.
Derived Tables for Summary Data: Create temporary summaries, such as monthly sales totals, for use in larger analyses.
Data Validation and Integrity: Check values against a master list, akin to data validation rules in Excel.
Think of subqueries as dynamic, auto-updating lookup tables or advanced filter criteria that operate directly within your database.
Watch this video from Alex The Analyst for a visual walkthrough of subquery syntax and practical examples.
To ensure your subqueries run efficiently:
Minimize Correlated Subqueries: They execute row-by-row; rewrite with JOINs where possible.
Prefer EXISTS Over IN for Large Sets: EXISTS often stops after finding a match, making it faster.
Ensure Proper Indexing: Index foreign keys and columns used in WHERE clauses of the subquery.
Analyze Execution Plans: Use EXPLAIN or similar tools to see how the database processes your query.
Anti-patterns to avoid include unnecessary nesting and using subqueries for simple table joins. According to The Anatomy of SQL Subqueries: A Data Scientist's Guide, correlated subqueries should be used sparingly and tested for performance impact.
Once you've crafted your SQL subqueries, you can seamlessly import the results into Sourcetable for enhanced analysis. Sourcetable connects directly to your databases, allowing you to execute queries and load the data into a familiar spreadsheet interface.
Write and test your subquery in your database client or Sourcetable's SQL editor.
Execute the query and import the result set into a Sourcetable spreadsheet.
Use Sourcetable's AI-powered tools to clean the data, build charts, create financial models, and automate reports-multiplying your analytical productivity without leaving the spreadsheet environment.
A subquery is a SELECT statement nested inside another SQL query. It's useful for performing multi-step data operations-like filtering based on an aggregate value or checking for existence-without creating physical temporary tables, thereby simplifying complex logic in a single query.
A subquery retrieves data to be used within the main query for filtering or calculation, often in stages. A JOIN combines rows from two or more tables based on a related column. JOINs are generally more efficient for merging table data, while subqueries are better suited for conditional logic based on aggregated or derived data.
Use a correlated subquery when your inner query needs to reference a column from the outer query, typically for row-by-row comparisons. A common example is finding employees who earn more than the average salary in their own department. Use them cautiously as they can be slower due to repeated execution for each outer row.
Not inherently. Performance depends on the database optimizer and query structure. CTEs primarily improve readability and organization. According to SQL for Data Science: Subqueries and Joins, modern database engines are adept at optimizing subqueries, but it's always wise to profile your queries with tools like EXPLAIN.
Map common spreadsheet operations to SQL: replace VLOOKUPs with JOINs or subqueries using IN, translate filter criteria on summary cells to WHERE clauses with aggregate subqueries, and convert multi-sheet analyses into a series of nested SELECT statements or CTEs. This approach brings the power and automation of SQL to your spreadsheet workflows.
Subqueries enable staged data processing without physical temporary tables.
Correlated subqueries can significantly impact performance and should be used judiciously.
JOINs are often more efficient for combining data from multiple related tables.
CTEs enhance query readability and maintenance for complex multi-step operations.
Proper indexing and query analysis are crucial for optimal subquery performance.