Essential SQL Clauses and Order of Execution

Learn about the essential SQL clauses and the Order of Execution - One of the least known concepts that help you really grasp the idea behind the SQL queries.

May 21, 2024

SQL Order of Execution

I always tell my mentees that the fundamentals are the key to success. One of the most fundamental concepts in SQL is the Order of Execution. When writing SQL, we always write clauses in a specific order but guess what, SQL engine executes these clauses in a completely different way.

The Real Order

FROM

The starting point is selecting the table(s) from which data will be retrieved. This sets the scope for the query.

SELECT * FROM employees

-- Selects all data FROM the 'employees' table.

JOIN

If your query includes multiple tables, the next step is joining them. Joins combine rows from two or more tables based on a related column.

SELECT * 
  FROM employees 
  JOIN departments ON employees.dept_id = departments.id

-- Joins the 'employees' table with the 'departments' table where there is a matching 'dept_id'.

WHERE

The WHERE clause filters rows based on specified conditions. This step reduces the data set to only the relevant rows.

SELECT * 
  FROM employees 
 WHERE salary > 50000

-- Filters employees with salaries greater than 50,000.

GROUP BY

When you need to aggregate data, the GROUP BY clause groups rows sharing a property so aggregate functions can be applied.

SELECT dept_id
     , COUNT(*) 
  FROM employees 
 GROUP BY dept_id

-- This groups employees by 'dept_id' and counts the number of employees in each department.

HAVING

Similar to WHERE, but applied to groups. HAVING filters groups based on aggregate properties.

SELECT dept_id
     , COUNT(*)
  FROM employees 
 GROUP BY dept_id
HAVING COUNT(*) > 10

-- Filters departments with more than 10 employees.

SELECT

Now, you specify the columns to return in the result. This step projects the required data.

SELECT name
     , salary
  FROM employees

-- Selects the 'name' and 'salary' columns from the 'employees' table.

DISTINCT

The DISTINCT clause is used to remove duplicate rows from the result set. It is applied after the SELECT clause has retrieved the columns.

SELECT DISTINCT dept_id 
  FROM employees 

-- Retrieves unique department IDs from the 'employees' table.

QUALIFY

The QUALIFY clause is used to filter rows after window functions have been applied. This is particularly useful for advanced analytics and ensures that only the desired rows are included in the final result.

SELECT name
     , salary
     , ROW_NUMBER() OVER (PARTITION BY dept_id 
                          ORDER BY salary DESC) AS rank
  FROM employees 
QUALIFY rank = 1

-- Selects the highest-paid employee in each department.

ORDER BY

Sorting the result set is done using the ORDER BY clause. It orders the data based on specified columns.

SELECT name
     , salary
  FROM employees
 ORDER BY salary DESC 

-- Orders employees by salary in descending order.

LIMIT

Finally, the LIMIT clause restricts the number of rows returned.

SELECT name
     , salary
  FROM employees
 ORDER BY salary DESC
 LIMIT 10 

-- Returns the top 10 highest-paid employees.

Conclusion

Understanding the order of execution in SQL allows you to write correct and efficient queries. By applying filters, joins, and aggregations in the right order, you can minimize the amount of data processed at each stage, use indexes effectively, and ensure that your queries run as fast as possible.

Have questions or need further clarification? Leave a comment below or reach out directly.

✅ Thank you for reading my article on SA Space! I welcome any questions, comments, or suggestions you may have.

Keep Knowledge Flowing by following me for more content on Solutions Architecture, System Design, Data Engineering, Business Analysis, and more. Your engagement is appreciated. 🚀

🔔 You can also follow my work on LinkedIn | Substack | X