Essential SQL Clauses and Order of Execution
Learn about the essential SQL clauses and the Order of Execution - One of the least known concepts that help you really grasp the idea behind the SQL queries.
SQL Order of Execution
I always tell my mentees that the fundamentals are the key to success. One of the most fundamental concepts in SQL is the Order of Execution. When writing SQL, we always write clauses in a specific order but guess what, SQL engine executes these clauses in a completely different way.
The Real Order
FROM
The starting point is selecting the table(s) from which data will be retrieved. This sets the scope for the query.
SELECT * FROM employees
-- Selects all data FROM the 'employees' table.
JOIN
If your query includes multiple tables, the next step is joining them. Joins combine rows from two or more tables based on a related column.
SELECT *
FROM employees
JOIN departments ON employees.dept_id = departments.id
-- Joins the 'employees' table with the 'departments' table where there is a matching 'dept_id'.
WHERE
The WHERE clause filters rows based on specified conditions. This step reduces the data set to only the relevant rows.
SELECT *
FROM employees
WHERE salary > 50000
-- Filters employees with salaries greater than 50,000.
GROUP BY
When you need to aggregate data, the GROUP BY clause groups rows sharing a property so aggregate functions can be applied.
SELECT dept_id
, COUNT(*)
FROM employees
GROUP BY dept_id
-- This groups employees by 'dept_id' and counts the number of employees in each department.
HAVING
Similar to WHERE, but applied to groups. HAVING filters groups based on aggregate properties.
SELECT dept_id
, COUNT(*)
FROM employees
GROUP BY dept_id
HAVING COUNT(*) > 10
-- Filters departments with more than 10 employees.
SELECT
Now, you specify the columns to return in the result. This step projects the required data.
SELECT name
, salary
FROM employees
-- Selects the 'name' and 'salary' columns from the 'employees' table.
DISTINCT
The DISTINCT clause is used to remove duplicate rows from the result set. It is applied after the SELECT clause has retrieved the columns.
SELECT DISTINCT dept_id
FROM employees
-- Retrieves unique department IDs from the 'employees' table.
QUALIFY
The QUALIFY clause is used to filter rows after window functions have been applied. This is particularly useful for advanced analytics and ensures that only the desired rows are included in the final result.
SELECT name
, salary
, ROW_NUMBER() OVER (PARTITION BY dept_id
ORDER BY salary DESC) AS rank
FROM employees
QUALIFY rank = 1
-- Selects the highest-paid employee in each department.
ORDER BY
Sorting the result set is done using the ORDER BY clause. It orders the data based on specified columns.
SELECT name
, salary
FROM employees
ORDER BY salary DESC
-- Orders employees by salary in descending order.
LIMIT
Finally, the LIMIT clause restricts the number of rows returned.
SELECT name
, salary
FROM employees
ORDER BY salary DESC
LIMIT 10
-- Returns the top 10 highest-paid employees.
Conclusion
Understanding the order of execution in SQL allows you to write correct and efficient queries. By applying filters, joins, and aggregations in the right order, you can minimize the amount of data processed at each stage, use indexes effectively, and ensure that your queries run as fast as possible.
Have questions or need further clarification? Leave a comment below or reach out directly.
✅ Thank you for reading my article on SA Space! I welcome any questions, comments, or suggestions you may have.
Keep Knowledge Flowing by following me for more content on Solutions Architecture, System Design, Data Engineering, Business Analysis, and more. Your engagement is appreciated. 🚀