SQL Practical Example: Calculating Number of Consecutive Days
Real-world application of the Gaps and Islands problem in SQL.
Calculating the Number of Consecutive Days
Sometimes, data analysis gets a bit tricky, especially when you need to group consecutive events, like logins on consecutive days. Let's walk through a common problem and solution: identifying consecutive login days for each customer.
Problem Description
You have a table called
CUSTOMER_LOGINS
with two columns:
customer_id: Identifier for each customer.
login_date: The date when the customer logged in.
Calculate the number of consecutive login dates for each customer. The query should return:
customer_id
interval_start (start of consecutive login period)
interval_end (end of consecutive login period)
consecutive_days (total number of consecutive days)
Create Test Table
Let’s create a test table CUSTOMER_LOGINS
for our problem:
-- Create the CUSTOMER_LOGINS table
CREATE TABLE CUSTOMER_LOGINS (
customer_id INT,
login_date DATE
);
-- Insert some sample data
INSERT INTO CUSTOMER_LOGINS (customer_id, login_date)
VALUES
(1, '2022-01-09'), (1, '2022-01-10'),
(1, '2022-01-11'), (1, '2022-01-18'),
(1, '2022-01-19'), (1, '2022-01-20'),
(1, '2022-01-21'), (2, '2022-01-09'),
(2, '2022-10-10'), (2, '2022-10-11');
---
SELECT *
FROM CUSTOMER_LOGINS;
| customer_id | login_date |
+ ----------- + ------------------------ +
| 1 | 2022-01-09T00:00:00.000Z |
| 1 | 2022-01-10T00:00:00.000Z |
| 1 | 2022-01-11T00:00:00.000Z |
| 1 | 2022-01-18T00:00:00.000Z |
| 1 | 2022-01-19T00:00:00.000Z |
| 1 | 2022-01-20T00:00:00.000Z |
| 1 | 2022-01-21T00:00:00.000Z |
| 2 | 2022-01-09T00:00:00.000Z |
| 2 | 2022-10-10T00:00:00.000Z |
| 2 | 2022-10-11T00:00:00.000Z |
Solution
Step I: Rank Dates
At first sight, it's not easy to guess how should we approach this problem. But if we think about the intermediary output, we probably need a column to group consecutive dates. Let's rank the date column and see what it gives us.
We’ll use DENSE_RANK()
to rank login dates and group them by customer.
SELECT
customer_id,
login_date,
DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY login_date) AS rn
FROM CUSTOMER_LOGINS;
| customer_id | login_date | rn |
+ ----------- + ------------------------ + --- +
| 1 | 2022-01-09T00:00:00.000Z | 1 |
| 1 | 2022-01-10T00:00:00.000Z | 2 |
| 1 | 2022-01-11T00:00:00.000Z | 3 |
| 1 | 2022-01-18T00:00:00.000Z | 4 |
| 1 | 2022-01-19T00:00:00.000Z | 5 |
| 1 | 2022-01-20T00:00:00.000Z | 6 |
| 1 | 2022-01-21T00:00:00.000Z | 7 |
| 2 | 2022-01-09T00:00:00.000Z | 1 |
| 2 | 2022-10-10T00:00:00.000Z | 2 |
| 2 | 2022-10-11T00:00:00.000Z | 3 |
Step II: Subtract to Get Groups
Now, we’ll subtract the rank from the login date. If you think for a minute, you will notice that by subtracting rn from login_date we get a unique group for consecutive dates.
WITH ranked_logins AS (
SELECT
customer_id,
login_date,
DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY login_date) AS rn
FROM CUSTOMER_LOGINS
)
SELECT
customer_id,
login_date,
login_date - INTERVAL '1' DAY * rn AS grp_date
FROM ranked_logins;
| customer_id | login_date | grp_date |
| ----------- | ------------------------ | ------------------------ |
| 1 | 2022-01-09T00:00:00.000Z | 2022-01-08T00:00:00.000Z |
| 1 | 2022-01-10T00:00:00.000Z | 2022-01-08T00:00:00.000Z |
| 1 | 2022-01-11T00:00:00.000Z | 2022-01-08T00:00:00.000Z |
| 1 | 2022-01-18T00:00:00.000Z | 2022-01-14T00:00:00.000Z |
| 1 | 2022-01-19T00:00:00.000Z | 2022-01-14T00:00:00.000Z |
| 1 | 2022-01-20T00:00:00.000Z | 2022-01-14T00:00:00.000Z |
| 1 | 2022-01-21T00:00:00.000Z | 2022-01-14T00:00:00.000Z |
| 2 | 2022-01-09T00:00:00.000Z | 2022-01-08T00:00:00.000Z |
| 2 | 2022-10-10T00:00:00.000Z | 2022-10-08T00:00:00.000Z |
| 2 | 2022-10-11T00:00:00.000Z | 2022-10-08T00:00:00.000Z |
Step III: Find Start and End Dates
The next step is to find the minimum and maximum dates for each grp_date, it will give us the interval_start_date
and interval_end_date
.
WITH ranked_logins AS (
SELECT
customer_id,
login_date,
DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY login_date) AS rn
FROM CUSTOMER_LOGINS
),
date_groups AS (
SELECT
customer_id,
login_date,
login_date - INTERVAL '1' DAY * rn AS grp_date
FROM ranked_logins
)
SELECT
customer_id,
MIN(login_date) AS interval_start,
MAX(login_date) AS interval_end
FROM date_groups
GROUP BY customer_id, grp_date
ORDER BY customer_id, interval_start;
| customer_id | interval_start | interval_end |
| ----------- | ------------------------ | ------------------------ |
| 1 | 2022-01-09T00:00:00.000Z | 2022-01-11T00:00:00.000Z |
| 1 | 2022-01-18T00:00:00.000Z | 2022-01-21T00:00:00.000Z |
| 2 | 2022-01-09T00:00:00.000Z | 2022-01-09T00:00:00.000Z |
| 2 | 2022-10-10T00:00:00.000Z | 2022-10-11T00:00:00.000Z |
Step IV: Calculate Consecutive Days
The last step is trivial, we calculate the number of consecutive days by taking the difference between the interval_end_date
and interval_start_date
:
WITH ranked_logins AS (
SELECT
customer_id,
login_date,
DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY login_date) AS rn
FROM CUSTOMER_LOGINS
),
date_groups AS (
SELECT
customer_id,
login_date,
login_date - INTERVAL '1' DAY * rn AS grp_date
FROM ranked_logins
)
SELECT
customer_id,
MIN(login_date) AS interval_start,
MAX(login_date) AS interval_end,
1 + MAX(login_date) - MIN(login_date) AS consecutive_days
FROM date_groups
GROUP BY customer_id, grp_date
ORDER BY customer_id, interval_start;
| customer_id | interval_start | interval_end | consecutive_days |
| ----------- | ------------------------ | ------------------------ | ---------------- |
| 1 | 2022-01-09T00:00:00.000Z | 2022-01-11T00:00:00.000Z | 3 |
| 1 | 2022-01-18T00:00:00.000Z | 2022-01-21T00:00:00.000Z | 4 |
| 2 | 2022-01-09T00:00:00.000Z | 2022-01-09T00:00:00.000Z | 1 |
| 2 | 2022-10-10T00:00:00.000Z | 2022-10-11T00:00:00.000Z | 2 |
This approach shows how SQL can be used to handle Gaps and Islands problems. By understanding ranking and grouping strategies, you can solve similar problems for consecutive events or other time-based data.
I hope you liked the article. If you have questions or need further clarification, leave a comment below or reach out directly.
✅ Thank you for reading my article on SA Space! I welcome any questions, comments, or suggestions you may have.
Keep Knowledge Flowing by following me for more content on Solutions Architecture, System Design, Data Engineering, Business Analysis, and more. Your engagement is appreciated. 🚀
🔔 You can also follow my work on LinkedIn | Substack | My Resume