Joining with generate_series for missing dates with derived date column
-1
votes
1
answer
105
views
There are a few similar questions to this (e.g.https://dba.stackexchange.com/questions/72419/filling-in-missing-dates-in-record-set-from-generate-series) but the solution does not appear to work in my case... Essentially I am trying to generate zero entries for dates not present in a series but I suspect the issue is that I am having to extract the date value from a timestamp? I've used SQL for years but very new to postgres - impressed so far, though.. Have tried both a left and right join here but no joy...
Here is a little test case (are sql fiddles still encouraged?):
-- temp test table - works as expected
WITH incomplete_data(payment_date, payment_id) AS (
VALUES
('2024-09-06 11:26:57.509429+01'::timestamp with time zone, 'uuid01')
,('2024-09-06 12:26:57.509429+01', 'uuid02')
,('2024-09-07 07:26:57.509429+01', 'uuid03')
,('2024-09-08 10:26:57.509429+01', 'uuid05')
,('2024-09-08 12:26:57.509429+01', 'uuid08')
,('2024-09-08 14:26:57.509429+01', 'uuid11')
,('2024-09-10 09:26:57.509429+01', 'uuid23')
)
select * from incomplete_data;
-- generated dates - work as expected
select * FROM (
SELECT generate_series(timestamp '2024-01-01'
, timestamp '2024-01-01' + interval '1 year - 1 day'
, interval '1 day')::date
) d(day)
;
-- join - failing to do what I was hoping..
WITH incomplete_data(payment_date, payment_id) AS (
VALUES
('2024-09-06 11:26:57.509429+01'::timestamp with time zone, 'uuid01')
,('2024-09-06 12:26:57.509429+01', 'uuid02')
,('2024-09-07 07:26:57.509429+01', 'uuid03')
,('2024-09-08 10:26:57.509429+01', 'uuid05')
,('2024-09-08 12:26:57.509429+01', 'uuid08')
,('2024-09-08 14:26:57.509429+01', 'uuid11')
,('2024-09-10 09:26:57.509429+01', 'uuid23')
)
select count(payment_id), date_trunc('day',payment_date)::date as time
FROM (
SELECT generate_series(timestamp '2024-01-01'
, timestamp '2024-01-01' + interval '1 year - 1 day'
, interval '1 day')::date
) d(day)
right JOIN incomplete_data p ON date_trunc('day',payment_date) = d.day
where payment_date BETWEEN '2024-09-01T12:55:36.824Z' AND '2024-09-30T13:55:36.824Z'
GROUP BY date_trunc('day',payment_date)
ORDER BY date_trunc('day',payment_date);
count | time
-------+------------
2 | 2024-09-06
1 | 2024-09-07
3 | 2024-09-08
1 | 2024-09-10
(4 rows)
I was hoping to get a row for every day in the month with zeroes for unpopulated days. The background is that this is for populating a grafana query.
Can anyone suggest what I am doing wrong or am I failing to grasp a bigger issue here? My version is: PostgreSQL 15.9 (Debian 15.9-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
**UPDATE**
jjanes answer below helped clarify the sequence of joining and filtering for me - this is the required select:
WITH incomplete_data(payment_date, payment_id) AS (
VALUES
('2024-09-06 11:26:57.509429+01'::timestamp with time zone, 'uuid01')
,('2024-09-06 12:26:57.509429+01', 'uuid02')
,('2024-09-07 07:26:57.509429+01', 'uuid03')
,('2024-09-08 10:26:57.509429+01', 'uuid05')
,('2024-09-08 12:26:57.509429+01', 'uuid08')
,('2024-09-08 14:26:57.509429+01', 'uuid11')
,('2024-09-10 09:26:57.509429+01', 'uuid23')
)
select count(payment_id), d.day as time
FROM (
SELECT generate_series(timestamp '2024-01-01'
, timestamp '2024-01-01' + interval '1 year - 1 day'
, interval '1 day')::date
) d(day)
left JOIN incomplete_data p ON date_trunc('day',payment_date) = d.day
where d.day BETWEEN '2024-09-01T12:55:36.824Z' AND '2024-09-30T13:55:36.824Z'
GROUP BY d.day
ORDER BY d.day
;
Asked by cam
(1 rep)
Nov 25, 2024, 03:31 PM
Last activity: Nov 25, 2024, 06:12 PM
Last activity: Nov 25, 2024, 06:12 PM