Joining with generate_series for missing dates with derived date column

-1 votes

1 answer

105 views

There are a few similar questions to this (e.g.https://dba.stackexchange.com/questions/72419/filling-in-missing-dates-in-record-set-from-generate-series) but the solution does not appear to work in my case... Essentially I am trying to generate zero entries for dates not present in a series but I suspect the issue is that I am having to extract the date value from a timestamp? I've used SQL for years but very new to postgres - impressed so far, though.. Have tried both a left and right join here but no joy... Here is a little test case (are sql fiddles still encouraged?):

-- temp test table - works as expected
WITH incomplete_data(payment_date, payment_id) AS (
   VALUES 
     ('2024-09-06 11:26:57.509429+01'::timestamp with time zone, 'uuid01')
    ,('2024-09-06 12:26:57.509429+01', 'uuid02')
    ,('2024-09-07 07:26:57.509429+01', 'uuid03')
    ,('2024-09-08 10:26:57.509429+01', 'uuid05')
    ,('2024-09-08 12:26:57.509429+01', 'uuid08')
    ,('2024-09-08 14:26:57.509429+01', 'uuid11')
    ,('2024-09-10 09:26:57.509429+01', 'uuid23')
   )
select * from incomplete_data;

-- generated dates - work as expected
select * FROM  (
   SELECT generate_series(timestamp '2024-01-01'
                        , timestamp '2024-01-01' + interval '1 year - 1 day'
                        , interval  '1 day')::date
   ) d(day)
;
   
-- join - failing to do what I was hoping..
WITH incomplete_data(payment_date, payment_id) AS (
   VALUES 
     ('2024-09-06 11:26:57.509429+01'::timestamp with time zone, 'uuid01')
    ,('2024-09-06 12:26:57.509429+01', 'uuid02')
    ,('2024-09-07 07:26:57.509429+01', 'uuid03')
    ,('2024-09-08 10:26:57.509429+01', 'uuid05')
    ,('2024-09-08 12:26:57.509429+01', 'uuid08')
    ,('2024-09-08 14:26:57.509429+01', 'uuid11')
    ,('2024-09-10 09:26:57.509429+01', 'uuid23')
   )
select count(payment_id), date_trunc('day',payment_date)::date as time
FROM  (
   SELECT generate_series(timestamp '2024-01-01'
                        , timestamp '2024-01-01' + interval '1 year - 1 day'
                        , interval  '1 day')::date
   ) d(day)
right  JOIN incomplete_data p ON date_trunc('day',payment_date) = d.day
where payment_date BETWEEN '2024-09-01T12:55:36.824Z' AND '2024-09-30T13:55:36.824Z'
GROUP  BY date_trunc('day',payment_date)
ORDER  BY date_trunc('day',payment_date);

 count |    time
-------+------------
     2 | 2024-09-06
     1 | 2024-09-07
     3 | 2024-09-08
     1 | 2024-09-10
(4 rows)

I was hoping to get a row for every day in the month with zeroes for unpopulated days. The background is that this is for populating a grafana query. Can anyone suggest what I am doing wrong or am I failing to grasp a bigger issue here? My version is: PostgreSQL 15.9 (Debian 15.9-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit **UPDATE** jjanes answer below helped clarify the sequence of joining and filtering for me - this is the required select:

WITH incomplete_data(payment_date, payment_id) AS (
   VALUES 
     ('2024-09-06 11:26:57.509429+01'::timestamp with time zone, 'uuid01')
    ,('2024-09-06 12:26:57.509429+01', 'uuid02')
    ,('2024-09-07 07:26:57.509429+01', 'uuid03')
    ,('2024-09-08 10:26:57.509429+01', 'uuid05')
    ,('2024-09-08 12:26:57.509429+01', 'uuid08')
    ,('2024-09-08 14:26:57.509429+01', 'uuid11')
    ,('2024-09-10 09:26:57.509429+01', 'uuid23')
   )
select count(payment_id), d.day as time
FROM  (
   SELECT generate_series(timestamp '2024-01-01'
                        , timestamp '2024-01-01' + interval '1 year - 1 day'
                        , interval  '1 day')::date
   ) d(day)
left JOIN incomplete_data p ON date_trunc('day',payment_date) = d.day
where d.day BETWEEN '2024-09-01T12:55:36.824Z' AND '2024-09-30T13:55:36.824Z'
GROUP  BY d.day
ORDER  BY d.day
;

Asked by cam (1 rep)

Nov 25, 2024, 03:31 PM
Last activity: Nov 25, 2024, 06:12 PM

Joining with generate_series for missing dates with derived date column

Related Questions