Sample Header Ad - 728x90

Aggregate SCD Type 2 data "as of" each day

1 vote
1 answer
249 views
## Problem When working with SCD type 2 data it's easy to see the state of a table "as of" a given point in time by using your date columns, eg: valid_from and valid_to. For example:
select * from table
where '2023-11-01' between valid_from and valid_to
You can then group by or use window functions to aggregate. But what if I want to do that repeatedly, for each date in a range (eg: daily)? I do not need to aggregate *across* these dates, just within each one. ## Example Let's say I have a table that tracks the quantity of people for each reservation_id. Both the quantity of people and reservation_status can change over time, as tracked by the table. Each reservation is attached to an event_id. event_date is included to help limit scope if needed (see assumptions below). | reservation_id | reservation_status | quantity | event_id | event_date | valid_from | valid_to | | -------------- | ------------------ | -------- | -------- | ---------- | ---------- | -------- | | 1 | active | 4 | 100 | 2023-05-25 | 2023-01-01 | 2023-01-02 | | 2 | active | 2 | 200 | 2024-01-07 | 2023-01-01 | 2023-01-03 | | 3 | active | 7 | 100 | 2023-05-25 | 2023-01-02 | 9999-12-31 | | 4 | active | 1 | 200 | 2024-01-07 | 2023-01-02 | 9999-12-31 | | 1 | active | 5 | 100 | 2023-05-25 | 2023-01-03 | 9999-12-31 | | 5 | active | 8 | 100 | 2023-05-25 | 2023-01-03 | 9999-12-31 | | 2 | cancelled | 2 | 200 | 2024-01-07 | 2023-01-04 | 9999-12-31 | | 6 | active | 3 | 100 | 2023-05-25 | 2023-01-06 | 9999-12-31 | PostgreSQL since BigQuery is more difficult to test – db-fiddle / SQL:
-sql
CREATE TABLE Reservations (
  "reservation_id" INTEGER,
  "reservation_status" VARCHAR(9),
  "quantity" INTEGER,
  "event_id" INTEGER,
  "event_date" DATE,
  "valid_from" DATE,
  "valid_to" DATE
);

INSERT INTO Reservations
  ("reservation_id", "reservation_status", "quantity", "event_id", "event_date", "valid_from", "valid_to")
VALUES
  ('1', 'active', '4', '100', '2023-05-25', '2023-01-01', '2023-01-02'),
  ('2', 'active', '2', '200', '2024-01-07', '2023-01-01', '2023-01-03'),
  ('3', 'active', '7', '100', '2023-05-25', '2023-01-02', '9999-12-31'),
  ('4', 'active', '1', '200', '2024-01-07', '2023-01-02', '9999-12-31'),
  ('1', 'active', '5', '100', '2023-05-25', '2023-01-03', '9999-12-31'),
  ('5', 'active', '8', '100', '2023-05-25', '2023-01-03', '9999-12-31'),
  ('2', 'cancelled', '2', '200', '2024-01-07', '2023-01-04', '9999-12-31'),
  ('6', 'active', '3', '100', '2023-05-25', '2023-01-06', '9999-12-31');
While this is ultimately for BigQuery, answer will be accepted in any dialect as long as it's somewhat generic. ##### Assumptions - "As of" dates can be a list or range based off valid_from min/max - valid_to of 9999-12-31 is the most recent data - All reservations for a given event will be between event_date - INTERVAL '2 years' and event_date. This doesn't change anything for this example, but maybe is useful for scaling(?) ### Desired output I'd like to know the sum of quantity grouped by event_id and reservation_status *as of* each interval (day). | as_of_date | event_id | reservation_status | sum_quantity | | ---------- | -------- | ------------------ | ------------ | | 2023-01-01 | 100 | active | 4 | | 2023-01-01 | 200 | active | 2 | | 2023-01-02 | 100 | active | 11 | | 2023-01-02 | 200 | active | 2 | | 2023-01-03 | 100 | active | 20 | | 2023-01-03 | 200 | active | 3 | | 2023-01-04 | 100 | active | 20 | | 2023-01-04 | 200 | active | 1 | | 2023-01-04 | 200 | cancelled | 2 | | 2023-01-06 | 100 | active | 23 | *Rough estimate of row values. Will be different if using complete range of dates.* I essentially want to do the following:
-sql
/* Invalid SQL, just for conceptual purposes */

-- Given a list of dates, for each "date":
select
  event_id,
  reservation_status,
  sum(quantity)
from table
where {{date}} between valid_from and valid_to
group by
  event_id,
  reservation_status
I believe this can be done using procedural language, such as a for loop, but I feel like I am overthinking that, and just having trouble combining simpler concepts.
Asked by camtech (143 rep)
Nov 28, 2023, 05:58 AM
Last activity: Nov 29, 2023, 08:03 PM