Sample Header Ad - 728x90

T-SQL : Slicing time periods by related time periods

2 votes
1 answer
634 views
Consider the following table that gives the validity dates for which an Item is associated to something called a Demand_Unit, and a certain Market : | Item_ID | Start_Date | End_Date | Demand_Unit | Market | |---------|------------|------------|-------------|----------| | X | 2020-06-01 | 2020-09-30 | A | GREECE | | X | 2020-11-01 | 2021-01-01 | A | BELGIUM | | X | 2021-01-01 | 2023-12-31 | A | USA | | Y | 2021-01-01 | 2021-02-01 | B | PORTUGAL | | Y | 2021-02-01 | 2021-06-07 | B | FRANCE | | Y | 2021-06-07 | 2022-01-03 | B | ITALY | | Y | 2022-01-03 | 2023-12-31 | B | SPAIN | | Y | 2024-01-01 | 2025-12-31 | B | UKRAINE | | Z | 2021-05-01 | 2021-06-30 | C | USA | | W | 2021-02-01 | 2021-05-30 | D | SWEDEN | | W | 2023-12-31 | 2025-05-30 | D | DANEMARK | | U | 2021-01-04 | 2022-02-07 | E | TUNISIA | | U | 2022-02-07 | 2025-01-05 | (null) | MOROCCO | > Important note: > You can never have overlapping dates because basically when one Market closes another opens. But it is possible to have non-contiguous periods (like for Item X for example) Now consider the following table that gives the Classifications of a Demand_Unit throughout time: | Demand_Unit | Start_Date | End_Date | Classification | |-------------|------------|------------|----------------| | A | 2021-01-01 | 2021-04-05 | N+ | | A | 2021-04-05 | 2023-12-31 | K- | | B | 2021-03-08 | 2021-07-26 | N+ | | B | 2021-07-26 | 2022-11-30 | L- | | C | 2021-01-01 | 2023-12-31 | N- | | D | 2021-06-01 | 2023-12-31 | K+ | | E | 2021-08-02 | 2022-01-03 | N+ | | E | 2022-01-03 | 2025-01-06 | C- | > Important note: > You can only have contiguous periods here. This table is extracted from a Slowly changing dimension. The goal is to 'slice' the first table to associate to each Item its Classification, while taking into account the time periods. The result should look like this: | Item_ID | Demand_Unit | Market | Item_Classification_Dt_Begin | Item_Classification_Dt_End | Item_Classification | |---------|-------------|----------|------------------------------|----------------------------|---------------------| | U | E | TUNISIA | 2021-01-04 | 2021-08-02 | (null) | | U | E | TUNISIA | 2021-08-02 | 2022-01-03 | N+ | | U | E | TUNISIA | 2022-01-03 | 2022-02-07 | C- | | U | (null) | MOROCCO | 2022-02-07 | 2025-01-05 | (null) | | W | D | SWEDEN | 2021-02-01 | 2021-05-30 | (null) | | W | D | DANEMARK | 2023-12-31 | 2023-12-31 | K+ | | W | D | DANEMARK | 2023-12-31 | 2025-05-30 | (null) | | X | A | GREECE | 2020-06-01 | 2020-09-30 | (null) | | X | A | BELGIUM | 2020-11-01 | 2021-01-01 | (null) | | X | A | USA | 2021-01-01 | 2021-04-05 | N+ | | X | A | USA | 2021-04-05 | 2023-12-31 | K- | | Y | B | PORTUGAL | 2021-01-01 | 2021-02-01 | (null) | | Y | B | FRANCE | 2021-02-01 | 2021-03-08 | (null) | | Y | B | FRANCE | 2021-03-08 | 2021-06-07 | N+ | | Y | B | ITALY | 2021-06-07 | 2021-07-26 | N+ | | Y | B | ITALY | 2021-07-26 | 2022-01-03 | L- | | Y | B | SPAIN | 2022-01-03 | 2022-11-30 | L- | | Y | B | SPAIN | 2022-11-30 | 2023-12-31 | (null) | | Y | B | UKRAINE | 2024-01-01 | 2025-12-31 | (null) | | Z | C | USA | 2021-05-01 | 2021-06-30 | (null) | | Z | C | USA | 2021-05-01 | 2021-06-30 | N- | These are some examples with 'tricky' cases (Items X and Y) and a 'simple' case (Item Z) which has its dates completely included in the classifications period, + other cases to try and test possibilities I could think of... I'm working on a solution that uses LAG() LEAD() and a three-part UNION, but I'm afraid I'm forgetting something and the original table has hundreds of thousands of rows. I'm not trying to have someone do my work here, I would just like your opinions on how should this be properly done. Here's my solution so far, it looks like it works but I don't find it elegant and I'm sure the second and third UNION block can be simplified, but I just can't see straight anymore. ;WITH CLASSIFICATIONS_LAG_LEAD AS ( SELECT Demand_Unit ,Start_Date ,End_Date ,Classification ,LAG(End_Date) OVER (PARTITION BY Demand_Unit ORDER BY Start_Date) AS LAGD ,LEAD(Start_Date) OVER (PARTITION BY Demand_Unit ORDER BY Start_Date ) AS LEADD FROM Classifications ) -- This first block handles Item periods that are not covered by classifications periods -- and/or the NULL classif portion of Item periods that are partially covered by classifications periods SELECT T1.Item_ID ,T1.Demand_Unit ,T1.Market ,CASE WHEN T2.End_Date > T1.Start_Date AND T2.End_Date T1.Start_Date AND T2.Start_Date T2.End_Date ) ) UNION -- The following block handles Items periods that are (at least partially) -- contained within the Classification period, and checks which DT_BEGIN to use SELECT T1.Item_ID ,T1.Demand_Unit ,T1.Market ,CASE WHEN T1.Start_Date T2.Start_Date UNION -- The following block handles Items periods that are (at least partially) -- contained within the Classification period, and checks which DT_END to use SELECT T1.Item_ID ,T1.Demand_Unit ,T1.Market ,T1.Start_Date AS Item_Classification_Dt_Begin ,CASE WHEN T1.End_Date > T2.End_Date THEN T2.End_Date ELSE T1.End_Date END AS Item_Classification_Dt_End ,T2.Classification AS Item_Classification FROM Items T1 INNER JOIN Classifications T2 ON T1.Demand_Unit = T2.Demand_Unit AND T1.Start_Date = T2.Start_Date ORDER BY Item_ID, 4 Here is a SQLFiddle Many thanks in advance Edit : last minute fixes in my solution
Asked by Yass T (33 rep)
Oct 27, 2021, 04:06 PM
Last activity: Jul 23, 2023, 12:02 AM