T-SQL : Slicing time periods by related time periods
2
votes
1
answer
634
views
Consider the following table that gives the validity dates for which an
Item
is associated to something called a Demand_Unit
, and a certain Market
:
| Item_ID | Start_Date | End_Date | Demand_Unit | Market |
|---------|------------|------------|-------------|----------|
| X | 2020-06-01 | 2020-09-30 | A | GREECE |
| X | 2020-11-01 | 2021-01-01 | A | BELGIUM |
| X | 2021-01-01 | 2023-12-31 | A | USA |
| Y | 2021-01-01 | 2021-02-01 | B | PORTUGAL |
| Y | 2021-02-01 | 2021-06-07 | B | FRANCE |
| Y | 2021-06-07 | 2022-01-03 | B | ITALY |
| Y | 2022-01-03 | 2023-12-31 | B | SPAIN |
| Y | 2024-01-01 | 2025-12-31 | B | UKRAINE |
| Z | 2021-05-01 | 2021-06-30 | C | USA |
| W | 2021-02-01 | 2021-05-30 | D | SWEDEN |
| W | 2023-12-31 | 2025-05-30 | D | DANEMARK |
| U | 2021-01-04 | 2022-02-07 | E | TUNISIA |
| U | 2022-02-07 | 2025-01-05 | (null) | MOROCCO |
> Important note:
> You can never have overlapping dates because basically when one Market closes another opens. But it is possible to have non-contiguous periods (like for Item
X for example)
Now consider the following table that gives the Classification
s of a Demand_Unit
throughout time:
| Demand_Unit | Start_Date | End_Date | Classification |
|-------------|------------|------------|----------------|
| A | 2021-01-01 | 2021-04-05 | N+ |
| A | 2021-04-05 | 2023-12-31 | K- |
| B | 2021-03-08 | 2021-07-26 | N+ |
| B | 2021-07-26 | 2022-11-30 | L- |
| C | 2021-01-01 | 2023-12-31 | N- |
| D | 2021-06-01 | 2023-12-31 | K+ |
| E | 2021-08-02 | 2022-01-03 | N+ |
| E | 2022-01-03 | 2025-01-06 | C- |
> Important note:
> You can only have contiguous periods here. This table is extracted from a Slowly changing dimension.
The goal is to 'slice' the first table to associate to each Item
its Classification
, while taking into account the time periods. The result should look like this:
| Item_ID | Demand_Unit | Market | Item_Classification_Dt_Begin | Item_Classification_Dt_End | Item_Classification |
|---------|-------------|----------|------------------------------|----------------------------|---------------------|
| U | E | TUNISIA | 2021-01-04 | 2021-08-02 | (null) |
| U | E | TUNISIA | 2021-08-02 | 2022-01-03 | N+ |
| U | E | TUNISIA | 2022-01-03 | 2022-02-07 | C- |
| U | (null) | MOROCCO | 2022-02-07 | 2025-01-05 | (null) |
| W | D | SWEDEN | 2021-02-01 | 2021-05-30 | (null) |
| W | D | DANEMARK | 2023-12-31 | 2023-12-31 | K+ |
| W | D | DANEMARK | 2023-12-31 | 2025-05-30 | (null) |
| X | A | GREECE | 2020-06-01 | 2020-09-30 | (null) |
| X | A | BELGIUM | 2020-11-01 | 2021-01-01 | (null) |
| X | A | USA | 2021-01-01 | 2021-04-05 | N+ |
| X | A | USA | 2021-04-05 | 2023-12-31 | K- |
| Y | B | PORTUGAL | 2021-01-01 | 2021-02-01 | (null) |
| Y | B | FRANCE | 2021-02-01 | 2021-03-08 | (null) |
| Y | B | FRANCE | 2021-03-08 | 2021-06-07 | N+ |
| Y | B | ITALY | 2021-06-07 | 2021-07-26 | N+ |
| Y | B | ITALY | 2021-07-26 | 2022-01-03 | L- |
| Y | B | SPAIN | 2022-01-03 | 2022-11-30 | L- |
| Y | B | SPAIN | 2022-11-30 | 2023-12-31 | (null) |
| Y | B | UKRAINE | 2024-01-01 | 2025-12-31 | (null) |
| Z | C | USA | 2021-05-01 | 2021-06-30 | (null) |
| Z | C | USA | 2021-05-01 | 2021-06-30 | N- |
These are some examples with 'tricky' cases (Items X and Y) and a 'simple' case (Item Z) which has its dates completely included in the classifications period, + other cases to try and test possibilities I could think of...
I'm working on a solution that uses LAG()
LEAD()
and a three-part UNION
, but I'm afraid I'm forgetting something and the original table has hundreds of thousands of rows.
I'm not trying to have someone do my work here, I would just like your opinions on how should this be properly done.
Here's my solution so far, it looks like it works but I don't find it elegant and I'm sure the second and third UNION
block can be simplified, but I just can't see straight anymore.
;WITH CLASSIFICATIONS_LAG_LEAD AS
(
SELECT
Demand_Unit
,Start_Date
,End_Date
,Classification
,LAG(End_Date) OVER (PARTITION BY Demand_Unit ORDER BY Start_Date) AS LAGD
,LEAD(Start_Date) OVER (PARTITION BY Demand_Unit ORDER BY Start_Date ) AS LEADD
FROM Classifications
)
-- This first block handles Item periods that are not covered by classifications periods
-- and/or the NULL classif portion of Item periods that are partially covered by classifications periods
SELECT
T1.Item_ID
,T1.Demand_Unit
,T1.Market
,CASE
WHEN T2.End_Date > T1.Start_Date
AND T2.End_Date T1.Start_Date
AND T2.Start_Date T2.End_Date
)
)
UNION
-- The following block handles Items periods that are (at least partially)
-- contained within the Classification period, and checks which DT_BEGIN to use
SELECT
T1.Item_ID
,T1.Demand_Unit
,T1.Market
,CASE
WHEN T1.Start_Date T2.Start_Date
UNION
-- The following block handles Items periods that are (at least partially)
-- contained within the Classification period, and checks which DT_END to use
SELECT
T1.Item_ID
,T1.Demand_Unit
,T1.Market
,T1.Start_Date AS Item_Classification_Dt_Begin
,CASE
WHEN T1.End_Date > T2.End_Date THEN T2.End_Date
ELSE T1.End_Date
END AS Item_Classification_Dt_End
,T2.Classification AS Item_Classification
FROM Items T1
INNER JOIN Classifications T2
ON T1.Demand_Unit = T2.Demand_Unit
AND T1.Start_Date = T2.Start_Date
ORDER BY Item_ID, 4
Here is a SQLFiddle
Many thanks in advance
Edit : last minute fixes in my solution
Asked by Yass T
(33 rep)
Oct 27, 2021, 04:06 PM
Last activity: Jul 23, 2023, 12:02 AM
Last activity: Jul 23, 2023, 12:02 AM