Sample Header Ad - 728x90

Optimal join for joining facts with scd-type-2 dimension for aggregation/reporting

1 vote
1 answer
865 views
I have a fact table and an scd-type-2 dimension table. I want to produce sales report by region and year. I have working solution with a query that joins them for reporting purposes. When I run the query in spark/databricks, it gives me a little warning at the bottom: Use range join optimization: This query has a join condition that can benefit from range join optimization. To improve performance, consider adding a range join hint. and points to this link . **Question:** Is there a more optimal way to query when I'm joining using a between condition (instead of = condition)? fact table:
create table sales
(name             string
,sale_date        date
,sold_amt         long);

insert into sales values
('John','2022-02-02',100),
('John','2022-03-03',100),
('John','2023-02-02',200),
('John','2023-03-03',200),
('Rick','2022-02-02',300),
('Rick','2023-02-02',400);
dimension table (scd-type-2)
create table employee_scd2
(name        string
,region      string
,start_date  date,
,end_date    date,
,is_current  boolean); -- unused, kept for completeness

insert into employee_scd2 values
('John','NAM',  '2010-01-01', '2022-12-31', false),
-- John transferred from NAM to APAC starting 2023
('John','APAC', '2023-01-01', '9999-01-01', true),
('Rick','NAM',  '2020-01-01', '9999-12-31', true);
sales report by region and year
select e.region,
       year(s.sale_date) as sale_year,
       SUM(s.sold_amt)   as sale_amt
from       sales s
left join  employee_scd2 e
  on     e.name       = s.name
  and    s.sale_date between e.start_date and e.end_date
group by e.region, year(s.sale_date);
--- I've read following and some more: * https://dba.stackexchange.com/questions/184639/join-fact-table-to-scd-type-2-how-to-write-query-sql-server * https://www.startdataengineering.com/post/how-to-join-fact-scd2-tables/ * How to implement Slowly Changing Dimensions (SCD2) Type 2 in Spark
Asked by Kashyap (145 rep)
Mar 7, 2023, 04:49 PM
Last activity: Jun 7, 2023, 05:43 PM