Sample Header Ad - 728x90

How to eliminate duplicates made by recursive CTE

0 votes
1 answer
1665 views
I have following schema in PostgreSQL 12 (objects and their relations are versioned by date ranges): CREATE TABLE tmp_deps ( id bigint, code text, name text, start_date date, end_date date ); CREATE TABLE tmp_dep_parents ( department_id bigint, parent_id bigint, start_date date, end_date date ); INSERT INTO tmp_deps (id, code, name, start_date, end_date) VALUES (1, '11111111', 'Root Dep', '1970-01-01', '9999-12-31'), (2, '11111112', 'Sub Root Dep', '1970-01-01', '9999-12-31'), (3, '11111113', 'Dep 1', '1970-01-01', '2021-02-23'), (3, '11111113', 'Dep 1', '2021-02-24', '9999-12-31'), (4, '11111114', 'Dep 2', '1970-01-01', '2020-01-01'), (4, '11111114', 'Dep 2', '2020-01-02', '9999-12-31'), (5, '11111115', 'Sub Dep 1', '1970-01-01', '9999-12-31'), (6, '11111116', 'Sub Dep 2', '1970-01-01', '9999-12-31'); INSERT INTO tmp_dep_parents (department_id, parent_id, start_date, end_date) VALUES (2, 1, '1970-01-01', '9999-12-31'), (3, 2, '1970-01-01', '2005-12-31'), (3, 1, '2006-01-01', '2006-12-31'), (3, 2, '2007-01-01', '9999-12-31'), (4, 2, '1970-01-01', '2010-05-01'), (4, 3, '2010-05-02', '2010-05-03'), (4, 2, '2010-05-04', '9999-12-31'), (5, 3, '1970-01-01', '9999-12-31'), (6, 4, '1970-01-01', '9999-12-31'); > objects and their relations are versioned by date ranges It means that object with id=1 can be named as "Root Department" from 1 Jan 2000 to 31 Dec 2021, and can be named as "Super Department" from 1 Jan 2022 to 31 Dec 9999. Object with id=3 can belong to object with id=2 from 1 Jan 1970 to 15 May 2003 and can belong to object with id=1 from 16 May 2003 to 31 Dec 9999 --- The hierarchy of objects can be changed at any time. I am using recursive CTE to create MATERIALIZED VIEW and I noticed duplicates when I filter objects and their relations by date. And also I need an ability to view historical data too that is the reason why I can't just filter objects and their relations by date inside CTE. My CTE is: WITH RECURSIVE deps AS ( SELECT dep.id, dep.start_date as dep_start_date, dep.end_date as dep_end_date, dep.name, dep.code, 1::int AS "depth", dep.parent_id, dep.dep_par_start_date, dep.dep_par_end_date FROM all_deps dep WHERE dep.code = '11111111' UNION SELECT dep.id, dep.start_date as dep_start_date, dep.end_date as dep_end_date, dep.name, dep.code, s.depth + 1 AS "depth", dep.parent_id, dep.dep_par_start_date, dep.dep_par_end_date FROM all_deps dep INNER JOIN deps s ON dep.parent_id = s.id ), all_deps AS ( SELECT dep.*, dp.parent_id, dp.start_date AS dep_par_start_date, dp.end_date AS dep_par_end_date FROM tmp_deps dep LEFT JOIN tmp_dep_parents dp ON dp.department_id = dep.id ) SELECT * FROM deps The problem with duplicates occurs when I add the "depth" output in CTE. And as I understand it correctly, the duplicates are hidden when I remove the "depth" output from CTE, but I can still get wrong data. My CTE contains all versions of objects and their relations. And I don't understand how to get the CTE to follow the correct hierarchy path. --- I want to see the hierarchy of departments and the departments themselves as of 15 May 2020 or 9 June 2021 or any other date, that is why I put all data states (versions) to CTE. Materialized view is created from RCTE: CREATE MATERIALIZED VIEW departments_mat_view AS WITH RECURSIVE deps AS ( SELECT dep.id, dep.start_date as dep_start_date, dep.end_date as dep_end_date, dep.name, dep.code, 1::int AS "depth", dep.parent_id, dep.dep_par_start_date, dep.dep_par_end_date FROM all_deps dep WHERE dep.code = '11111111' UNION SELECT dep.id, dep.start_date as dep_start_date, dep.end_date as dep_end_date, dep.name, dep.code, s.depth + 1 AS "depth", dep.parent_id, dep.dep_par_start_date, dep.dep_par_end_date FROM all_deps dep INNER JOIN deps s ON dep.parent_id = s.id ), all_deps AS ( SELECT dep.*, dp.parent_id, dp.start_date AS dep_par_start_date, dp.end_date AS dep_par_end_date FROM tmp_deps dep LEFT JOIN tmp_dep_parents dp ON dp.department_id = dep.id ) SELECT * FROM deps And here is select from MATERIALIZED VIEW: SELECT * FROM departments_mat_view WHERE dep_start_date = '2021-06-09' AND (dep_par_start_date IS NULL OR dep_par_start_date = '2021-06-09') Here is output from CTE: "id" "dep_start_date" "dep_end_date" "name" "code" "depth" "parent_id" "dep_par_start_date" "dep_par_end_date" 1 "1970-01-01" "9999-12-31" "Root Dep" 11111111 1 NULL NULL NULL 2 "1970-01-01" "9999-12-31" "Sub Root Dep" 11111112 2 1 "1970-01-01" "9999-12-31" 3 "2021-02-24" "9999-12-31" "Dep 1" 11111113 3 2 "2007-01-01" "9999-12-31" 4 "2020-01-02" "9999-12-31" "Dep 2" 11111114 3 2 "2010-05-04" "9999-12-31" 5 "1970-01-01" "9999-12-31" "Sub Dep 1" 11111115 3 3 "1970-01-01" "9999-12-31" 5 "1970-01-01" "9999-12-31" "Sub Dep 1" 11111115 4 3 "1970-01-01" "9999-12-31" 6 "1970-01-01" "9999-12-31" "Sub Dep 2" 11111116 4 4 "1970-01-01" "9999-12-31" 6 "1970-01-01" "9999-12-31" "Sub Dep 2" 11111116 5 4 "1970-01-01" "9999-12-31" Desired output is: "id" "dep_start_date" "dep_end_date" "name" "code" "depth" "parent_id" "dep_par_start_date" "dep_par_end_date" 1 "1970-01-01" "9999-12-31" "Root Dep" 11111111 1 NULL NULL NULL 2 "1970-01-01" "9999-12-31" "Sub Root Dep" 11111112 2 1 "1970-01-01" "9999-12-31" 3 "2021-02-24" "9999-12-31" "Dep 1" 11111113 3 2 "2007-01-01" "9999-12-31" 4 "2020-01-02" "9999-12-31" "Dep 2" 11111114 3 2 "2010-05-04" "9999-12-31" 5 "1970-01-01" "9999-12-31" "Sub Dep 1" 11111115 4 3 "1970-01-01" "9999-12-31" 6 "1970-01-01" "9999-12-31" "Sub Dep 2" 11111116 4 4 "1970-01-01" "9999-12-31"
Asked by Dmitry K. (143 rep)
Jun 9, 2021, 08:16 PM
Last activity: Jun 11, 2021, 11:14 AM