How to eliminate duplicates made by recursive CTE
0
votes
1
answer
1665
views
I have following schema in PostgreSQL 12 (objects and their relations are versioned by date ranges):
CREATE TABLE tmp_deps (
id bigint,
code text,
name text,
start_date date,
end_date date
);
CREATE TABLE tmp_dep_parents (
department_id bigint,
parent_id bigint,
start_date date,
end_date date
);
INSERT INTO tmp_deps (id, code, name, start_date, end_date)
VALUES
(1, '11111111', 'Root Dep', '1970-01-01', '9999-12-31'),
(2, '11111112', 'Sub Root Dep', '1970-01-01', '9999-12-31'),
(3, '11111113', 'Dep 1', '1970-01-01', '2021-02-23'),
(3, '11111113', 'Dep 1', '2021-02-24', '9999-12-31'),
(4, '11111114', 'Dep 2', '1970-01-01', '2020-01-01'),
(4, '11111114', 'Dep 2', '2020-01-02', '9999-12-31'),
(5, '11111115', 'Sub Dep 1', '1970-01-01', '9999-12-31'),
(6, '11111116', 'Sub Dep 2', '1970-01-01', '9999-12-31');
INSERT INTO tmp_dep_parents (department_id, parent_id, start_date, end_date)
VALUES
(2, 1, '1970-01-01', '9999-12-31'),
(3, 2, '1970-01-01', '2005-12-31'),
(3, 1, '2006-01-01', '2006-12-31'),
(3, 2, '2007-01-01', '9999-12-31'),
(4, 2, '1970-01-01', '2010-05-01'),
(4, 3, '2010-05-02', '2010-05-03'),
(4, 2, '2010-05-04', '9999-12-31'),
(5, 3, '1970-01-01', '9999-12-31'),
(6, 4, '1970-01-01', '9999-12-31');
> objects and their relations are versioned by date ranges
It means that object with id=1 can be named as "Root Department" from 1 Jan 2000 to 31 Dec 2021, and can be named as "Super Department" from 1 Jan 2022 to 31 Dec 9999.
Object with id=3 can belong to object with id=2 from 1 Jan 1970 to 15 May 2003 and can belong to object with id=1 from 16 May 2003 to 31 Dec 9999
---
The hierarchy of objects can be changed at any time.
I am using recursive CTE to create MATERIALIZED VIEW and I noticed duplicates when I filter objects and their relations by date. And also I need an ability to view historical data too that is the reason why I can't just filter objects and their relations by date inside CTE.
My CTE is:
WITH RECURSIVE deps AS (
SELECT
dep.id,
dep.start_date as dep_start_date,
dep.end_date as dep_end_date,
dep.name,
dep.code,
1::int AS "depth",
dep.parent_id,
dep.dep_par_start_date,
dep.dep_par_end_date
FROM all_deps dep
WHERE dep.code = '11111111'
UNION
SELECT
dep.id,
dep.start_date as dep_start_date,
dep.end_date as dep_end_date,
dep.name,
dep.code,
s.depth + 1 AS "depth",
dep.parent_id,
dep.dep_par_start_date,
dep.dep_par_end_date
FROM all_deps dep
INNER JOIN deps s ON dep.parent_id = s.id
),
all_deps AS (
SELECT
dep.*,
dp.parent_id,
dp.start_date AS dep_par_start_date,
dp.end_date AS dep_par_end_date
FROM tmp_deps dep
LEFT JOIN tmp_dep_parents dp ON dp.department_id = dep.id
)
SELECT * FROM deps
The problem with duplicates occurs when I add the "depth" output in CTE.
And as I understand it correctly, the duplicates are hidden when I remove the "depth" output from CTE, but I can still get wrong data.
My CTE contains all versions of objects and their relations.
And I don't understand how to get the CTE to follow the correct hierarchy path.
---
I want to see the hierarchy of departments and the departments themselves as of 15 May 2020 or 9 June 2021 or any other date, that is why I put all data states (versions) to CTE.
Materialized view is created from RCTE:
CREATE MATERIALIZED VIEW departments_mat_view AS
WITH RECURSIVE deps AS (
SELECT
dep.id,
dep.start_date as dep_start_date,
dep.end_date as dep_end_date,
dep.name,
dep.code,
1::int AS "depth",
dep.parent_id,
dep.dep_par_start_date,
dep.dep_par_end_date
FROM all_deps dep
WHERE dep.code = '11111111'
UNION
SELECT
dep.id,
dep.start_date as dep_start_date,
dep.end_date as dep_end_date,
dep.name,
dep.code,
s.depth + 1 AS "depth",
dep.parent_id,
dep.dep_par_start_date,
dep.dep_par_end_date
FROM all_deps dep
INNER JOIN deps s ON dep.parent_id = s.id
),
all_deps AS (
SELECT
dep.*,
dp.parent_id,
dp.start_date AS dep_par_start_date,
dp.end_date AS dep_par_end_date
FROM tmp_deps dep
LEFT JOIN tmp_dep_parents dp ON dp.department_id = dep.id
)
SELECT * FROM deps
And here is select from MATERIALIZED VIEW:
SELECT *
FROM departments_mat_view
WHERE
dep_start_date = '2021-06-09'
AND (dep_par_start_date IS NULL OR dep_par_start_date = '2021-06-09')
Here is output from CTE:
"id" "dep_start_date" "dep_end_date" "name" "code" "depth" "parent_id" "dep_par_start_date" "dep_par_end_date"
1 "1970-01-01" "9999-12-31" "Root Dep" 11111111 1 NULL NULL NULL
2 "1970-01-01" "9999-12-31" "Sub Root Dep" 11111112 2 1 "1970-01-01" "9999-12-31"
3 "2021-02-24" "9999-12-31" "Dep 1" 11111113 3 2 "2007-01-01" "9999-12-31"
4 "2020-01-02" "9999-12-31" "Dep 2" 11111114 3 2 "2010-05-04" "9999-12-31"
5 "1970-01-01" "9999-12-31" "Sub Dep 1" 11111115 3 3 "1970-01-01" "9999-12-31"
5 "1970-01-01" "9999-12-31" "Sub Dep 1" 11111115 4 3 "1970-01-01" "9999-12-31"
6 "1970-01-01" "9999-12-31" "Sub Dep 2" 11111116 4 4 "1970-01-01" "9999-12-31"
6 "1970-01-01" "9999-12-31" "Sub Dep 2" 11111116 5 4 "1970-01-01" "9999-12-31"
Desired output is:
"id" "dep_start_date" "dep_end_date" "name" "code" "depth" "parent_id" "dep_par_start_date" "dep_par_end_date"
1 "1970-01-01" "9999-12-31" "Root Dep" 11111111 1 NULL NULL NULL
2 "1970-01-01" "9999-12-31" "Sub Root Dep" 11111112 2 1 "1970-01-01" "9999-12-31"
3 "2021-02-24" "9999-12-31" "Dep 1" 11111113 3 2 "2007-01-01" "9999-12-31"
4 "2020-01-02" "9999-12-31" "Dep 2" 11111114 3 2 "2010-05-04" "9999-12-31"
5 "1970-01-01" "9999-12-31" "Sub Dep 1" 11111115 4 3 "1970-01-01" "9999-12-31"
6 "1970-01-01" "9999-12-31" "Sub Dep 2" 11111116 4 4 "1970-01-01" "9999-12-31"
Asked by Dmitry K.
(143 rep)
Jun 9, 2021, 08:16 PM
Last activity: Jun 11, 2021, 11:14 AM
Last activity: Jun 11, 2021, 11:14 AM