Sample Header Ad - 728x90

Drop intermediate results of recursive query in Postgres

2 votes
0 answers
500 views
I'm trying to aggregate the contents of a column from a directed acyclic graph (DAG) stored in a Postgres table. Each dag row has an id, bytes, and may also have a parent referencing another row. I'm trying to write a function which identifies every "tip" in the graph starting from a particular parent id. The result needs to include each graph tip and its aggregated collected_bytes from the parent id to that tip. (The DAG can be very deep, so some collected_bytes arrays can have millions of elements.) The function below works, but **memory usage grows quadratically as the collected_bytes get longer**. The results CTE keeps a copy of every iteration of collected_bytes until the end of the query, then the ranked CTE is used to select only the deepest node for each tip. I think I'm approaching this incorrectly: **how can I do this more efficiently**? Is it possible to instruct Postgres to drop the intermediate results as the recursive query is running? (So we can also skip the ranked CTE?) Is there a more natural way to achieve the result below?
DROP TABLE IF EXISTS dag;
CREATE TABLE dag (
	id bigint PRIMARY KEY,
	parent bigint,
	bytes bytea NOT NULL,
	FOREIGN KEY (parent) REFERENCES dag(id)
);
INSERT INTO dag (id, parent, bytes) VALUES  (0, NULL, '\x0000'),
											(1, NULL, '\x0100'),
											(2, 0, '\x0200'),
											(3, 1, '\x0300'),
											(4, 2, '\x0400'),
											(5, 3, '\x0500'),
											(6, 4, '\x0600'),
											(7, 5, '\x0700'),
											(8, 4, '\x0800');

DROP FUNCTION IF EXISTS get_descendant;
CREATE FUNCTION get_descendant (input_id bigint)
	RETURNS TABLE(start_id bigint, end_id bigint, collected_bytes bytea[], depth bigint)
    LANGUAGE sql STABLE
    AS $$
 	WITH RECURSIVE results AS (
			SELECT id AS start_id, id AS end_id, ARRAY[bytes] AS collected_bytes, 0 AS depth
				FROM dag WHERE id = input_id
		UNION ALL
			SELECT start_id,
		 		   dag.id AS end_id,
		 		   collected_bytes || dag.bytes AS collected_bytes,
				   depth + 1 AS depth
				FROM results INNER JOIN dag
					ON results.end_id = dag.parent
				WHERE depth < 100000
	),
	ranked AS (
		SELECT *, rank() over (PARTITION BY start_id ORDER BY start_id, depth DESC) FROM results
	)
	SELECT start_id, end_id, collected_bytes, depth FROM ranked WHERE rank = 1;
$$;
Here's the result for 0, which has two valid tips, an id of 6 and 8. The collected_bytes field is the aggregation of bytes along each path:
postgres=# SELECT get_descendant.* FROM get_descendant(0::bigint);
 start_id | end_id |              collected_bytes              | depth 
----------+--------+-------------------------------------------+-------
        0 |      6 | {"\\x0000","\\x0200","\\x0400","\\x0600"} |     3
        0 |      8 | {"\\x0000","\\x0200","\\x0400","\\x0800"} |     3
(2 rows)
While here are the intermediate results before being ranked (and only the maximum depths selected):
postgres=# SELECT get_descendant.* FROM get_descendant(0);
 start_id | end_id |              collected_bytes              | depth 
----------+--------+-------------------------------------------+-------
        0 |      0 | {"\\x0000"}                               |     0
        0 |      2 | {"\\x0000","\\x0200"}                     |     1
        0 |      4 | {"\\x0000","\\x0200","\\x0400"}           |     2
        0 |      6 | {"\\x0000","\\x0200","\\x0400","\\x0600"} |     3
        0 |      8 | {"\\x0000","\\x0200","\\x0400","\\x0800"} |     3
(5 rows)
As you can see, this implementation is already wasting ~half of the memory in use. How can I make this more memory efficient? Thanks! **Edit: I realized the above solution doesn't work if the paths to each terminal leaf are different depths, so I also asked a more general version of this question here:** **https://dba.stackexchange.com/questions/289515/recursively-find-the-path-to-all-leaves-descending-from-a-given-parent-in-a-tree**
Asked by Jason Dreyzehner (155 rep)
Apr 8, 2021, 10:57 PM
Last activity: Apr 9, 2021, 05:16 AM