Sample Header Ad - 728x90

Emulate Loose Index Scan for multiple columns with alternating sort direction

2 votes
1 answer
442 views
A while back I asked [this question](https://dba.stackexchange.com/questions/320064/use-skip-scan-index-to-efficiently-select-unique-permutations-of-columns-in-post) about efficiently selecting unique permutations of columns in Postgres. Now I have a follow-up question regarding how to do so, with the addition of being able to order any of the columns with any combination of ASC/DESC across the columns. The table contains hundreds of millions of rows, and while the accepted answer to my previous question is orders of magnitude faster than traditional approaches, not being able to order the columns in an ad-hoc way prevents me from putting this query to good use (I really need it to 'paginate', with LIMIT/OFFSET in small chunks). Is there a way to do this? The author of the previous answer kindly suggested a workaround (changing the row comparison for an explicit where clause), which I tried, but it doesn't seem to work (or I misunderstand it). Given the following generic query:
WITH RECURSIVE cte AS (
   (
   SELECT col1, col2, col3, col4
   FROM   tbl
   ORDER  BY 1,2,3,4
   LIMIT  1
   )
   UNION ALL
   SELECT l.*
   FROM   cte c
   CROSS  JOIN LATERAL (
      SELECT t.col1, t.col2, t.col3, t.col4
      FROM   tbl t
      WHERE (t.col1, t.col2, t.col3, t.col4) > (c.col1, c.col2, c.col3, c.col4)
      ORDER  BY 1,2,3,4
      LIMIT  1
      ) l
   )
SELECT * FROM cte
Is there a way to order the columns in an ad-hoc way, whilst maintaining the performance? For example: ORDER BY by col1 DESC, col2 ASC, col3 ASC, col4 DESC Assume an index on each column, as well as a combined index across all 4 columns. Postgres version is 15.4. The table is read-only in the sense that the data can't / won't be modified, however it will be added to. Following is a CREATE TABLE script to replicate my problematic table (more or less):
CREATE TABLE tbl (id SERIAL primary key, col1 integer NOT NULL, col2 integer NOT NULL, col3 integer NOT NULL, col4 integer NOT NULL);

INSERT INTO tbl (col1, col2, col3, col4) SELECT (random()*1000)::int AS col1, (random()*1000)::int AS col2, (random()*1000)::int AS col3, (random()*1000)::int AS col4 FROM generate_series(1,10000000);

CREATE INDEX ON tbl (col1);
CREATE INDEX ON tbl (col2);
CREATE INDEX ON tbl (col3);
CREATE INDEX ON tbl (col4);
CREATE INDEX ON tbl (col1, col2, col3, col4);
Asked by hunter (217 rep)
May 2, 2024, 06:52 PM
Last activity: May 4, 2024, 03:49 AM