Choosing optimal analytical representation of votes, re: btree-normal v. array v. hstore
0
votes
0
answers
31
views
I'm tasked with scrapping a big website with a decade-long history and for the purpose of corpus & prompt engineering it into multiple datasets of various value; in some cases, this value is not immediately apparent. We don't really know the qualities of most fundamental distributions in advance, so this job would require a whole lot of aggregating, plotting, regressing, and sampling. Lots of preliminary work before we could ever get started on actually working the prompts.
Good news we have Postgres (Timescale!) and Grafana to work it out.
For reference: there are 743.000 forum posts and 35.5 million comments to go with them, the good two thirds of these comments have been rated, 23.1 million total. The normal post and & forum would have anywhere from 5–50 votes to them, all ids are integer, and all votes are +1/–1. I would have to scrap, and SQL COPY them in relatively quickly but most important of all is that I'm also able to have this data readily available part of some large window-aggregating, percentile-calculating materialisation for different statistics and that kind of stuff. There are primarily three types of queries we're going to be doing re: *votes* are concerned:
1. Selecting set of up/down votes for individual posts quickly;
2. Reverse-Joining on all up/down votes per individual user;
3. Ranking on voting net-positive, negative user clusters, re: PageRank algebra;
So the question is whether we want vote-normal, or post-normal representation.
CREATE TABLE ballot
(
post_id int,
comment_id int,
user_id int not null,
vote bool not null
);
CREATE INDEX ON ballot USING brin (post_id, comment_id);
CREATE INDEX ON ballot USING btree (user_id, post_id);
CREATE INDEX ON ballot USING btree (user_id, comment_id);
Correct me if I'm wrong but I think I can leverage BRIN for (1) just because I know ahead of time I'll be copying votes in a particular order, i.e. votes for one comment would go in one uninterrupted block. If this is not the case, thought I might do simply a primary key on (post_id, comment_id, user_id)
which would yield a three-way composite btree which is something that you would expect. I thought perhaps given the set of requirements and given that the ingest order is known ahead of time, a brin + two leaner btrees would work better. Maybe I would have to find out but first— I have to make a decision how it will be structured during ingest.
The other option here involves some denormalised representation and a reverse index.
When initially considering array v. hstore v. jsonb, my intuition suggested given the set-like nature of my data that hstore would work best, and my intuition was somewhat reassured when I stumbled upon a only tangentially relevant [benchmark](https://gist.github.com/semaperepelitsa/66527f35f5127ed8dbb95974e68139b7) showing nonmarginal gains for hstore in set-like operations but the truth is for my use case that is the ?
operator, it probably wouldn't matter all too much. As long as the denormalised form helps with locality and leads to a smaller index that I could always fit in memory... it doesn't matter what particular inverse index is going to do the job. I will do my example in hstore:
CREATE TABLE ballot
(
post_id int,
comment_id int,
-- hstore(user_id, null) or hstore(-user_id, null)
votes hstore not null
);
CREATE UNIQUE INDEX ON ballot USING btree (post_id);
CREATE UNIQUE INDEX ON ballot USING btree (comment_id);
CREATE INDEX ON ballot USING gin (votes);
The likes are encoded as positive and dislikes— as negative ids. The idea here is now that our schema is subject-normal we can rely on two normal btrees for random-access and have a reverse index do the heavy lifting using ?
operator which is performed using by hstore. The whole table is supposed to get anywhere from 5-50 times smaller, and hopefully that would make a difference!
I've also considered other variants, like using two separate hstore columns for pro's and con's, or to keep the user_id constant regardless of vote, and encode it as the value instead, so I did a simple query from a user table that has been ingested already to see how the planner would react:
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT
id,
login,
a.post_id as post_up,
a.comment_id as comment_up,
b.post_id as post_down,
b.comment_id as comment_down
FROM users u
LEFT JOIN ballot a ON a.pros ? id::text
LEFT JOIN ballot b ON b.cons ? id::text;
QUERY PLAN
Nested Loop Left Join (cost=0.07..1359142.33 rows=3247298 width=28) (actual time=0.033..102.884 rows=49494 loops=1)
" Output: u.id, u.login, a.post_id, a.comment_id, b.post_id, b.comment_id"
Buffers: shared hit=99373
-> Nested Loop Left Join (cost=0.00..802702.87 rows=400901 width=20) (actual time=0.019..17.434 rows=49494 loops=1)
" Output: u.id, u.login, a.post_id, a.comment_id"
Join Filter: (a.pros ? (u.id)::text)
Buffers: shared hit=385
-> Seq Scan on public.users u (cost=0.00..879.94 rows=49494 width=12) (actual time=0.012..5.382 rows=49494 loops=1)
" Output: u.id, u.login, u.profile"
Buffers: shared hit=385
-> Materialize (cost=0.00..22.15 rows=810 width=40) (actual time=0.000..0.000 rows=0 loops=49494)
" Output: a.post_id, a.comment_id, a.pros"
-> Seq Scan on public.ballot a (cost=0.00..18.10 rows=810 width=40) (actual time=0.002..0.002 rows=0 loops=1)
" Output: a.post_id, a.comment_id, a.pros"
-> Bitmap Heap Scan on public.ballot b (cost=0.07..1.31 rows=8 width=40) (actual time=0.001..0.001 rows=0 loops=49494)
" Output: b.post_id, b.comment_id, b.pros, b.cons"
Recheck Cond: (b.cons ? (u.id)::text)
Buffers: shared hit=98988
-> Bitmap Index Scan on ballot_down_idx (cost=0.00..0.07 rows=8 width=0) (actual time=0.001..0.001 rows=0 loops=49494)
Index Cond: (b.cons ? (u.id)::text)
Buffers: shared hit=98988
Query Identifier: 3422089729452766994
Planning:
Buffers: shared hit=8
Planning Time: 0.265 ms
Execution Time: 105.154 ms
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT
id,
login,
a.post_id as post_up,
a.comment_id as comment_up,
b.post_id as post_down,
b.comment_id as comment_down
FROM users u
-- notice that both joins are now on the same column
LEFT JOIN ballot a ON a.pros ? id::text
LEFT JOIN ballot b ON b.pros ? ('-'||id::text);
QUERY PLAN
Nested Loop Left Join (cost=0.00..7397544.54 rows=3247298 width=28) (actual time=0.023..21.634 rows=49494 loops=1)
" Output: u.id, u.login, a.post_id, a.comment_id, b.post_id, b.comment_id"
Join Filter: (a.pros ? (u.id)::text)
Buffers: shared hit=385
-> Nested Loop Left Join (cost=0.00..902928.22 rows=400901 width=20) (actual time=0.019..13.729 rows=49494 loops=1)
" Output: u.id, u.login, b.post_id, b.comment_id"
Join Filter: (b.pros ? ('-'::text || (u.id)::text))
Buffers: shared hit=385
-> Seq Scan on public.users u (cost=0.00..879.94 rows=49494 width=12) (actual time=0.013..4.383 rows=49494 loops=1)
" Output: u.id, u.login, u.profile"
Buffers: shared hit=385
-> Materialize (cost=0.00..22.15 rows=810 width=40) (actual time=0.000..0.000 rows=0 loops=49494)
" Output: b.post_id, b.comment_id, b.pros"
-> Seq Scan on public.ballot b (cost=0.00..18.10 rows=810 width=40) (actual time=0.002..0.002 rows=0 loops=1)
" Output: b.post_id, b.comment_id, b.pros"
-> Materialize (cost=0.00..22.15 rows=810 width=40) (actual time=0.000..0.000 rows=0 loops=49494)
" Output: a.post_id, a.comment_id, a.pros"
-> Seq Scan on public.ballot a (cost=0.00..18.10 rows=810 width=40) (actual time=0.002..0.002 rows=0 loops=1)
" Output: a.post_id, a.comment_id, a.pros"
Query Identifier: 8470202287074307650
Planning:
Buffers: shared hit=7
Planning Time: 0.231 ms
Execution Time: 23.143 ms
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT
id,
login,
a.post_id as post_up,
a.comment_id as comment_up,
b.post_id as post_down,
b.comment_id as comment_down
FROM users u
-- notice that both joins are now on the same column
LEFT JOIN ballot a ON a.pros @> hstore(id::text, '1')
LEFT JOIN ballot b ON b.pros @> hstore(id::text, '-1');
QUERY PLAN
Nested Loop Left Join (cost=0.00..8209369.07 rows=3247298 width=28) (actual time=0.024..21.786 rows=49494 loops=1)
" Output: u.id, u.login, a.post_id, a.comment_id, b.post_id, b.comment_id"
" Join Filter: (b.pros @> hstore((u.id)::text, '-1'::text))"
Buffers: shared hit=385
-> Nested Loop Left Join (cost=0.00..902928.22 rows=400901 width=20) (actual time=0.020..13.496 rows=49494 loops=1)
" Output: u.id, u.login, a.post_id, a.comment_id"
" Join Filter: (a.pros @> hstore((u.id)::text, '1'::text))"
Buffers: shared hit=385
-> Seq Scan on public.users u (cost=0.00..879.94 rows=49494 width=12) (actual time=0.014..4.348 rows=49494 loops=1)
" Output: u.id, u.login, u.profile"
Buffers: shared hit=385
-> Materialize (cost=0.00..22.15 rows=810 width=40) (actual time=0.000..0.000 rows=0 loops=49494)
" Output: a.post_id, a.comment_id, a.pros"
-> Seq Scan on public.ballot a (cost=0.00..18.10 rows=810 width=40) (actual time=0.002..0.002 rows=0 loops=1)
" Output: a.post_id, a.comment_id, a.pros"
-> Materialize (cost=0.00..22.15 rows=810 width=40) (actual time=0.000..0.000 rows=0 loops=49494)
" Output: b.post_id, b.comment_id, b.pros"
-> Seq Scan on public.ballot b (cost=0.00..18.10 rows=810 width=40) (actual time=0.002..0.002 rows=0 loops=1)
" Output: b.post_id, b.comment_id, b.pros"
Query Identifier: -9035379806324338655
Planning:
Buffers: shared hit=71
Planning Time: 0.389 ms
Execution Time: 23.297 ms
105ms / 23ms / 23ms
In all honesty, I expected no.1 to do much better and no.2 to do much worse! However, neither 2 nor 3 seem to be using the index! Would love to get some insight on this from experienced DBA's.
Best regards,
Ilya
Asked by Ilya
(1 rep)
Jan 5, 2023, 12:18 PM
Last activity: Jan 5, 2023, 03:57 PM
Last activity: Jan 5, 2023, 03:57 PM