Optimizing subqueries and common table expression queries
1
vote
0
answers
30
views
I have a database table where users send alert messages, the alert messages have their own categories.
I use PostgreSQL with Timescale.
I have two queries:
1. Given a user, get the latest alert per category.
2. Given a user, get the latest 10 alerts per category.
And the queries are as follows:
Query 1:
SELECT *
FROM (
SELECT DISTINCT ON (category) *
FROM alert_messages
WHERE username = ''
AND (subcategory = ''
OR subcategory LIKE '%')
ORDER BY category, timestamp DESC
) sub
ORDER BY timestamp DESC
Query 2:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER(
PARTITION BY category ORDER BY timestamp DESC) rn
FROM alert_messages
WHERE username = ''
AND subcategory != ''
AND subcategory != ''
)
SELECT *
FROM cte
WHERE rn Unique (cost=2493853.29..2581310.26 rows=64 width=48) (actual time=10606.587..10783.197 rows=16 loops=1)"
" -> Gather Merge (cost=2493853.29..2579577.23 rows=693211 width=48) (actual time=10606.586..10766.832 rows=560581 loops=1)"
" Workers Planned: 8"
" Workers Launched: 8"
" -> Sort (cost=2492853.14..2493069.77 rows=86650 width=48) (actual time=10510.154..10515.959 rows=62287 loops=9)"
" Sort Key: _hyper_1_209_chunk.category, _hyper_1_209_chunk.""timestamp"" DESC"
" Sort Method: external merge Disk: 15824kB"
" Worker 0: Sort Method: quicksort Memory: 1282kB"
" Worker 1: Sort Method: external merge Disk: 5264kB"
" Worker 2: Sort Method: external merge Disk: 6744kB"
" Worker 3: Sort Method: external merge Disk: 4344kB"
" Worker 4: Sort Method: quicksort Memory: 435kB"
" Worker 5: Sort Method: quicksort Memory: 497kB"
" Worker 6: Sort Method: quicksort Memory: 1587kB"
" Worker 7: Sort Method: external merge Disk: 2992kB"
" -> Parallel Append (cost=0.00..2484184.48 rows=86650 width=48) (actual time=7831.976..10441.152 rows=62287 loops=9)"
" -> Parallel Seq Scan on _hyper_1_209_chunk (cost=0.00..397507.98 rows=1522 width=40) (actual time=8285.449..10145.736 rows=9042 loops=1)"
" Filter: ((((username)::text = ''::text) AND ((subcategory)::text = ''::text)) OR (()::text ~~ '%diagnostic_updater'::text))"
" Rows Removed by Filter: 7566380"
" -> Parallel Seq Scan on _hyper_1_220_chunk (cost=0.00..384560.45 rows=4955 width=37) (actual time=1920.476..2716.668 rows=808 loops=4)"
" Filter: ((((username)::text = ''::text) AND ((subcategory)::text = ''::text)) OR (()::text ~~ '%diagnostic_updater'::text))"
" Rows Removed by Filter: 4076316"
" -> Parallel Seq Scan on _hyper_1_221_chunk (cost=0.00..321639.51 rows=2258 width=37) (actual time=935.219..1268.861 rows=480 loops=9)"
" Filter: ((((username)::text = ''::text) AND ((subcategory)::text = ''::text)) OR (()::text ~~ '%diagnostic_updater'::text))"
" Rows Removed by Filter: 1516415"
" [ .... Removed similar lines .... ]"
"Planning Time: 15.378 ms"
"JIT:"
" Functions: 7309"
" Options: Inlining true, Optimization true, Expressions true, Deforming true"
" Timing: Generation 465.846 ms, Inlining 513.276 ms, Optimization 39806.196 ms, Emission 28927.894 ms, Total 69713.211 ms"
"Execution Time: 10814.961 ms"
Query 2:
"WindowAgg (cost=693791.34..987063.68 rows=2077557 width=76) (actual time=9684.123..10218.021 rows=226 loops=1)"
" Run Condition: (row_number() OVER (?) Gather Merge (cost=693791.34..950706.43 rows=2077557 width=68) (actual time=9684.089..10159.708 rows=1671306 loops=1)"
" Workers Planned: 8"
" Workers Launched: 8"
" -> Sort (cost=692791.19..693440.41 rows=259687 width=68) (actual time=8732.486..8751.575 rows=185701 loops=9)"
" Sort Key: _hyper_1_114_chunk.hardware_id, _hyper_1_114_chunk.""timestamp"" DESC"
" Sort Method: external merge Disk: 9592kB"
" Worker 0: Sort Method: external merge Disk: 20064kB"
" Worker 1: Sort Method: external merge Disk: 13016kB"
" Worker 2: Sort Method: quicksort Memory: 25kB"
" Worker 3: Sort Method: external merge Disk: 24840kB"
" Worker 4: Sort Method: quicksort Memory: 26kB"
" Worker 5: Sort Method: external merge Disk: 33080kB"
" Worker 6: Sort Method: external merge Disk: 26304kB"
" Worker 7: Sort Method: quicksort Memory: 26kB"
" -> Parallel Append (cost=209.98..663196.80 rows=259687 width=68) (actual time=8107.200..8568.288 rows=185701 loops=9)"
" -> Parallel Bitmap Heap Scan on _hyper_1_114_chunk (cost=631.98..11084.95 rows=22851 width=65) (actual time=7248.905..7296.457 rows=71547 loops=1)"
" Recheck Cond: ((username)::text = ''::text)"
" Filter: (((subcategory)::text ''::text) AND ((subcategory)::text ''::text))"
" Rows Removed by Filter: 1076"
" -> Bitmap Index Scan on _hyper_1_114_chunk_clerts_clerts_username_28789efc_like_1 (cost=0.00..614.27 rows=72623 width=0) (actual time=4.217..4.217 rows=72623 loops=1)"
" Index Cond: ((username)::text = ''::text)"
" -> Parallel Bitmap Heap Scan on _hyper_1_116_chunk (cost=516.44..9825.32 rows=24531 width=68) (actual time=4730.966..4745.737 rows=29615 loops=2)"
" Recheck Cond: ((username)::text = ''::text)"
" Filter: (((subcategory)::text ''::text) AND ((subcategory)::text ''::text))"
" -> Bitmap Index Scan on _hyper_1_116_chunk_clerts_clerts_username_28789efc_like_1 (cost=0.00..501.72 rows=59230 width=0) (actual time=4.505..4.505 rows=59230 loops=1)"
" Index Cond: ((username)::text = ''::text)"
" [ .... Cut similar lines .... ]"
" -> Parallel Index Scan using _hyper_1_220_chunk_clerts_clerts_username_28789efc_1 on _hyper_1_220_chunk (cost=0.43..1753.27 rows=15825 width=68) (actual time=0.928..98.999 rows=25380 loops=1)"
" Index Cond: ((username)::text = ''::text)"
" Filter: (((subcategory)::text ''::text) AND ((subcategory)::text ''::text))"
" Rows Removed by Filter: 265"
" -> Parallel Index Scan using _hyper_1_221_chunk_clerts_clerts_username_28789efc_1 on _hyper_1_221_chunk (cost=0.43..770.81 rows=5459 width=68) (actual time=1.014..82.271 rows=8872 loops=1)"
" Index Cond: ((username)::text = ''::text)"
" Filter: (((subcategory)::text ''::text) AND ((subcategory)::text ''::text))"
" Rows Removed by Filter: 914"
" [ .... Cut similar lines .... ]"
" -> Parallel Index Scan using _hyper_1_102_chunk_clerts_clerts_username_28789efc_like_1 on _hyper_1_102_chunk (cost=0.29..2.51 rows=1 width=68) (actual time=0.284..0.285 rows=0 loops=1)"
" Index Cond: ((username)::text = ''::text)"
" Filter: (((subcategory)::text ''::text) AND ((subcategory)::text ''::text))"
" -> Parallel Index Scan using _hyper_1_104_chunk_clerts_clerts_username_28789efc_like_1 on _hyper_1_104_chunk (cost=0.29..2.51 rows=1 width=65) (actual time=0.333..0.333 rows=0 loops=1)"
" Index Cond: ((username)::text = ''::text)"
" Filter: (((subcategory)::text ''::text) AND ((subcategory)::text ''::text))"
" [ .... Cut similar lines .... ]"
" -> Parallel Seq Scan on _hyper_1_206_chunk (cost=0.00..34850.68 rows=40556 width=68) (actual time=0.124..79.937 rows=2145 loops=6)"
" Filter: (((subcategory)::text ''::text) AND ((subcategory)::text ''::text) AND ((username)::text = ''::text))"
" Rows Removed by Filter: 104243"
" -> Parallel Seq Scan on _hyper_1_117_chunk (cost=0.00..34643.20 rows=31455 width=69) (actual time=0.955..152.430 rows=22981 loops=3)"
" Filter: (((subcategory)::text ''::text) AND ((subcategory)::text ''::text) AND ((username)::text = ''::text))"
" Rows Removed by Filter: 183738"
" [ .... Cut similar lines .... ]"
"Planning Time: 15.181 ms"
"JIT:"
" Functions: 10733"
" Options: Inlining true, Optimization true, Expressions true, Deforming true"
" Timing: Generation 526.707 ms, Inlining 514.124 ms, Optimization 45236.398 ms, Emission 27189.984 ms, Total 73467.213 ms"
"Execution Time: 10248.488 ms"
Query 1 looks straightforward, with the planner going through the usernames and the subcategories. I have tried to add a multi-column index here for the username and subcategory, but it didn't generally improve the execution time significantly.
Query 2 uses an index that's already there for the username, but having to go through millions and millions of records takes time.
Not sure how to optimize these queries, is there a better way to write these queries, or an index that would speed up the queries?
Another complication in the situation is that there users and message categories whose last message was 6+ month ago. This may be a flaw in the system, as users are supposed to send messages in all categories on a regular basis. If this is the case, then it might be viable to partition data, in such a way that the queries would only look at the past month or two of data.
**EDIT**
The best way to tackle this issue is to partition the database.
Running queries on a large database (61 million rows and growing) will tend to be slow, especially as the queries get more complicated.
It of course will depend on a use-case basis. But for my use case, it would be better to just partition the last 3 months of data, to get a fast response while still providing accurate information.
Asked by Razgriz
(113 rep)
May 28, 2024, 12:06 PM
Last activity: May 30, 2024, 07:17 AM
Last activity: May 30, 2024, 07:17 AM