What is the cause of this intermittent slow query when counting rows over a single indexed text column in PostgreSQL?
0
votes
4
answers
129
views
I have a table with 2,395,015 rows, where a
TEXT
column has one of three values and is never NULL
. I have intermittent query performance issues when counting the number of rows where the value matches most (>99%) of the rows. I want to fix this performance problem. These queries must return exact counts, so I can't use approximate counts.
corpus=# \d metadata
Table "public.metadata"
Column | Type | Collation | Nullable | Default
---------------+-----------------------------+-----------+----------+----------------
id | text | | not null |
priority | integer | | not null | 10
media_type | text | | not null |
modified | timestamp without time zone | | not null | now()
processed | timestamp without time zone | | |
status | text | | not null | 'QUEUED'::text
note | text | | |
content | text | | |
resolved | text | | |
response_time | integer | | |
luid | integer | | not null |
jamo_date | timestamp without time zone | | |
audit_path | text | | |
Indexes:
"metadata_pkey" PRIMARY KEY, btree (id)
"metadata_id_idx" btree (id)
"metadata_luid_idx" btree (luid)
"metadata_modified_idx" btree (modified DESC)
"metadata_processed_idx" btree (processed DESC)
"metadata_status_idx" btree (status)
Check constraints:
"media_type_ck" CHECK (media_type = ANY (ARRAY['text/json'::text, 'text/yaml'::text]))
"status_ck" CHECK (status = ANY (ARRAY['QUEUED'::text, 'PROCESSED'::text, 'ERROR'::text]))
Foreign-key constraints:
"metadata_luid_fkey" FOREIGN KEY (luid) REFERENCES concept(luid) ON DELETE CASCADE
corpus=#
I have some simple queries that count the number of rows matching one of the three status codes (QUEUED
, PROCESSED
, ERROR
). There are 0 rows matching QUEUED
, 9,794 matching ERROR
, and 2,385,221 matching PROCESSED
. When I run an identical query against each of these status codes, I usually get a result set promptly:
corpus=# EXPLAIN ANALYZE VERBOSE SELECT COUNT(*) FROM metadata WHERE status='QUEUED';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1947.17..1947.18 rows=1 width=8) (actual time=2.935..2.936 rows=1 loops=1)
Output: count(*)
-> Index Only Scan using metadata_status_idx on public.metadata (cost=0.43..1915.97 rows=12480 width=0) (actual time=2.932..2.933 rows=0 loops=1)
Output: status
Index Cond: (metadata.status = 'QUEUED'::text)
Heap Fetches: 0
Planning Time: 0.734 ms
Execution Time: 2.988 ms
(8 rows)
corpus=# EXPLAIN ANALYZE VERBOSE SELECT COUNT(*) FROM metadata WHERE status='ERROR';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1184.19..1184.20 rows=1 width=8) (actual time=1484.763..1484.764 rows=1 loops=1)
Output: count(*)
-> Index Only Scan using metadata_status_idx on public.metadata (cost=0.43..1165.26 rows=7569 width=0) (actual time=4.235..1484.029 rows=9794 loops=1)
Output: status
Index Cond: (metadata.status = 'ERROR'::text)
Heap Fetches: 9584
Planning Time: 0.072 ms
Execution Time: 1484.786 ms
(8 rows)
corpus=#
corpus=# EXPLAIN ANALYZE VERBOSE SELECT COUNT(*) FROM metadata WHERE status='PROCESSED';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=261398.83..261398.84 rows=1 width=8) (actual time=741.319..749.026 rows=1 loops=1)
Output: count(*)
-> Gather (cost=261398.62..261398.83 rows=2 width=8) (actual time=741.309..749.020 rows=3 loops=1)
Output: (PARTIAL count(*))
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=260398.62..260398.63 rows=1 width=8) (actual time=735.099..735.100 rows=1 loops=3)
Output: PARTIAL count(*)
Worker 0: actual time=730.871..730.872 rows=1 loops=1
Worker 1: actual time=733.435..733.436 rows=1 loops=1
-> Parallel Index Only Scan using metadata_status_idx on public.metadata (cost=0.43..257903.37 rows=998100 width=0) (actual time=0.065..700.529 rows=795074 loops=3)
Output: status
Index Cond: (metadata.status = 'PROCESSED'::text)
Heap Fetches: 747048
Worker 0: actual time=0.060..702.980 rows=670975 loops=1
Worker 1: actual time=0.076..686.946 rows=1010099 loops=1
Planning Time: 0.085 ms
Execution Time: 749.068 ms
(18 rows)
corpus=#
But occasionally, count for the PROCESSED
rows takes an excessive amount of time (sometimes several minutes):
corpus=# EXPLAIN ANALYZE VERBOSE SELECT COUNT(*) FROM metadata WHERE status='PROCESSED';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=261398.83..261398.84 rows=1 width=8) (actual time=30019.273..30019.336 rows=1 loops=1)
Output: count(*)
-> Gather (cost=261398.62..261398.83 rows=2 width=8) (actual time=30019.261..30019.326 rows=3 loops=1)
Output: (PARTIAL count(*))
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=260398.62..260398.63 rows=1 width=8) (actual time=29967.734..29967.735 rows=1 loops=3)
Output: PARTIAL count(*)
Worker 0: actual time=29939.915..29939.916 rows=1 loops=1
Worker 1: actual time=29944.395..29944.395 rows=1 loops=1
-> Parallel Index Only Scan using metadata_status_idx on public.metadata (cost=0.43..257903.37 rows=998100 width=0) (actual time=75.385..29931.795 rows=795074 loops=3)
Output: status
Index Cond: (metadata.status = 'PROCESSED'::text)
Heap Fetches: 747151
Worker 0: actual time=128.857..29899.156 rows=916461 loops=1
Worker 1: actual time=28.609..29905.708 rows=854439 loops=1
Planning Time: 421.203 ms
Execution Time: 30019.440 ms
(18 rows)
corpus=#
While the above query is running slowly, I am able to query the same table for either of the other two codes, and those queries return within 1 second. I have looked for table locks (there are none). This happens even when there are no other queries or table inserts running.
- What are the possible causes for these intermittent slow queries?
- What additional debugging can I try to get some more information about these slow queries?
- Are there any relevant server settings?
- Is there a more efficient way to index / code these columns (e.g., should I use a CHAR(1)
), or even a SMALLINT
? If so, what index should be used for the column?
If I use a CHAR(1)
, is there any difference between the following constraints:
- ALTER TABLE jgi_metadata ADD CONSTRAINT status_code_ck CHECK (status_code = ANY (ARRAY['Q'::char(1), 'P'::char(1), 'E'::char(1)]));
- ALTER TABLE jgi_metadata ADD CONSTRAINT status_code_ck CHECK (status_code IN ('Q', 'P', 'E'));
- Could a partial index be used for this column, even though it is never NULL
?
- Should I split the PROCESSED
off into a boolean column, and then use the status
column only for the other codes and make it nullable with a partial index?
This is PostgreSQL 11 with default settings, running on Linux.
Other things I have tried:
- Increased the work_mem to 100MB (via postgresql.conf
). No change in performance.
- I tried creating a partial index on the status column.
---
**Update:** I have discovered that this performance issue has nothing to do with the status column, but rather the size of the table itself, as the following 2-minute query shows:
corpus=# EXPLAIN ANALYZE VERBOSE SELECT COUNT(*) FROM metadata;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=196398.52..196398.53 rows=1 width=8) (actual time=118527.897..118554.762 rows=1 loops=1)
Output: count(*)
-> Gather (cost=196398.30..196398.51 rows=2 width=8) (actual time=118522.165..118554.756 rows=3 loops=1)
Output: (PARTIAL count(*))
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=195398.30..195398.31 rows=1 width=8) (actual time=118491.043..118491.044 rows=1 loops=3)
Output: PARTIAL count(*)
Worker 0: actual time=118475.143..118475.144 rows=1 loops=1
Worker 1: actual time=118476.110..118476.111 rows=1 loops=1
-> Parallel Index Only Scan using metadata_status_idx on public.metadata (cost=0.43..192876.13rows=1008870 width=0) (actual time=71.797..118449.265 rows=809820 loops=3)
Output: status
Heap Fetches: 552630
Worker 0: actual time=75.877..118434.476 rows=761049 loops=1
Worker 1: actual time=104.872..118436.647 rows=745770 loops=1
Planning Time: 592.040 ms
Execution Time: 118554.839 ms
(17 rows)
corpus=#
This seems to be very similar to other questions now, so I am trying mitigation strategies from this answer :
- VACUUM ANALYZE metadata;
First COUNT(*)
after this took 5 seconds, subsequent counts took 190ms.
Other thoughts:
- Would it help if the status column were split off into its own table, with a foreign key into the metadata
table?
Note: I am becoming convinced that this question is a duplicate of several other questions here:
- https://dba.stackexchange.com/questions/280095/postgresql-extremely-slow-counts
- https://dba.stackexchange.com/questions/314371/count-queries-are-too-slow-even-with-an-index-only-scan
- https://dba.stackexchange.com/questions/235094/why-are-some-count-queries-so-slow
- https://dba.stackexchange.com/questions/317626/optimizing-select-count-result-in-postgresql
- https://stackoverflow.com/questions/58449716/postgres-why-does-select-count-take-so-long
- https://stackoverflow.com/questions/16916633/if-postgresql-count-is-always-slow-how-to-paginate-complex-queries
- https://stackoverflow.com/questions/7943233/fast-way-to-discover-the-row-count-of-a-table-in-postgresql/7945274#7945274
This answer may hold the best solution for this problem:
- https://stackoverflow.com/a/7945274/2074605
As requested, here is a query plan analysis with buffers:
EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT COUNT(*) FROM metadata;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=80771.95..80771.96 rows=1 width=8) (actual time=26711.481..26716.494 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=293915 read=19595 dirtied=282 written=12
-> Gather (cost=80771.73..80771.94 rows=2 width=8) (actual time=26711.203..26716.488 rows=3 loops=1)
Output: (PARTIAL count(*))
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=293915 read=19595 dirtied=282 written=12
-> Partial Aggregate (cost=79771.73..79771.74 rows=1 width=8) (actual time=26565.622..26565.623 rows=1 loops=3)
Output: PARTIAL count(*)
Buffers: shared hit=293915 read=19595 dirtied=282 written=12
Worker 0: actual time=26530.890..26530.891 rows=1 loops=1
Buffers: shared hit=105264 read=6760 dirtied=145 written=5
Worker 1: actual time=26530.942..26530.942 rows=1 loops=1
Buffers: shared hit=84675 read=7529 dirtied=46 written=2
-> Parallel Index Only Scan using metadata_status_idx on public.metadata (cost=0.43..77241.05 rows=1012275 width=0) (actual time=42.254..26529.232 rows=809820 loops=3)
Output: status
Heap Fetches: 17185
Buffers: shared hit=293915 read=19595 dirtied=282 written=12
Worker 0: actual time=59.291..26494.376 rows=815113 loops=1
Buffers: shared hit=105264 read=6760 dirtied=145 written=5
Worker 1: actual time=31.165..26484.729 rows=1036972 loops=1
Buffers: shared hit=84675 read=7529 dirtied=46 written=2
Planning Time: 98.400 ms
Execution Time: 26716.529 ms
(25 rows)
Asked by Autumn Skye
(379 rep)
Aug 8, 2024, 01:50 PM
Last activity: Feb 27, 2025, 02:52 PM
Last activity: Feb 27, 2025, 02:52 PM