Database Administrators
Q&A for database professionals who wish to improve their database skills
Latest Questions
0
votes
1
answers
182
views
Composite multicolumn index for geopoint range and numeric range query
I am building an app where the server needs to select rows based on some criteria/filters. One of them is the location of the user and the radius at which the user want's to see posts and other filters such date range and filter for a value of another column. This is going to be for an ad-hoc event...
I am building an app where the server needs to select rows based on some criteria/filters. One of them is the location of the user and the radius at which the user want's to see posts and other filters such date range and filter for a value of another column. This is going to be for an ad-hoc event discovery app.
I have read about PostGIS, its
geometry
,geography
types and I know there is a native point
datatype. Based on this answer I understood that it is better to order from equality to range columns, even though I feel like geo point column should be the first.
Suppose the following few rows of a simplified events table (disregard the validity position data):
id event_title event_position event_type is_public start_date
(varchar) (point lat/lon) (smallint) (boolean) (timestamptz)
-- --------------------------- --------------- --------- --------- ----
1 "John's Party" (122,35) 0 0 2020-07-05
2 "Revolution then Starbucks" (123,30) 1 1 2020-07-06
3 "Study for math exam" (120,36) 2 1 2020-07-07
4 "Party after exam" (120,36) 1 1 2020-07-08
5 "Hiking next to the city" (95,40) 3 1 2020-07-09
6 "Football match" (-42,31) 4 1 2020-07-10
Imagine the table contains several thousand records at least, obviously not only 6.
So in this table a user would be able to query public events close to (122,34) by 100km (suppose first three rows fall into this area) and of event types 0, 1 or 2 falling between dates 2020-07-05 and 2020-07-07. The user would get the rows with ID 2 and 3.
This is the query I want to optimize with an appropriate index. My question is, how is it possible to create such an index? I thought about GiST or GIN index but not sure how these could help. Thanks!
Najib
(231 rep)
Jul 4, 2020, 02:59 PM
• Last activity: Jul 5, 2025, 12:18 AM
0
votes
1
answers
184
views
How can I make an index covering a ts_vector expression and also an integer?
I am using the ruby pg_search gem, where code like this: `Food.search("orange")` produces a query like this. SELECT "foods".* FROM "foods" INNER JOIN (SELECT "foods"."id" AS pg_search_id, (ts_rank((to_tsvector('simple', coalesce("foods"."name"::text, ''))), (to_tsquery('simple', ''' ' || 'orange' ||...
I am using the ruby pg_search gem, where code like this:
Food.search("orange")
produces a query like this.
SELECT "foods".*
FROM "foods"
INNER JOIN
(SELECT "foods"."id" AS pg_search_id,
(ts_rank((to_tsvector('simple', coalesce("foods"."name"::text, ''))), (to_tsquery('simple', ''' ' || 'orange' || ' ''' || ':*')), 0)) AS rank
FROM "foods"
WHERE ((to_tsvector('simple', coalesce("foods"."name"::text, ''))) @@ (to_tsquery('simple', ''' ' || 'orange' || ' ''' || ':*')))) AS pg_search_d4392ced9dff0647fed4ed ON "foods"."id" = pg_search_d4392ced9dff0647fed4ed.pg_search_id
ORDER BY pg_search_d4392ced9dff0647fed4ed.rank DESC,
"foods"."id" ASC;
Ive created this index, which seems to work quite well
CREATE INDEX concurrently foods_gin ON foods USING gin
(to_tsvector('simple', coalesce("foods"."name"::text, '')));
I have another column, category. there are 9 categories. For a given search as shown above for name, I want to show results from category 9 first, and then all the other categories (order doesn't matter).
ideally the results could be paginated, so a perfect query/index combo would be great.
is this possible to do, and/or will it be a truly gigantic index?
p.s. suggestions for a better name for the question are welcome
John Bachir
(867 rep)
May 5, 2021, 05:32 PM
• Last activity: Jun 26, 2025, 02:08 PM
0
votes
1
answers
501
views
Should I consider GIN index for several columns?
In our application we have a field that can be used to match with several columns in our database. So if the user searches for `somestring` we will return all the rows `WHERE column1 = 'somestring' OR column2 = 'somestring'` etc. We are using Postgres. I am wondering if it is viable to use `GIN` ind...
In our application we have a field that can be used to match with several columns in our database. So if the user searches for
somestring
we will return all the rows WHERE column1 = 'somestring' OR column2 = 'somestring'
etc. We are using Postgres.
I am wondering if it is viable to use GIN
index for this purpose. Normally i see GIN
indexes used for long and variable-length data, however. So I am wondering there is any point using them to index short, fixed length tsvector
s or should I just index each column separately with a BTREE
and let the planner handle the search.
zefciu
(101 rep)
Jan 18, 2024, 10:49 AM
• Last activity: Jan 19, 2024, 02:21 AM
0
votes
1
answers
69
views
Missing GIN support function (4 or 6) for attribute 1 of index
I have a problem creating GIN indexes in postgres 14.9 and GIN 1.3 The following is the query: CREATE TABLE assert.test_gin(col1 func.doc_sub_type[]); CREATE INDEX ON assert.test_gin USING gin(col1); And here is the results of running the above: ERROR: missing GIN support function (4 or 6) for attri...
I have a problem creating GIN indexes in postgres 14.9 and GIN 1.3
The following is the query:
CREATE TABLE assert.test_gin(col1 func.doc_sub_type[]);
CREATE INDEX ON assert.test_gin USING gin(col1);
And here is the results of running the above:
ERROR: missing GIN support function (4 or 6) for attribute 1 of index "test_gin_col1_idx"
SQL state: XX000
I check pg_catalog and public schema GIN support functions. 4 and 6 are according to the documentation (https://www.postgresql.org/docs/current/gin-extensibility.html) are bool consistent and GinTernaryValue triConsistent respectively, which both exist in my pg_catalog but not in public schema.
Please help!!!!!
Ali Azimi
(1 rep)
Jan 15, 2024, 03:19 PM
• Last activity: Jan 15, 2024, 04:21 PM
1
votes
1
answers
263
views
Postgresql strange behavior with non-ascii characters with triagram index present
I see some strange behavior when using either `gin_trgm_ops` or `gist_trgm_ops` indexes. There seems to be quite a difference in plans when using say `ILIKE` or `~` and search for ascii phrases vs multi-byte char phrases. As if there is higher cost when the operand is a non-ascii operand. Is what I...
I see some strange behavior when using either
gin_trgm_ops
or gist_trgm_ops
indexes. There seems to be quite a difference in plans when using say ILIKE
or ~
and search for ascii phrases vs multi-byte char phrases. As if there is higher cost when the operand is a non-ascii operand.
Is what I am seeing expected? What is the reason for it?
I tried it on Postgreql 12 and 13 latest of each.
Here is a scenario:
CREATE DATABASE postgres
WITH
OWNER = postgres
ENCODING = 'UTF8'
LC_COLLATE = 'en_US.utf8'
LC_CTYPE = 'en_US.utf8'
TABLESPACE = pg_default
CONNECTION LIMIT = -1
IS_TEMPLATE = False;
-- snip
CREATE TABLE test_table (
id uuid PRIMARY KEY,
label varchar
);
-- insert 1m rows
VACUUM ANALYZE test_table;
In the data set I have 10 labels containing 'acl'
and 10 containing '定す'
.
When using GIN index
CREATE INDEX test_table_label_gin_idx
ON test_table USING gin
(label gin_trgm_ops);
I see the following.
EXPLAIN ANALYZE SELECT * FROM test_table WHERE label ILIKE '%定す%' LIMIT 100;
Limit (cost=1000.00..16573.18 rows=100 width=52) (actual time=392.153..395.095 rows=10 loops=1)
-> Gather (cost=1000.00..16728.91 rows=101 width=52) (actual time=392.135..394.830 rows=10 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on test_table (cost=0.00..15718.81 rows=42 width=52) (actual time=382.922..388.082 rows=3 loops=3)
Filter: ((label)::text ~~* '%定す%'::text)
Rows Removed by Filter: 338417
Planning Time: 0.656 ms
Execution Time: 395.233 ms
EXPLAIN ANALYZE SELECT * FROM test_table WHERE label ILIKE '%acl%' LIMIT 100;
Limit (cost=28.78..400.51 rows=100 width=52) (actual time=0.072..0.406 rows=10 loops=1)
-> Bitmap Heap Scan on test_table (cost=28.78..404.23 rows=101 width=52) (actual time=0.053..0.197 rows=10 loops=1)
Recheck Cond: ((label)::text ~~* '%acl%'::text)
Heap Blocks: exact=10
-> Bitmap Index Scan on test_table_label_gin_idx (cost=0.00..28.76 rows=101 width=0) (actual time=0.025..0.034 rows=10 loops=1)
Index Cond: ((label)::text ~~* '%acl%'::text)
Planning Time: 0.231 ms
Execution Time: 0.542 ms
With GIST
DROP INDEX test_table_label_gin_idx;
CREATE INDEX test_table_label_gist_idx
ON test_table USING gist
(label gist_trgm_ops);
I see
EXPLAIN ANALYZE SELECT * FROM test_table WHERE label ILIKE '%定す%' LIMIT 100;
Limit (cost=13.19..384.92 rows=100 width=52) (actual time=303.772..1557.498 rows=10 loops=1)
-> Bitmap Heap Scan on test_table (cost=13.19..388.64 rows=101 width=52) (actual time=303.752..1557.286 rows=10 loops=1)
Recheck Cond: ((label)::text ~~* '%定す%'::text)
Rows Removed by Index Recheck: 1015250
Heap Blocks: exact=10431
-> Bitmap Index Scan on test_table_label_gist_idx (cost=0.00..13.17 rows=101 width=0) (actual time=301.046..301.053 rows=1015260 loops=1)
Index Cond: ((label)::text ~~* '%定す%'::text)
Planning Time: 0.215 ms
Execution Time: 1557.643 ms
EXPLAIN ANALYZE SELECT * FROM test_table WHERE label ILIKE '%acl%' LIMIT 100;
Limit (cost=13.19..384.92 rows=100 width=52) (actual time=257.385..257.751 rows=10 loops=1)
-> Bitmap Heap Scan on test_table (cost=13.19..388.64 rows=101 width=52) (actual time=257.366..257.551 rows=10 loops=1)
Recheck Cond: ((label)::text ~~* '%acl%'::text)
Heap Blocks: exact=10
-> Bitmap Index Scan on test_table_label_gist_idx (cost=0.00..13.17 rows=101 width=0) (actual time=257.319..257.328 rows=10 loops=1)
Index Cond: ((label)::text ~~* '%acl%'::text)
Planning Time: 0.377 ms
Execution Time: 257.948 ms
Just changing the the characters of the operand change the plan quite a lot.
**Edit**
SELECT show_trgm('定す');
"{0x145ed8,0x6628fa,0x6cb12d}"
SELECT encode('定す', 'escape')
\345\256\232\343\201\231
This issue seems similar https://dba.stackexchange.com/questions/63193/postgresql-not-using-gin-trigram-index-when-performing-non-ascii-like-query
Pawel Zieminski
(125 rep)
Oct 6, 2023, 08:38 PM
• Last activity: Oct 7, 2023, 03:06 AM
0
votes
1
answers
1237
views
Queries using GIN index on f_unaccent() seem slow?
I have almost 20M rows in one table that contains product, including their names. I wish to search fulltext and fast by the name so I created this index: CREATE INDEX uprice_item_occurrence_unaccent_name_trgm_idx ON price_item_occurrence USING gin (f_unaccent(name) gin_trgm_ops);; I was hopping the...
I have almost 20M rows in one table that contains product, including their names.
I wish to search fulltext and fast by the name so I created this index:
CREATE INDEX uprice_item_occurrence_unaccent_name_trgm_idx ON price_item_occurrence USING gin (f_unaccent(name) gin_trgm_ops);;
I was hopping the following query would take less then (lets say) 500ms
select * from price_item_occurrence as oo
where f_unaccent(oo.name) % f_unaccent('iphone');
But it takes almost 2s:
postgres=# explain analyze select * from price_item_occurrence as oo where f_unaccent(oo.name) % f_unaccent('iphone');
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on price_item_occurrence oo (cost=1956.42..63674.14 rows=16570 width=287) (actual time=247.918..1880.759 rows=94 loops=1)
Recheck Cond: (f_unaccent((name)::text) % 'iphone'::text)
Rows Removed by Index Recheck: 87838
Heap Blocks: exact=76663
-> Bitmap Index Scan on uprice_item_occurrence_unaccent_name_trgm_idx (cost=0.00..1952.28 rows=16570 width=0) (actual time=195.418..195.418 rows=88962 loops=1)
Index Cond: (f_unaccent((name)::text) % 'iphone'::text)
Planning time: 0.444 ms
Execution time: 1880.833 ms
It's possible that the DB is busy, but i am not sure.
I tried to play with
select set_limit(0.9);
(increased) and it helps a bit but not much.
I am using Postgres 10, can change the Postgres config and I am open for suggestions.
I tried it with ilike
and it improved by some:
postgres=# explain analyze select * from price_item_occurrence as oo where f_unaccent(oo.name) ilike ('%' || f_unaccent('iphone') || '%');
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on price_item_occurrence oo (cost=3135.08..416823.45 rows=166075 width=286) (actual time=50.258..670.085 rows=65917 loops=1)
Recheck Cond: (f_unaccent((name)::text) ~~* '%iphone%'::text)
Rows Removed by Index Recheck: 10
Heap Blocks: exact=59750
-> Bitmap Index Scan on uprice_item_occurrence_unaccent_name_trgm_idx (cost=0.00..3093.56 rows=166075 width=0) (actual time=37.385..37.385 rows=67700 loops=1)
Index Cond: (f_unaccent((name)::text) ~~* '%iphone%'::text)
Planning time: 0.545 ms
Execution time: 675.776 ms
(8 rows)
About 2x faster.
i tried limit 10
:
postgres=# explain analyze select * from price_item_occurrence as oo where f_unaccent(oo.name) % f_unaccent('iphone') limit 10;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=373.27..410.51 rows=10 width=287) (actual time=268.718..589.131 rows=10 loops=1)
-> Bitmap Heap Scan on price_item_occurrence oo (cost=373.27..62493.45 rows=16680 width=287) (actual time=268.715..589.123 rows=10 loops=1)
Recheck Cond: (f_unaccent((name)::text) % 'iphone'::text)
Rows Removed by Index Recheck: 18917
Heap Blocks: exact=17100
-> Bitmap Index Scan on uprice_item_occurrence_unaccent_name_trgm_idx (cost=0.00..369.10 rows=16680 width=0) (actual time=165.958..165.958 rows=69268 loops=1)
Index Cond: (f_unaccent((name)::text) % 'iphone'::text)
Planning time: 0.397 ms
Execution time: 589.187 ms
(9 rows)
this is also faster, maybe almost good enough
Michal
(3 rep)
Apr 9, 2018, 07:19 PM
• Last activity: Aug 1, 2023, 05:00 PM
1
votes
1
answers
331
views
How to search for partial matches from a GIN index in a multi-word PostgreSQL query?
I have a GIN index in a Postgres database that I'm currently using to do full text search. So if a tsvector field contains the vectorized form of "mary had 4 little lambs" then I could search for "4 little lambs" with: SELECT * FROM mytable WHERE text_index @@ to_tsquery('pg_catalog.english', '''4 l...
I have a GIN index in a Postgres database that I'm currently using to do full text search. So if a tsvector field contains the vectorized form of "mary had 4 little lambs" then I could search for "4 little lambs" with:
SELECT * FROM mytable WHERE text_index @@ to_tsquery('pg_catalog.english', '''4 little lambs'':*');
However, if I try to search for "4 lambs" with:
SELECT * FROM mytable WHERE text_index @@ to_tsquery('pg_catalog.english', '''4 lambs'':*');
this query returns no results.
Why isn't the index finding any partial matches? What's wrong with my syntax?
Cerin
(1425 rep)
Dec 6, 2022, 01:43 AM
• Last activity: Dec 6, 2022, 02:36 AM
2
votes
2
answers
2159
views
Does PostgreSQL btree_gin extension use btree or gin data structure?
In order to define a GIN index on an array and scalar field (for example) you need to enable the `btree_gin` extension in PostgreSQL. Then you can define your indexes as a GIN index. However I don't understand if, under the hood, PostgreSQL is using a btree structure or a GIN structure.
In order to define a GIN index on an array and scalar field (for example) you need to enable the
btree_gin
extension in PostgreSQL. Then you can define your indexes as a GIN index.
However I don't understand if, under the hood, PostgreSQL is using a btree structure or a GIN structure.
collimarco
(653 rep)
Jan 16, 2020, 12:08 PM
• Last activity: Oct 26, 2022, 08:09 PM
1
votes
0
answers
671
views
Optimize select query with OR and ILIKE
I have the following query for my PostgreSQL 13 database: SELECT * FROM "user" INNER JOIN "comment" on "comment"."user_id" = "user"."id" WHERE "user"."biography" ILIKE '%SomeString%' OR "comment"."text" ILIKE '%SomeString%'; I have a GIN index defined on `user` `biography` and another one on `commen...
I have the following query for my PostgreSQL 13 database:
SELECT * FROM "user"
INNER JOIN "comment" on "comment"."user_id" = "user"."id"
WHERE "user"."biography" ILIKE '%SomeString%'
OR "comment"."text" ILIKE '%SomeString%';
I have a GIN index defined on
user
biography
and another one on comment
text
CREATE INDEX user_bio_idx ON "user" USING gin ("biography" gin_trgm_ops);
CREATE INDEX comment_txt_idx ON "comment" USING gin ("text" gin_trgm_ops);
Unfortunately, running EXPLAIN ANALYZE ...
on my select query reveals that the indexes are not being used and the query is taking 4 to 5 seconds.
How can I improve its performance? (I already tried UNION
and even concatenating the two columns)
Kamal Karain
(11 rep)
Mar 16, 2022, 10:18 PM
• Last activity: Sep 29, 2022, 11:07 PM
0
votes
0
answers
458
views
What should I choose between jsonb array and jsonb map for faster querying?
I have a jsonb column which will consist of some keys and the rows will be searchable using those keys. Considering that I will be applying GIN index on the jsonb column what should be the ideal structure for faster lookups ? A jsonb array consisting of all the items or a jsonb map where the items w...
I have a jsonb column which will consist of some keys and the rows will be searchable using those keys. Considering that I will be applying GIN index on the jsonb column what should be the ideal structure for faster lookups ? A jsonb array consisting of all the items or a jsonb map where the items will be the keys.
user2375245
(1 rep)
Sep 8, 2022, 05:46 PM
7
votes
4
answers
5982
views
PostgreSQL GIN index not used when ts_query language is fetched from a column
I've got a table that stores some multilingual content: CREATE TABLE search ( content text NOT NULL, language regconfig NOT NULL, fulltext tsvector ); CREATE INDEX search_fulltext ON search USING GIN(fulltext); INSERT INTO search (language, content) VALUES ('dutch', 'Als achter vliegen vliegen vlieg...
I've got a table that stores some multilingual content:
CREATE TABLE search (
content text NOT NULL,
language regconfig NOT NULL,
fulltext tsvector
);
CREATE INDEX search_fulltext ON search USING GIN(fulltext);
INSERT INTO search (language, content) VALUES
('dutch', 'Als achter vliegen vliegen vliegen vliegen vliegen vliegen achterna'),
('dutch', 'Langs de koele kali liep een kale koeli met een kilo kali op zijn kale koeli-kop.'),
('dutch', 'Moeder sneed zeven scheve sneden brood'),
('english', 'I saw Susie sitting in a shoe shine shop. Where she sits she shines, and where she shines she sits.'),
('english', 'How can a clam cram in a clean cream can?'),
('english', 'Can you can a can as a canner can can a can?');
UPDATE search SET fulltext = to_tsvector(language, content);
To make sure I always search in the correct language I use these queries:
SELECT FROM search WHERE fulltext @@ to_tsquery(language, 'shine');
(1 row)
SELECT FROM search WHERE fulltext @@ to_tsquery(language, 'vlieg');
(1 row)
Because hardcoding the language doesn't give the correct results:
SELECT FROM search WHERE fulltext @@ to_tsquery('dutch', 'shine');
(0 rows)
SELECT FROM search WHERE fulltext @@ to_tsquery('english', 'vlieg');
(0 rows)
The problem however is that PostgreSQL doesn't use the GIN index when using the first set of queries and instead does a sequential scan:
(Note: I've disabled scanning using
SET enable_seqscan = OFF;
for these examples because of the low amount of rows)
EXPLAIN ANALYZE SELECT * FROM search WHERE fulltext @@ to_tsquery(language, 'shine');
---
Seq Scan on search (cost=0.00..17.35 rows=2 width=136) (actual time=0.040..0.044 rows=1 loops=1)
Filter: (fulltext @@ to_tsquery(language, 'shine'::text))
Rows Removed by Filter: 5
Planning time: 0.039 ms
Execution time: 0.064 ms
(5 rows)
While it does when hardcoding a language:
EXPLAIN ANALYZE SELECT FROM search WHERE fulltext @@ to_tsquery('dutch', 'vlieg');
---
Bitmap Heap Scan on search (cost=12.63..23.66 rows=82 width=0) (actual time=0.044..0.044 rows=1 loops=1)
Recheck Cond: (fulltext @@ '''vlieg'''::tsquery)
Heap Blocks: exact=1
-> Bitmap Index Scan on search_fulltext (cost=0.00..12.61 rows=82 width=0) (actual time=0.037..0.037 rows=1 loops=1)
Index Cond: (fulltext @@ '''vlieg'''::tsquery)
Planning time: 0.128 ms
Execution time: 0.065 ms
(7 rows)
So my question is: Is it at all possible to use a column in the ts_query
to use the correct language config and still have Postgres use the GIN index?
I'm using PostgreSQL 9.4.
*EDIT*:
Here's the execution plan from the **real** table:
Using a column for language config:
Seq Scan on search (cost=0.00..8727.25 rows=188 width=0) (actual time=0.725..352.307 rows=1689 loops=1)
Filter: (fulltext @@ to_tsquery(language_config, 'example'::text))
Rows Removed by Filter: 35928
Planning time: 0.053 ms
Execution time: 352.915 ms
When hardcoding the language:
Bitmap Heap Scan on search (cost=28.65..4088.92 rows=1633 width=0) (actual time=0.514..10.475 rows=1684 loops=1)
Recheck Cond: (fulltext @@ '''exampl'''::tsquery)
Heap Blocks: exact=1522
-> Bitmap Index Scan on search_fulltext (cost=0.00..28.24 rows=1633 width=0) (actual time=0.333..0.333 rows=1684 loops=1)
Index Cond: (fulltext @@ '''exampl'''::tsquery)
Planning time: 0.180 ms
Execution time: 10.564 ms
*EDIT #2*
Tried it with Postgres 9.5, same result
jaap3
(183 rep)
Sep 15, 2016, 11:42 AM
• Last activity: Jul 8, 2022, 07:19 PM
0
votes
2
answers
1329
views
PostgreSQL btree index for int with character varying (ILIKE) not working
Hello I have about 50 million forums and each forum has over 30 million topics which this is my topic structure in PostgreSQL CREATE TABLE public.forum_topic ( "forum" integer NOT NULL, "user" integer NOT NULL, "submit" integer NOT NULL, "subject" character varying(64) NOT NULL, "content" character...
Hello I have about 50 million forums and each forum has over 30 million topics which this is my topic structure in PostgreSQL
CREATE TABLE public.forum_topic
(
"forum" integer NOT NULL,
"user" integer NOT NULL,
"submit" integer NOT NULL,
"subject" character varying(64) NOT NULL,
"content" character varying(8192) NOT NULL,
"ip" integer NOT NULL,
PRIMARY KEY ("forum", "user", "submit")
);
This is my search query, find every topic with keyword 'PHP' in it (in its subject) in forum 1000 (forum ID)
SELECT * FROM "forum_topic" WHERE "forum"=1000 AND "subject" ILIKE '%PHP%' LIMIT 10
I know that i must create an index for this kind of search, and I did.
CREATE INDEX forum_topic_forum_subject_idx
ON public.forum_topic USING btree
(forum, subject varchar_pattern_ops);
but when I executed the select query (top), there is no sign of using 'forum_topic_forum_subject_idx' index that I created, and it takes 10 seconds to execute it !!!!!
Seq Scan on forum_topic as forum_topic (rows=0 loops=5)
Filter: (((subject)::text ~~* '%PHP%'::text) AND (forum = 1000))
Rows Removed by Filter: 9998001
Do you have any idea for my situation? Do you have a better method for this kind of index?
I'm using PostgreSQL 13
**---- Update ----**
I created a gin index:
create index idx_forum_topic on forum_topic using gin (forum,subject gin_trgm_ops);
but something's wired happened ! I executed 2 queries and one is executed based on the 'idx_forum_topic' index and one is executed based on 'seq scan' !!!!
SELECT * FROM "forum_topic" WHERE "forum"=26854 AND "subject" ILIKE '%mmap2%' LIMIT 2;
result (1 second):
Bitmap Heap Scan on forum_topic as forum_topic (rows=1 loops=1)
Recheck Cond: ((forum = 26854) AND ((subject)::text ~~* '%mmap2%'::text))
Heap Blocks: exact=1
and
SELECT * FROM "forum_topic" WHERE "forum"=2 AND "subject" ILIKE '%mmap2%' LIMIT 2;
result (27 seconds)
Seq Scan on forum_topic as forum_topic (rows=1 loops=1)
Filter: (((subject)::text ~~* '%mmap2%'::text) AND (forum = 2))
Rows Removed by Filter: 49990003
!!! What is the problem !!!
HelloMachine
(101 rep)
Jun 23, 2021, 08:49 AM
• Last activity: Jun 23, 2021, 03:56 PM
0
votes
0
answers
502
views
Convert jsonb column into view for Grafana
I have a partially dynamic data that I have to save into Postgres 13 and then visualize it in Grafana. Example for one document insert (partial) { "Timestamp": { "$date": { "$numberLong": "1623224386151" } }, "Strings": [], "Coords": [ { "key": "GreySegm", "value": { "X": 734, "Y": 529, "Angle": -0....
I have a partially dynamic data that I have to save into Postgres 13 and then visualize it in Grafana.
Example for one document insert (partial)
{
"Timestamp": {
"$date": {
"$numberLong": "1623224386151"
}
},
"Strings": [],
"Coords": [
{
"key": "GreySegm",
"value": {
"X": 734,
"Y": 529,
"Angle": -0.17575337363989894
}
}
],
"DataMatrices": [],
"SegmentationRes": [
{
"key": "GreySegm",
"value": {
"AlgResultBase": {
"resultCode": 0,
"errorDescr": ""
},
"Regions": [
{
"Name": "reg0",
"Id": 1,
"Status": "NotSet",
"SearchRegionName": "GreySegm",
"SearchRegionId": 0,
"Position": {
"X": 733.8323364257813,
"Y": 529.5650634765625,
"Angle": -0.1757533699274063
}
}
]
}
}
],
"ShapeControl": []
Data is generally a time series data, so no updates and deletes, only inserts. It is about 10k inserts per day but of course grows over months.
Grafana doesn't support user friendly DB navigation to build queries if the only column is jsonb.
So I have decided to create a view/s and let Grafan use them. This works perfectly but when table grows to thousands of rows queries become extremely slow since no indexes are there, I suspect.
Does this approach with views makes sense or am I abusing the DB design?
**My ideas**
1. to use material views and then apply indexes on them, this should help then. My problem is with updates. How to update views relatively frequent and not over the whole table?
I do want Grafana to be kind of real time with some lag but definitely not an offline tool with the data from yesterday because I update material view once a day at midnight by cron.
2. to create indexes on jsonb properties (GIN indexes) and write queries to jsonb directly in Grafana. Then no more user friendliness, only hardcore. And indexes are much less intuitive to build.
ivans
(11 rep)
Jun 10, 2021, 12:02 PM
4
votes
1
answers
3267
views
Understanding composite BTREE + GIN_TRGM_OPS index prioritization & odd lower() behavior
hoping someone can try to help me decrypt some index behavior. I'm working on enabling some simple `contains` type lookups on various user-data columns (~varchar Sort (cost=52220.05..52777.38 rows=222932 width=4) (actual time=251.069..285.652 rows=222408 loops=1) Output: id Sort Key: people.id Sort...
hoping someone can try to help me decrypt some index behavior. I'm working on enabling some simple
contains
type lookups on various user-data columns (~varchar Sort (cost=52220.05..52777.38 rows=222932 width=4) (actual time=251.069..285.652 rows=222408 loops=1)
Output: id
Sort Key: people.id
Sort Method: external merge Disk: 3064kB
-> Bitmap Heap Scan on public.people (cost=3070.23..29368.23 rows=222932 width=4) (actual time=35.156..198.549 rows=222408 loops=1)
Output: id
Recheck Cond: (people.user_category = 2)
Filter: ((people.user_type)::text 'Ogre'::text)
Rows Removed by Filter: 111278
Heap Blocks: exact=21277
-> Bitmap Index Scan on idx_people_gin_user_category_and_user_type_and_email (cost=0.00..3014.50 rows=334733 width=0) (actual time=32.017..32.017 rows=333686 loops=1)
Index Cond: (people.user_category = 2)
Planning Time: 0.293 ms
Execution Time: 359.247 ms
Is the original b-tree totally redundant at this point? I expected it might still be picked by the planner if only those two columns were used if the b-tree was faster for those data types but that seems like it is not the case.
Next, I had noticed was that our existing queries were depending on lower()
and seemed to completely ignore the GIN indexes, or rather it seemed to use whichever was the last one created even if that column was not used in the query:
sql
EXPLAIN ANALYZE VERBOSE
SELECT DISTINCT id
FROM people
WHERE user_category = 2
AND (user_type != 'Ogre')
AND (LOWER(last_name) LIKE LOWER('%a62%'))
Results (comparing last_name but using email index):
sql
HashAggregate (cost=28997.16..29086.33 rows=8917 width=4) (actual time=175.204..175.554 rows=1677 loops=1)
Output: id
Group Key: people.id
-> Gather (cost=4016.73..28974.87 rows=8917 width=4) (actual time=39.947..181.936 rows=1677 loops=1)
Output: id
Workers Planned: 2
Workers Launched: 2
-> Parallel Bitmap Heap Scan on public.people (cost=3016.73..27083.17 rows=3715 width=4) (actual time=22.037..156.233 rows=559 loops=3)
Output: id
Recheck Cond: (people.user_category = 2)
Filter: (((people.user_type)::text 'Ogre'::text) AND (lower((people.last_name)::text) ~~ '%a62%'::text))
Rows Removed by Filter: 110670
Heap Blocks: exact=7011
Worker 0: actual time=13.573..147.844 rows=527 loops=1
Worker 1: actual time=13.138..147.867 rows=584 loops=1
-> Bitmap Index Scan on idx_people_gin_user_category_and_user_type_and_email (cost=0.00..3014.50 rows=334733 width=0) (actual time=35.445..35.445 rows=333686 loops=1)
Index Cond: (people.user_category = 2)
Planning Time: 7.546 ms
Execution Time: 189.186 ms
Whereas switching to ILIKE
sql
EXPLAIN ANALYZE VERBOSE
SELECT DISTINCT id
FROM people
WHERE user_category = 2
AND (user_type != 'Ogre')
AND (last_name ILIKE '%A62%')
Results are way faster, and using the expected index. What is it about the lower()
call that seems to make the planner skip a beat?
sql
Unique (cost=161.51..161.62 rows=22 width=4) (actual time=27.144..27.570 rows=1677 loops=1)
Output: id
-> Sort (cost=161.51..161.56 rows=22 width=4) (actual time=27.137..27.256 rows=1677 loops=1)
Output: id
Sort Key: people.id
Sort Method: quicksort Memory: 127kB
-> Bitmap Heap Scan on public.people (cost=32.34..161.02 rows=22 width=4) (actual time=16.470..26.798 rows=1677 loops=1)
Output: id
Recheck Cond: ((people.user_category = 2) AND ((people.last_name)::text ~~* '%A62%'::text))
Filter: ((people.user_type)::text 'Ogre'::text)
Rows Removed by Filter: 766
Heap Blocks: exact=2291
-> Bitmap Index Scan on idx_people_gin_user_category_and_user_type_and_last_name (cost=0.00..32.33 rows=33 width=0) (actual time=16.058..16.058 rows=2443 loops=1)
Index Cond: ((people.user_category = 2) AND ((people.last_name)::text ~~* '%A62%'::text))
Planning Time: 10.577 ms
Execution Time: 27.746 ms
Next, adding another field into things...
sql
EXPLAIN ANALYZE VERBOSE
SELECT DISTINCT id
FROM people
WHERE user_category = 2
AND (user_type != 'Ogre')
AND (last_name ILIKE '%A62%')
AND (first_name ILIKE '%EAD%')
Is still pretty speedy overall
sql
Unique (cost=161.11..161.11 rows=1 width=4) (actual time=10.854..10.860 rows=12 loops=1)
Output: id
-> Sort (cost=161.11..161.11 rows=1 width=4) (actual time=10.853..10.854 rows=12 loops=1)
Output: id
Sort Key: people.id
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on public.people (cost=32.33..161.10 rows=1 width=4) (actual time=3.895..10.831 rows=12 loops=1)
Output: id
Recheck Cond: ((people.user_category = 2) AND ((people.last_name)::text ~~* '%A62%'::text))
Filter: (((people.user_type)::text 'Ogre'::text) AND ((people.first_name)::text ~~* '%EAD%'::text))
Rows Removed by Filter: 2431
Heap Blocks: exact=2291
-> Bitmap Index Scan on idx_people_gin_user_category_and_user_type_and_last_name (cost=0.00..32.33 rows=33 width=0) (actual time=3.173..3.173 rows=2443 loops=1)
Index Cond: ((people.user_category = 2) AND ((people.last_name)::text ~~* '%A62%'::text))
Planning Time: 0.257 ms
Execution Time: 10.897 ms
Yet going back to that extra, non-tuple index created at the bottom and filtering on email seems to utilize another index as part of things?
sql
EXPLAIN ANALYZE VERBOSE
SELECT DISTINCT id
FROM people
WHERE user_category = 2
AND (user_type != 'Ogre')
AND (last_name ILIKE '%A62%')
AND (email ILIKE '%0F9%')
Has a different path:
sql
Unique (cost=140.37..140.38 rows=1 width=4) (actual time=4.180..4.184 rows=7 loops=1)
Output: id
-> Sort (cost=140.37..140.38 rows=1 width=4) (actual time=4.180..4.180 rows=7 loops=1)
Output: id
Sort Key: people.id
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on public.people (cost=136.34..140.36 rows=1 width=4) (actual time=4.145..4.174 rows=7 loops=1)
Output: id
Recheck Cond: ((people.user_category = 2) AND ((people.last_name)::text ~~* '%A62%'::text) AND ((people.email)::text ~~* '%0F9%'::text))
Filter: ((people.user_type)::text 'Ogre'::text)
Rows Removed by Filter: 4
Heap Blocks: exact=11
-> BitmapAnd (cost=136.34..136.34 rows=1 width=0) (actual time=4.125..4.125 rows=0 loops=1)
-> Bitmap Index Scan on idx_people_gin_user_category_and_user_type_and_last_name (cost=0.00..32.33 rows=33 width=0) (actual time=3.089..3.089 rows=2443 loops=1)
Index Cond: ((people.user_category = 2) AND ((people.last_name)::text ~~* '%A62%'::text))
-> Bitmap Index Scan on idx_people_gin_email (cost=0.00..103.76 rows=10101 width=0) (actual time=0.879..0.879 rows=7138 loops=1)
Index Cond: ((people.email)::text ~~* '%0F9%'::text)
Planning Time: 0.291 ms
Execution Time: 4.217 ms
```
The cost seems negligibly lower, but wondering what this means for a fairly dynamic amount of columns that could be filtered by? Would it be ideal to make a non-tuple index for all fields too?
Sorry for the length, have spun my wheels for a while trying to figure through it all, but any insight would be pretty awesome (and it seems like there aren't a ton of GIN indexes like this, though maybe I am missing something more fundamental overall)
Jeff B.
(141 rep)
Nov 28, 2019, 10:56 PM
• Last activity: Nov 29, 2019, 02:55 AM
0
votes
2
answers
2429
views
Gin indexing in postgres
I have four columns project_id, custom_event(boolean), page_view(boolean), event_url(text). I usually do like a regex operation in event_url column and my query will be all the four columns in that order I specified. When I try to create like this CREATE INDEX event_url_index ON event_data USING gin...
I have four columns project_id, custom_event(boolean), page_view(boolean), event_url(text). I usually do like a regex operation in event_url column and my query will be all the four columns in that order I specified. When I try to create like this
CREATE INDEX event_url_index ON event_data
USING gin (project_id, custom_event, page_view, event_url gin_trgm_ops)
it's giving me an error like this
> data type boolean has no default operator class for access method "gin"
Can anyone help me how to solve this issue?
NAREN PUSHPARAJU
(21 rep)
Jun 6, 2019, 05:23 AM
• Last activity: Oct 2, 2019, 08:02 AM
4
votes
3
answers
4057
views
Any drawbacks of using GIN PostgreSQL index for an integer foreign key?
I have a large table (600 millions rows) with a foreign key `other_id` of type `integer`. A single value of the foreign key is repeated about 100 times on average. I need to have an index on that FK column because the data is frequently selected by `other_id`. My tests show that the `gin` index type...
I have a large table (600 millions rows) with a foreign key
other_id
of type integer
. A single value of the foreign key is repeated about 100 times on average. I need to have an index on that FK column because the data is frequently selected by other_id
.
My tests show that the gin
index type is about 10 times smaller and about 3 times as performant as the default btree
index (the performance was tested using SELECT
queries).
The question is are there any real-world drawbacks of using the gin
index instead of the btree
index? It looks like this index type isn't used much for a very common case like mine, i.e. an integer
foreign key. But my tests show vast performance gains. Why then gin
isn't recommended for such scenarios?
I had to execute CREATE EXTENSION btree_gin
to be able to use the gin
index for the integer
column.
I know about the UPDATE
being possibly slow due to FASTUPDATE
being enabled by default: https://dba.stackexchange.com/questions/202583/occasional-intermittent-slow-10-second-update-queries-on-postgresql-table-wi
I care only about the equality =
operator being able to use the index (also, IN (...)
with a possibly large number of values but I assume this is also equality).
Artur
(41 rep)
Aug 16, 2019, 10:13 AM
• Last activity: Aug 17, 2019, 01:37 PM
2
votes
2
answers
2748
views
What's the best way to use a gist index on tsrange to check if a time occurs after the range?
# table (simplified) Table "public.events" Column | Type | Modifiers ------------------+-----------------------------+-------------------------------------------------------- id | integer | not null default nextval('events_id_seq'::regclass) duration | integer | not null start_at | timestamp without...
# table (simplified)
Table "public.events"
Column | Type | Modifiers
------------------+-----------------------------+--------------------------------------------------------
id | integer | not null default nextval('events_id_seq'::regclass)
duration | integer | not null
start_at | timestamp without time zone |
Indexes:
"events_pkey" PRIMARY KEY, btree (id)
"my_idx" gist (tsrange(start_at, end_at(events.*), '[)'::text))
# function
CREATE FUNCTION end_at(rec events)
RETURNS timestamp without time zone
IMMUTABLE
LANGUAGE SQL
AS $$
SELECT $1.start_at + ($1.duration * ('00:00:01'::interval));
$$;
# what I am already doing successfully
The index is used for queries like this:
-- check if current time is within the start and end times
-- of event
where localtimestamp > tsrange(start_at, events.end_at, '[)')
. I'm pretty sure this is the semantics I want, and
explain analyze` says it's using the index, but it's a bit ugly and I'm wondering if there's a better way to express this (and also am vaguely uncertain it's the semantics I want, as I am new to ranges).
* where localtimestamp > upper(tsrange(start_at, events.end_at, '[)'))
+ a btree index on upper(tsrange(start_at, events.end_at, '[)'))
. This will work well, but requires keeping another index around.
* where localtimestamp > events.end_at
. + a btree index on events.end_at
. Same situation as above.
Is there a more elegant (or correct) way to achieve the first bullet point above?
Any other ideas for how to go about this?
John Bachir
(867 rep)
Jan 11, 2018, 07:00 PM
• Last activity: Apr 17, 2018, 09:46 PM
3
votes
1
answers
1918
views
Occasional/intermittent, slow (10+-second) UPDATE queries on PostgreSQL table with GIN index
## The Setup ## I am running PostgreSQL 9.4.15 on an SSD-based, quad-core Virtual Private Server (VPS) with Debian Linux (8). The relevant table has approximately 2-million records. Records are frequently being inserted and even more frequently (constantly -- every few seconds at least) updated. As...
## The Setup ##
I am running PostgreSQL 9.4.15 on an SSD-based, quad-core Virtual Private Server (VPS) with Debian Linux (8). The relevant table has approximately 2-million records.
Records are frequently being inserted and even more frequently (constantly -- every few seconds at least) updated. As far as I can tell, I have all appropriate indexes in place for these operations to execute snappily, however, and the vast majority of the time they do execute instantly (in milliseconds).
## The Problem ##
Every hour or so, however, one of the
UPDATE
queries takes an excessive amount of time -- like 10 seconds or more. And when this happens, it's usually like a "batch" of queries that get "blocked", all terminating at roughly the same time. It's as if one of the queries, or some other background operation (e.g., a vacuum) is blocking them all.
## Schema ##
The table, items
, has many columns, but I think the following are the only ones possibly relevant to the problem:
* id INTEGER NOT NULL
(primary key)
* search_vector TSVECTOR
* last_checkup_at TIMESTAMP WITHOUT TIME ZONE
And these are the relevant indexes:
* items_pkey PRIMARY KEY, btree (id)
* items_search_vector_idx gin (search_vector)
* items_last_checkup_at_idx btree (last_checkup_at)
# Likely Culprits #
Finally, after rigging together a little script to dump the contents of pg_stat_activity
(the list of all active Postgres connections/queries) whenever a "connection leak" warning is emitted to my log-file, I've narrowed down the possible culprit queries/columns (assuming the problem isn't external, like with a misbehaving VPS). These are, roughly, the kinds of queries that seem to appear again and again:
* UPDATE items SET last_checkup_at = $1 WHERE items.id = 123245
* UPDATE items SET search_vector = [..] WHERE items.id = 78901
Those are slightly paraphrased, but I truly doubt anything relevant is missing. Occasionally other queries (on other tables) appear as well, but those usually look like they were just "unlucky" to get caught in the mix.
Now, even though the first query (setting last_checkup_at
) tends to appear most of the time, the query that sets search_vector
seems to appear *every* time. (And in addition, there are probably many more instances of the first query being issued in general, making it more likely to just be there on happenstance.)
(I think I'm winnowing in on a solution here, but even if I have it in the bag I wanted to document the incident here for others... I've been mystified by this problem for months, before getting a chance to deep-dive.)
Chris W.
(201 rep)
Mar 28, 2018, 09:17 PM
• Last activity: Apr 5, 2018, 07:30 PM
3
votes
1
answers
1365
views
How to create some GIN index concurrently in Postgresql
I have a large table: `CREATE TABLE hh(h int8[] not null, file int8 not null)`, and GIN-index over `h` field (`CREATE INDEX ON hh USING gin(h)`). Index was created is about 8 minutes, so I decide to split this big table into two or more tables, and create indexes over all this tables. I assumed that...
I have a large table:
CREATE TABLE hh(h int8[] not null, file int8 not null)
, and GIN-index over h
field (CREATE INDEX ON hh USING gin(h)
). Index was created is about 8 minutes, so I decide to split this big table into two or more tables, and create indexes over all this tables.
I assumed that these indexes will be created in parallel and I get about double speedup (slightly less because of indexes and tables places on one disk). I assumed that for two tables I got 4-5 minutes (instead of 10 minutes for one table).
But that did not happen! Both indexes created 8-9 minutes, not 4-5 mins.
I try parallel creation of B-tree of indexes over other my big tables, and yes, PostgreSQL can create such indexes in parallel. So I assumed that works for GIN too.
I have enough RAM for indexes: 32GB RAM. Original table has 1Gb size, and original index 3Gb size. Splitted tables has 0.5Gb size and index 1.3Gb size.
maintenance_work_mem = 16Gb
, work_mem = 16Gb
.
I also try to put second table and index in another tablespace (located in other physical disk), but acceleration was very small (7.5 mins vs 8 mins). Why??
So, how to create GIN-indexes in parallel? Is this possible?
Dmitry Krylov
(31 rep)
Mar 6, 2018, 01:42 PM
• Last activity: Mar 9, 2018, 10:54 PM
1
votes
1
answers
2780
views
How to create pg_trgm compound indexes with date columns
SELECT col1, max(date1) as max_date FROM table WHERE col1 ILIKE 'name' GROUP BY col1 `TYPES: `Here col1 is `varchar` and date1 is `timestamp with time zone` data type. So created extension `CREATE EXTENSION pg_trgm` Then tried the following indexes and got the errors: 1: Issue: ERROR: operator class...
SELECT col1, max(date1) as max_date
FROM table
WHERE col1 ILIKE 'name'
GROUP BY col1
TYPES:
Here col1 is varchar
and date1 is timestamp with time zone
data type. So created extension CREATE EXTENSION pg_trgm
Then tried the following indexes and got the errors:
1: Issue: ERROR: operator class "gin_trgm_ops" does not accept data type timestamp with time zone
CREATE INDEX CONCURRENTLY trgm_table_col1_date_index
ON table
USING gin (provider_id, date1 gin_trgm_ops);
2: Issue: ERROR: operator class "text_pattern_ops" does not exist for access method "gin"
CREATE INDEX CONCURRENTLY trgm_table_col1_date_index
ON table
USING gin (provider_id, date1 text_pattern_ops);
How can I create an index for the above query for faster execution? Any help will be really appreciated!
EDIT:

Atihska
(129 rep)
Jan 23, 2018, 07:30 PM
• Last activity: Jan 23, 2018, 09:51 PM
Showing page 1 of 20 total questions