Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes

1 answers

182 views

Composite multicolumn index for geopoint range and numeric range query

I am building an app where the server needs to select rows based on some criteria/filters. One of them is the location of the user and the radius at which the user want's to see posts and other filters such date range and filter for a value of another column. This is going to be for an ad-hoc event...

                                  I am building an app where the server needs to select rows based on some criteria/filters. One of them is the location of the user and the radius at which the user want's to see posts and other filters such date range and filter for a value of another column. This is going to be for an ad-hoc event discovery app.

I have read about PostGIS, its geometry,geography types and I know there is a native point datatype. Based on this  answer I understood that it is better to order from equality to range columns, even though I feel like geo point column should be the first.

Suppose the following few rows of a simplified events table (disregard the validity position data):

    id  event_title                  event_position   event_type  is_public  start_date
        (varchar)                    (point lat/lon)  (smallint)  (boolean)  (timestamptz)
    --  ---------------------------  ---------------  ---------   ---------  ----
     1  "John's Party"               (122,35)         0           0          2020-07-05
     2  "Revolution then Starbucks"  (123,30)         1           1          2020-07-06
     3  "Study for math exam"        (120,36)         2           1          2020-07-07
     4  "Party after exam"           (120,36)         1           1          2020-07-08
     5  "Hiking next to the city"    (95,40)          3           1          2020-07-09
     6  "Football match"             (-42,31)         4           1          2020-07-10

Imagine the table contains several thousand records at least, obviously not only 6. 

So in this table a user would be able to query public events close to (122,34) by 100km (suppose first three rows fall into this area) and of event types 0, 1 or 2 falling between dates 2020-07-05 and 2020-07-07. The user would get the rows with ID 2 and 3.

This is the query I want to optimize with an appropriate index. My question is, how is it possible to create such an index? I thought about GiST or GIN index but not sure how these could help. Thanks!
                                

Najib (231 rep)

Jul 4, 2020, 02:59 PM • Last activity: Jul 5, 2025, 12:18 AM

0 votes

1 answers

184 views

How can I make an index covering a ts_vector expression and also an integer?

postgresql index full-text-search gin-index

I am using the ruby pg_search gem, where code like this: `Food.search("orange")` produces a query like this. SELECT "foods".* FROM "foods" INNER JOIN (SELECT "foods"."id" AS pg_search_id, (ts_rank((to_tsvector('simple', coalesce("foods"."name"::text, ''))), (to_tsquery('simple', ''' ' || 'orange' ||...

                                  I am using the ruby pg_search gem, where code like this: Food.search("orange") produces a query like this.

    SELECT "foods".*
    FROM "foods"
    INNER JOIN
      (SELECT "foods"."id" AS pg_search_id,
              (ts_rank((to_tsvector('simple', coalesce("foods"."name"::text, ''))), (to_tsquery('simple', ''' ' || 'orange' || ' ''' || ':*')), 0)) AS rank
       FROM "foods"
       WHERE ((to_tsvector('simple', coalesce("foods"."name"::text, ''))) @@ (to_tsquery('simple', ''' ' || 'orange' || ' ''' || ':*')))) AS pg_search_d4392ced9dff0647fed4ed ON "foods"."id" = pg_search_d4392ced9dff0647fed4ed.pg_search_id
    ORDER BY pg_search_d4392ced9dff0647fed4ed.rank DESC,
             "foods"."id" ASC;

Ive created this index, which seems to work quite well

    CREATE INDEX concurrently foods_gin ON foods USING gin
    (to_tsvector('simple', coalesce("foods"."name"::text, '')));

I have another column, category. there are 9 categories. For a given search as shown above for name, I want to show results from category 9 first, and then all the other categories (order doesn't matter).
ideally the results could be paginated, so a perfect query/index combo would be great.
is this possible to do, and/or will it be a truly gigantic index?

p.s. suggestions for a better name for the question are welcome
                                

John Bachir (867 rep)

May 5, 2021, 05:32 PM • Last activity: Jun 26, 2025, 02:08 PM

0 votes

1 answers

501 views

Should I consider GIN index for several columns?

postgresql index gin-index

In our application we have a field that can be used to match with several columns in our database. So if the user searches for `somestring` we will return all the rows `WHERE column1 = 'somestring' OR column2 = 'somestring'` etc. We are using Postgres. I am wondering if it is viable to use `GIN` ind...

                                  In our application we have a field that can be used to match with several columns in our database. So if the user searches for somestring we will return all the rows WHERE column1 = 'somestring' OR column2 = 'somestring' etc. We are using Postgres.

I am wondering if it is viable to use GIN index for this purpose. Normally i see GIN indexes used for long and variable-length data, however. So I am wondering there is any point using them to index short, fixed length tsvectors or should I just index each column separately with a BTREE and let the planner handle the search.

zefciu (101 rep)

Jan 18, 2024, 10:49 AM • Last activity: Jan 19, 2024, 02:21 AM

0 votes

1 answers

69 views

Missing GIN support function (4 or 6) for attribute 1 of index

postgresql postgresql-14 btree gin-index

I have a problem creating GIN indexes in postgres 14.9 and GIN 1.3 The following is the query: CREATE TABLE assert.test_gin(col1 func.doc_sub_type[]); CREATE INDEX ON assert.test_gin USING gin(col1); And here is the results of running the above: ERROR: missing GIN support function (4 or 6) for attri...

                                  I have a problem creating GIN indexes in postgres 14.9 and GIN 1.3
The following is the query:

    CREATE TABLE assert.test_gin(col1 func.doc_sub_type[]);
    CREATE INDEX ON assert.test_gin USING gin(col1);

And here is the results of running the above:

    ERROR:  missing GIN support function (4 or 6) for attribute 1 of index "test_gin_col1_idx" 
    SQL state: XX000

I check pg_catalog and public schema GIN support functions. 4 and 6 are according to the documentation (https://www.postgresql.org/docs/current/gin-extensibility.html)  are bool consistent and GinTernaryValue triConsistent respectively, which both exist in my pg_catalog but not in public schema.
Please help!!!!!
                                

Ali Azimi (1 rep)

Jan 15, 2024, 03:19 PM • Last activity: Jan 15, 2024, 04:21 PM

1 votes

1 answers

263 views

Postgresql strange behavior with non-ascii characters with triagram index present

postgresql query-performance gist-index gin-index

I see some strange behavior when using either `gin_trgm_ops` or `gist_trgm_ops` indexes. There seems to be quite a difference in plans when using say `ILIKE` or `~` and search for ascii phrases vs multi-byte char phrases. As if there is higher cost when the operand is a non-ascii operand. Is what I...

I see some strange behavior when using either gin_trgm_ops or gist_trgm_ops indexes. There seems to be quite a difference in plans when using say ILIKE or ~ and search for ascii phrases vs multi-byte char phrases. As if there is higher cost when the operand is a non-ascii operand. Is what I am seeing expected? What is the reason for it? I tried it on Postgreql 12 and 13 latest of each. Here is a scenario:

CREATE DATABASE postgres
    WITH
    OWNER = postgres
    ENCODING = 'UTF8'
    LC_COLLATE = 'en_US.utf8'
    LC_CTYPE = 'en_US.utf8'
    TABLESPACE = pg_default
    CONNECTION LIMIT = -1
    IS_TEMPLATE = False;

-- snip

CREATE TABLE test_table (
    id uuid PRIMARY KEY,
    label varchar
);

-- insert 1m rows

VACUUM ANALYZE test_table;

In the data set I have 10 labels containing 'acl' and 10 containing '定す'. When using GIN index

CREATE INDEX test_table_label_gin_idx
    ON test_table USING gin
    (label gin_trgm_ops);

I see the following.

EXPLAIN ANALYZE SELECT * FROM test_table WHERE label ILIKE '%定す%' LIMIT 100;

Limit  (cost=1000.00..16573.18 rows=100 width=52) (actual time=392.153..395.095 rows=10 loops=1)
  ->  Gather  (cost=1000.00..16728.91 rows=101 width=52) (actual time=392.135..394.830 rows=10 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Parallel Seq Scan on test_table  (cost=0.00..15718.81 rows=42 width=52) (actual time=382.922..388.082 rows=3 loops=3)
              Filter: ((label)::text ~~* '%定す%'::text)
              Rows Removed by Filter: 338417
Planning Time: 0.656 ms
Execution Time: 395.233 ms


EXPLAIN ANALYZE SELECT * FROM test_table WHERE label ILIKE '%acl%' LIMIT 100;

Limit  (cost=28.78..400.51 rows=100 width=52) (actual time=0.072..0.406 rows=10 loops=1)
  ->  Bitmap Heap Scan on test_table  (cost=28.78..404.23 rows=101 width=52) (actual time=0.053..0.197 rows=10 loops=1)
        Recheck Cond: ((label)::text ~~* '%acl%'::text)
        Heap Blocks: exact=10
        ->  Bitmap Index Scan on test_table_label_gin_idx  (cost=0.00..28.76 rows=101 width=0) (actual time=0.025..0.034 rows=10 loops=1)
              Index Cond: ((label)::text ~~* '%acl%'::text)
Planning Time: 0.231 ms
Execution Time: 0.542 ms

With GIST

DROP INDEX test_table_label_gin_idx;

CREATE INDEX test_table_label_gist_idx
    ON test_table USING gist
    (label gist_trgm_ops);

I see

EXPLAIN ANALYZE SELECT * FROM test_table WHERE label ILIKE '%定す%' LIMIT 100;

Limit  (cost=13.19..384.92 rows=100 width=52) (actual time=303.772..1557.498 rows=10 loops=1)
  ->  Bitmap Heap Scan on test_table  (cost=13.19..388.64 rows=101 width=52) (actual time=303.752..1557.286 rows=10 loops=1)
        Recheck Cond: ((label)::text ~~* '%定す%'::text)
        Rows Removed by Index Recheck: 1015250
        Heap Blocks: exact=10431
        ->  Bitmap Index Scan on test_table_label_gist_idx  (cost=0.00..13.17 rows=101 width=0) (actual time=301.046..301.053 rows=1015260 loops=1)
              Index Cond: ((label)::text ~~* '%定す%'::text)
Planning Time: 0.215 ms
Execution Time: 1557.643 ms


EXPLAIN ANALYZE SELECT * FROM test_table WHERE label ILIKE '%acl%' LIMIT 100;

Limit  (cost=13.19..384.92 rows=100 width=52) (actual time=257.385..257.751 rows=10 loops=1)
  ->  Bitmap Heap Scan on test_table  (cost=13.19..388.64 rows=101 width=52) (actual time=257.366..257.551 rows=10 loops=1)
        Recheck Cond: ((label)::text ~~* '%acl%'::text)
        Heap Blocks: exact=10
        ->  Bitmap Index Scan on test_table_label_gist_idx  (cost=0.00..13.17 rows=101 width=0) (actual time=257.319..257.328 rows=10 loops=1)
              Index Cond: ((label)::text ~~* '%acl%'::text)
Planning Time: 0.377 ms
Execution Time: 257.948 ms

Just changing the the characters of the operand change the plan quite a lot. **Edit**

SELECT show_trgm('定す');

"{0x145ed8,0x6628fa,0x6cb12d}"

SELECT encode('定す', 'escape')

\345\256\232\343\201\231

This issue seems similar https://dba.stackexchange.com/questions/63193/postgresql-not-using-gin-trigram-index-when-performing-non-ascii-like-query

Pawel Zieminski (125 rep)

Oct 6, 2023, 08:38 PM • Last activity: Oct 7, 2023, 03:06 AM

0 votes

1 answers

1237 views

Queries using GIN index on f_unaccent() seem slow?

postgresql performance postgresql-10 gist-index gin-index postgresql-performance

I have almost 20M rows in one table that contains product, including their names. I wish to search fulltext and fast by the name so I created this index: CREATE INDEX uprice_item_occurrence_unaccent_name_trgm_idx ON price_item_occurrence USING gin (f_unaccent(name) gin_trgm_ops);; I was hopping the...

                                  I have almost 20M rows in one table that contains product, including their names.

I wish to search fulltext and fast by the name so I created this index:

    CREATE INDEX uprice_item_occurrence_unaccent_name_trgm_idx ON price_item_occurrence USING gin (f_unaccent(name) gin_trgm_ops);;

I was hopping the following query would take less then (lets say) 500ms

    select * from price_item_occurrence as oo
    where f_unaccent(oo.name) % f_unaccent('iphone');

But it takes almost 2s:

    postgres=# explain analyze select * from price_item_occurrence as oo where f_unaccent(oo.name) % f_unaccent('iphone');
                                                                                 QUERY PLAN                                                                              
    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
     Bitmap Heap Scan on price_item_occurrence oo  (cost=1956.42..63674.14 rows=16570 width=287) (actual time=247.918..1880.759 rows=94 loops=1)
       Recheck Cond: (f_unaccent((name)::text) % 'iphone'::text)
       Rows Removed by Index Recheck: 87838
       Heap Blocks: exact=76663
       ->  Bitmap Index Scan on uprice_item_occurrence_unaccent_name_trgm_idx  (cost=0.00..1952.28 rows=16570 width=0) (actual time=195.418..195.418 rows=88962 loops=1)
             Index Cond: (f_unaccent((name)::text) % 'iphone'::text)
     Planning time: 0.444 ms
     Execution time: 1880.833 ms


It's possible that the DB is busy, but i am not sure.

I tried to play with select set_limit(0.9); (increased) and it helps a bit but not much.

I am using Postgres 10, can change the Postgres config and I am open for suggestions.

I tried it with ilike and it improved by some:

    postgres=# explain analyze select * from price_item_occurrence as oo where f_unaccent(oo.name) ilike ('%' || f_unaccent('iphone') || '%');
                                                                                 QUERY PLAN                                                                             
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------
     Bitmap Heap Scan on price_item_occurrence oo  (cost=3135.08..416823.45 rows=166075 width=286) (actual time=50.258..670.085 rows=65917 loops=1)
       Recheck Cond: (f_unaccent((name)::text) ~~* '%iphone%'::text)
       Rows Removed by Index Recheck: 10
       Heap Blocks: exact=59750
       ->  Bitmap Index Scan on uprice_item_occurrence_unaccent_name_trgm_idx  (cost=0.00..3093.56 rows=166075 width=0) (actual time=37.385..37.385 rows=67700 loops=1)
             Index Cond: (f_unaccent((name)::text) ~~* '%iphone%'::text)
     Planning time: 0.545 ms
     Execution time: 675.776 ms
    (8 rows)

About 2x faster.

i tried limit 10:

    postgres=# explain analyze select * from price_item_occurrence as oo where f_unaccent(oo.name) % f_unaccent('iphone') limit 10;
                                                                                    QUERY PLAN                                                                                
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     Limit  (cost=373.27..410.51 rows=10 width=287) (actual time=268.718..589.131 rows=10 loops=1)
       ->  Bitmap Heap Scan on price_item_occurrence oo  (cost=373.27..62493.45 rows=16680 width=287) (actual time=268.715..589.123 rows=10 loops=1)
             Recheck Cond: (f_unaccent((name)::text) % 'iphone'::text)
             Rows Removed by Index Recheck: 18917
             Heap Blocks: exact=17100
             ->  Bitmap Index Scan on uprice_item_occurrence_unaccent_name_trgm_idx  (cost=0.00..369.10 rows=16680 width=0) (actual time=165.958..165.958 rows=69268 loops=1)
                   Index Cond: (f_unaccent((name)::text) % 'iphone'::text)
     Planning time: 0.397 ms
     Execution time: 589.187 ms
    (9 rows)

this is also faster, maybe almost good enough
                                

Michal (3 rep)

Apr 9, 2018, 07:19 PM • Last activity: Aug 1, 2023, 05:00 PM

1 votes

1 answers

331 views

How to search for partial matches from a GIN index in a multi-word PostgreSQL query?

postgresql full-text-search gin-index

I have a GIN index in a Postgres database that I'm currently using to do full text search. So if a tsvector field contains the vectorized form of "mary had 4 little lambs" then I could search for "4 little lambs" with: SELECT * FROM mytable WHERE text_index @@ to_tsquery('pg_catalog.english', '''4 l...

                                  I have a GIN index in a Postgres database that I'm currently using to do full text search. So if a tsvector field contains the vectorized form of "mary had 4 little lambs" then I could search for "4 little lambs" with:

    SELECT * FROM mytable WHERE text_index @@ to_tsquery('pg_catalog.english', '''4 little lambs'':*');

However, if I try to search for "4 lambs" with:

    SELECT * FROM mytable WHERE text_index @@ to_tsquery('pg_catalog.english', '''4 lambs'':*');

this query returns no results.

Why isn't the index finding any partial matches? What's wrong with my syntax?

Cerin (1425 rep)

Dec 6, 2022, 01:43 AM • Last activity: Dec 6, 2022, 02:36 AM

2 votes

2 answers

2159 views

Does PostgreSQL btree_gin extension use btree or gin data structure?

postgresql postgresql-extensions gin-index

In order to define a GIN index on an array and scalar field (for example) you need to enable the `btree_gin` extension in PostgreSQL. Then you can define your indexes as a GIN index. However I don't understand if, under the hood, PostgreSQL is using a btree structure or a GIN structure.

                                  In order to define a GIN index on an array and scalar field (for example) you need to enable the btree_gin extension in PostgreSQL. Then you can define your indexes as a GIN index.

However I don't understand if, under the hood, PostgreSQL is using a btree structure or a GIN structure.

collimarco (653 rep)

Jan 16, 2020, 12:08 PM • Last activity: Oct 26, 2022, 08:09 PM

1 votes

0 answers

671 views

Optimize select query with OR and ILIKE

postgresql select string-searching gin-index

I have the following query for my PostgreSQL 13 database: SELECT * FROM "user" INNER JOIN "comment" on "comment"."user_id" = "user"."id" WHERE "user"."biography" ILIKE '%SomeString%' OR "comment"."text" ILIKE '%SomeString%'; I have a GIN index defined on `user` `biography` and another one on `commen...

                                  I have the following query for my PostgreSQL 13 database:

    SELECT * FROM "user" 
    INNER JOIN "comment" on "comment"."user_id" = "user"."id"
    WHERE "user"."biography" ILIKE '%SomeString%' 
       OR "comment"."text" ILIKE '%SomeString%';

I have a GIN index defined on user biography and another one on comment text

    CREATE INDEX user_bio_idx ON "user" USING gin ("biography" gin_trgm_ops);
    CREATE INDEX comment_txt_idx ON "comment" USING gin ("text" gin_trgm_ops);

Unfortunately, running EXPLAIN ANALYZE ... on my select query reveals that the indexes are not being used and the query is taking 4 to 5 seconds.

How can I improve its performance?  (I already tried UNION and even concatenating the two columns)

Kamal Karain (11 rep)

Mar 16, 2022, 10:18 PM • Last activity: Sep 29, 2022, 11:07 PM

0 votes

0 answers

458 views

What should I choose between jsonb array and jsonb map for faster querying?

postgresql index jsonb gin-index

I have a jsonb column which will consist of some keys and the rows will be searchable using those keys. Considering that I will be applying GIN index on the jsonb column what should be the ideal structure for faster lookups ? A jsonb array consisting of all the items or a jsonb map where the items w...

                                  I have a jsonb column which will consist of some keys and the rows will be searchable using those keys. Considering that I will be applying GIN index on the jsonb column what should be the ideal structure for faster lookups ? A jsonb array consisting of all the items or a jsonb map where the items will be the keys.
                                

user2375245 (1 rep)

Sep 8, 2022, 05:46 PM

7 votes

4 answers

5982 views

PostgreSQL GIN index not used when ts_query language is fetched from a column

postgresql full-text-search gin-index

I've got a table that stores some multilingual content: CREATE TABLE search ( content text NOT NULL, language regconfig NOT NULL, fulltext tsvector ); CREATE INDEX search_fulltext ON search USING GIN(fulltext); INSERT INTO search (language, content) VALUES ('dutch', 'Als achter vliegen vliegen vlieg...

                                  I've got a table that stores some multilingual content:

    CREATE TABLE search (
      content text NOT NULL,
      language regconfig NOT NULL,
      fulltext tsvector
    );

    CREATE INDEX search_fulltext ON search USING GIN(fulltext);

    INSERT INTO search (language, content) VALUES 
      ('dutch', 'Als achter vliegen vliegen vliegen vliegen vliegen vliegen achterna'),
      ('dutch', 'Langs de koele kali liep een kale koeli met een kilo kali op zijn kale koeli-kop.'),
      ('dutch', 'Moeder sneed zeven scheve sneden brood'),
      ('english', 'I saw Susie sitting in a shoe shine shop. Where she sits she shines, and where she shines she sits.'),
      ('english', 'How can a clam cram in a clean cream can?'),
      ('english', 'Can you can a can as a canner can can a can?');

    UPDATE search SET fulltext = to_tsvector(language, content);

To make sure I always search in the correct language I use these queries:

    SELECT FROM search WHERE fulltext @@ to_tsquery(language, 'shine');
    (1 row)

    SELECT FROM search WHERE fulltext @@ to_tsquery(language, 'vlieg');
    (1 row)

Because hardcoding the language doesn't give the correct results:

    SELECT FROM search WHERE fulltext @@ to_tsquery('dutch', 'shine');
    (0 rows)

    SELECT FROM search WHERE fulltext @@ to_tsquery('english', 'vlieg');
    (0 rows)

The problem however is that PostgreSQL doesn't use the GIN index when using the first set of queries and instead does a sequential scan:

(Note: I've disabled scanning using SET enable_seqscan = OFF; for these examples because of the low amount of rows)

    EXPLAIN ANALYZE SELECT * FROM search WHERE fulltext @@ to_tsquery(language, 'shine');
    ---
    Seq Scan on search  (cost=0.00..17.35 rows=2 width=136) (actual time=0.040..0.044 rows=1 loops=1)
        Filter: (fulltext @@ to_tsquery(language, 'shine'::text))
        Rows Removed by Filter: 5
    Planning time: 0.039 ms
    Execution time: 0.064 ms
    (5 rows)

While it does when hardcoding a language:

    EXPLAIN ANALYZE SELECT FROM search WHERE fulltext @@ to_tsquery('dutch', 'vlieg');
    ---
    Bitmap Heap Scan on search  (cost=12.63..23.66 rows=82 width=0) (actual time=0.044..0.044 rows=1 loops=1)
      Recheck Cond: (fulltext @@ '''vlieg'''::tsquery)
      Heap Blocks: exact=1
      ->  Bitmap Index Scan on search_fulltext  (cost=0.00..12.61 rows=82 width=0) (actual time=0.037..0.037 rows=1 loops=1)
            Index Cond: (fulltext @@ '''vlieg'''::tsquery)
    Planning time: 0.128 ms
    Execution time: 0.065 ms
    (7 rows)


So my question is: Is it at all possible to use a column in the ts_query to use the correct language config and still have Postgres use the GIN index? 

I'm using PostgreSQL 9.4.

*EDIT*:

Here's the execution plan from the **real** table:

Using a column for language config:

    Seq Scan on search  (cost=0.00..8727.25 rows=188 width=0) (actual time=0.725..352.307 rows=1689 loops=1)
      Filter: (fulltext @@ to_tsquery(language_config, 'example'::text))
      Rows Removed by Filter: 35928
    Planning time: 0.053 ms
    Execution time: 352.915 ms

When hardcoding the language:

    Bitmap Heap Scan on search  (cost=28.65..4088.92 rows=1633 width=0) (actual time=0.514..10.475 rows=1684 loops=1)
      Recheck Cond: (fulltext @@ '''exampl'''::tsquery)
      Heap Blocks: exact=1522
      ->  Bitmap Index Scan on search_fulltext  (cost=0.00..28.24 rows=1633 width=0) (actual time=0.333..0.333 rows=1684 loops=1)
            Index Cond: (fulltext @@ '''exampl'''::tsquery)
    Planning time: 0.180 ms
    Execution time: 10.564 ms

*EDIT #2*

Tried it with Postgres 9.5, same result
                                

jaap3 (183 rep)

Sep 15, 2016, 11:42 AM • Last activity: Jul 8, 2022, 07:19 PM

0 votes

2 answers

1329 views

PostgreSQL btree index for int with character varying (ILIKE) not working

postgresql btree gin-index

Hello I have about 50 million forums and each forum has over 30 million topics which this is my topic structure in PostgreSQL CREATE TABLE public.forum_topic ( "forum" integer NOT NULL, "user" integer NOT NULL, "submit" integer NOT NULL, "subject" character varying(64) NOT NULL, "content" character...

                                  Hello I have about 50 million forums and each forum has over 30 million topics which this is my topic structure in PostgreSQL

    CREATE TABLE public.forum_topic
    (
        "forum" integer NOT NULL,
        "user" integer NOT NULL,
        "submit" integer NOT NULL,
        "subject" character varying(64) NOT NULL,
        "content" character varying(8192) NOT NULL,
        "ip" integer NOT NULL,
        PRIMARY KEY ("forum", "user", "submit")
    );

This is my search query, find every topic with keyword 'PHP' in it (in its subject) in forum 1000 (forum ID)

    SELECT * FROM "forum_topic" WHERE "forum"=1000 AND "subject" ILIKE '%PHP%' LIMIT 10

I know that i must create an index for this kind of search, and I did.

    CREATE INDEX forum_topic_forum_subject_idx
        ON public.forum_topic USING btree
        (forum, subject varchar_pattern_ops);

but when I executed the select query (top), there is no sign of using 'forum_topic_forum_subject_idx' index that I created, and it takes 10 seconds to execute it  !!!!!

    Seq Scan on forum_topic as forum_topic (rows=0 loops=5)
    Filter: (((subject)::text ~~* '%PHP%'::text) AND (forum = 1000))
    Rows Removed by Filter: 9998001

Do you have any idea for my situation? Do you have a better method for this kind of index?

I'm using PostgreSQL 13

**---- Update ----**

I created a gin index:

    create index idx_forum_topic on forum_topic using gin (forum,subject gin_trgm_ops);

but something's wired happened ! I executed 2 queries and one is executed based on the 'idx_forum_topic' index and one is executed based on 'seq scan' !!!!

    SELECT * FROM "forum_topic" WHERE "forum"=26854 AND "subject" ILIKE '%mmap2%' LIMIT 2;

result (1 second):

    Bitmap Heap Scan on forum_topic as forum_topic (rows=1 loops=1)
    Recheck Cond: ((forum = 26854) AND ((subject)::text ~~* '%mmap2%'::text))
    Heap Blocks: exact=1

and

    SELECT * FROM "forum_topic" WHERE "forum"=2 AND "subject" ILIKE '%mmap2%' LIMIT 2;

result (27 seconds)

    Seq Scan on forum_topic as forum_topic (rows=1 loops=1)
    Filter: (((subject)::text ~~* '%mmap2%'::text) AND (forum = 2))
    Rows Removed by Filter: 49990003

!!! What is the problem !!! 
                                

HelloMachine (101 rep)

Jun 23, 2021, 08:49 AM • Last activity: Jun 23, 2021, 03:56 PM

0 votes

0 answers

502 views

Convert jsonb column into view for Grafana

postgresql postgresql-performance view materialized-view gin-index

I have a partially dynamic data that I have to save into Postgres 13 and then visualize it in Grafana. Example for one document insert (partial) { "Timestamp": { "$date": { "$numberLong": "1623224386151" } }, "Strings": [], "Coords": [ { "key": "GreySegm", "value": { "X": 734, "Y": 529, "Angle": -0....

                                  I have a partially dynamic data that I have to save into Postgres 13 and then visualize it in Grafana.

Example for one document insert (partial)

    {
    "Timestamp": {
        "$date": {
            "$numberLong": "1623224386151"
        }
    },
    "Strings": [],
    "Coords": [
        {
            "key": "GreySegm",
            "value": {
                "X": 734,
                "Y": 529,
                "Angle": -0.17575337363989894
            }
        }
    ],
    "DataMatrices": [],
    "SegmentationRes": [
        {
            "key": "GreySegm",
            "value": {
                "AlgResultBase": {
                    "resultCode": 0,
                    "errorDescr": ""
                },
                "Regions": [
                    {
                        "Name": "reg0",
                        "Id": 1,
                        "Status": "NotSet",
                        "SearchRegionName": "GreySegm",
                        "SearchRegionId": 0,
                        "Position": {
                            "X": 733.8323364257813,
                            "Y": 529.5650634765625,
                            "Angle": -0.1757533699274063
                        }
                    }
                ]
            }
        }       
    ],
    "ShapeControl": []
 

Data is generally a time series data, so no updates and deletes, only inserts. It is about 10k inserts per day but of course grows over months.
Grafana doesn't support user friendly DB navigation to build queries if the only column is jsonb.
So I have decided to create a view/s and let Grafan use them. This works perfectly but when table grows to thousands of rows queries become extremely slow since no indexes are there, I suspect.

Does this approach with views makes sense or am I abusing the DB design?

**My ideas**

1. to use material views and then apply indexes on them, this should help then. My problem is with updates. How to update views relatively frequent and not over the whole table?
I do want Grafana to be kind of real time with some lag but definitely not an offline tool with the data from yesterday because I update material view once a day at midnight by cron.

2. to create indexes on jsonb properties (GIN indexes) and write queries to jsonb directly in Grafana. Then no more user friendliness, only hardcore. And indexes are much less intuitive to build.
                                

ivans (11 rep)

Jun 10, 2021, 12:02 PM

4 votes

1 answers

3267 views

Understanding composite BTREE + GIN_TRGM_OPS index prioritization & odd lower() behavior

postgresql index index-tuning postgresql-11 gin-index

hoping someone can try to help me decrypt some index behavior. I'm working on enabling some simple `contains` type lookups on various user-data columns (~varchar Sort (cost=52220.05..52777.38 rows=222932 width=4) (actual time=251.069..285.652 rows=222408 loops=1) Output: id Sort Key: people.id Sort...

hoping someone can try to help me decrypt some index behavior. I'm working on enabling some simple contains type lookups on various user-data columns (~varchar Sort (cost=52220.05..52777.38 rows=222932 width=4) (actual time=251.069..285.652 rows=222408 loops=1) Output: id Sort Key: people.id Sort Method: external merge Disk: 3064kB -> Bitmap Heap Scan on public.people (cost=3070.23..29368.23 rows=222932 width=4) (actual time=35.156..198.549 rows=222408 loops=1) Output: id Recheck Cond: (people.user_category = 2) Filter: ((people.user_type)::text 'Ogre'::text) Rows Removed by Filter: 111278 Heap Blocks: exact=21277 -> Bitmap Index Scan on idx_people_gin_user_category_and_user_type_and_email (cost=0.00..3014.50 rows=334733 width=0) (actual time=32.017..32.017 rows=333686 loops=1) Index Cond: (people.user_category = 2) Planning Time: 0.293 ms Execution Time: 359.247 ms

Is the original b-tree totally redundant at this point? I expected it might still be picked by the planner if only those two columns were used if the b-tree was faster for those data types but that seems like it is not the case.

Next, I had noticed was that our existing queries were depending on lower() and seemed to completely ignore the GIN indexes, or rather it seemed to use whichever was the last one created even if that column was not used in the query:

sql EXPLAIN ANALYZE VERBOSE SELECT DISTINCT id FROM people WHERE user_category = 2 AND (user_type != 'Ogre') AND (LOWER(last_name) LIKE LOWER('%a62%'))

Results (comparing last_name but using email index):

sql HashAggregate (cost=28997.16..29086.33 rows=8917 width=4) (actual time=175.204..175.554 rows=1677 loops=1) Output: id Group Key: people.id -> Gather (cost=4016.73..28974.87 rows=8917 width=4) (actual time=39.947..181.936 rows=1677 loops=1) Output: id Workers Planned: 2 Workers Launched: 2 -> Parallel Bitmap Heap Scan on public.people (cost=3016.73..27083.17 rows=3715 width=4) (actual time=22.037..156.233 rows=559 loops=3) Output: id Recheck Cond: (people.user_category = 2) Filter: (((people.user_type)::text 'Ogre'::text) AND (lower((people.last_name)::text) ~~ '%a62%'::text)) Rows Removed by Filter: 110670 Heap Blocks: exact=7011 Worker 0: actual time=13.573..147.844 rows=527 loops=1 Worker 1: actual time=13.138..147.867 rows=584 loops=1 -> Bitmap Index Scan on idx_people_gin_user_category_and_user_type_and_email (cost=0.00..3014.50 rows=334733 width=0) (actual time=35.445..35.445 rows=333686 loops=1) Index Cond: (people.user_category = 2) Planning Time: 7.546 ms Execution Time: 189.186 ms

Whereas switching to ILIKE

sql EXPLAIN ANALYZE VERBOSE SELECT DISTINCT id FROM people WHERE user_category = 2 AND (user_type != 'Ogre') AND (last_name ILIKE '%A62%')

Results are way faster, and using the expected index. What is it about the lower() call that seems to make the planner skip a beat?

sql Unique (cost=161.51..161.62 rows=22 width=4) (actual time=27.144..27.570 rows=1677 loops=1) Output: id -> Sort (cost=161.51..161.56 rows=22 width=4) (actual time=27.137..27.256 rows=1677 loops=1) Output: id Sort Key: people.id Sort Method: quicksort Memory: 127kB -> Bitmap Heap Scan on public.people (cost=32.34..161.02 rows=22 width=4) (actual time=16.470..26.798 rows=1677 loops=1) Output: id Recheck Cond: ((people.user_category = 2) AND ((people.last_name)::text ~~* '%A62%'::text)) Filter: ((people.user_type)::text 'Ogre'::text) Rows Removed by Filter: 766 Heap Blocks: exact=2291 -> Bitmap Index Scan on idx_people_gin_user_category_and_user_type_and_last_name (cost=0.00..32.33 rows=33 width=0) (actual time=16.058..16.058 rows=2443 loops=1) Index Cond: ((people.user_category = 2) AND ((people.last_name)::text ~~* '%A62%'::text)) Planning Time: 10.577 ms Execution Time: 27.746 ms

Next, adding another field into things...

sql EXPLAIN ANALYZE VERBOSE SELECT DISTINCT id FROM people WHERE user_category = 2 AND (user_type != 'Ogre') AND (last_name ILIKE '%A62%') AND (first_name ILIKE '%EAD%')

Is still pretty speedy overall

sql Unique (cost=161.11..161.11 rows=1 width=4) (actual time=10.854..10.860 rows=12 loops=1) Output: id -> Sort (cost=161.11..161.11 rows=1 width=4) (actual time=10.853..10.854 rows=12 loops=1) Output: id Sort Key: people.id Sort Method: quicksort Memory: 25kB -> Bitmap Heap Scan on public.people (cost=32.33..161.10 rows=1 width=4) (actual time=3.895..10.831 rows=12 loops=1) Output: id Recheck Cond: ((people.user_category = 2) AND ((people.last_name)::text ~~* '%A62%'::text)) Filter: (((people.user_type)::text 'Ogre'::text) AND ((people.first_name)::text ~~* '%EAD%'::text)) Rows Removed by Filter: 2431 Heap Blocks: exact=2291 -> Bitmap Index Scan on idx_people_gin_user_category_and_user_type_and_last_name (cost=0.00..32.33 rows=33 width=0) (actual time=3.173..3.173 rows=2443 loops=1) Index Cond: ((people.user_category = 2) AND ((people.last_name)::text ~~* '%A62%'::text)) Planning Time: 0.257 ms Execution Time: 10.897 ms

Yet going back to that extra, non-tuple index created at the bottom and filtering on email seems to utilize another index as part of things?

sql EXPLAIN ANALYZE VERBOSE SELECT DISTINCT id FROM people WHERE user_category = 2 AND (user_type != 'Ogre') AND (last_name ILIKE '%A62%') AND (email ILIKE '%0F9%')

Has a different path:

sql Unique (cost=140.37..140.38 rows=1 width=4) (actual time=4.180..4.184 rows=7 loops=1) Output: id -> Sort (cost=140.37..140.38 rows=1 width=4) (actual time=4.180..4.180 rows=7 loops=1) Output: id Sort Key: people.id Sort Method: quicksort Memory: 25kB -> Bitmap Heap Scan on public.people (cost=136.34..140.36 rows=1 width=4) (actual time=4.145..4.174 rows=7 loops=1) Output: id Recheck Cond: ((people.user_category = 2) AND ((people.last_name)::text ~~* '%A62%'::text) AND ((people.email)::text ~~* '%0F9%'::text)) Filter: ((people.user_type)::text 'Ogre'::text) Rows Removed by Filter: 4 Heap Blocks: exact=11 -> BitmapAnd (cost=136.34..136.34 rows=1 width=0) (actual time=4.125..4.125 rows=0 loops=1) -> Bitmap Index Scan on idx_people_gin_user_category_and_user_type_and_last_name (cost=0.00..32.33 rows=33 width=0) (actual time=3.089..3.089 rows=2443 loops=1) Index Cond: ((people.user_category = 2) AND ((people.last_name)::text ~~* '%A62%'::text)) -> Bitmap Index Scan on idx_people_gin_email (cost=0.00..103.76 rows=10101 width=0) (actual time=0.879..0.879 rows=7138 loops=1) Index Cond: ((people.email)::text ~~* '%0F9%'::text) Planning Time: 0.291 ms Execution Time: 4.217 ms ``` The cost seems negligibly lower, but wondering what this means for a fairly dynamic amount of columns that could be filtered by? Would it be ideal to make a non-tuple index for all fields too? Sorry for the length, have spun my wheels for a while trying to figure through it all, but any insight would be pretty awesome (and it seems like there aren't a ton of GIN indexes like this, though maybe I am missing something more fundamental overall)

Jeff B. (141 rep)

Nov 28, 2019, 10:56 PM • Last activity: Nov 29, 2019, 02:55 AM

0 votes

2 answers

2429 views

Gin indexing in postgres

postgresql gin-index

I have four columns project_id, custom_event(boolean), page_view(boolean), event_url(text). I usually do like a regex operation in event_url column and my query will be all the four columns in that order I specified. When I try to create like this CREATE INDEX event_url_index ON event_data USING gin...

                                  I have four columns project_id, custom_event(boolean), page_view(boolean), event_url(text). I usually do like a regex operation in event_url column and my query will be all the four columns in that order I specified. When I try to create like this 

    CREATE INDEX event_url_index ON event_data
      USING gin (project_id, custom_event, page_view, event_url gin_trgm_ops)

it's giving me an error like this

> data type boolean has no default operator class for access method "gin"

Can anyone help me how to solve this issue?

NAREN PUSHPARAJU (21 rep)

Jun 6, 2019, 05:23 AM • Last activity: Oct 2, 2019, 08:02 AM

4 votes

3 answers

4057 views

Any drawbacks of using GIN PostgreSQL index for an integer foreign key?

postgresql index performance gin-index postgresql-performance

I have a large table (600 millions rows) with a foreign key `other_id` of type `integer`. A single value of the foreign key is repeated about 100 times on average. I need to have an index on that FK column because the data is frequently selected by `other_id`. My tests show that the `gin` index type...

                                  I have a large table (600 millions rows) with a foreign key other_id of type integer. A single value of the foreign key is repeated about 100 times on average. I need to have an index on that FK column because the data is frequently selected by other_id.

My tests show that the gin index type is about 10 times smaller and about 3 times as performant as the default btree index (the performance was tested using SELECT queries).

The question is are there any real-world drawbacks of using the gin index instead of the btree index? It looks like this index type isn't used much for a very common case like mine, i.e. an integer foreign key. But my tests show vast performance gains. Why then gin isn't recommended for such scenarios?

I had to execute CREATE EXTENSION btree_gin to be able to use the gin index for the integer column.

I know about the UPDATE being possibly slow due to FASTUPDATE being enabled by default: https://dba.stackexchange.com/questions/202583/occasional-intermittent-slow-10-second-update-queries-on-postgresql-table-wi 

I care only about the equality = operator being able to use the index (also, IN (...) with a possibly large number of values but I assume this is also equality).

Artur (41 rep)

Aug 16, 2019, 10:13 AM • Last activity: Aug 17, 2019, 01:37 PM

2 votes

2 answers

2748 views

What's the best way to use a gist index on tsrange to check if a time occurs after the range?

postgresql index range-types gin-index

                                  # table (simplified)

                                            Table "public.events"
          Column      |            Type             |                       Modifiers                        
    ------------------+-----------------------------+--------------------------------------------------------
     id               | integer                     | not null default nextval('events_id_seq'::regclass)
     duration         | integer                     | not null
     start_at         | timestamp without time zone | 
    Indexes:
        "events_pkey" PRIMARY KEY, btree (id)
        "my_idx" gist (tsrange(start_at, end_at(events.*), '[)'::text))

# function

    CREATE FUNCTION end_at(rec events)
      RETURNS timestamp without time zone
      IMMUTABLE
      LANGUAGE SQL
    AS $$
      SELECT $1.start_at + ($1.duration * ('00:00:01'::interval));
    $$;

# what I am already doing successfully

The index is used for queries like this:

    -- check if current time is within the start and end times
    -- of event
    where localtimestamp > tsrange(start_at, events.end_at, '[)'). I'm pretty sure this is the semantics I want, and explain analyze` says it's using the index, but it's a bit ugly and I'm wondering if there's a better way to express this (and also am vaguely uncertain it's the semantics I want, as I am new to ranges).
* where localtimestamp > upper(tsrange(start_at, events.end_at, '[)')) + a btree index on upper(tsrange(start_at, events.end_at, '[)')). This will work well, but requires keeping another index around.
* where localtimestamp > events.end_at. + a btree index on events.end_at. Same situation as above.


Is there a more elegant (or correct) way to achieve the first bullet point above?

Any other ideas for how to go about this?
                                

John Bachir (867 rep)

Jan 11, 2018, 07:00 PM • Last activity: Apr 17, 2018, 09:46 PM

3 votes

1 answers

1918 views

Occasional/intermittent, slow (10+-second) UPDATE queries on PostgreSQL table with GIN index

postgresql index performance gin-index postgresql-performance

## The Setup ## I am running PostgreSQL 9.4.15 on an SSD-based, quad-core Virtual Private Server (VPS) with Debian Linux (8). The relevant table has approximately 2-million records. Records are frequently being inserted and even more frequently (constantly -- every few seconds at least) updated. As...

                                  ## The Setup ##

I am running PostgreSQL 9.4.15 on an SSD-based, quad-core Virtual Private Server (VPS) with Debian Linux (8). The relevant table has approximately 2-million records.

Records are frequently being inserted and even more frequently (constantly -- every few seconds at least) updated. As far as I can tell, I have all appropriate indexes in place for these operations to execute snappily, however, and the vast majority of the time they do execute instantly (in milliseconds).

## The Problem ##

Every hour or so, however, one of the UPDATE queries takes an excessive amount of time -- like 10 seconds or more. And when this happens, it's usually like a "batch" of queries that get "blocked", all terminating at roughly the same time. It's as if one of the queries, or some other background operation (e.g., a vacuum) is blocking them all.

## Schema ##

The table, items, has many columns, but I think the following are the only ones possibly relevant to the problem:

  * id INTEGER NOT NULL (primary key)
  * search_vector TSVECTOR
  * last_checkup_at TIMESTAMP WITHOUT TIME ZONE

And these are the relevant indexes:

  * items_pkey PRIMARY KEY, btree (id)
  * items_search_vector_idx gin (search_vector)
  * items_last_checkup_at_idx btree (last_checkup_at)

# Likely Culprits #

Finally, after rigging together a little script to dump the contents of pg_stat_activity (the list of all active Postgres connections/queries) whenever a "connection leak" warning is emitted to my log-file, I've narrowed down the possible culprit queries/columns (assuming the problem isn't external, like with a misbehaving VPS). These are, roughly, the kinds of queries that seem to appear again and again:

  * UPDATE items SET last_checkup_at = $1 WHERE items.id = 123245
  * UPDATE items SET search_vector = [..] WHERE items.id = 78901

Those are slightly paraphrased, but I truly doubt anything relevant is missing. Occasionally other queries (on other tables) appear as well, but those usually look like they were just "unlucky" to get caught in the mix.

Now, even though the first query (setting last_checkup_at) tends to appear most of the time, the query that sets search_vector seems to appear *every* time. (And in addition, there are probably many more instances of the first query being issued in general, making it more likely to just be there on happenstance.)

(I think I'm winnowing in on a solution here, but even if I have it in the bag I wanted to document the incident here for others... I've been mystified by this problem for months, before getting a chance to deep-dive.)

Chris W. (201 rep)

Mar 28, 2018, 09:17 PM • Last activity: Apr 5, 2018, 07:30 PM

3 votes

1 answers

1365 views

How to create some GIN index concurrently in Postgresql

postgresql index index-tuning concurrency gin-index

I have a large table: `CREATE TABLE hh(h int8[] not null, file int8 not null)`, and GIN-index over `h` field (`CREATE INDEX ON hh USING gin(h)`). Index was created is about 8 minutes, so I decide to split this big table into two or more tables, and create indexes over all this tables. I assumed that...

                                  I have a large table: CREATE TABLE hh(h int8[] not null, file int8 not null), and GIN-index over h field (CREATE INDEX ON hh USING gin(h)). Index was created is about 8 minutes, so I decide to split this big table into two or more tables, and create indexes over all this tables.

I assumed that these indexes will be created in parallel and I get about double speedup (slightly less because of indexes and tables places on one disk). I assumed that for two tables I got 4-5 minutes (instead of 10 minutes for one table).

But that did not happen! Both indexes created 8-9 minutes, not 4-5 mins.

I try parallel creation of B-tree of indexes over other my big tables, and yes, PostgreSQL can create such indexes in parallel. So I assumed that works for GIN too.

I have enough RAM for indexes: 32GB RAM. Original table has 1Gb size, and original index 3Gb size. Splitted tables has 0.5Gb size and index 1.3Gb size.
maintenance_work_mem = 16Gb, work_mem = 16Gb.

I also try to put second table and index in another tablespace (located in other physical disk), but acceleration was very small (7.5 mins vs 8 mins). Why??

So, how to create GIN-indexes in parallel? Is this possible?

Dmitry Krylov (31 rep)

Mar 6, 2018, 01:42 PM • Last activity: Mar 9, 2018, 10:54 PM

1 votes

1 answers

2780 views

How to create pg_trgm compound indexes with date columns

postgresql index gin-index

SELECT col1, max(date1) as max_date FROM table WHERE col1 ILIKE 'name' GROUP BY col1 `TYPES: `Here col1 is `varchar` and date1 is `timestamp with time zone` data type. So created extension `CREATE EXTENSION pg_trgm` Then tried the following indexes and got the errors: 1: Issue: ERROR: operator class...

                                      SELECT col1, max(date1) as max_date
    FROM table
    WHERE col1 ILIKE 'name'
    GROUP BY col1

TYPES: Here col1 is varchar and date1 is timestamp with time zone data type. So created extension CREATE EXTENSION pg_trgm

Then tried the following indexes and got the errors: 

1: Issue: ERROR:  operator class "gin_trgm_ops" does not accept data type timestamp with time zone

    CREATE INDEX CONCURRENTLY trgm_table_col1_date_index
    ON table 
    USING gin (provider_id, date1 gin_trgm_ops);

2: Issue: ERROR:  operator class "text_pattern_ops" does not exist for access method "gin"

     CREATE INDEX CONCURRENTLY trgm_table_col1_date_index
        ON table 
        USING gin (provider_id, date1 text_pattern_ops);

How can I create an index for the above query for faster execution? Any help will be really appreciated!

EDIT:

Atihska (129 rep)

Jan 23, 2018, 07:30 PM • Last activity: Jan 23, 2018, 09:51 PM

Showing page 1 of 20 total questions