Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes

1 answers

182 views

Composite multicolumn index for geopoint range and numeric range query

I am building an app where the server needs to select rows based on some criteria/filters. One of them is the location of the user and the radius at which the user want's to see posts and other filters such date range and filter for a value of another column. This is going to be for an ad-hoc event...

                                  I am building an app where the server needs to select rows based on some criteria/filters. One of them is the location of the user and the radius at which the user want's to see posts and other filters such date range and filter for a value of another column. This is going to be for an ad-hoc event discovery app.

I have read about PostGIS, its geometry,geography types and I know there is a native point datatype. Based on this  answer I understood that it is better to order from equality to range columns, even though I feel like geo point column should be the first.

Suppose the following few rows of a simplified events table (disregard the validity position data):

    id  event_title                  event_position   event_type  is_public  start_date
        (varchar)                    (point lat/lon)  (smallint)  (boolean)  (timestamptz)
    --  ---------------------------  ---------------  ---------   ---------  ----
     1  "John's Party"               (122,35)         0           0          2020-07-05
     2  "Revolution then Starbucks"  (123,30)         1           1          2020-07-06
     3  "Study for math exam"        (120,36)         2           1          2020-07-07
     4  "Party after exam"           (120,36)         1           1          2020-07-08
     5  "Hiking next to the city"    (95,40)          3           1          2020-07-09
     6  "Football match"             (-42,31)         4           1          2020-07-10

Imagine the table contains several thousand records at least, obviously not only 6. 

So in this table a user would be able to query public events close to (122,34) by 100km (suppose first three rows fall into this area) and of event types 0, 1 or 2 falling between dates 2020-07-05 and 2020-07-07. The user would get the rows with ID 2 and 3.

This is the query I want to optimize with an appropriate index. My question is, how is it possible to create such an index? I thought about GiST or GIN index but not sure how these could help. Thanks!
                                

Najib (231 rep)

Jul 4, 2020, 02:59 PM • Last activity: Jul 5, 2025, 12:18 AM

2 votes

0 answers

44 views

How can I speed up GIST index query for most similar n items?

postgresql query-performance gist-index

I have the following tables and indices: ```sql CREATE TABLE IF NOT EXISTS users ( id NUMERIC(20, 0) NOT NULL DEFAULT NEXTVAL('users_sequence') PRIMARY KEY, first_name VARCHAR(512) DEFAULT NULL NULL, last_name VARCHAR(512) DEFAULT NULL NULL, full_name VARCHAR(1024) GENERATED ALWAYS AS (CASE WHEN fir...

I have the following tables and indices:

CREATE TABLE IF NOT EXISTS users
(
    id                  NUMERIC(20, 0)                           NOT NULL DEFAULT NEXTVAL('users_sequence') PRIMARY KEY,
    first_name          VARCHAR(512)   DEFAULT NULL              NULL,
    last_name           VARCHAR(512)   DEFAULT NULL              NULL,
    full_name VARCHAR(1024) GENERATED ALWAYS AS
        (CASE
        WHEN first_name IS NULL AND
             last_name IS NULL THEN NULL
        ELSE
             (TRIM(COALESCE(first_name, '') || ' ' || COALESCE(last_name, ''))) END) STORED,
    deleted_at          TIMESTAMP      DEFAULT NULL              NULL,
    -- Some ~20 columns
);

CREATE TABLE IF NOT EXISTS user_aliases
(
    id         NUMERIC(20, 0)            NOT NULL DEFAULT NEXTVAL('user_aliases_sequence') PRIMARY KEY,
    entry_id   NUMERIC(20, 0)            NOT NULL,
    first_name VARCHAR(512) DEFAULT NULL NULL,
    last_name  VARCHAR(512) DEFAULT NULL NULL,
    full_name  VARCHAR(1024) GENERATED ALWAYS AS
        (CASE
            WHEN first_name IS NULL AND
                 last_name IS NULL THEN NULL
            ELSE
                 (TRIM(COALESCE(first_name, '') || ' ' || COALESCE(last_name, ''))) END) STORED,
    deleted_at TIMESTAMP      DEFAULT NULL NULL,
    CONSTRAINT fk_user_aliases_entry_id FOREIGN KEY (entry_id) REFERENCES users (id) ON UPDATE CASCADE ON DELETE CASCADE
);

CREATE INDEX users_full_name_idx ON users USING GIST (full_name gist_trgm_ops) WHERE deleted_at IS NULL AND full_name IS NOT NULL;

CREATE INDEX user_aliases_full_name_idx ON user_aliases USING GIST (full_name gist_trgm_ops) WHERE deleted_at IS NULL AND full_name IS NOT NULL;

and my query:

SELECT id
FROM ((SELECT e.id,
              e.full_name
       FROM users e
       WHERE e.full_name % 'some name here'
         AND e.deleted_at IS NULL
       LIMIT 10)
      UNION
      (SELECT a.entry_id AS id,
              a.full_name
       FROM user_aliases a
       WHERE a.full_name % 'some name here'
         AND a.deleted_at IS NULL
       LIMIT 10)) filter_table
LIMIT 10

and this takes around ~300ms on average on a table with ~500K records, with the following execution plan:

Limit  (cost=88.71..88.91 rows=10 width=536) (actual time=322.001..322.009 rows=2 loops=1)
"  Output: e.id, e.full_name"
  Buffers: shared hit=44390
  ->  HashAggregate  (cost=88.71..88.91 rows=20 width=536) (actual time=321.998..322.004 rows=2 loops=1)
"        Output: e.id, e.full_name"
"        Group Key: e.id, e.full_name"
        Batches: 1  Memory Usage: 24kB
        Buffers: shared hit=44390
        ->  Append  (cost=0.41..88.61 rows=20 width=536) (actual time=158.361..321.966 rows=2 loops=1)
              Buffers: shared hit=44390
              ->  Limit  (cost=0.41..43.92 rows=10 width=27) (actual time=158.358..206.304 rows=2 loops=1)
"                    Output: e.id, e.full_name"
                    Buffers: shared hit=30910
                    ->  Index Scan using users_full_name_idx on master.users e  (cost=0.41..209.25 rows=48 width=27) (actual time=158.355..206.290 rows=2 loops=1)
"                          Output: e.id, e.full_name"
                          Index Cond: ((e.full_name)::text % 'some name here'::text)
                          Buffers: shared hit=30910
              ->  Limit  (cost=0.41..44.39 rows=10 width=37) (actual time=115.641..115.643 rows=0 loops=1)
"                    Output: a.entry_id, a.full_name"
                    Buffers: shared hit=13480
                    ->  Index Scan using user_aliases_full_name_idx on master.user_aliases a  (cost=0.41..92.77 rows=21 width=37) (actual time=115.633..115.633 rows=0 loops=1)
"                          Output: a.entry_id, a.full_name"
                          Index Cond: ((a.full_name)::text % 'some name here'::text)
                          Buffers: shared hit=13480
"Settings: effective_cache_size = '16GB', search_path = 'master', work_mem = '32MB'"
Planning Time: 1.210 ms
Execution Time: 322.081 ms

Now it is not the ideal query, since it LIMITs arbitrary records. When I update the query to:

SELECT id
FROM ((SELECT e.id,
              e.full_name
       FROM users e
       WHERE e.full_name % 'some name here'
         AND e.deleted_at IS NULL
       ORDER BY e.full_name  'some name here'
       LIMIT 10)
      UNION
      (SELECT a.entry_id AS id,
              a.full_name
       FROM user_aliases a
       WHERE a.full_name % 'some name here'
         AND a.deleted_at IS NULL
       ORDER BY a.full_name  'some name here'
       LIMIT 10)) filter_table
LIMIT 10

it now performs as it should, but the average execution time rises to ~700ms with the following plan:

Limit  (cost=88.81..89.01 rows=10 width=536) (actual time=324.831..324.839 rows=2 loops=1)
"  Output: ""*SELECT* 1"".id, ""*SELECT* 1"".full_name"
  Buffers: shared hit=44390
  ->  HashAggregate  (cost=88.81..89.01 rows=20 width=536) (actual time=324.827..324.834 rows=2 loops=1)
"        Output: ""*SELECT* 1"".id, ""*SELECT* 1"".full_name"
"        Group Key: ""*SELECT* 1"".id, ""*SELECT* 1"".full_name"
        Batches: 1  Memory Usage: 24kB
        Buffers: shared hit=44390
        ->  Append  (cost=0.41..88.71 rows=20 width=536) (actual time=232.209..324.809 rows=2 loops=1)
              Buffers: shared hit=44390
"              ->  Subquery Scan on ""*SELECT* 1""  (cost=0.41..44.07 rows=10 width=27) (actual time=232.207..232.219 rows=2 loops=1)"
"                    Output: ""*SELECT* 1"".id, ""*SELECT* 1"".full_name"
                    Buffers: shared hit=30910
                    ->  Limit  (cost=0.41..43.97 rows=10 width=31) (actual time=232.204..232.214 rows=2 loops=1)
"                          Output: e.id, e.full_name, (((e.full_name)::text  'some name here'::text))"
                          Buffers: shared hit=30910
                          ->  Index Scan using users_full_name_idx on master.users e  (cost=0.41..209.49 rows=48 width=31) (actual time=232.201..232.209 rows=2 loops=1)
"                                Output: e.id, e.full_name, ((e.full_name)::text  'some name here'::text)"
                                Index Cond: ((e.full_name)::text % 'some name here'::text)
                                Order By: ((e.full_name)::text  'some name here'::text)
                                Buffers: shared hit=30910
"              ->  Subquery Scan on ""*SELECT* 2""  (cost=0.41..44.54 rows=10 width=37) (actual time=92.579..92.581 rows=0 loops=1)"
"                    Output: ""*SELECT* 2"".id, ""*SELECT* 2"".full_name"
                    Buffers: shared hit=13480
                    ->  Limit  (cost=0.41..44.44 rows=10 width=41) (actual time=92.577..92.577 rows=0 loops=1)
"                          Output: a.entry_id, a.full_name, (((a.full_name)::text  'some name here'::text))"
                          Buffers: shared hit=13480
                          ->  Index Scan using user_aliases_full_name_idx on master.user_aliases a  (cost=0.41..92.88 rows=21 width=41) (actual time=92.572..92.573 rows=0 loops=1)
"                                Output: a.entry_id, a.full_name, ((a.full_name)::text  'some name here'::text)"
                                Index Cond: ((a.full_name)::text % 'some name here'::text)
                                Order By: ((a.full_name)::text  'some name here'::text)
                                Buffers: shared hit=13480
"Settings: effective_cache_size = '16GB', search_path = 'master', work_mem = '32MB'"
Planning Time: 1.115 ms
Execution Time: 324.906 ms

How do I get the most similar records, without generating the extra execution time overhead?

Hasan Can Saral (175 rep)

Jun 11, 2025, 10:41 AM

1 votes

1 answers

263 views

Postgresql strange behavior with non-ascii characters with triagram index present

postgresql query-performance gist-index gin-index

I see some strange behavior when using either `gin_trgm_ops` or `gist_trgm_ops` indexes. There seems to be quite a difference in plans when using say `ILIKE` or `~` and search for ascii phrases vs multi-byte char phrases. As if there is higher cost when the operand is a non-ascii operand. Is what I...

I see some strange behavior when using either gin_trgm_ops or gist_trgm_ops indexes. There seems to be quite a difference in plans when using say ILIKE or ~ and search for ascii phrases vs multi-byte char phrases. As if there is higher cost when the operand is a non-ascii operand. Is what I am seeing expected? What is the reason for it? I tried it on Postgreql 12 and 13 latest of each. Here is a scenario:

CREATE DATABASE postgres
    WITH
    OWNER = postgres
    ENCODING = 'UTF8'
    LC_COLLATE = 'en_US.utf8'
    LC_CTYPE = 'en_US.utf8'
    TABLESPACE = pg_default
    CONNECTION LIMIT = -1
    IS_TEMPLATE = False;

-- snip

CREATE TABLE test_table (
    id uuid PRIMARY KEY,
    label varchar
);

-- insert 1m rows

VACUUM ANALYZE test_table;

In the data set I have 10 labels containing 'acl' and 10 containing '定す'. When using GIN index

CREATE INDEX test_table_label_gin_idx
    ON test_table USING gin
    (label gin_trgm_ops);

I see the following.

EXPLAIN ANALYZE SELECT * FROM test_table WHERE label ILIKE '%定す%' LIMIT 100;

Limit  (cost=1000.00..16573.18 rows=100 width=52) (actual time=392.153..395.095 rows=10 loops=1)
  ->  Gather  (cost=1000.00..16728.91 rows=101 width=52) (actual time=392.135..394.830 rows=10 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Parallel Seq Scan on test_table  (cost=0.00..15718.81 rows=42 width=52) (actual time=382.922..388.082 rows=3 loops=3)
              Filter: ((label)::text ~~* '%定す%'::text)
              Rows Removed by Filter: 338417
Planning Time: 0.656 ms
Execution Time: 395.233 ms


EXPLAIN ANALYZE SELECT * FROM test_table WHERE label ILIKE '%acl%' LIMIT 100;

Limit  (cost=28.78..400.51 rows=100 width=52) (actual time=0.072..0.406 rows=10 loops=1)
  ->  Bitmap Heap Scan on test_table  (cost=28.78..404.23 rows=101 width=52) (actual time=0.053..0.197 rows=10 loops=1)
        Recheck Cond: ((label)::text ~~* '%acl%'::text)
        Heap Blocks: exact=10
        ->  Bitmap Index Scan on test_table_label_gin_idx  (cost=0.00..28.76 rows=101 width=0) (actual time=0.025..0.034 rows=10 loops=1)
              Index Cond: ((label)::text ~~* '%acl%'::text)
Planning Time: 0.231 ms
Execution Time: 0.542 ms

With GIST

DROP INDEX test_table_label_gin_idx;

CREATE INDEX test_table_label_gist_idx
    ON test_table USING gist
    (label gist_trgm_ops);

I see

EXPLAIN ANALYZE SELECT * FROM test_table WHERE label ILIKE '%定す%' LIMIT 100;

Limit  (cost=13.19..384.92 rows=100 width=52) (actual time=303.772..1557.498 rows=10 loops=1)
  ->  Bitmap Heap Scan on test_table  (cost=13.19..388.64 rows=101 width=52) (actual time=303.752..1557.286 rows=10 loops=1)
        Recheck Cond: ((label)::text ~~* '%定す%'::text)
        Rows Removed by Index Recheck: 1015250
        Heap Blocks: exact=10431
        ->  Bitmap Index Scan on test_table_label_gist_idx  (cost=0.00..13.17 rows=101 width=0) (actual time=301.046..301.053 rows=1015260 loops=1)
              Index Cond: ((label)::text ~~* '%定す%'::text)
Planning Time: 0.215 ms
Execution Time: 1557.643 ms


EXPLAIN ANALYZE SELECT * FROM test_table WHERE label ILIKE '%acl%' LIMIT 100;

Limit  (cost=13.19..384.92 rows=100 width=52) (actual time=257.385..257.751 rows=10 loops=1)
  ->  Bitmap Heap Scan on test_table  (cost=13.19..388.64 rows=101 width=52) (actual time=257.366..257.551 rows=10 loops=1)
        Recheck Cond: ((label)::text ~~* '%acl%'::text)
        Heap Blocks: exact=10
        ->  Bitmap Index Scan on test_table_label_gist_idx  (cost=0.00..13.17 rows=101 width=0) (actual time=257.319..257.328 rows=10 loops=1)
              Index Cond: ((label)::text ~~* '%acl%'::text)
Planning Time: 0.377 ms
Execution Time: 257.948 ms

Just changing the the characters of the operand change the plan quite a lot. **Edit**

SELECT show_trgm('定す');

"{0x145ed8,0x6628fa,0x6cb12d}"

SELECT encode('定す', 'escape')

\345\256\232\343\201\231

This issue seems similar https://dba.stackexchange.com/questions/63193/postgresql-not-using-gin-trigram-index-when-performing-non-ascii-like-query

Pawel Zieminski (125 rep)

Oct 6, 2023, 08:38 PM • Last activity: Oct 7, 2023, 03:06 AM

0 votes

1 answers

1237 views

Queries using GIN index on f_unaccent() seem slow?

postgresql performance postgresql-10 gist-index gin-index postgresql-performance

I have almost 20M rows in one table that contains product, including their names. I wish to search fulltext and fast by the name so I created this index: CREATE INDEX uprice_item_occurrence_unaccent_name_trgm_idx ON price_item_occurrence USING gin (f_unaccent(name) gin_trgm_ops);; I was hopping the...

                                  I have almost 20M rows in one table that contains product, including their names.

I wish to search fulltext and fast by the name so I created this index:

    CREATE INDEX uprice_item_occurrence_unaccent_name_trgm_idx ON price_item_occurrence USING gin (f_unaccent(name) gin_trgm_ops);;

I was hopping the following query would take less then (lets say) 500ms

    select * from price_item_occurrence as oo
    where f_unaccent(oo.name) % f_unaccent('iphone');

But it takes almost 2s:

    postgres=# explain analyze select * from price_item_occurrence as oo where f_unaccent(oo.name) % f_unaccent('iphone');
                                                                                 QUERY PLAN                                                                              
    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
     Bitmap Heap Scan on price_item_occurrence oo  (cost=1956.42..63674.14 rows=16570 width=287) (actual time=247.918..1880.759 rows=94 loops=1)
       Recheck Cond: (f_unaccent((name)::text) % 'iphone'::text)
       Rows Removed by Index Recheck: 87838
       Heap Blocks: exact=76663
       ->  Bitmap Index Scan on uprice_item_occurrence_unaccent_name_trgm_idx  (cost=0.00..1952.28 rows=16570 width=0) (actual time=195.418..195.418 rows=88962 loops=1)
             Index Cond: (f_unaccent((name)::text) % 'iphone'::text)
     Planning time: 0.444 ms
     Execution time: 1880.833 ms


It's possible that the DB is busy, but i am not sure.

I tried to play with select set_limit(0.9); (increased) and it helps a bit but not much.

I am using Postgres 10, can change the Postgres config and I am open for suggestions.

I tried it with ilike and it improved by some:

    postgres=# explain analyze select * from price_item_occurrence as oo where f_unaccent(oo.name) ilike ('%' || f_unaccent('iphone') || '%');
                                                                                 QUERY PLAN                                                                             
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------
     Bitmap Heap Scan on price_item_occurrence oo  (cost=3135.08..416823.45 rows=166075 width=286) (actual time=50.258..670.085 rows=65917 loops=1)
       Recheck Cond: (f_unaccent((name)::text) ~~* '%iphone%'::text)
       Rows Removed by Index Recheck: 10
       Heap Blocks: exact=59750
       ->  Bitmap Index Scan on uprice_item_occurrence_unaccent_name_trgm_idx  (cost=0.00..3093.56 rows=166075 width=0) (actual time=37.385..37.385 rows=67700 loops=1)
             Index Cond: (f_unaccent((name)::text) ~~* '%iphone%'::text)
     Planning time: 0.545 ms
     Execution time: 675.776 ms
    (8 rows)

About 2x faster.

i tried limit 10:

    postgres=# explain analyze select * from price_item_occurrence as oo where f_unaccent(oo.name) % f_unaccent('iphone') limit 10;
                                                                                    QUERY PLAN                                                                                
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     Limit  (cost=373.27..410.51 rows=10 width=287) (actual time=268.718..589.131 rows=10 loops=1)
       ->  Bitmap Heap Scan on price_item_occurrence oo  (cost=373.27..62493.45 rows=16680 width=287) (actual time=268.715..589.123 rows=10 loops=1)
             Recheck Cond: (f_unaccent((name)::text) % 'iphone'::text)
             Rows Removed by Index Recheck: 18917
             Heap Blocks: exact=17100
             ->  Bitmap Index Scan on uprice_item_occurrence_unaccent_name_trgm_idx  (cost=0.00..369.10 rows=16680 width=0) (actual time=165.958..165.958 rows=69268 loops=1)
                   Index Cond: (f_unaccent((name)::text) % 'iphone'::text)
     Planning time: 0.397 ms
     Execution time: 589.187 ms
    (9 rows)

this is also faster, maybe almost good enough
                                

Michal (3 rep)

Apr 9, 2018, 07:19 PM • Last activity: Aug 1, 2023, 05:00 PM

0 votes

1 answers

100 views

Why does my GiST index not trigger a conflict?

postgresql gist-index exclusion-constraint

I have a table with user subscriptions in my Postgres 13 DB. I added a GiST index to disallow overlapping time ranges for the same user: CREATE INDEX user_sub_exclusion_index ON public.user_sub USING gist (user_id, tstzrange(sub_start, sub_end)) But today I found it did not work. This is the table D...

                                  I have a table with user subscriptions in my Postgres 13 DB. I added a GiST index to disallow overlapping time ranges for the same user:

    CREATE INDEX user_sub_exclusion_index ON public.user_sub USING gist (user_id, tstzrange(sub_start, sub_end))

But today I found it did not work.  
This is the table DDL:

~~~pgsql
-- DROP TABLE public.user_sub;
CREATE TABLE public.user_sub (
	id int8 NOT NULL GENERATED ALWAYS AS IDENTITY,
	app_id varchar NOT NULL,
	product_id int4 NOT NULL,
	iap_product_id int8 NOT NULL,
	created_time int8 NOT NULL,
	updated_time int8 NOT NULL,
	user_id int8 NOT NULL,
	sub_start_time int8 NOT NULL DEFAULT 0,
	sub_end_time int8 NOT NULL DEFAULT 0,
	enabled int2 NOT NULL DEFAULT 1,
	order_id varchar NOT NULL,
	sub_start timestamptz NOT NULL,
	sub_end timestamptz NOT NULL,
	CONSTRAINT user_sub_new_pk PRIMARY KEY (id),
	CONSTRAINT user_sub_new_un UNIQUE (order_id)
);

CREATE INDEX user_sub_exclusion_index ON public.user_sub USING gist (user_id, tstzrange(sub_start, sub_end));
CREATE INDEX user_sub_tstzrange_user_id_product_id_excl ON public.user_sub USING gist (tstzrange(sub_start, sub_end, '[]'::text), user_id, product_id);
~~~

Here is a sample of the table data right now: it contains overlapping time ranges for the same user:

~~~pgsql
INSERT INTO public.user_sub (app_id, product_id, iap_product_id, created_time, updated_time, user_id, sub_start_time, sub_end_time, enabled, order_id, sub_start, sub_end)
VALUES
  ('vOghoo10L9',  9,  6, 1680251663942, 1680251663942, 77, 1680501039421, 1680587439421, 1, '627599858277646336', '2023-04-03 13:50:39.421', '2023-04-04 13:50:39.421')
, ('vOghoo10L9',  9,  6, 1680263287925, 1680263287925, 77, 1680587439422, 1680673839422, 1, '627697298195189760', '2023-04-04 13:50:39.422', '2023-04-05 13:50:39.422')
, ('vOghoo10L9',  9,  6, 1680263497256, 1680263497256, 77, 1680673839423, 1680760239423, 1, '627679779371601920', '2023-04-05 13:50:39.423', '2023-04-06 13:50:39.423')
, ('vOghoo10L9',  9,  6, 1680263539890, 1680263539890, 77, 1680760239424, 1680846639424, 1, '627680143827259392', '2023-04-06 13:50:39.424', '2023-04-07 13:50:39.424')
, ('vOghoo10L9',  9,  6, 1680273609032, 1680273609032, 77, 1680846639425, 1680933039425, 1, '627601223242579968', '2023-04-07 13:50:39.425', '2023-04-08 13:50:39.425')
, ('vOghoo10L9',  9,  6, 1680275903068, 1680275903068, 77, 1680933039426, 1681019439426, 1, '627610724383956992', '2023-04-08 13:50:39.426', '2023-04-09 13:50:39.426')
, ('vOghoo10L9',  9,  6, 1680276434561, 1680276434561, 77, 1681019439427, 1681105839427, 1, '627612898614681600', '2023-04-09 13:50:39.427', '2023-04-10 13:50:39.427')
, ('vOghoo10L9',  9,  6, 1680277832740, 1680277832740, 77, 1681105839428, 1681192239428, 1, '627618880539664384', '2023-04-10 13:50:39.428', '2023-04-11 13:50:39.428')
, ('vOghoo10L9',  9,  6, 1680282545888, 1680282545888, 77, 1681192239429, 1681278639429, 1, '627411682153152512', '2023-04-11 13:50:39.429', '2023-04-12 13:50:39.429')
, ('vOghoo10L9',  9,  6, 1680327772145, 1680327772145, 77, 1681278639430, 1681365039430, 1, '627601513341616128', '2023-04-12 13:50:39.430', '2023-04-13 13:50:39.430')
, ('vOghoo10L9',  9,  5, 1680761228691, 1680761228691, 79, 1680761228687, 1680847628687, 1, '629789245600776192', '2023-04-06 14:07:08.687', '2023-04-07 14:07:08.687')
, ('vOghoo10L9',  9,  5, 1680763488691, 1680763488691, 77, 1680763488689, 1680849888689, 1, '629798689575354368', '2023-04-06 14:44:48.689', '2023-04-07 14:44:48.689')
, ('vOghoo10L9',  9,  5, 1680763634694, 1680763634694, 77, 1680849888690, 1688712288690, 1, '629799331018653696', '2023-04-07 14:44:48.690', '2023-07-07 14:44:48.690')
, ('vOghoo10L9',  9,  6, 1680850885924, 1680850885924, 80, 1680850881709, 1680937281709, 1, '630165239553671168', '2023-04-07 15:01:21.709', '2023-04-08 15:01:21.709')
, ('vOghoo10L9',  9,  6, 1681461059268, 1681461059268, 81, 1681461059263, 1681547459263, 1, '632723434021126144', '2023-04-14 16:30:59.263', '2023-04-15 16:30:59.263')
, ('vOghoo10L9',  9,  6, 1681483885008, 1681483885008, 82, 1681483885006, 1681570285006, 1, '632820205569245184', '2023-04-14 22:51:25.006', '2023-04-15 22:51:25.006')
, ('iYDFo0PQQX', 11, 14, 1682762462986, 1682762462986, 86, 1682762462971, 1682848862971, 1, '638182956129267712', '2023-04-29 18:01:02.971', '2023-04-30 18:01:02.971')
, ('iYDFo0PQQX', 11, 14, 1682764832099, 1682764832099, 86, 1682764832090, 1682851232090, 1, '638192942803423232', '2023-04-29 18:40:32.090', '2023-04-30 18:40:32.090')
, ('iYDFo0PQQX', 11, 14, 1682765939081, 1682765939081, 86, 1682765939052, 1682852339052, 1, '638197106476421120', '2023-04-29 18:58:59.052', '2023-04-30 18:58:59.052')
, ('iYDFo0PQQX', 11, 14, 1682766222551, 1682766222551, 86, 1682766193447, 1682852593447, 1, '638198676681232384', '2023-04-29 19:03:13.447', '2023-04-30 19:03:13.447')
;
~~~

Am I missing something? How to make this constraint work? That is, to prevent the same user from subscribing with overlapping time ranges.
                                

Dolphin (939 rep)

Apr 29, 2023, 02:29 PM • Last activity: Apr 29, 2023, 09:23 PM

3 votes

3 answers

1009 views

Is a GIST index on a geometry point useful to speed up a spatial query?

postgresql postgis geometry gist-index

I have multiple tables with point geometries and a GiST index eating many gigabits of space. From what I understand about GiST indexes on polygons or polylines, it will store the bounding box and compute a first calculation on the bounding box instead of the full geometry resulting in faster process...

                                  I have multiple tables with point geometries and a GiST index eating many gigabits of space.

From what I understand about GiST indexes on polygons or polylines, it will store the bounding box and compute a first calculation on the bounding box instead of the full geometry resulting in faster processing.

But what about point geometry? Will it do anything more than ordering the points spatially?

Boodoo (65 rep)

Feb 25, 2021, 01:51 PM • Last activity: Mar 30, 2023, 04:13 PM

4 votes

2 answers

3239 views

ERROR: data type text[] has no default operator class for access method "gist"

postgresql index array gist-index

Whenever I try to try to create a GIST index on `text[]`, I get the above error. CREATE TABLE bar(f) AS VALUES (ARRAY['foo','bar','baz']); CREATE INDEX ON bar USING gist(f); ERROR: data type text[] has no default operator class for access method "gist" HINT: You must specify an operator class for th...

                                  Whenever I try to try to create a GIST index on text[], I get the above error.

    CREATE TABLE bar(f)
    AS VALUES
      (ARRAY['foo','bar','baz']);

    CREATE INDEX ON bar USING gist(f);

    ERROR:  data type text[] has no default operator class for access method "gist"
    HINT:  You must specify an operator class for the index or define a default operator class for the data type.

I'm trying to [create an *Exclusion Constraint*](https://dba.stackexchange.com/a/185177/2639)  on bar, so I need a GIST index.
                                

Evan Carroll (65502 rep)

Sep 6, 2017, 09:37 AM • Last activity: Mar 2, 2023, 07:37 PM

1 votes

0 answers

151 views

Curious results from pg_stat_user_indexes

postgresql index postgis gist-index

**PostgreSQL 12.8, PostGIS 3.1.4** Running `select * from pg_stat_user_indexes`, there are some rows that have 0 for `idx_scan`, and a >0 result for `idx_tup_read`. The indexes that match this condition are all simple SP-Gist indexes, i.e., ``` CREATE INDEX ... USING spgist(geometry) ``` The definit...

**PostgreSQL 12.8, PostGIS 3.1.4** Running select * from pg_stat_user_indexes, there are some rows that have 0 for idx_scan, and a >0 result for idx_tup_read. The indexes that match this condition are all simple SP-Gist indexes, i.e.,

CREATE INDEX ... USING spgist(geometry)

The definition for idx_tup_read is "Number of index entries returned by scans on this index." If idx_scan is 0, I would expect "scans on this index" to be 0 as well. So how can idx_tup_read have a result greater than 0, if idx_scan is 0?

Avocado (245 rep)

Dec 15, 2022, 06:02 PM • Last activity: Dec 16, 2022, 03:43 PM

0 votes

1 answers

762 views

Optimizing geolocation query in Postgres with earth_box and earth_distance

postgresql postgresql-performance spatial gist-index

I'm trying to optimize my geolocation query for a table of addresses of ~862k rows within a search radius using `earth_box`. My first initial query is not too terrible: ``` explain analyze SELECT id FROM location WHERE earth_box(ll_to_earth(40.65130101, -73.83367812), 25000) @> ll_to_earth(latitude,...

I'm trying to optimize my geolocation query for a table of addresses of ~862k rows within a search radius using earth_box. My first initial query is not too terrible:

explain analyze SELECT
    id
FROM
   location
WHERE
    earth_box(ll_to_earth(40.65130101, -73.83367812), 25000) @> ll_to_earth(latitude, longitude)
  ;
                                                                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on location  (cost=46.97..3385.08 rows=863 width=16) (actual time=11.430..49.406 rows=29407 loops=1)
   Recheck Cond: ('(1322317.9173587903, -4672693.781456112, 4130081.708818594),(1372317.8853516562, -4622693.8134632455, 4180081.6768114604)'::cube @> (ll_to_earth(latitude, longitude))::cube)
   Heap Blocks: exact=22479
   ->  Bitmap Index Scan on location_gist_lat_lon_idx  (cost=0.00..46.76 rows=863 width=0) (actual time=7.942..7.943 rows=29407 loops=1)
         Index Cond: ((ll_to_earth(latitude, longitude))::cube  explain analyze SELECT
    id
FROM
   location
WHERE
    earth_box(ll_to_earth(40.65130101, -73.83367812), 25000) @> ll_to_earth(latitude, longitude)
  AND earth_distance(ll_to_earth(40.65130101, -73.83367812),
                     ll_to_earth(latitude, longitude))  (ll_to_earth(latitude, longitude))::cube)
   Filter: (sec_to_gc(cube_distance('(1347317.9013552233, -4647693.797459679, 4155081.692815027)'::cube, (ll_to_earth(latitude, longitude))::cube))   Bitmap Index Scan on location_gist_lat_lon_idx  (cost=0.00..46.76 rows=863 width=0) (actual time=7.358..7.358 rows=29407 loops=1)
         Index Cond: ((ll_to_earth(latitude, longitude))::cube <@ '(1322317.9173587903, -4672693.781456112, 4130081.708818594),(1372317.8853516562, -4622693.8134632455, 4180081.6768114604)'::cube)
 Planning Time: 0.901 ms
 Execution Time: 539.113 ms
(9 rows)

My table schema (excluded some columns):

Table "provider.location"
      Column      |           Type           | Collation | Nullable |      Default
------------------+--------------------------+-----------+----------+-------------------
 id               | uuid                     |           | not null |
 created_at       | timestamp with time zone |           | not null | CURRENT_TIMESTAMP
 updated_at       | timestamp with time zone |           | not null | CURRENT_TIMESTAMP
 can_delete       | boolean                  |           |          |
 name             | text                     |           |          |
 address          | text                     |           |          |
 address_line_1   | text                     |           |          |
 address_line_2   | text                     |           |          |
 city             | text                     |           |          |
 state            | text                     |           |          |
 street           | text                     |           |          |
 zip              | text                     |           |          |
 confidence       | int                      |           |          |
 google_maps_link | text                     |           |          |
 is_coe           | boolean                  |           |          |
 latitude         | double precision         |           |          |
 longitude        | double precision         |           |          |

I also have a GiST index created for the lat/lon:

"location_gist_lat_lon_idx" gist (ll_to_earth(latitude, longitude))

I'm wondering what is it that I'm missing that's making the additional query execution time increase by 10x? My postgres 13.5 instance has the following specs:

CPU: 4vCPU
Memory: 16GB
SDD: 250GB

This question is related to https://dba.stackexchange.com/questions/158349/how-can-i-speed-up-my-query-on-geo-location-processes , however after following the suggested answer it didn't seem to improve my performance.

blin (3 rep)

Jul 28, 2022, 07:04 PM • Last activity: Jul 30, 2022, 12:39 AM

12 votes

1 answers

7331 views

2 B-tree indices OR 1 GiST index on tsrange -- which will perform better?

postgresql index index-tuning postgresql-9.2 gist-index

I have a table which is storing reservation data using the columns `starts_at` & `ends_at` Whenever I'm querying the table to find overlapping reservations, I have an option of using one of the following queries: SELECT * FROM reservations WHERE starts_at ='2014-01-01 00:00:00'; Or SELECT * FROM res...

                                  I have a table which is storing reservation data using the columns starts_at & ends_at Whenever I'm querying the table to find overlapping reservations, I have an option of using one of the following queries:

    SELECT * FROM reservations
    WHERE starts_at ='2014-01-01 00:00:00';

Or

    SELECT * FROM reservations
    WHERE tsrange(starts_at, ends_at) && ('2014-01-01 00:00:00', '2014-01-03 00:00:00')

I have regular B-Tree indices on the starts_at and ends_at columns, therefore the first query is always using them. However, unless I define a functional GiST index on the tsrange, the second query does a complete scan.

    create index tsrange_idx on reservations using gist(tsrange(starts_at, ends_at)); 

My question is, as the table grows, which index is going to be faster? Probably the answer is obvious looking at the query execution plan, but I'm not well-versed with reading EXPLAIN ANALYZE output.

Saurabh Nanda (333 rep)

Feb 18, 2014, 01:32 AM • Last activity: Jun 24, 2022, 01:55 AM

0 votes

3 answers

548 views

slow index creation with gist and gist__int_ops

postgresql index gist-index

I am seeing performance issues with frequently updated / inserts table, and one of my hypothesis is the gin index I have on int array. The gin index was added since we have query of "X exists in array?" in high profile endpoint. I created a one-time replica of the production DB and removed the index...

                                  I am seeing performance issues with frequently updated / inserts table, and one of my hypothesis is the gin index I have on int array.

The gin index was added since we have query of "X exists in array?" in high profile endpoint.

I created a one-time replica of the production DB and removed the index and replaced it with GIST and gist__int_ops to check it's performance (since it should work better on frequently updated /inserts table).

The gist index creation never finishes, I let it run for hours and no results on the other side, gin index creation takes few minutes.

I tried cleaning the data - I had arrays with 50 to 120 items, and I removed them and now all of my data is an array with one element and still, index creation is slow.

DB: PostgreSQL 11.5 on RDS  (db.m4.4xlarge)
Table size: 9373 MB

Any ideas what I can do here?

Yosi (175 rep)

Dec 8, 2019, 03:42 PM • Last activity: Jun 6, 2022, 03:47 PM

0 votes

1 answers

586 views

I have a GIST index and a regular index, one intended for filtering, one intended for sorting, but Postgres WILL NOT use both

postgresql postgresql-performance index-tuning postgis gist-index

I have a table with three columns: two can be used to filter the table, and the last can be used to sort it: ```sql create table thing( id serial primary key, location geometry(Point, 4326) not null, created_at timestamptz not null default (now() at time zone 'utc'), priority float not null ); ``` I...

I have a table with three columns: two can be used to filter the table, and the last can be used to sort it:

create table thing(
  id serial primary key,
  location geometry(Point, 4326) not null,
  created_at timestamptz not null default (now() at time zone 'utc'),
  priority float not null
);

I have been playing with the indexes to support the following query:

select id
from thing
where st_dwithin(location, 'SRID=4326;Point($0 $1)'::geometry, $2)
  and created_at >= now() - interval '$3 days'
order by priority
offset $4
limit $5;

Ideally this query could combine indexes on location, created_at, and priority to get optimal performance. Typically I'd imagine an index using btree (priority, location, created_at) would support such a query quite well, utilizing the index during the filter and delivering results pre-sorted. However, the st_dwithin part of the filter goes unassisted by the filter since that is not a gist index. If I make an index using gist (priority, location, created_at) then the query plan utilizes the index during filtering but does not deliver results pre-sorted and sorts the results unassisted by the index. At this point I thought we'd have to split up the index, but using gist(location, created_at) and using btree(priority) if $5 20 the query plan uses the gist index. Does anyone have any idea why this might be, and what I could do to optimize the performance of both the filtering and the sorting?

Cooper Wolfe (3 rep)

Apr 11, 2022, 10:24 PM • Last activity: Apr 11, 2022, 11:28 PM

7 votes

2 answers

8471 views

How can I speed-up my query on geo-location processes

postgresql index spatial gist-index

I have a table that contains 10,301,390 GPS records, cities, countries and IP address blocks. I have user's current location with latitude and longitude. I created this query: SELECT *, point(45.1013021, 46.3021011) point(latitude, longitude) :: point AS distance FROM locs WHERE ( point(45.1013021,...

                                  I have a table that contains 10,301,390 GPS records, cities, countries and IP address blocks. I have user's current location with latitude and longitude. I created this query:

    SELECT
      *, point(45.1013021, 46.3021011)  point(latitude, longitude) :: point AS distance
    FROM
      locs
    WHERE
      (
        point(45.1013021, 46.3021011)  point(latitude, longitude)
      ) < 10 -- radius
    ORDER BY
      distance LIMIT 1;

This query successfully gave me what I want, but it is slow. It took 2 to 3 seconds to get one record by given latitude and longitude.

I tried a B-Tree index on the latitude and longitude columns, also tried GIST( point(latitude, longitude)); but still querying is slow.

How can I speed up this query?

# Update:

It seems slowness is caused by the ORDER BY but I want to get the shortest distance, so the question remains.
                                

xangr (457 rep)

Dec 16, 2016, 10:40 AM • Last activity: Apr 1, 2022, 04:58 PM

15 votes

1 answers

9272 views

Order by distance

postgresql postgresql-9.2 postgis gist-index

If I have a query returning nearby cafes: SELECT * FROM cafes c WHERE ( ST_DWithin( ST_GeographyFromText( 'SRID=4326;POINT(' || c.longitude || ' ' || c.latitude || ')' ), ST_GeographyFromText('SRID=4326;POINT(-76.000000 39.000000)'), 2000 ) ) How do I select distance, and order by distance as well?...

                                  If I have a query returning nearby cafes:

    SELECT * FROM cafes c WHERE (
       ST_DWithin(
       ST_GeographyFromText(
         'SRID=4326;POINT(' || c.longitude || ' ' || c.latitude || ')'
       ),
       ST_GeographyFromText('SRID=4326;POINT(-76.000000 39.000000)'),
         2000
       )
    )

How do I select distance, and order by distance as well?  
Is there a more efficient way than this one:

     SELECT id, 
     ST_Distance(ST_GeographyFromText('SRID=4326;POINT(-76.000000 39.000000)'),
                 ST_GeographyFromText(
                 'SRID=4326;POINT(' || c.longitude || ' ' || c.latitude || ')')      
                 ) as distance 
     FROM cafes c
       WHERE (
       ST_DWithin(
         ST_GeographyFromText(
         'SRID=4326;POINT(' || c.longitude || ' ' || c.latitude || ')'
       ),
        ST_GeographyFromText('SRID=4326;POINT(-76.000000 39.000000)'),
       2000
     )
     ) order by distance
                                

Gandalf StormCrow (615 rep)

Mar 11, 2014, 02:12 AM • Last activity: Nov 2, 2021, 11:44 PM

62 votes

2 answers

42798 views

PostgreSQL EXCLUDE USING error: Data type integer has no default operator class

postgresql constraint installation exclusion-constraint gist-index

In PostgreSQL 9.2.3 I am trying to create this simplified table: CREATE TABLE test ( user_id INTEGER, startend TSTZRANGE, EXCLUDE USING gist (user_id WITH =, startend WITH &&) ); But I get this error: > ERROR: data type integer has no default operator class for access method "gist" > HINT: You must...

                                  In PostgreSQL 9.2.3 I am trying to create this simplified table:

    CREATE TABLE test (
        user_id INTEGER,
        startend TSTZRANGE,
        EXCLUDE USING gist (user_id WITH =, startend WITH &&)
    );

But I get this error:

>     ERROR:  data type integer has no default operator class for access method "gist"
>     HINT:  You must specify an operator class for the index or define
>            a default operator class for the data type.

The PostgreSQL docs use this example  which does not work for me:

    CREATE TABLE room_reservation (
        room text,
        during tsrange,
        EXCLUDE USING gist (room WITH =, during WITH &&)
    );

Same error message.

And this one , which does not work for me either:

    CREATE TABLE zoo (
        cage   INTEGER,
        animal TEXT,
        EXCLUDE USING gist (cage WITH =, animal WITH )
    );

Same error message.

I am able to create this without any problem:

    CREATE TABLE test (
        user_id INTEGER,
        startend TSTZRANGE,
        EXCLUDE USING gist (startend WITH &&)
    );

and this:

    CREATE TABLE test (
        user_id INTEGER,
        startend TSTZRANGE,
        EXCLUDE USING btree (user_id WITH =)
    );

I've spent quite a bit of time searching for hints about figuring out how to make this work, or figuring out why it won't work. Any ideas?

                                

Ian Timothy (905 rep)

Mar 22, 2013, 03:56 PM • Last activity: Sep 16, 2021, 11:37 PM

1 votes

0 answers

355 views

Why does a GIST index on a cube column in PostgreSQL actually make K-Nearest Neighbor (KNN) ORDER BY queries worse?

postgresql query-performance cube gist-index nearest-neighbor

Adding a [GIST][1] index actually seems to make K-Nearest Neighbor (KNN) `ORDER BY` queries on `cube` columns *worse* in PostgreSQL. Why would that be, and what can be done about it? Here's what I mean. In a PostgreSQL database I have a table whose DDL is `create sample (id serial primary key, title...

Adding a GIST index actually seems to make K-Nearest Neighbor (KNN) ORDER BY queries on cube columns *worse* in PostgreSQL. Why would that be, and what can be done about it? Here's what I mean. In a PostgreSQL database I have a table whose DDL is create sample (id serial primary key, title text, embedding cube) where the embedding column is an embedding vector of the title obtained with a Google language model. The cube data type is provided by the cube extension, which I have installed. Incidentally, these are titles of Wikipedia articles. In any case, there are 1 million records. I then perform a KNN query with the following query. This query defines distance using the Euclidean distance operator `, though results are similar for the other two metrics. It does an ORDER BY and applies a LIMIT` in order to find 10 Wikipedia articles with "similar" titles (the most similar being the target title itself). That all works fine.

select sample.title, sample.embedding  cube('(0.18936706, -0.12455666, -0.31581765, 0.0192692, -0.07364611, 0.07851536, 0.0290586, -0.02582532, -0.03378124, -0.10564457, -0.03903799, 0.08668878, -0.15357816, -0.17793414, -0.01826405, 0.01969068, 0.11386908, 0.1555583, 0.09368557, 0.13697313, -0.05610929, -0.06536788, -0.12212707, 0.26356605, -0.06004387, -0.01966437, -0.1250324, -0.16645767, -0.13525756, 0.22482251, -0.1709727, 0.28966117, -0.07927769, -0.02498624, -0.10018375, -0.10923951, 0.04770213, 0.11573371, 0.04619929, 0.05216618, 0.19176421, 0.12948817, 0.08719034, -0.16109011, -0.02411379, -0.05638905, -0.37334979, 0.31225419, 0.0744801, 0.27044332)') distance from sample order by distance limit 10;

What's puzzling to me, however, is that, if I put a GIST index on the embedding column, the query performance actually is *worse*. Adding the index, the query plan changes as expected, in the way expected, insofar as it uses the index. But...it gets slower! This seems to run contrary to the documentation for cube which states: > In addition, a cube GiST index can be used to find nearest neighbors using the metric operators , , and in ORDER BY clauses They even provide an example query, which is very similar to mine.

SELECT c FROM test ORDER BY c  cube(array[0.5,0.5,0.5]) LIMIT 1

Here's the query plan and timing info *before* dropping the index.

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.41..6.30 rows=10 width=29)
   ->  Index Scan using sample_embedding_idx on sample  (cost=0.41..589360.33 rows=999996 width=29)
         Order By: (embedding  '(0.18936706, -0.12455666, -0.31581765, 0.0192692, -0.07364611, 0.07851536, 0.0290586, -0.02582532, -0.03378124, -0.10564457, -0.03903799, 0.08668878, -0.15357816, -0.17793414, -0.01826405, 0.01969068, 0.11386908, 0.1555583, 0.09368557, 0.13697313, -0.05610929, -0.06536788, -0.12212707, 0.26356605, -0.06004387, -0.01966437, -0.1250324, -0.16645767, -0.13525756, 0.22482251, -0.1709727, 0.28966117, -0.07927769, -0.02498624, -0.10018375, -0.10923951, 0.04770213, 0.11573371, 0.04619929, 0.05216618, 0.19176421, 0.12948817, 0.08719034, -0.16109011, -0.02411379, -0.05638905, -0.37334979, 0.31225419, 0.0744801, 0.27044332)'::cube)
(3 rows)

        title         |      distance      
----------------------+--------------------
 david petrarca       | 0.5866321762629475
 david adamski        | 0.5866321762629475
 richard ansdell      | 0.6239883862603475
 linda darke          | 0.6392124797481789
 ilias tsiliggiris    | 0.6996660649119893
 watson, jim          | 0.7059481479504834
 sk radni%c4%8dki     |   0.71718948226995
 burnham, pa          | 0.7384858030758069
 arthur (europa-park) | 0.7468462897336924
 ivan kecojevic       | 0.7488206082281348
(10 rows)

Time: 1226.457 ms (00:01.226)

And, here's the query plan and timing info *after* dropping the index.

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=74036.32..74037.48 rows=10 width=29)
   ->  Gather Merge  (cost=74036.32..171264.94 rows=833330 width=29)
         Workers Planned: 2
         ->  Sort  (cost=73036.29..74077.96 rows=416665 width=29)
               Sort Key: ((embedding  '(0.18936706, -0.12455666, -0.31581765, 0.0192692, -0.07364611, 0.07851536, 0.0290586, -0.02582532, -0.03378124, -0.10564457, -0.03903799, 0.08668878, -0.15357816, -0.17793414, -0.01826405, 0.01969068, 0.11386908, 0.1555583, 0.09368557, 0.13697313, -0.05610929, -0.06536788, -0.12212707, 0.26356605, -0.06004387, -0.01966437, -0.1250324, -0.16645767, -0.13525756, 0.22482251, -0.1709727, 0.28966117, -0.07927769, -0.02498624, -0.10018375, -0.10923951, 0.04770213, 0.11573371, 0.04619929, 0.05216618, 0.19176421, 0.12948817, 0.08719034, -0.16109011, -0.02411379, -0.05638905, -0.37334979, 0.31225419, 0.0744801, 0.27044332)'::cube))
               ->  Parallel Seq Scan on sample  (cost=0.00..64032.31 rows=416665 width=29)
(6 rows)

        title         |      distance      
----------------------+--------------------
 david petrarca       | 0.5866321762629475
 david adamski        | 0.5866321762629475
 richard ansdell      | 0.6239883862603475
 linda darke          | 0.6392124797481789
 ilias tsiliggiris    | 0.6996660649119893
 watson, jim          | 0.7059481479504834
 sk radni%c4%8dki     |   0.71718948226995
 burnham, pa          | 0.7384858030758069
 arthur (europa-park) | 0.7468462897336924
 ivan kecojevic       | 0.7488206082281348
(10 rows)

Time: 381.419 ms

Notice: - With Index: 1226.457 ms - Without Index: 381.419 ms This very puzzling behavior! All of it is documented in a GitHub repo so that others can try it. I'll add documentation about how to generate the embedding vectors, but that *shouldn't* be needed, as in the Quick-Start I show that pre-computed embedding vectors can be downloaded from my Google Drive folder. **Addendum** It was asked in the comments below to provide the output of explain (analyze, buffers). Here that is, where 1. I re-create the (covering) index 2. I run the query with explain (analyze, buffers) 3. I drop the index 4. I run the query with explain (analyze, buffers) again

pgbench=# create index on sample using gist (embedding) include (title);
CREATE INDEX
Time: 51966.315 ms (00:51.966)
pgbench=# 
                                                                                                                                                                                                                                                                                                                                       QUERY PLAN                                                                                                                                                                                                                                                                                                                                        
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.41..4.15 rows=10 width=29) (actual time=3215.956..3216.667 rows=10 loops=1)
   Buffers: shared hit=1439 read=87004 written=7789
   ->  Index Only Scan using sample_embedding_title_idx on sample  (cost=0.41..373768.39 rows=999999 width=29) (actual time=3215.932..3216.441 rows=10 loops=1)
         Order By: (embedding  '(0.18936706, -0.12455666, -0.31581765, 0.0192692, -0.07364611, 0.07851536, 0.0290586, -0.02582532, -0.03378124, -0.10564457, -0.03903799, 0.08668878, -0.15357816, -0.17793414, -0.01826405, 0.01969068, 0.11386908, 0.1555583, 0.09368557, 0.13697313, -0.05610929, -0.06536788, -0.12212707, 0.26356605, -0.06004387, -0.01966437, -0.1250324, -0.16645767, -0.13525756, 0.22482251, -0.1709727, 0.28966117, -0.07927769, -0.02498624, -0.10018375, -0.10923951, 0.04770213, 0.11573371, 0.04619929, 0.05216618, 0.19176421, 0.12948817, 0.08719034, -0.16109011, -0.02411379, -0.05638905, -0.37334979, 0.31225419, 0.0744801, 0.27044332)'::cube)
         Heap Fetches: 0
         Buffers: shared hit=1439 read=87004 written=7789
 Planning:
   Buffers: shared hit=14 read=6 dirtied=2
 Planning Time: 0.432 ms
 Execution Time: 3316.266 ms
(10 rows)

Time: 3318.333 ms (00:03.318)
pgbench=# drop index sample_embedding_title_idx;
DROP INDEX
Time: 182.324 ms
pgbench=# 
                                                                                                                                                                                                                                                                                                                                           QUERY PLAN                                                                                                                                                                                                                                                                                                                                            
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=74036.35..74037.52 rows=10 width=29) (actual time=6052.845..6057.210 rows=10 loops=1)
   Buffers: shared hit=70 read=58830
   ->  Gather Merge  (cost=74036.35..171265.21 rows=833332 width=29) (actual time=6052.825..6057.021 rows=10 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=70 read=58830
         ->  Sort  (cost=73036.33..74077.99 rows=416666 width=29) (actual time=6002.928..6003.019 rows=8 loops=3)
               Sort Key: ((embedding  '(0.18936706, -0.12455666, -0.31581765, 0.0192692, -0.07364611, 0.07851536, 0.0290586, -0.02582532, -0.03378124, -0.10564457, -0.03903799, 0.08668878, -0.15357816, -0.17793414, -0.01826405, 0.01969068, 0.11386908, 0.1555583, 0.09368557, 0.13697313, -0.05610929, -0.06536788, -0.12212707, 0.26356605, -0.06004387, -0.01966437, -0.1250324, -0.16645767, -0.13525756, 0.22482251, -0.1709727, 0.28966117, -0.07927769, -0.02498624, -0.10018375, -0.10923951, 0.04770213, 0.11573371, 0.04619929, 0.05216618, 0.19176421, 0.12948817, 0.08719034, -0.16109011, -0.02411379, -0.05638905, -0.37334979, 0.31225419, 0.0744801, 0.27044332)'::cube))
               Sort Method: top-N heapsort  Memory: 26kB
               Buffers: shared hit=70 read=58830
               Worker 0:  Sort Method: top-N heapsort  Memory: 26kB
               Worker 1:  Sort Method: top-N heapsort  Memory: 26kB
               ->  Parallel Seq Scan on sample  (cost=0.00..64032.33 rows=416666 width=29) (actual time=0.024..3090.103 rows=333333 loops=3)
                     Buffers: shared read=58824
 Planning:
   Buffers: shared hit=3 read=3 dirtied=1
 Planning Time: 0.129 ms
 Execution Time: 6057.388 ms
(18 rows)

Time: 6053.284 ms (00:06.053)

David Ventimiglia (111 rep)

Apr 30, 2021, 10:57 PM

3 votes

0 answers

432 views

Ordering of columns in a GIST index

postgresql postgresql-performance gist-index exclusion-constraint

I am modelling an application where events for factories can be registered. There are usually a small amount of factories (around 500), but each factory can register a large set of events (millions). It's impossible that two events overlap for the same factory. My idea was to introduce an exclusion...

CREATE TABLE event
(
   identifier SERIAL NOT NULL PRIMARY KEY
   factory_identifier INTEGER NOT NULL REFERENCES factory,
   period TSTZRANGE NOT NULL,

   EXCLUDE USING gist
   (
      factory_identifier WITH =,
      period             WITH &&
   )
);

In the documentation I read: > A GiST index will be relatively ineffective if its first column has only a few distinct values, even if there are many distinct values in additional columns. I am not sure how to interpret this. Given that there are only a small amount of factories but a large number of events, should I define the exclusions constraint as following instead?

EXCLUDE USING gist
(
   period             WITH &&
   factory_identifier WITH =,
)

Now, the period is the first column, for which there are a lot of distinct values (almost all values are unique, actually), in contrast to factory_identifier, for which there only a few distinct values. Is this better now? In short, I have difficulties to grasp the quoted documentation above. ---- additional ---- In the slides on a presentation of exclusion constraints on page 16 and a blog post , both coming from the original author of exclusion constraints, I found the following example:

EXCLUDE USING GIST(room WITH =, during WITH &&)

Given the (reasonable) assumption that there much more distinct reservation periods than distinct rooms, I wonder if this should have been (during WITH &&, room WITH =) instead, given the quote regarding GIST column ordering above. This makes me believe that I am not really understanding the quoted documented above.

ItIsJustMe (33 rep)

Aug 22, 2020, 08:00 AM • Last activity: Aug 22, 2020, 08:27 AM

6 votes

1 answers

2143 views

Why is a GiST index used for filtering on non-leading column?

postgresql index-tuning postgresql-12 gist-index

I always learned and understood that an index can only be used when we have predicates for the leading (or all) columns. Now, to my surprise, I noticed that a [GiST index][1] is used in the following query. Why is that? Is this a special feature of GiST indexes? ```sql CREATE TABLE t1 ( i INT, j INT...

I always learned and understood that an index can only be used when we have predicates for the leading (or all) columns. Now, to my surprise, I noticed that a GiST index is used in the following query. Why is that? Is this a special feature of GiST indexes?

CREATE TABLE t1 (
    i INT, 
    j INT, 
    k INT
);

INSERT INTO t1 
SELECT i, j, k 
FROM   GENERATE_SERIES(1, 100) AS i, 
       GENERATE_SERIES(1, 100) AS j, 
       GENERATE_SERIES(1, 100) AS k;

CREATE INDEX ON t1 USING GiST(i, j, k);

EXPLAIN SELECT * FROM t1 WHERE k = 54;

QUERY PLAN
Bitmap Heap Scan on t1  (cost=199.03..5780.51 rows=5000 width=12)
  Recheck Cond: (k = 54)
  ->  Bitmap Index Scan on t1_i_j_k_idx  (cost=0.00..197.78 rows=5000 width=0)
        Index Cond: (k = 54)

*dbfiddle [here](https://dbfiddle.uk/?rdbms=postgres_12&fiddle=e40988cd9745f7cd2949e844ceb294f0)*

Kejlo (63 rep)

Jul 12, 2020, 09:05 PM • Last activity: Jul 13, 2020, 11:51 AM

5 votes

1 answers

6005 views

Best index for jsonb in Postgres

postgresql index json btree gist-index

We have a table containing around 500k rows. The database table is supposed to grow to million of records. This is how the table looks like: CREATE TABLE public.influencers ( id integer NOT NULL DEFAULT nextval('influencers_id_seq'::regclass), location jsonb, gender text COLLATE pg_catalog."default"...

                                  We have a table containing around 500k rows. The database table is supposed to grow to million of records.

This is how the table looks like:

    CREATE TABLE public.influencers
    (
        id integer NOT NULL DEFAULT nextval('influencers_id_seq'::regclass),
        location jsonb,
        gender text COLLATE pg_catalog."default",
        birthdate timestamp without time zone,
        ig jsonb,
        contact_info jsonb,
        created_at timestamp without time zone DEFAULT now(),
        updated_at timestamp without time zone DEFAULT now(),
        categories text[] COLLATE pg_catalog."default",
        search_field text COLLATE pg_catalog."default",
        search_vector tsvector,
        ig_updated_at timestamp without time zone,
        CONSTRAINT influencers_pkey PRIMARY KEY (id),
        CONSTRAINT ig_id_must_exist CHECK (ig ? 'id'::text),
        CONSTRAINT ig_username_must_exist CHECK (ig ? 'username'::text)
    )

And these are some of the queries we need to perform efficiently:

    
    SELECT  "public"."influencers".*
    FROM "public"."influencers"
    WHERE (ig->'follower_count' IS NOT NULL)
    ORDER BY (ig->'follower_count') DESC
    LIMIT 9
    OFFSET 0
    
    SELECT *
    FROM "public"."influencers"
    WHERE (ig->'follower_count' >= '5000')
    LIMIT 9 
    
    SELECT SUM(CAST(ig ->> 'follower_count' AS integer))
    FROM "public"."influencers"
    WHERE (ig->'follower_count' >= '5000')
      AND (ig->'follower_count' 'follower_count' IS NOT NULL)

ig -> follower_count are numeric values.
 
I read that GIN indexes are mainly intended for searching through composite items (text) so I'm guessing the best index to use would be a BTREE. Would that be correct?
                                

borjagvo (205 rep)

May 17, 2017, 10:01 AM • Last activity: Oct 1, 2019, 04:08 PM

1 votes

1 answers

5252 views

How to increase index row maximum size in Postgresql?

postgresql index gist-index

I want to perform KNN on 512-dimensional vector using the solution suggested [here][1]. 1. I couldn't create table with 512-d cube array. Solved by changing the `CUBE_MAX_DIM` in the source code. 2. Couldn't create index on table with dimensions > 200. Set `block_size=32`. Now works with `dimensiona...

                                  I want to perform KNN on 512-dimensional vector using the solution suggested here .

 1. I couldn't create table with 512-d cube array. Solved by changing the CUBE_MAX_DIM in the source code.
 2. Couldn't create index on table with dimensions > 200. Set block_size=32. Now works with dimensionality<=510.
 3. Got new error after changing block_size and trying to create index on table with 512-d vectors ERROR:  index row requires 8208 bytes, maximum size is 8191

Is it possible to increase this limit?
                                

Alibi Yeslambek (15 rep)

Aug 6, 2019, 07:15 AM • Last activity: Aug 6, 2019, 07:35 AM

Showing page 1 of 20 total questions