Database Administrators
Q&A for database professionals who wish to improve their database skills
Latest Questions
0
votes
1
answers
182
views
Composite multicolumn index for geopoint range and numeric range query
I am building an app where the server needs to select rows based on some criteria/filters. One of them is the location of the user and the radius at which the user want's to see posts and other filters such date range and filter for a value of another column. This is going to be for an ad-hoc event...
I am building an app where the server needs to select rows based on some criteria/filters. One of them is the location of the user and the radius at which the user want's to see posts and other filters such date range and filter for a value of another column. This is going to be for an ad-hoc event discovery app.
I have read about PostGIS, its
geometry
,geography
types and I know there is a native point
datatype. Based on this answer I understood that it is better to order from equality to range columns, even though I feel like geo point column should be the first.
Suppose the following few rows of a simplified events table (disregard the validity position data):
id event_title event_position event_type is_public start_date
(varchar) (point lat/lon) (smallint) (boolean) (timestamptz)
-- --------------------------- --------------- --------- --------- ----
1 "John's Party" (122,35) 0 0 2020-07-05
2 "Revolution then Starbucks" (123,30) 1 1 2020-07-06
3 "Study for math exam" (120,36) 2 1 2020-07-07
4 "Party after exam" (120,36) 1 1 2020-07-08
5 "Hiking next to the city" (95,40) 3 1 2020-07-09
6 "Football match" (-42,31) 4 1 2020-07-10
Imagine the table contains several thousand records at least, obviously not only 6.
So in this table a user would be able to query public events close to (122,34) by 100km (suppose first three rows fall into this area) and of event types 0, 1 or 2 falling between dates 2020-07-05 and 2020-07-07. The user would get the rows with ID 2 and 3.
This is the query I want to optimize with an appropriate index. My question is, how is it possible to create such an index? I thought about GiST or GIN index but not sure how these could help. Thanks!
Najib
(231 rep)
Jul 4, 2020, 02:59 PM
• Last activity: Jul 5, 2025, 12:18 AM
2
votes
0
answers
44
views
How can I speed up GIST index query for most similar n items?
I have the following tables and indices: ```sql CREATE TABLE IF NOT EXISTS users ( id NUMERIC(20, 0) NOT NULL DEFAULT NEXTVAL('users_sequence') PRIMARY KEY, first_name VARCHAR(512) DEFAULT NULL NULL, last_name VARCHAR(512) DEFAULT NULL NULL, full_name VARCHAR(1024) GENERATED ALWAYS AS (CASE WHEN fir...
I have the following tables and indices:
CREATE TABLE IF NOT EXISTS users
(
id NUMERIC(20, 0) NOT NULL DEFAULT NEXTVAL('users_sequence') PRIMARY KEY,
first_name VARCHAR(512) DEFAULT NULL NULL,
last_name VARCHAR(512) DEFAULT NULL NULL,
full_name VARCHAR(1024) GENERATED ALWAYS AS
(CASE
WHEN first_name IS NULL AND
last_name IS NULL THEN NULL
ELSE
(TRIM(COALESCE(first_name, '') || ' ' || COALESCE(last_name, ''))) END) STORED,
deleted_at TIMESTAMP DEFAULT NULL NULL,
-- Some ~20 columns
);
CREATE TABLE IF NOT EXISTS user_aliases
(
id NUMERIC(20, 0) NOT NULL DEFAULT NEXTVAL('user_aliases_sequence') PRIMARY KEY,
entry_id NUMERIC(20, 0) NOT NULL,
first_name VARCHAR(512) DEFAULT NULL NULL,
last_name VARCHAR(512) DEFAULT NULL NULL,
full_name VARCHAR(1024) GENERATED ALWAYS AS
(CASE
WHEN first_name IS NULL AND
last_name IS NULL THEN NULL
ELSE
(TRIM(COALESCE(first_name, '') || ' ' || COALESCE(last_name, ''))) END) STORED,
deleted_at TIMESTAMP DEFAULT NULL NULL,
CONSTRAINT fk_user_aliases_entry_id FOREIGN KEY (entry_id) REFERENCES users (id) ON UPDATE CASCADE ON DELETE CASCADE
);
CREATE INDEX users_full_name_idx ON users USING GIST (full_name gist_trgm_ops) WHERE deleted_at IS NULL AND full_name IS NOT NULL;
CREATE INDEX user_aliases_full_name_idx ON user_aliases USING GIST (full_name gist_trgm_ops) WHERE deleted_at IS NULL AND full_name IS NOT NULL;
and my query:
SELECT id
FROM ((SELECT e.id,
e.full_name
FROM users e
WHERE e.full_name % 'some name here'
AND e.deleted_at IS NULL
LIMIT 10)
UNION
(SELECT a.entry_id AS id,
a.full_name
FROM user_aliases a
WHERE a.full_name % 'some name here'
AND a.deleted_at IS NULL
LIMIT 10)) filter_table
LIMIT 10
and this takes around ~300ms on average on a table with ~500K records, with the following execution plan:
Limit (cost=88.71..88.91 rows=10 width=536) (actual time=322.001..322.009 rows=2 loops=1)
" Output: e.id, e.full_name"
Buffers: shared hit=44390
-> HashAggregate (cost=88.71..88.91 rows=20 width=536) (actual time=321.998..322.004 rows=2 loops=1)
" Output: e.id, e.full_name"
" Group Key: e.id, e.full_name"
Batches: 1 Memory Usage: 24kB
Buffers: shared hit=44390
-> Append (cost=0.41..88.61 rows=20 width=536) (actual time=158.361..321.966 rows=2 loops=1)
Buffers: shared hit=44390
-> Limit (cost=0.41..43.92 rows=10 width=27) (actual time=158.358..206.304 rows=2 loops=1)
" Output: e.id, e.full_name"
Buffers: shared hit=30910
-> Index Scan using users_full_name_idx on master.users e (cost=0.41..209.25 rows=48 width=27) (actual time=158.355..206.290 rows=2 loops=1)
" Output: e.id, e.full_name"
Index Cond: ((e.full_name)::text % 'some name here'::text)
Buffers: shared hit=30910
-> Limit (cost=0.41..44.39 rows=10 width=37) (actual time=115.641..115.643 rows=0 loops=1)
" Output: a.entry_id, a.full_name"
Buffers: shared hit=13480
-> Index Scan using user_aliases_full_name_idx on master.user_aliases a (cost=0.41..92.77 rows=21 width=37) (actual time=115.633..115.633 rows=0 loops=1)
" Output: a.entry_id, a.full_name"
Index Cond: ((a.full_name)::text % 'some name here'::text)
Buffers: shared hit=13480
"Settings: effective_cache_size = '16GB', search_path = 'master', work_mem = '32MB'"
Planning Time: 1.210 ms
Execution Time: 322.081 ms
Now it is not the ideal query, since it LIMIT
s arbitrary records. When I update the query to:
SELECT id
FROM ((SELECT e.id,
e.full_name
FROM users e
WHERE e.full_name % 'some name here'
AND e.deleted_at IS NULL
ORDER BY e.full_name 'some name here'
LIMIT 10)
UNION
(SELECT a.entry_id AS id,
a.full_name
FROM user_aliases a
WHERE a.full_name % 'some name here'
AND a.deleted_at IS NULL
ORDER BY a.full_name 'some name here'
LIMIT 10)) filter_table
LIMIT 10
it now performs as it should, but the average execution time rises to ~700ms with the following plan:
Limit (cost=88.81..89.01 rows=10 width=536) (actual time=324.831..324.839 rows=2 loops=1)
" Output: ""*SELECT* 1"".id, ""*SELECT* 1"".full_name"
Buffers: shared hit=44390
-> HashAggregate (cost=88.81..89.01 rows=20 width=536) (actual time=324.827..324.834 rows=2 loops=1)
" Output: ""*SELECT* 1"".id, ""*SELECT* 1"".full_name"
" Group Key: ""*SELECT* 1"".id, ""*SELECT* 1"".full_name"
Batches: 1 Memory Usage: 24kB
Buffers: shared hit=44390
-> Append (cost=0.41..88.71 rows=20 width=536) (actual time=232.209..324.809 rows=2 loops=1)
Buffers: shared hit=44390
" -> Subquery Scan on ""*SELECT* 1"" (cost=0.41..44.07 rows=10 width=27) (actual time=232.207..232.219 rows=2 loops=1)"
" Output: ""*SELECT* 1"".id, ""*SELECT* 1"".full_name"
Buffers: shared hit=30910
-> Limit (cost=0.41..43.97 rows=10 width=31) (actual time=232.204..232.214 rows=2 loops=1)
" Output: e.id, e.full_name, (((e.full_name)::text 'some name here'::text))"
Buffers: shared hit=30910
-> Index Scan using users_full_name_idx on master.users e (cost=0.41..209.49 rows=48 width=31) (actual time=232.201..232.209 rows=2 loops=1)
" Output: e.id, e.full_name, ((e.full_name)::text 'some name here'::text)"
Index Cond: ((e.full_name)::text % 'some name here'::text)
Order By: ((e.full_name)::text 'some name here'::text)
Buffers: shared hit=30910
" -> Subquery Scan on ""*SELECT* 2"" (cost=0.41..44.54 rows=10 width=37) (actual time=92.579..92.581 rows=0 loops=1)"
" Output: ""*SELECT* 2"".id, ""*SELECT* 2"".full_name"
Buffers: shared hit=13480
-> Limit (cost=0.41..44.44 rows=10 width=41) (actual time=92.577..92.577 rows=0 loops=1)
" Output: a.entry_id, a.full_name, (((a.full_name)::text 'some name here'::text))"
Buffers: shared hit=13480
-> Index Scan using user_aliases_full_name_idx on master.user_aliases a (cost=0.41..92.88 rows=21 width=41) (actual time=92.572..92.573 rows=0 loops=1)
" Output: a.entry_id, a.full_name, ((a.full_name)::text 'some name here'::text)"
Index Cond: ((a.full_name)::text % 'some name here'::text)
Order By: ((a.full_name)::text 'some name here'::text)
Buffers: shared hit=13480
"Settings: effective_cache_size = '16GB', search_path = 'master', work_mem = '32MB'"
Planning Time: 1.115 ms
Execution Time: 324.906 ms
How do I get the most similar records, without generating the extra execution time overhead?
Hasan Can Saral
(175 rep)
Jun 11, 2025, 10:41 AM
1
votes
1
answers
263
views
Postgresql strange behavior with non-ascii characters with triagram index present
I see some strange behavior when using either `gin_trgm_ops` or `gist_trgm_ops` indexes. There seems to be quite a difference in plans when using say `ILIKE` or `~` and search for ascii phrases vs multi-byte char phrases. As if there is higher cost when the operand is a non-ascii operand. Is what I...
I see some strange behavior when using either
gin_trgm_ops
or gist_trgm_ops
indexes. There seems to be quite a difference in plans when using say ILIKE
or ~
and search for ascii phrases vs multi-byte char phrases. As if there is higher cost when the operand is a non-ascii operand.
Is what I am seeing expected? What is the reason for it?
I tried it on Postgreql 12 and 13 latest of each.
Here is a scenario:
CREATE DATABASE postgres
WITH
OWNER = postgres
ENCODING = 'UTF8'
LC_COLLATE = 'en_US.utf8'
LC_CTYPE = 'en_US.utf8'
TABLESPACE = pg_default
CONNECTION LIMIT = -1
IS_TEMPLATE = False;
-- snip
CREATE TABLE test_table (
id uuid PRIMARY KEY,
label varchar
);
-- insert 1m rows
VACUUM ANALYZE test_table;
In the data set I have 10 labels containing 'acl'
and 10 containing '定す'
.
When using GIN index
CREATE INDEX test_table_label_gin_idx
ON test_table USING gin
(label gin_trgm_ops);
I see the following.
EXPLAIN ANALYZE SELECT * FROM test_table WHERE label ILIKE '%定す%' LIMIT 100;
Limit (cost=1000.00..16573.18 rows=100 width=52) (actual time=392.153..395.095 rows=10 loops=1)
-> Gather (cost=1000.00..16728.91 rows=101 width=52) (actual time=392.135..394.830 rows=10 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on test_table (cost=0.00..15718.81 rows=42 width=52) (actual time=382.922..388.082 rows=3 loops=3)
Filter: ((label)::text ~~* '%定す%'::text)
Rows Removed by Filter: 338417
Planning Time: 0.656 ms
Execution Time: 395.233 ms
EXPLAIN ANALYZE SELECT * FROM test_table WHERE label ILIKE '%acl%' LIMIT 100;
Limit (cost=28.78..400.51 rows=100 width=52) (actual time=0.072..0.406 rows=10 loops=1)
-> Bitmap Heap Scan on test_table (cost=28.78..404.23 rows=101 width=52) (actual time=0.053..0.197 rows=10 loops=1)
Recheck Cond: ((label)::text ~~* '%acl%'::text)
Heap Blocks: exact=10
-> Bitmap Index Scan on test_table_label_gin_idx (cost=0.00..28.76 rows=101 width=0) (actual time=0.025..0.034 rows=10 loops=1)
Index Cond: ((label)::text ~~* '%acl%'::text)
Planning Time: 0.231 ms
Execution Time: 0.542 ms
With GIST
DROP INDEX test_table_label_gin_idx;
CREATE INDEX test_table_label_gist_idx
ON test_table USING gist
(label gist_trgm_ops);
I see
EXPLAIN ANALYZE SELECT * FROM test_table WHERE label ILIKE '%定す%' LIMIT 100;
Limit (cost=13.19..384.92 rows=100 width=52) (actual time=303.772..1557.498 rows=10 loops=1)
-> Bitmap Heap Scan on test_table (cost=13.19..388.64 rows=101 width=52) (actual time=303.752..1557.286 rows=10 loops=1)
Recheck Cond: ((label)::text ~~* '%定す%'::text)
Rows Removed by Index Recheck: 1015250
Heap Blocks: exact=10431
-> Bitmap Index Scan on test_table_label_gist_idx (cost=0.00..13.17 rows=101 width=0) (actual time=301.046..301.053 rows=1015260 loops=1)
Index Cond: ((label)::text ~~* '%定す%'::text)
Planning Time: 0.215 ms
Execution Time: 1557.643 ms
EXPLAIN ANALYZE SELECT * FROM test_table WHERE label ILIKE '%acl%' LIMIT 100;
Limit (cost=13.19..384.92 rows=100 width=52) (actual time=257.385..257.751 rows=10 loops=1)
-> Bitmap Heap Scan on test_table (cost=13.19..388.64 rows=101 width=52) (actual time=257.366..257.551 rows=10 loops=1)
Recheck Cond: ((label)::text ~~* '%acl%'::text)
Heap Blocks: exact=10
-> Bitmap Index Scan on test_table_label_gist_idx (cost=0.00..13.17 rows=101 width=0) (actual time=257.319..257.328 rows=10 loops=1)
Index Cond: ((label)::text ~~* '%acl%'::text)
Planning Time: 0.377 ms
Execution Time: 257.948 ms
Just changing the the characters of the operand change the plan quite a lot.
**Edit**
SELECT show_trgm('定す');
"{0x145ed8,0x6628fa,0x6cb12d}"
SELECT encode('定す', 'escape')
\345\256\232\343\201\231
This issue seems similar https://dba.stackexchange.com/questions/63193/postgresql-not-using-gin-trigram-index-when-performing-non-ascii-like-query
Pawel Zieminski
(125 rep)
Oct 6, 2023, 08:38 PM
• Last activity: Oct 7, 2023, 03:06 AM
0
votes
1
answers
1237
views
Queries using GIN index on f_unaccent() seem slow?
I have almost 20M rows in one table that contains product, including their names. I wish to search fulltext and fast by the name so I created this index: CREATE INDEX uprice_item_occurrence_unaccent_name_trgm_idx ON price_item_occurrence USING gin (f_unaccent(name) gin_trgm_ops);; I was hopping the...
I have almost 20M rows in one table that contains product, including their names.
I wish to search fulltext and fast by the name so I created this index:
CREATE INDEX uprice_item_occurrence_unaccent_name_trgm_idx ON price_item_occurrence USING gin (f_unaccent(name) gin_trgm_ops);;
I was hopping the following query would take less then (lets say) 500ms
select * from price_item_occurrence as oo
where f_unaccent(oo.name) % f_unaccent('iphone');
But it takes almost 2s:
postgres=# explain analyze select * from price_item_occurrence as oo where f_unaccent(oo.name) % f_unaccent('iphone');
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on price_item_occurrence oo (cost=1956.42..63674.14 rows=16570 width=287) (actual time=247.918..1880.759 rows=94 loops=1)
Recheck Cond: (f_unaccent((name)::text) % 'iphone'::text)
Rows Removed by Index Recheck: 87838
Heap Blocks: exact=76663
-> Bitmap Index Scan on uprice_item_occurrence_unaccent_name_trgm_idx (cost=0.00..1952.28 rows=16570 width=0) (actual time=195.418..195.418 rows=88962 loops=1)
Index Cond: (f_unaccent((name)::text) % 'iphone'::text)
Planning time: 0.444 ms
Execution time: 1880.833 ms
It's possible that the DB is busy, but i am not sure.
I tried to play with
select set_limit(0.9);
(increased) and it helps a bit but not much.
I am using Postgres 10, can change the Postgres config and I am open for suggestions.
I tried it with ilike
and it improved by some:
postgres=# explain analyze select * from price_item_occurrence as oo where f_unaccent(oo.name) ilike ('%' || f_unaccent('iphone') || '%');
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on price_item_occurrence oo (cost=3135.08..416823.45 rows=166075 width=286) (actual time=50.258..670.085 rows=65917 loops=1)
Recheck Cond: (f_unaccent((name)::text) ~~* '%iphone%'::text)
Rows Removed by Index Recheck: 10
Heap Blocks: exact=59750
-> Bitmap Index Scan on uprice_item_occurrence_unaccent_name_trgm_idx (cost=0.00..3093.56 rows=166075 width=0) (actual time=37.385..37.385 rows=67700 loops=1)
Index Cond: (f_unaccent((name)::text) ~~* '%iphone%'::text)
Planning time: 0.545 ms
Execution time: 675.776 ms
(8 rows)
About 2x faster.
i tried limit 10
:
postgres=# explain analyze select * from price_item_occurrence as oo where f_unaccent(oo.name) % f_unaccent('iphone') limit 10;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=373.27..410.51 rows=10 width=287) (actual time=268.718..589.131 rows=10 loops=1)
-> Bitmap Heap Scan on price_item_occurrence oo (cost=373.27..62493.45 rows=16680 width=287) (actual time=268.715..589.123 rows=10 loops=1)
Recheck Cond: (f_unaccent((name)::text) % 'iphone'::text)
Rows Removed by Index Recheck: 18917
Heap Blocks: exact=17100
-> Bitmap Index Scan on uprice_item_occurrence_unaccent_name_trgm_idx (cost=0.00..369.10 rows=16680 width=0) (actual time=165.958..165.958 rows=69268 loops=1)
Index Cond: (f_unaccent((name)::text) % 'iphone'::text)
Planning time: 0.397 ms
Execution time: 589.187 ms
(9 rows)
this is also faster, maybe almost good enough
Michal
(3 rep)
Apr 9, 2018, 07:19 PM
• Last activity: Aug 1, 2023, 05:00 PM
0
votes
1
answers
100
views
Why does my GiST index not trigger a conflict?
I have a table with user subscriptions in my Postgres 13 DB. I added a GiST index to disallow overlapping time ranges for the same user: CREATE INDEX user_sub_exclusion_index ON public.user_sub USING gist (user_id, tstzrange(sub_start, sub_end)) But today I found it did not work. This is the table D...
I have a table with user subscriptions in my Postgres 13 DB. I added a GiST index to disallow overlapping time ranges for the same user:
CREATE INDEX user_sub_exclusion_index ON public.user_sub USING gist (user_id, tstzrange(sub_start, sub_end))
But today I found it did not work.
This is the table DDL:
~~~pgsql
-- DROP TABLE public.user_sub;
CREATE TABLE public.user_sub (
id int8 NOT NULL GENERATED ALWAYS AS IDENTITY,
app_id varchar NOT NULL,
product_id int4 NOT NULL,
iap_product_id int8 NOT NULL,
created_time int8 NOT NULL,
updated_time int8 NOT NULL,
user_id int8 NOT NULL,
sub_start_time int8 NOT NULL DEFAULT 0,
sub_end_time int8 NOT NULL DEFAULT 0,
enabled int2 NOT NULL DEFAULT 1,
order_id varchar NOT NULL,
sub_start timestamptz NOT NULL,
sub_end timestamptz NOT NULL,
CONSTRAINT user_sub_new_pk PRIMARY KEY (id),
CONSTRAINT user_sub_new_un UNIQUE (order_id)
);
CREATE INDEX user_sub_exclusion_index ON public.user_sub USING gist (user_id, tstzrange(sub_start, sub_end));
CREATE INDEX user_sub_tstzrange_user_id_product_id_excl ON public.user_sub USING gist (tstzrange(sub_start, sub_end, '[]'::text), user_id, product_id);
~~~
Here is a sample of the table data right now: it contains overlapping time ranges for the same user:
~~~pgsql
INSERT INTO public.user_sub (app_id, product_id, iap_product_id, created_time, updated_time, user_id, sub_start_time, sub_end_time, enabled, order_id, sub_start, sub_end)
VALUES
('vOghoo10L9', 9, 6, 1680251663942, 1680251663942, 77, 1680501039421, 1680587439421, 1, '627599858277646336', '2023-04-03 13:50:39.421', '2023-04-04 13:50:39.421')
, ('vOghoo10L9', 9, 6, 1680263287925, 1680263287925, 77, 1680587439422, 1680673839422, 1, '627697298195189760', '2023-04-04 13:50:39.422', '2023-04-05 13:50:39.422')
, ('vOghoo10L9', 9, 6, 1680263497256, 1680263497256, 77, 1680673839423, 1680760239423, 1, '627679779371601920', '2023-04-05 13:50:39.423', '2023-04-06 13:50:39.423')
, ('vOghoo10L9', 9, 6, 1680263539890, 1680263539890, 77, 1680760239424, 1680846639424, 1, '627680143827259392', '2023-04-06 13:50:39.424', '2023-04-07 13:50:39.424')
, ('vOghoo10L9', 9, 6, 1680273609032, 1680273609032, 77, 1680846639425, 1680933039425, 1, '627601223242579968', '2023-04-07 13:50:39.425', '2023-04-08 13:50:39.425')
, ('vOghoo10L9', 9, 6, 1680275903068, 1680275903068, 77, 1680933039426, 1681019439426, 1, '627610724383956992', '2023-04-08 13:50:39.426', '2023-04-09 13:50:39.426')
, ('vOghoo10L9', 9, 6, 1680276434561, 1680276434561, 77, 1681019439427, 1681105839427, 1, '627612898614681600', '2023-04-09 13:50:39.427', '2023-04-10 13:50:39.427')
, ('vOghoo10L9', 9, 6, 1680277832740, 1680277832740, 77, 1681105839428, 1681192239428, 1, '627618880539664384', '2023-04-10 13:50:39.428', '2023-04-11 13:50:39.428')
, ('vOghoo10L9', 9, 6, 1680282545888, 1680282545888, 77, 1681192239429, 1681278639429, 1, '627411682153152512', '2023-04-11 13:50:39.429', '2023-04-12 13:50:39.429')
, ('vOghoo10L9', 9, 6, 1680327772145, 1680327772145, 77, 1681278639430, 1681365039430, 1, '627601513341616128', '2023-04-12 13:50:39.430', '2023-04-13 13:50:39.430')
, ('vOghoo10L9', 9, 5, 1680761228691, 1680761228691, 79, 1680761228687, 1680847628687, 1, '629789245600776192', '2023-04-06 14:07:08.687', '2023-04-07 14:07:08.687')
, ('vOghoo10L9', 9, 5, 1680763488691, 1680763488691, 77, 1680763488689, 1680849888689, 1, '629798689575354368', '2023-04-06 14:44:48.689', '2023-04-07 14:44:48.689')
, ('vOghoo10L9', 9, 5, 1680763634694, 1680763634694, 77, 1680849888690, 1688712288690, 1, '629799331018653696', '2023-04-07 14:44:48.690', '2023-07-07 14:44:48.690')
, ('vOghoo10L9', 9, 6, 1680850885924, 1680850885924, 80, 1680850881709, 1680937281709, 1, '630165239553671168', '2023-04-07 15:01:21.709', '2023-04-08 15:01:21.709')
, ('vOghoo10L9', 9, 6, 1681461059268, 1681461059268, 81, 1681461059263, 1681547459263, 1, '632723434021126144', '2023-04-14 16:30:59.263', '2023-04-15 16:30:59.263')
, ('vOghoo10L9', 9, 6, 1681483885008, 1681483885008, 82, 1681483885006, 1681570285006, 1, '632820205569245184', '2023-04-14 22:51:25.006', '2023-04-15 22:51:25.006')
, ('iYDFo0PQQX', 11, 14, 1682762462986, 1682762462986, 86, 1682762462971, 1682848862971, 1, '638182956129267712', '2023-04-29 18:01:02.971', '2023-04-30 18:01:02.971')
, ('iYDFo0PQQX', 11, 14, 1682764832099, 1682764832099, 86, 1682764832090, 1682851232090, 1, '638192942803423232', '2023-04-29 18:40:32.090', '2023-04-30 18:40:32.090')
, ('iYDFo0PQQX', 11, 14, 1682765939081, 1682765939081, 86, 1682765939052, 1682852339052, 1, '638197106476421120', '2023-04-29 18:58:59.052', '2023-04-30 18:58:59.052')
, ('iYDFo0PQQX', 11, 14, 1682766222551, 1682766222551, 86, 1682766193447, 1682852593447, 1, '638198676681232384', '2023-04-29 19:03:13.447', '2023-04-30 19:03:13.447')
;
~~~
Am I missing something? How to make this constraint work? That is, to prevent the same user from subscribing with overlapping time ranges.
Dolphin
(939 rep)
Apr 29, 2023, 02:29 PM
• Last activity: Apr 29, 2023, 09:23 PM
3
votes
3
answers
1009
views
Is a GIST index on a geometry point useful to speed up a spatial query?
I have multiple tables with point geometries and a GiST index eating many gigabits of space. From what I understand about GiST indexes on polygons or polylines, it will store the bounding box and compute a first calculation on the bounding box instead of the full geometry resulting in faster process...
I have multiple tables with point geometries and a GiST index eating many gigabits of space.
From what I understand about GiST indexes on polygons or polylines, it will store the bounding box and compute a first calculation on the bounding box instead of the full geometry resulting in faster processing.
But what about point geometry? Will it do anything more than ordering the points spatially?
Boodoo
(65 rep)
Feb 25, 2021, 01:51 PM
• Last activity: Mar 30, 2023, 04:13 PM
4
votes
2
answers
3239
views
ERROR: data type text[] has no default operator class for access method "gist"
Whenever I try to try to create a GIST index on `text[]`, I get the above error. CREATE TABLE bar(f) AS VALUES (ARRAY['foo','bar','baz']); CREATE INDEX ON bar USING gist(f); ERROR: data type text[] has no default operator class for access method "gist" HINT: You must specify an operator class for th...
Whenever I try to try to create a GIST index on
text[]
, I get the above error.
CREATE TABLE bar(f)
AS VALUES
(ARRAY['foo','bar','baz']);
CREATE INDEX ON bar USING gist(f);
ERROR: data type text[] has no default operator class for access method "gist"
HINT: You must specify an operator class for the index or define a default operator class for the data type.
I'm trying to [create an *Exclusion Constraint*](https://dba.stackexchange.com/a/185177/2639) on bar, so I need a GIST index.
Evan Carroll
(65502 rep)
Sep 6, 2017, 09:37 AM
• Last activity: Mar 2, 2023, 07:37 PM
1
votes
0
answers
151
views
Curious results from pg_stat_user_indexes
**PostgreSQL 12.8, PostGIS 3.1.4** Running `select * from pg_stat_user_indexes`, there are some rows that have 0 for `idx_scan`, and a >0 result for `idx_tup_read`. The indexes that match this condition are all simple SP-Gist indexes, i.e., ``` CREATE INDEX ... USING spgist(geometry) ``` The definit...
**PostgreSQL 12.8, PostGIS 3.1.4**
Running
select * from pg_stat_user_indexes
, there are some rows that have 0 for idx_scan
, and a >0 result for idx_tup_read
.
The indexes that match this condition are all simple SP-Gist indexes, i.e.,
CREATE INDEX ... USING spgist(geometry)
The definition for idx_tup_read
is "Number of index entries returned by scans on this index."
If idx_scan
is 0, I would expect "scans on this index" to be 0 as well. So how can idx_tup_read
have a result greater than 0, if idx_scan
is 0?
Avocado
(245 rep)
Dec 15, 2022, 06:02 PM
• Last activity: Dec 16, 2022, 03:43 PM
0
votes
1
answers
762
views
Optimizing geolocation query in Postgres with earth_box and earth_distance
I'm trying to optimize my geolocation query for a table of addresses of ~862k rows within a search radius using `earth_box`. My first initial query is not too terrible: ``` explain analyze SELECT id FROM location WHERE earth_box(ll_to_earth(40.65130101, -73.83367812), 25000) @> ll_to_earth(latitude,...
I'm trying to optimize my geolocation query for a table of addresses of ~862k rows within a search radius using
earth_box
. My first initial query is not too terrible:
explain analyze SELECT
id
FROM
location
WHERE
earth_box(ll_to_earth(40.65130101, -73.83367812), 25000) @> ll_to_earth(latitude, longitude)
;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on location (cost=46.97..3385.08 rows=863 width=16) (actual time=11.430..49.406 rows=29407 loops=1)
Recheck Cond: ('(1322317.9173587903, -4672693.781456112, 4130081.708818594),(1372317.8853516562, -4622693.8134632455, 4180081.6768114604)'::cube @> (ll_to_earth(latitude, longitude))::cube)
Heap Blocks: exact=22479
-> Bitmap Index Scan on location_gist_lat_lon_idx (cost=0.00..46.76 rows=863 width=0) (actual time=7.942..7.943 rows=29407 loops=1)
Index Cond: ((ll_to_earth(latitude, longitude))::cube explain analyze SELECT
id
FROM
location
WHERE
earth_box(ll_to_earth(40.65130101, -73.83367812), 25000) @> ll_to_earth(latitude, longitude)
AND earth_distance(ll_to_earth(40.65130101, -73.83367812),
ll_to_earth(latitude, longitude)) (ll_to_earth(latitude, longitude))::cube)
Filter: (sec_to_gc(cube_distance('(1347317.9013552233, -4647693.797459679, 4155081.692815027)'::cube, (ll_to_earth(latitude, longitude))::cube)) Bitmap Index Scan on location_gist_lat_lon_idx (cost=0.00..46.76 rows=863 width=0) (actual time=7.358..7.358 rows=29407 loops=1)
Index Cond: ((ll_to_earth(latitude, longitude))::cube <@ '(1322317.9173587903, -4672693.781456112, 4130081.708818594),(1372317.8853516562, -4622693.8134632455, 4180081.6768114604)'::cube)
Planning Time: 0.901 ms
Execution Time: 539.113 ms
(9 rows)
My table schema (excluded some columns):
Table "provider.location"
Column | Type | Collation | Nullable | Default
------------------+--------------------------+-----------+----------+-------------------
id | uuid | | not null |
created_at | timestamp with time zone | | not null | CURRENT_TIMESTAMP
updated_at | timestamp with time zone | | not null | CURRENT_TIMESTAMP
can_delete | boolean | | |
name | text | | |
address | text | | |
address_line_1 | text | | |
address_line_2 | text | | |
city | text | | |
state | text | | |
street | text | | |
zip | text | | |
confidence | int | | |
google_maps_link | text | | |
is_coe | boolean | | |
latitude | double precision | | |
longitude | double precision | | |
I also have a GiST index created for the lat/lon:
"location_gist_lat_lon_idx" gist (ll_to_earth(latitude, longitude))
I'm wondering what is it that I'm missing that's making the additional query execution time increase by 10x?
My postgres 13.5 instance has the following specs:
CPU: 4vCPU
Memory: 16GB
SDD: 250GB
This question is related to https://dba.stackexchange.com/questions/158349/how-can-i-speed-up-my-query-on-geo-location-processes , however after following the suggested answer it didn't seem to improve my performance.
blin
(3 rep)
Jul 28, 2022, 07:04 PM
• Last activity: Jul 30, 2022, 12:39 AM
12
votes
1
answers
7331
views
2 B-tree indices OR 1 GiST index on tsrange -- which will perform better?
I have a table which is storing reservation data using the columns `starts_at` & `ends_at` Whenever I'm querying the table to find overlapping reservations, I have an option of using one of the following queries: SELECT * FROM reservations WHERE starts_at ='2014-01-01 00:00:00'; Or SELECT * FROM res...
I have a table which is storing reservation data using the columns
starts_at
& ends_at
Whenever I'm querying the table to find overlapping reservations, I have an option of using one of the following queries:
SELECT * FROM reservations
WHERE starts_at ='2014-01-01 00:00:00';
Or
SELECT * FROM reservations
WHERE tsrange(starts_at, ends_at) && ('2014-01-01 00:00:00', '2014-01-03 00:00:00')
I have regular B-Tree indices on the starts_at
and ends_at
columns, therefore the first query is always using them. However, unless I define a functional GiST index on the tsrange, the second query does a complete scan.
create index tsrange_idx on reservations using gist(tsrange(starts_at, ends_at));
My question is, as the table grows, which index is going to be faster? Probably the answer is obvious looking at the query execution plan, but I'm not well-versed with reading EXPLAIN ANALYZE
output.
Saurabh Nanda
(333 rep)
Feb 18, 2014, 01:32 AM
• Last activity: Jun 24, 2022, 01:55 AM
0
votes
3
answers
548
views
slow index creation with gist and gist__int_ops
I am seeing performance issues with frequently updated / inserts table, and one of my hypothesis is the gin index I have on int array. The gin index was added since we have query of "X exists in array?" in high profile endpoint. I created a one-time replica of the production DB and removed the index...
I am seeing performance issues with frequently updated / inserts table, and one of my hypothesis is the gin index I have on int array.
The gin index was added since we have query of "X exists in array?" in high profile endpoint.
I created a one-time replica of the production DB and removed the index and replaced it with GIST and gist__int_ops to check it's performance (since it should work better on frequently updated /inserts table).
The gist index creation never finishes, I let it run for hours and no results on the other side, gin index creation takes few minutes.
I tried cleaning the data - I had arrays with 50 to 120 items, and I removed them and now all of my data is an array with one element and still, index creation is slow.
DB: PostgreSQL 11.5 on RDS (db.m4.4xlarge)
Table size: 9373 MB
Any ideas what I can do here?
Yosi
(175 rep)
Dec 8, 2019, 03:42 PM
• Last activity: Jun 6, 2022, 03:47 PM
0
votes
1
answers
586
views
I have a GIST index and a regular index, one intended for filtering, one intended for sorting, but Postgres WILL NOT use both
I have a table with three columns: two can be used to filter the table, and the last can be used to sort it: ```sql create table thing( id serial primary key, location geometry(Point, 4326) not null, created_at timestamptz not null default (now() at time zone 'utc'), priority float not null ); ``` I...
I have a table with three columns: two can be used to filter the table, and the last can be used to sort it:
create table thing(
id serial primary key,
location geometry(Point, 4326) not null,
created_at timestamptz not null default (now() at time zone 'utc'),
priority float not null
);
I have been playing with the indexes to support the following query:
select id
from thing
where st_dwithin(location, 'SRID=4326;Point($0 $1)'::geometry, $2)
and created_at >= now() - interval '$3 days'
order by priority
offset $4
limit $5;
Ideally this query could combine indexes on location, created_at, and priority to get optimal performance. Typically I'd imagine an index using btree (priority, location, created_at)
would support such a query quite well, utilizing the index during the filter and delivering results pre-sorted. However, the st_dwithin part of the filter goes unassisted by the filter since that is not a gist
index.
If I make an index using gist (priority, location, created_at)
then the query plan utilizes the index during filtering but does not deliver results pre-sorted and sorts the results unassisted by the index.
At this point I thought we'd have to split up the index, but using gist(location, created_at)
and using btree(priority)
if $5 20
the query plan uses the gist index.
Does anyone have any idea why this might be, and what I could do to optimize the performance of both the filtering and the sorting?
Cooper Wolfe
(3 rep)
Apr 11, 2022, 10:24 PM
• Last activity: Apr 11, 2022, 11:28 PM
7
votes
2
answers
8471
views
How can I speed-up my query on geo-location processes
I have a table that contains 10,301,390 GPS records, cities, countries and IP address blocks. I have user's current location with latitude and longitude. I created this query: SELECT *, point(45.1013021, 46.3021011) point(latitude, longitude) :: point AS distance FROM locs WHERE ( point(45.1013021,...
I have a table that contains 10,301,390 GPS records, cities, countries and IP address blocks. I have user's current location with latitude and longitude. I created this query:
SELECT
*, point(45.1013021, 46.3021011) point(latitude, longitude) :: point AS distance
FROM
locs
WHERE
(
point(45.1013021, 46.3021011) point(latitude, longitude)
) < 10 -- radius
ORDER BY
distance LIMIT 1;
This query successfully gave me what I want, but it is slow. It took 2 to 3 seconds to get one record by given latitude and longitude.
I tried a B-Tree index on the
latitude
and longitude
columns, also tried GIST( point(latitude, longitude));
but still querying is slow.
How can I speed up this query?
# Update:
It seems slowness is caused by the ORDER BY
but I want to get the shortest distance, so the question remains.
xangr
(457 rep)
Dec 16, 2016, 10:40 AM
• Last activity: Apr 1, 2022, 04:58 PM
15
votes
1
answers
9272
views
Order by distance
If I have a query returning nearby cafes: SELECT * FROM cafes c WHERE ( ST_DWithin( ST_GeographyFromText( 'SRID=4326;POINT(' || c.longitude || ' ' || c.latitude || ')' ), ST_GeographyFromText('SRID=4326;POINT(-76.000000 39.000000)'), 2000 ) ) How do I select distance, and order by distance as well?...
If I have a query returning nearby cafes:
SELECT * FROM cafes c WHERE (
ST_DWithin(
ST_GeographyFromText(
'SRID=4326;POINT(' || c.longitude || ' ' || c.latitude || ')'
),
ST_GeographyFromText('SRID=4326;POINT(-76.000000 39.000000)'),
2000
)
)
How do I select distance, and order by distance as well?
Is there a more efficient way than this one:
SELECT id,
ST_Distance(ST_GeographyFromText('SRID=4326;POINT(-76.000000 39.000000)'),
ST_GeographyFromText(
'SRID=4326;POINT(' || c.longitude || ' ' || c.latitude || ')')
) as distance
FROM cafes c
WHERE (
ST_DWithin(
ST_GeographyFromText(
'SRID=4326;POINT(' || c.longitude || ' ' || c.latitude || ')'
),
ST_GeographyFromText('SRID=4326;POINT(-76.000000 39.000000)'),
2000
)
) order by distance
Gandalf StormCrow
(615 rep)
Mar 11, 2014, 02:12 AM
• Last activity: Nov 2, 2021, 11:44 PM
62
votes
2
answers
42798
views
PostgreSQL EXCLUDE USING error: Data type integer has no default operator class
In PostgreSQL 9.2.3 I am trying to create this simplified table: CREATE TABLE test ( user_id INTEGER, startend TSTZRANGE, EXCLUDE USING gist (user_id WITH =, startend WITH &&) ); But I get this error: > ERROR: data type integer has no default operator class for access method "gist" > HINT: You must...
In PostgreSQL 9.2.3 I am trying to create this simplified table:
CREATE TABLE test (
user_id INTEGER,
startend TSTZRANGE,
EXCLUDE USING gist (user_id WITH =, startend WITH &&)
);
But I get this error:
> ERROR: data type integer has no default operator class for access method "gist"
> HINT: You must specify an operator class for the index or define
> a default operator class for the data type.
The PostgreSQL docs use this example which does not work for me:
CREATE TABLE room_reservation (
room text,
during tsrange,
EXCLUDE USING gist (room WITH =, during WITH &&)
);
Same error message.
And this one , which does not work for me either:
CREATE TABLE zoo (
cage INTEGER,
animal TEXT,
EXCLUDE USING gist (cage WITH =, animal WITH )
);
Same error message.
I am able to create this without any problem:
CREATE TABLE test (
user_id INTEGER,
startend TSTZRANGE,
EXCLUDE USING gist (startend WITH &&)
);
and this:
CREATE TABLE test (
user_id INTEGER,
startend TSTZRANGE,
EXCLUDE USING btree (user_id WITH =)
);
I've spent quite a bit of time searching for hints about figuring out how to make this work, or figuring out why it won't work. Any ideas?
Ian Timothy
(905 rep)
Mar 22, 2013, 03:56 PM
• Last activity: Sep 16, 2021, 11:37 PM
1
votes
0
answers
355
views
Why does a GIST index on a cube column in PostgreSQL actually make K-Nearest Neighbor (KNN) ORDER BY queries worse?
Adding a [GIST][1] index actually seems to make K-Nearest Neighbor (KNN) `ORDER BY` queries on `cube` columns *worse* in PostgreSQL. Why would that be, and what can be done about it? Here's what I mean. In a PostgreSQL database I have a table whose DDL is `create sample (id serial primary key, title...
Adding a GIST index actually seems to make K-Nearest Neighbor (KNN)
ORDER BY
queries on cube
columns *worse* in PostgreSQL. Why would that be, and what can be done about it?
Here's what I mean. In a PostgreSQL database I have a table whose DDL is create sample (id serial primary key, title text, embedding cube)
where the embedding
column is an embedding vector of the title
obtained with a Google language model. The cube
data type is provided by the cube extension, which I have installed. Incidentally, these are titles of Wikipedia articles. In any case, there are 1 million records. I then perform a KNN query with the following query. This query defines distance
using the Euclidean distance operator `, though results are similar for the other two metrics. It does an
ORDER BY and applies a
LIMIT` in order to find 10 Wikipedia articles with "similar" titles (the most similar being the target title itself). That all works fine.
select sample.title, sample.embedding cube('(0.18936706, -0.12455666, -0.31581765, 0.0192692, -0.07364611, 0.07851536, 0.0290586, -0.02582532, -0.03378124, -0.10564457, -0.03903799, 0.08668878, -0.15357816, -0.17793414, -0.01826405, 0.01969068, 0.11386908, 0.1555583, 0.09368557, 0.13697313, -0.05610929, -0.06536788, -0.12212707, 0.26356605, -0.06004387, -0.01966437, -0.1250324, -0.16645767, -0.13525756, 0.22482251, -0.1709727, 0.28966117, -0.07927769, -0.02498624, -0.10018375, -0.10923951, 0.04770213, 0.11573371, 0.04619929, 0.05216618, 0.19176421, 0.12948817, 0.08719034, -0.16109011, -0.02411379, -0.05638905, -0.37334979, 0.31225419, 0.0744801, 0.27044332)') distance from sample order by distance limit 10;
What's puzzling to me, however, is that, if I put a GIST index on the embedding
column, the query performance actually is *worse*. Adding the index, the query plan changes as expected, in the way expected, insofar as it uses the index. But...it gets slower!
This seems to run contrary to the documentation for cube
which states:
> In addition, a cube GiST index can be used to find nearest neighbors using the metric operators , , and in ORDER BY clauses
They even provide an example query, which is very similar to mine.
SELECT c FROM test ORDER BY c cube(array[0.5,0.5,0.5]) LIMIT 1
Here's the query plan and timing info *before* dropping the index.
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.41..6.30 rows=10 width=29)
-> Index Scan using sample_embedding_idx on sample (cost=0.41..589360.33 rows=999996 width=29)
Order By: (embedding '(0.18936706, -0.12455666, -0.31581765, 0.0192692, -0.07364611, 0.07851536, 0.0290586, -0.02582532, -0.03378124, -0.10564457, -0.03903799, 0.08668878, -0.15357816, -0.17793414, -0.01826405, 0.01969068, 0.11386908, 0.1555583, 0.09368557, 0.13697313, -0.05610929, -0.06536788, -0.12212707, 0.26356605, -0.06004387, -0.01966437, -0.1250324, -0.16645767, -0.13525756, 0.22482251, -0.1709727, 0.28966117, -0.07927769, -0.02498624, -0.10018375, -0.10923951, 0.04770213, 0.11573371, 0.04619929, 0.05216618, 0.19176421, 0.12948817, 0.08719034, -0.16109011, -0.02411379, -0.05638905, -0.37334979, 0.31225419, 0.0744801, 0.27044332)'::cube)
(3 rows)
title | distance
----------------------+--------------------
david petrarca | 0.5866321762629475
david adamski | 0.5866321762629475
richard ansdell | 0.6239883862603475
linda darke | 0.6392124797481789
ilias tsiliggiris | 0.6996660649119893
watson, jim | 0.7059481479504834
sk radni%c4%8dki | 0.71718948226995
burnham, pa | 0.7384858030758069
arthur (europa-park) | 0.7468462897336924
ivan kecojevic | 0.7488206082281348
(10 rows)
Time: 1226.457 ms (00:01.226)
And, here's the query plan and timing info *after* dropping the index.
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=74036.32..74037.48 rows=10 width=29)
-> Gather Merge (cost=74036.32..171264.94 rows=833330 width=29)
Workers Planned: 2
-> Sort (cost=73036.29..74077.96 rows=416665 width=29)
Sort Key: ((embedding '(0.18936706, -0.12455666, -0.31581765, 0.0192692, -0.07364611, 0.07851536, 0.0290586, -0.02582532, -0.03378124, -0.10564457, -0.03903799, 0.08668878, -0.15357816, -0.17793414, -0.01826405, 0.01969068, 0.11386908, 0.1555583, 0.09368557, 0.13697313, -0.05610929, -0.06536788, -0.12212707, 0.26356605, -0.06004387, -0.01966437, -0.1250324, -0.16645767, -0.13525756, 0.22482251, -0.1709727, 0.28966117, -0.07927769, -0.02498624, -0.10018375, -0.10923951, 0.04770213, 0.11573371, 0.04619929, 0.05216618, 0.19176421, 0.12948817, 0.08719034, -0.16109011, -0.02411379, -0.05638905, -0.37334979, 0.31225419, 0.0744801, 0.27044332)'::cube))
-> Parallel Seq Scan on sample (cost=0.00..64032.31 rows=416665 width=29)
(6 rows)
title | distance
----------------------+--------------------
david petrarca | 0.5866321762629475
david adamski | 0.5866321762629475
richard ansdell | 0.6239883862603475
linda darke | 0.6392124797481789
ilias tsiliggiris | 0.6996660649119893
watson, jim | 0.7059481479504834
sk radni%c4%8dki | 0.71718948226995
burnham, pa | 0.7384858030758069
arthur (europa-park) | 0.7468462897336924
ivan kecojevic | 0.7488206082281348
(10 rows)
Time: 381.419 ms
Notice:
- With Index: 1226.457 ms
- Without Index: 381.419 ms
This very puzzling behavior! All of it is documented in a GitHub repo so that others can try it. I'll add documentation about how to generate the embedding vectors, but that *shouldn't* be needed, as in the Quick-Start I show that pre-computed embedding vectors can be downloaded from my Google Drive folder.
**Addendum**
It was asked in the comments below to provide the output of explain (analyze, buffers)
. Here that is, where
1. I re-create the (covering) index
2. I run the query with explain (analyze, buffers)
3. I drop the index
4. I run the query with explain (analyze, buffers)
again
pgbench=# create index on sample using gist (embedding) include (title);
CREATE INDEX
Time: 51966.315 ms (00:51.966)
pgbench=#
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.41..4.15 rows=10 width=29) (actual time=3215.956..3216.667 rows=10 loops=1)
Buffers: shared hit=1439 read=87004 written=7789
-> Index Only Scan using sample_embedding_title_idx on sample (cost=0.41..373768.39 rows=999999 width=29) (actual time=3215.932..3216.441 rows=10 loops=1)
Order By: (embedding '(0.18936706, -0.12455666, -0.31581765, 0.0192692, -0.07364611, 0.07851536, 0.0290586, -0.02582532, -0.03378124, -0.10564457, -0.03903799, 0.08668878, -0.15357816, -0.17793414, -0.01826405, 0.01969068, 0.11386908, 0.1555583, 0.09368557, 0.13697313, -0.05610929, -0.06536788, -0.12212707, 0.26356605, -0.06004387, -0.01966437, -0.1250324, -0.16645767, -0.13525756, 0.22482251, -0.1709727, 0.28966117, -0.07927769, -0.02498624, -0.10018375, -0.10923951, 0.04770213, 0.11573371, 0.04619929, 0.05216618, 0.19176421, 0.12948817, 0.08719034, -0.16109011, -0.02411379, -0.05638905, -0.37334979, 0.31225419, 0.0744801, 0.27044332)'::cube)
Heap Fetches: 0
Buffers: shared hit=1439 read=87004 written=7789
Planning:
Buffers: shared hit=14 read=6 dirtied=2
Planning Time: 0.432 ms
Execution Time: 3316.266 ms
(10 rows)
Time: 3318.333 ms (00:03.318)
pgbench=# drop index sample_embedding_title_idx;
DROP INDEX
Time: 182.324 ms
pgbench=#
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=74036.35..74037.52 rows=10 width=29) (actual time=6052.845..6057.210 rows=10 loops=1)
Buffers: shared hit=70 read=58830
-> Gather Merge (cost=74036.35..171265.21 rows=833332 width=29) (actual time=6052.825..6057.021 rows=10 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=70 read=58830
-> Sort (cost=73036.33..74077.99 rows=416666 width=29) (actual time=6002.928..6003.019 rows=8 loops=3)
Sort Key: ((embedding '(0.18936706, -0.12455666, -0.31581765, 0.0192692, -0.07364611, 0.07851536, 0.0290586, -0.02582532, -0.03378124, -0.10564457, -0.03903799, 0.08668878, -0.15357816, -0.17793414, -0.01826405, 0.01969068, 0.11386908, 0.1555583, 0.09368557, 0.13697313, -0.05610929, -0.06536788, -0.12212707, 0.26356605, -0.06004387, -0.01966437, -0.1250324, -0.16645767, -0.13525756, 0.22482251, -0.1709727, 0.28966117, -0.07927769, -0.02498624, -0.10018375, -0.10923951, 0.04770213, 0.11573371, 0.04619929, 0.05216618, 0.19176421, 0.12948817, 0.08719034, -0.16109011, -0.02411379, -0.05638905, -0.37334979, 0.31225419, 0.0744801, 0.27044332)'::cube))
Sort Method: top-N heapsort Memory: 26kB
Buffers: shared hit=70 read=58830
Worker 0: Sort Method: top-N heapsort Memory: 26kB
Worker 1: Sort Method: top-N heapsort Memory: 26kB
-> Parallel Seq Scan on sample (cost=0.00..64032.33 rows=416666 width=29) (actual time=0.024..3090.103 rows=333333 loops=3)
Buffers: shared read=58824
Planning:
Buffers: shared hit=3 read=3 dirtied=1
Planning Time: 0.129 ms
Execution Time: 6057.388 ms
(18 rows)
Time: 6053.284 ms (00:06.053)
David Ventimiglia
(111 rep)
Apr 30, 2021, 10:57 PM
3
votes
0
answers
432
views
Ordering of columns in a GIST index
I am modelling an application where events for factories can be registered. There are usually a small amount of factories (around 500), but each factory can register a large set of events (millions). It's impossible that two events overlap for the same factory. My idea was to introduce an exclusion...
I am modelling an application where events for factories can be registered. There are usually a small amount of factories (around 500), but each factory can register a large set of events (millions). It's impossible that two events overlap for the same factory.
My idea was to introduce an exclusion constraint based on a GIST index:
CREATE TABLE event
(
identifier SERIAL NOT NULL PRIMARY KEY
factory_identifier INTEGER NOT NULL REFERENCES factory,
period TSTZRANGE NOT NULL,
EXCLUDE USING gist
(
factory_identifier WITH =,
period WITH &&
)
);
In the documentation I read:
> A GiST index will be relatively ineffective if its first column has only a few distinct values, even if there are many distinct values in additional columns.
I am not sure how to interpret this. Given that there are only a small amount of factories but a large number of events, should I define the exclusions constraint as following instead?
EXCLUDE USING gist
(
period WITH &&
factory_identifier WITH =,
)
Now, the period
is the first column, for which there are a lot of distinct values (almost all values are unique, actually), in contrast to factory_identifier
, for which there only a few distinct values.
Is this better now? In short, I have difficulties to grasp the quoted documentation above.
----
additional
----
In the slides on a presentation of exclusion constraints on page 16 and a blog post , both coming from the original author of exclusion constraints, I found the following example:
EXCLUDE USING GIST(room WITH =, during WITH &&)
Given the (reasonable) assumption that there much more distinct reservation periods than distinct rooms, I wonder if this should have been (during WITH &&, room WITH =)
instead, given the quote regarding GIST column ordering above.
This makes me believe that I am not really understanding the quoted documented above.
ItIsJustMe
(33 rep)
Aug 22, 2020, 08:00 AM
• Last activity: Aug 22, 2020, 08:27 AM
6
votes
1
answers
2143
views
Why is a GiST index used for filtering on non-leading column?
I always learned and understood that an index can only be used when we have predicates for the leading (or all) columns. Now, to my surprise, I noticed that a [GiST index][1] is used in the following query. Why is that? Is this a special feature of GiST indexes? ```sql CREATE TABLE t1 ( i INT, j INT...
I always learned and understood that an index can only be used when we have predicates for the leading (or all) columns. Now, to my surprise, I noticed that a GiST index is used in the following query. Why is that? Is this a special feature of GiST indexes?
CREATE TABLE t1 (
i INT,
j INT,
k INT
);
INSERT INTO t1
SELECT i, j, k
FROM GENERATE_SERIES(1, 100) AS i,
GENERATE_SERIES(1, 100) AS j,
GENERATE_SERIES(1, 100) AS k;
CREATE INDEX ON t1 USING GiST(i, j, k);
EXPLAIN SELECT * FROM t1 WHERE k = 54;
QUERY PLAN
Bitmap Heap Scan on t1 (cost=199.03..5780.51 rows=5000 width=12)
Recheck Cond: (k = 54)
-> Bitmap Index Scan on t1_i_j_k_idx (cost=0.00..197.78 rows=5000 width=0)
Index Cond: (k = 54)
*dbfiddle [here](https://dbfiddle.uk/?rdbms=postgres_12&fiddle=e40988cd9745f7cd2949e844ceb294f0)*
Kejlo
(63 rep)
Jul 12, 2020, 09:05 PM
• Last activity: Jul 13, 2020, 11:51 AM
5
votes
1
answers
6005
views
Best index for jsonb in Postgres
We have a table containing around 500k rows. The database table is supposed to grow to million of records. This is how the table looks like: CREATE TABLE public.influencers ( id integer NOT NULL DEFAULT nextval('influencers_id_seq'::regclass), location jsonb, gender text COLLATE pg_catalog."default"...
We have a table containing around 500k rows. The database table is supposed to grow to million of records.
This is how the table looks like:
CREATE TABLE public.influencers
(
id integer NOT NULL DEFAULT nextval('influencers_id_seq'::regclass),
location jsonb,
gender text COLLATE pg_catalog."default",
birthdate timestamp without time zone,
ig jsonb,
contact_info jsonb,
created_at timestamp without time zone DEFAULT now(),
updated_at timestamp without time zone DEFAULT now(),
categories text[] COLLATE pg_catalog."default",
search_field text COLLATE pg_catalog."default",
search_vector tsvector,
ig_updated_at timestamp without time zone,
CONSTRAINT influencers_pkey PRIMARY KEY (id),
CONSTRAINT ig_id_must_exist CHECK (ig ? 'id'::text),
CONSTRAINT ig_username_must_exist CHECK (ig ? 'username'::text)
)
And these are some of the queries we need to perform efficiently:
SELECT "public"."influencers".*
FROM "public"."influencers"
WHERE (ig->'follower_count' IS NOT NULL)
ORDER BY (ig->'follower_count') DESC
LIMIT 9
OFFSET 0
SELECT *
FROM "public"."influencers"
WHERE (ig->'follower_count' >= '5000')
LIMIT 9
SELECT SUM(CAST(ig ->> 'follower_count' AS integer))
FROM "public"."influencers"
WHERE (ig->'follower_count' >= '5000')
AND (ig->'follower_count' 'follower_count' IS NOT NULL)
ig -> follower_count
are numeric values.
I read that GIN indexes are mainly intended for searching through composite items (text) so I'm guessing the best index to use would be a BTREE. Would that be correct?
borjagvo
(205 rep)
May 17, 2017, 10:01 AM
• Last activity: Oct 1, 2019, 04:08 PM
1
votes
1
answers
5252
views
How to increase index row maximum size in Postgresql?
I want to perform KNN on 512-dimensional vector using the solution suggested [here][1]. 1. I couldn't create table with 512-d cube array. Solved by changing the `CUBE_MAX_DIM` in the source code. 2. Couldn't create index on table with dimensions > 200. Set `block_size=32`. Now works with `dimensiona...
I want to perform KNN on 512-dimensional vector using the solution suggested here .
1. I couldn't create table with 512-d cube array. Solved by changing the
CUBE_MAX_DIM
in the source code.
2. Couldn't create index on table with dimensions > 200. Set block_size=32
. Now works with dimensionality<=510
.
3. Got new error after changing block_size
and trying to create index on table with 512-d vectors ERROR: index row requires 8208 bytes, maximum size is 8191
Is it possible to increase this limit?
Alibi Yeslambek
(15 rep)
Aug 6, 2019, 07:15 AM
• Last activity: Aug 6, 2019, 07:35 AM
Showing page 1 of 20 total questions