Sample Header Ad - 728x90

Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

3 votes
2 answers
2332 views
Complex view becomes slow when adding ORDER BY in outer query with small LIMIT
I have a pretty large query in a view (let's call it `a_sql`), that is really fast unless I use `ORDER BY` in an outer `SELECT` with a small `LIMIT`: SELECT customs.id AS custom_id, customs.custom_name AS custom_name, customs.slug AS slug, customs.use_case AS custom_use_case, SUM(CASE WHEN designers...
I have a pretty large query in a view (let's call it a_sql), that is really fast unless I use ORDER BY in an outer SELECT with a small LIMIT: SELECT customs.id AS custom_id, customs.custom_name AS custom_name, customs.slug AS slug, customs.use_case AS custom_use_case, SUM(CASE WHEN designers.id = orders.user_id AND orders.bulk = 't' THEN order_rows.quantity ELSE 0 END) AS sale_bulk, SUM(CASE WHEN designers.id = orders.user_id AND orders.bulk = 'f' THEN order_rows.quantity ELSE 0 END) AS sale_not_bulk, SUM(CASE WHEN designers.id = orders.user_id THEN order_rows.quantity ELSE 0 END) AS sale_total, SUM(CASE WHEN designers.id orders.user_id AND orders.bulk = 't' THEN order_rows.quantity ELSE 0 END) AS buy_bulk, SUM(CASE WHEN designers.id orders.user_id AND orders.bulk = 'f' THEN order_rows.quantity ELSE 0 END) AS buy_not_bulk, SUM(CASE WHEN designers.id orders.user_id THEN order_rows.quantity ELSE 0 END) AS buy_total, SUM(CASE orders.bulk WHEN 't' THEN order_rows.quantity ELSE 0 END) AS total_bulk, SUM(CASE orders.bulk WHEN 'f' THEN order_rows.quantity ELSE 0 END) AS total_not_bulk, COALESCE(SUM(order_rows.quantity), 0 ) AS total, MIN(shoes.id) AS shoe_id, MIN(shoe_models.id) AS shoe_model_id, MIN(shoe_models.name) AS shoe_model_name, MIN(shoe_models.title) AS shoe_model_title, MIN(model_categories.id) AS model_category_id, MIN(model_categories.name) AS model_category_name, MIN(business_orders.id) AS business_order_id, MIN(business_orders.state) AS business_order_state, MIN(business_orders.published_at) AS business_order_published_at, MIN(designers.id) AS designer_id, MIN(designers.email) AS designer_email, MIN(designer_details.first_name) AS designer_first_name, MIN(designer_details.last_name) AS designer_last_name FROM business_orders /* 10^6 rows */ LEFT JOIN users designers ON designers.id = business_orders.user_id /* 10^6 rows - business_orders has 0 or 1 users, users has n business_orders */ LEFT JOIN user_details designer_details ON designers.id = designer_details.user_id /* 10^6 rows - users has 0 or 1 user_details, user_details has 1 users */ INNER JOIN customs ON business_orders.id = customs.business_order_id /* 10^6 rows - business_orders has 1 customs, customs has 1 business_order */ LEFT JOIN shoes ON shoes.product_id = customs.id AND shoes.product_type = 'Custom' /* 10^6 rows - customs has 1 shoes, shoes has 1 customs */ LEFT JOIN shoe_models ON shoe_models.id = shoes.shoe_model_id /* 10^2 rows - shoes has 1 shoe_models, shoe_models has n shoes */ LEFT JOIN model_categories ON shoe_models.model_category_id = model_categories.id /* 10^1 rows - shoe_models has 1 model_categories, model_categories has n models */ INNER JOIN sizes ON shoes.id = sizes.shoe_id /* 10^6 rows - sizes has 1 shoes, shoes has n sizes */ LEFT JOIN order_rows ON order_rows.article_id = sizes.id AND order_rows.article_type::text = 'Size'::text /* 10^5 rows - sizes has n order_rows, order_rows has 0 or 1 size */ LEFT JOIN orders ON orders.id = order_rows.order_id /* 10^4 rows - order_rows has 1 orders, orders has n order_rows */ WHERE orders.state IN ('funded', 'confirmed', 'paid', 'delivered' ,'production', 'produced', 'ready_to_ship' , 'shipped') OR orders.id IS NULL GROUP BY business_orders.id Returns around 52.000 rows. A query of the following type is executed in 12.728 ms: SELECT * FROM A_SQL LIMIT 10 The related EXPLAIN output: Limit (cost=3.51..145.53 rows=10 width=324) (actual time=1.545..12.468 rows=10 loops=1) Buffers: shared hit=1652 -> Subquery Scan on x (cost=3.51..737218.84 rows=51911 width=324) (actual time=1.543..12.462 rows=10 loops=1) Buffers: shared hit=1652 -> GroupAggregate (cost=3.51..736699.73 rows=51911 width=610) (actual time=1.542..12.455 rows=10 loops=1) Group Key: business_orders.id Buffers: shared hit=1652 -> Nested Loop Left Join (cost=3.51..716552.04 rows=270739 width=610) (actual time=0.090..4.073 rows=608 loops=1) Filter: (((orders.state)::text = ANY ('{funded,confirmed,paid,delivered,production,produced,ready_to_ship,shipped}'::text[])) OR (orders.id IS NULL)) Rows Removed by Filter: 5 Buffers: shared hit=1652 -> Nested Loop Left Join (cost=3.23..408595.00 rows=448022 width=609) (actual time=0.087..3.264 rows=613 loops=1) Buffers: shared hit=1547 -> Nested Loop (cost=2.94..264656.18 rows=448022 width=605) (actual time=0.082..1.227 rows=596 loops=1) Buffers: shared hit=269 -> Nested Loop Left Join (cost=2.52..130221.18 rows=52594 width=601) (actual time=0.073..0.578 rows=14 loops=1) Buffers: shared hit=197 -> Nested Loop Left Join (cost=2.23..104252.63 rows=51831 width=588) (actual time=0.066..0.478 rows=14 loops=1) Join Filter: (shoe_models.model_category_id = model_categories.id) Rows Removed by Join Filter: 79 Buffers: shared hit=155 -> Nested Loop Left Join (cost=2.23..101141.72 rows=51831 width=72) (actual time=0.055..0.413 rows=14 loops=1) Buffers: shared hit=154 -> Nested Loop (cost=2.09..92396.06 rows=51831 width=52) (actual time=0.051..0.348 rows=14 loops=1) Buffers: shared hit=126 -> Nested Loop Left Join (cost=1.80..65264.56 rows=51831 width=48) (actual time=0.033..0.209 rows=14 loops=1) Buffers: shared hit=84 -> Merge Join (cost=1.38..21836.97 rows=51831 width=26) (actual time=0.022..0.109 rows=14 loops=1) Merge Cond: (business_orders.id = customs.business_order_id) Buffers: shared hit=28 -> Index Scan using business_orders_pkey on business_orders (cost=0.29..3688.80 rows=51911 width=22) (actual time=0.012..0.036 rows=14 loops=1) Buffers: shared hit=14 -> Index Scan using index_customs_on_business_order_id on customs (cost=0.41..17371.39 rows=51831 width=8) (actual time=0.005..0.029 rows=14 loops=1) Buffers: shared hit=14 -> Index Scan using users_pkey on users designers (cost=0.41..0.83 rows=1 width=26) (actual time=0.006..0.006 rows=1 loops=14) Index Cond: (id = business_orders.user_id) Buffers: shared hit=56 -> Index Scan using index_shoes_on_product_id_and_product_type on shoes (cost=0.29..0.51 rows=1 width=12) (actual time=0.007..0.008 rows=1 loops=14) Index Cond: ((product_id = customs.id) AND ((product_type)::text = 'Custom'::text)) Buffers: shared hit=42 -> Index Scan using shoe_models_pkey on shoe_models (cost=0.14..0.16 rows=1 width=24) (actual time=0.003..0.003 rows=1 loops=14) Index Cond: (id = shoes.shoe_model_id) Buffers: shared hit=28 -> Materialize (cost=0.00..1.06 rows=4 width=520) (actual time=0.001..0.002 rows=7 loops=14) Buffers: shared hit=1 -> Seq Scan on model_categories (cost=0.00..1.04 rows=4 width=520) (actual time=0.004..0.005 rows=7 loops=1) Buffers: shared hit=1 -> Index Scan using index_user_details_on_user_id on user_details designer_details (cost=0.29..0.49 rows=1 width=17) (actual time=0.006..0.006 rows=1 loops=14) Index Cond: (designers.id = user_id) Buffers: shared hit=42 -> Index Scan using index_sizes_on_shoe_id on sizes (cost=0.42..2.00 rows=56 width=8) (actual time=0.006..0.030 rows=43 loops=14) Index Cond: (shoe_id = shoes.id) Buffers: shared hit=72 -> Index Scan using index_order_rows_on_article_id on order_rows (cost=0.29..0.31 rows=1 width=12) (actual time=0.003..0.003 rows=0 loops=596) Index Cond: (article_id = sizes.id) Filter: ((article_type)::text = 'Size'::text) Rows Removed by Filter: 2 Buffers: shared hit=1278 -> Index Scan using orders_pkey on orders (cost=0.29..0.67 rows=1 width=18) (actual time=0.000..0.000 rows=0 loops=613) Index Cond: (id = order_rows.order_id) Buffers: shared hit=105 Planning time: 5.013 ms Execution time: 12.728 ms A query of the following type, instead, is executed in 9090.141ms SELECT * FROM a_sql ORDER BY custom_id LIMIT 10 The related EXPLAIN output: Limit (cost=328570.62..328570.64 rows=10 width=324) (actual time=8987.928..8987.929 rows=10 loops=1) Buffers: shared hit=10412 read=12400, temp read=18319 written=18063 -> Sort (cost=328570.62..328700.40 rows=51911 width=324) (actual time=8987.926..8987.926 rows=10 loops=1) Sort Key: x.business_order_id Sort Method: top-N heapsort Memory: 27kB Buffers: shared hit=10412 read=12400, temp read=18319 written=18063 -> Subquery Scan on x (cost=306105.20..327448.84 rows=51911 width=324) (actual time=3074.397..8978.470 rows=8004 loops=1) Buffers: shared hit=10412 read=12400, temp read=18319 written=18063 -> GroupAggregate (cost=306105.20..326929.73 rows=51911 width=610) (actual time=3074.395..8975.492 rows=8004 loops=1) Group Key: business_orders.id Buffers: shared hit=10412 read=12400, temp read=18319 written=18063 -> Sort (cost=306105.20..306782.04 rows=270739 width=610) (actual time=3073.679..3411.919 rows=467218 loops=1) Sort Key: business_orders.id Sort Method: external merge Disk: 56936kB Buffers: shared hit=10412 read=12400, temp read=18319 written=18063 -> Hash Right Join (cost=98065.48..133611.68 rows=270739 width=610) (actual time=1559.328..2325.275 rows=467218 loops=1) Hash Cond: (order_rows.article_id = sizes.id) Filter: (((orders.state)::text = ANY ('{funded,confirmed,paid,delivered,production,produced,ready_to_ship,shipped}'::text[])) OR (orders.id IS NULL)) Rows Removed by Filter: 3712 Buffers: shared hit=10412 read=12400, temp read=9442 written=9186 -> Hash Left Join (cost=813.00..1497.05 rows=7367 width=26) (actual time=9.566..22.691 rows=7367 loops=1) Hash Cond: (order_rows.order_id = orders.id) Buffers: shared hit=888 -> Seq Scan on order_rows (cost=0.00..509.08 rows=7367 width=12) (actual time=0.029..5.732 rows=7367 loops=1) Filter: ((article_type)::text = 'Size'::text) Rows Removed by Filter: 11199 Buffers: shared hit=277 -> Hash (cost=700.78..700.78 rows=8978 width=18) (actual time=9.507..9.507 rows=8993 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 470kB Buffers: shared hit=611 -> Seq Scan on orders (cost=0.00..700.78 rows=8978 width=18) (actual time=0.009..7.142 rows=8993 loops=1) Buffers: shared hit=611 -> Hash (cost=57087.20..57087.20 rows=448022 width=605) (actual time=1547.263..1547.263 rows=469413 loops=1) Buckets: 1024 Batches: 128 Memory Usage: 567kB Buffers: shared hit=9524 read=12400, temp read=1037 written=8932 -> Hash Left Join (cost=30955.54..57087.20 rows=448022 width=605) (actual time=496.442..1160.554 rows=469413 loops=1) Hash Cond: (shoes.shoe_model_id = shoe_models.id) Buffers: shared hit=9524 read=12400, temp read=1037 written=1035 -> Hash Join (cost=30938.67..52547.10 rows=448022 width=69) (actual time=496.300..964.720 rows=469413 loops=1) Hash Cond: (sizes.shoe_id = shoes.id) Buffers: shared hit=9509 read=12400, temp read=1037 written=1035 -> Seq Scan on sizes (cost=0.00..8642.10 rows=441710 width=8) (actual time=0.009..119.758 rows=441934 loops=1) Buffers: shared hit=797 read=3428 -> Hash (cost=29664.25..29664.25 rows=52594 width=65) (actual time=496.056..496.056 rows=54329 loops=1) Buckets: 4096 Batches: 2 Memory Usage: 2679kB Buffers: shared hit=8712 read=8972, temp written=294 -> Hash Left Join (cost=15725.17..29664.25 rows=52594 width=65) (actual time=162.077..460.095 rows=54329 loops=1) Hash Cond: (designers.id = designer_details.user_id) Buffers: shared hit=8712 read=8972 -> Hash Join (cost=11607.65..22688.39 rows=51831 width=52) (actual time=124.442..362.315 rows=51846 loops=1) Hash Cond: (customs.id = shoes.product_id) Buffers: shared hit=6055 read=8972 -> Hash Left Join (cost=7908.32..17952.45 rows=51831 width=48) (actual time=83.756..251.381 rows=51846 loops=1) Hash Cond: (business_orders.user_id = designers.id) Buffers: shared hit=3652 read=8972 -> Hash Join (cost=1843.00..10720.93 rows=51831 width=26) (actual time=27.942..139.640 rows=51846 loops=1) Hash Cond: (customs.business_order_id = business_orders.id) Buffers: shared hit=3079 read=4919 -> Seq Scan on customs (cost=0.00..7841.31 rows=51831 width=8) (actual time=0.009..41.084 rows=51846 loops=1) Buffers: shared hit=2404 read=4919 -> Hash (cost=1194.11..1194.11 rows=51911 width=22) (actual time=27.888..27.888 rows=51849 loops=1) Buckets: 8192 Batches: 1 Memory Usage: 2513kB Buffers: shared hit=675 -> Seq Scan on business_orders (cost=0.00..1194.11 rows=51911 width=22) (actual time=0.007..15.422 rows=51849 loops=1) Buffers: shared hit=675 -> Hash (cost=5265.70..5265.70 rows=63970 width=26) (actual time=55.788..55.788 rows=63972 loops=1) Buckets: 8192 Batches: 1 Memory Usage: 3679kB Buffers: shared hit=573 read=4053 -> Seq Scan on users designers (cost=0.00..5265.70 rows=63970 width=26) (actual time=0.003..35.227 rows=63972 loops=1) Buffers: shared hit=573 read=4053 -> Hash (cost=3051.16..3051.16 rows=51853 width=12) (actual time=40.654..40.654 rows=51846 loops=1) Buckets: 8192 Batches: 1 Memory Usage: 2154kB Buffers: shared hit=2403 -> Seq Scan on shoes (cost=0.00..3051.16 rows=51853 width=12) (actual time=0.009..28.311 rows=51846 loops=1) Filter: ((product_type)::text = 'Custom'::text) Buffers: shared hit=2403 -> Hash (cost=3306.12..3306.12 rows=64912 width=17) (actual time=37.610..37.610 rows=64670 loops=1) Buckets: 8192 Batches: 1 Memory Usage: 2748kB Buffers: shared hit=2657 -> Seq Scan on user_details designer_details (cost=0.00..3306.12 rows=64912 width=17) (actual time=0.007..19.790 rows=64670 loops=1) Buffers: shared hit=2657 -> Hash (cost=16.19..16.19 rows=54 width=540) (actual time=0.121..0.121 rows=54 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 4kB Buffers: shared hit=15 -> Hash Left Join (cost=1.09..16.19 rows=54 width=540) (actual time=0.034..0.101 rows=54 loops=1) Hash Cond: (shoe_models.model_category_id = model_categories.id) Buffers: shared hit=15 -> Seq Scan on shoe_models (cost=0.00..14.54 rows=54 width=24) (actual time=0.006..0.028 rows=54 loops=1) Buffers: shared hit=14 -> Hash (cost=1.04..1.04 rows=4 width=520) (actual time=0.016..0.016 rows=7 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 1kB Buffers: shared hit=1 -> Seq Scan on model_categories (cost=0.00..1.04 rows=4 width=520) (actual time=0.006..0.012 rows=7 loops=1) Buffers: shared hit=1 Planning time: 4.263 ms Execution time: 9090.141 ms Table definitions are the following. No integrity constraints are defined on the database (using ORM) CREATE TABLE business_orders ( id integer NOT NULL, user_id integer, published_at timestamp without time zone, CONSTRAINT business_orders_pkey PRIMARY KEY (id) ); CREATE INDEX index_business_orders_on_user_id ON business_orders USING btree (user_id); CREATE TABLE users ( id serial NOT NULL,, email character varying(255) NOT NULL DEFAULT ''::character varying, CONSTRAINT users_pkey PRIMARY KEY (id) ); CREATE UNIQUE INDEX index_users_on_email ON users USING btree (email COLLATE pg_catalog."default"); CREATE TABLE user_details ( id serial NOT NULL, user_id integer, first_name character varying(255), last_name character varying(255), CONSTRAINT user_details_pkey PRIMARY KEY (id) ); CREATE INDEX index_user_details_on_user_id ON user_details USING btree (user_id); CREATE TABLE customs ( id serial NOT NULL, shoes_assortment_id integer, business_order_id integer, CONSTRAINT customs_pkey PRIMARY KEY (id) ); CREATE INDEX index_customs_on_business_order_id ON customs USING btree (business_order_id); CREATE TABLE shoes ( id serial NOT NULL, product_id integer, product_type character varying(255), CONSTRAINT shoes_pkey PRIMARY KEY (id) ); CREATE INDEX index_shoes_on_product_id_and_product_type ON shoes USING btree (product_id, product_type COLLATE pg_catalog."default"); CREATE INDEX index_shoes_on_shoe_model_id ON shoes USING btree (shoe_model_id); CREATE TABLE shoe_models ( id serial NOT NULL, name character varying(255) NOT NULL, title character varying(255), model_category_id integer, CONSTRAINT shoe_models_pkey PRIMARY KEY (id) ); CREATE INDEX index_shoe_models_on_model_category_id ON shoe_models USING btree (model_category_id); CREATE UNIQUE INDEX index_shoe_models_on_name ON shoe_models USING btree (name COLLATE pg_catalog."default"); CREATE TABLE model_categories ( id serial NOT NULL, name character varying(255) NOT NULL, sort_order integer, created_at timestamp without time zone NOT NULL, updated_at timestamp without time zone NOT NULL, access_level integer, CONSTRAINT model_categories_pkey PRIMARY KEY (id) ); CREATE UNIQUE INDEX index_model_categories_on_name ON model_categories USING btree (name COLLATE pg_catalog."default"); CREATE TABLE sizes ( id serial NOT NULL, shoe_id integer, CONSTRAINT sizes_pkey PRIMARY KEY (id) ); CREATE INDEX index_sizes_on_shoe_id ON sizes USING btree (shoe_id); CREATE TABLE order_rows ( id serial NOT NULL, order_id integer, quantity integer, article_id integer, article_type character varying(255), article_name character varying(255), unit_taxed_cents integer, CONSTRAINT order_rows_pkey PRIMARY KEY (id) ); CREATE INDEX index_order_rows_on_article_id ON order_rows USING btree (article_id); CREATE INDEX index_order_rows_on_article_type ON order_rows USING btree (article_type COLLATE pg_catalog."default"); CREATE INDEX index_order_rows_on_order_id ON order_rows USING btree (order_id); CREATE INDEX index_order_rows_on_quantity ON order_rows USING btree (quantity); CREATE INDEX index_order_rows_on_unit_taxed_cents ON order_rows USING btree (unit_taxed_cents); CREATE TABLE orders ( id serial NOT NULL, user_id integer, state character varying(255), bulk boolean DEFAULT false, CONSTRAINT orders_pkey PRIMARY KEY (id) ); CREATE INDEX index_orders_on_user_id ON orders USING btree (user_id); Because the a_sql is a view, I can't insert the ORDER BY clause inside the view. I will need to call it as a black box. The use cases for this query are: - With a limit of 10, ordered by custom_id - With a limit of 10, ordered by total - To filter all rows that have business_order.user_id = orders.id and business_orders.id = x (usually not more than 100 rows as result) The graphical explain of pg_admin, even if I don't understand much, seems to be telling me that if I run the query with no ordering, then the query is using indexes, (and doing "nested loop joins"), while if I do it with the ordering, then it doesn't (it uses "hash joins"). Are there any ways to increase performance?
Marco Marcianesi (57 rep)
Jan 12, 2016, 11:58 AM • Last activity: Jul 19, 2025, 11:17 AM
0 votes
2 answers
217 views
perform atomic operations on elements inside a container
In my DB Tshirts, Jeans and Shoes can be placed inside a Box and they can have a color that may be null. [![enter image description here][1]][1] Elements of the box must have the same color. To ensure this I use a trigger: /* If I try to INSERT (OR UPDATE) an item in a box containing elements of dif...
In my DB Tshirts, Jeans and Shoes can be placed inside a Box and they can have a color that may be null. enter image description here Elements of the box must have the same color. To ensure this I use a trigger: /* If I try to INSERT (OR UPDATE) an item in a box containing elements of different color, it raises an exception. */ BEGIN IF ( SELECT color FROM tshirt WHERE tshirt.box_id = NEW.box_id UNION SELECT color FROM jeans WHERE jeans.box_id = NEW.box_id UNION SELECT color FROM shoes WHERE shoes.box_id = NEW.box_id ) NEW.color THEN RAISE EXCEPTION 'Error..'; RETURN NULL; END IF; END; Suppose I want to change the color from 'Blue' to 'Red' for items inside Box 1. This code will fail because of the trigger: UPDATE Tshirts SET color = 'Red' WHERE box_id = 1; UPDATE Jeans SET color = 'Red' WHERE box_id = 1; UPDATE Shoes SET color = 'Red' WHERE box_id = 1; The only way I can change the color of items in the box is: 1. Remove all items from the box, setting their box_id to null. 2. Change the color of all the elements. 3. Put all the elements inside the box, setting back their box_id. How can I change the color of items in a box more simply and atomically through modeling or triggers?
Yata (33 rep)
Oct 16, 2015, 12:56 PM • Last activity: Jul 10, 2025, 01:28 PM
9 votes
4 answers
62485 views
Select rows for which at least one row per set meets a condition
I have following table: create table test ( company_id integer not null, client_id integer not null, client_status text, unique (company_id, client_id) ); insert into test values (1, 1, 'y'), -- company1 (2, 2, null), -- company2 (3, 3, 'n'), -- company3 (4, 4, 'y'), -- company4 (4, 5, 'n'), (5, 6,...
I have following table: create table test ( company_id integer not null, client_id integer not null, client_status text, unique (company_id, client_id) ); insert into test values (1, 1, 'y'), -- company1 (2, 2, null), -- company2 (3, 3, 'n'), -- company3 (4, 4, 'y'), -- company4 (4, 5, 'n'), (5, 6, null), -- company5 (5, 7, 'n') ; Basically, there are 5 different companies, each one has one or more clients and each client has status: 'y' or 'n' (might be null as well). What I have to do is to select all pairs (company_id, client_id) for all companies for which there is at least one client whose status is not 'n' ('y' or null). So for the example data above, the output should be: company_id;client_id 1;1 2;2 4;4 4;5 5;6 5;7 I tried something with window functions but I can't figure out how to compare the number of ALL clients with the number of clients with STATUS = 'n'. select company_id, count(*) over (partition by company_id) as all_clients_count from test -- where all_clients_count != ... ? I figured out how to do this, but I am not sure if it's the right way: select sub.company_id, unnest(sub.client_ids) from ( select company_id, array_agg(client_id) as client_ids from test group by company_id having count(*) != count( (case when client_status = 'n' then 1 else null end) ) ) sub
user606521 (1415 rep)
May 31, 2016, 09:36 AM • Last activity: Jul 7, 2025, 10:56 PM
1 votes
1 answers
238 views
How to get aggregate data from a dynamic number of related rows in adjacent table
EDIT: Unknowing of the rule that prohibits cross-posting, I also asked this on Stackoverflow and chose an answer over there. Since there's another (fully working) answer in this thread though, I won't delete it. But for the solution I chose, check out this thread - https://stackoverflow.com/question...
EDIT: Unknowing of the rule that prohibits cross-posting, I also asked this on Stackoverflow and chose an answer over there. Since there's another (fully working) answer in this thread though, I won't delete it. But for the solution I chose, check out this thread - https://stackoverflow.com/questions/52024244/how-to-get-aggregate-data-from-a-dynamic-number-of-related-rows-in-adjacent-tabl I have a table of matches played, roughly looking like this: player_id | match_id | result | opponent_rank ---------------------------------------------- 82 | 2847 | w | 42 82 | 3733 | w | 185 82 | 4348 | l | 10 82 | 5237 | w | 732 82 | 5363 | w | 83 82 | 7274 | w | 6 51 | 2347 | w | 39 51 | 3746 | w | 394 51 | 5037 | l | 90 ... | ... | ... | ... To get all the winning streaks (not just top streak by any player), I use this query: SELECT player.tag, s.streak, match.date, s.player_id, s.match_id FROM ( SELECT streaks.streak, streaks.player_id, streaks.match_id FROM ( SELECT w1.player_id, max(w1.match_id) AS match_id, count(*) AS streak FROM ( SELECT w2.player_id, w2.match_id, w2.win, w2.date, sum(w2.grp) OVER w AS grp FROM ( SELECT m.player_id, m.match_id, m.win, m.date, (m.win = false AND LAG(m.win, 1, true) OVER w = true)::integer AS grp FROM matches_m AS m WHERE matches_m.opponent_position<'100' WINDOW w AS (PARTITION BY m.player_id ORDER BY m.date, m.match_id) ) AS w2 WINDOW w AS (PARTITION BY w2.player_id ORDER BY w2.date, w2.match_id) ) AS w1 WHERE w1.win = true GROUP BY w1.player_id, w1.grp ORDER BY w1.player_id DESC, count(*) DESC ) AS streaks ORDER BY streaks.streak DESC LIMIT 100 ) AS s LEFT JOIN player ON player.id = s.player_id LEFT JOIN match ON match.id = s.match_id And the result looks like this (note that this is not a fixed table/view, as the query above can be extended by certain parameters such as nationality, date range, ranking of players, etc): player_id | match_id | streak ------------------------------- 82 | 3733 | 2 82 | 7274 | 3 51 | 3746 | 2 ... | ... | ... What I want to add now is a bunch of aggregate data to provide details about the winning streaks. For starters, **I'd like to know the average rank of the opponents during each those streaks**. Other data are the duration of the streak in time, first and last date, opponent name who ended the streak or if it's still ongoing, and so on. I've tried various things - CTE, some elaborate joins, unions, or adding them in as lag functions in the existing code. But I'm completely stuck how to solve this. As is obvious from the code, my SQL skills are very basic, so please excuse any mistakes or inefficient statements. Also new to DBA so let me know if my question can be phrased better. For complete context, I'm using Postgres 9.4 on Debian, the matches_m table is a materialized view with 550k lines (query takes 2.5s right now). The data comes from http://aligulac.com/about/db/ , I just mirror it to create the aforementioned view.
hcac (11 rep)
Aug 25, 2018, 10:18 PM • Last activity: Jun 6, 2025, 07:12 AM
2 votes
1 answers
3540 views
Automatic recovery of the failed postgresql master node is not working with pgpool II
I am new to Postgresql and Pgpool II setup. I have configured the Postgresql HA/Load balancing using Pgpool II and Repmgr. I have followed the [link][1] to do the setup. The setup consist of 3 nodes and verison of Application and OS is as mentioned below: **OS version** => CentOS 6.8 (On all the 3 n...
I am new to Postgresql and Pgpool II setup. I have configured the Postgresql HA/Load balancing using Pgpool II and Repmgr. I have followed the link to do the setup. The setup consist of 3 nodes and verison of Application and OS is as mentioned below: **OS version** => CentOS 6.8 (On all the 3 nodes) **Pgpool node** => 192.168.0.4 **Postgresql Nodes**: **node1** (Master in read-write mode) => 192.168.0.6 **node2** (Standby node in read only mode) => 192.168.0.7 **Pgpool II version** => pgpool-II version 3.5.0 (ekieboshi). **Postgresql Version** => PostgreSQL 9.4.8 **Repmgr Version** => repmgr 3.1.3 (PostgreSQL 9.4.8) **I have configured the Pgpool in Master-Slave mode using Streaming replication.** The setup is as shown in the below image: enter image description here When I bring down the Master node(192.168.0.6), the automatic failover happens successfully and the Slave node(192.168.0.7) becomes the new Master node. After failover, I have to recover the failed node(192.168.0.6) manually and sync it with the new Master node. Then register the node(192.168.0.6) as a new Standby node. I want to automate the Recovery process of the failed node and add it to the cluster back. The **pgpool.conf** file on the Pgpool node(192.168.0.4) contains parameter **recovery_1st_stage_command**. I have set the parameter **recovery_1st_stage_command = 'basebackup.sh'**. I have placed the script 'basebackup.sh' file on **both** the Postgresql nodes(192.168.0.6, 192.168.0.7) under the data directory **'/var/lib/pgsql/9.4/data'**. Also I have placed the script **'pgpool_remote_start'** on both the Postgresql nodes(192.168.0.6, 192.168.0.7) under the directory '/var/lib/pgsql/9.4/data'. Also created the pgpool extension **"pgpool_recovery and pgpool_adm"** on both the database node. After the failover is completed, the 'basebackup.sh' is not executed automatically. I have to run the command **'pcp_recovey_node'** manually on the **Pgpool node(192.168.0.4)** to recover the failed node(192.168.0.6). How can I automate the execution of **pcp_recovery_node** command on the Pgpool node with out any manual intervention. Scripts used by me as follows: basebackup.sh script ------------- #!/bin/bash # first stage recovery # $1 datadir # $2 desthost # $3 destdir #as I'm using repmgr it's not necessary for me to know datadir(master) $1 RECOVERY_NODE=$2 CLUSTER_PATH=$3 #repmgr needs to know the master's ip MASTERNODE=/sbin/ifconfig eth0 | grep inet | awk '{print $2}' | sed 's/addr://' cmd1=ssh postgres@$RECOVERY_NODE "repmgr -D $CLUSTER_PATH --force standby clone $MASTERNODE" echo $cmd1 ## pgpool_remote_start script #! /bin/sh if [ $# -ne 2 ] then echo "pgpool_remote_start remote_host remote_datadir" exit 1 fi DEST=$1 DESTDIR=$2 PGCTL=/usr/pgsql-9.4/bin/pg_ctl ssh -T $DEST $PGCTL -w -D $DESTDIR start 2>/dev/null 1>/dev/null /dev/null 1>/dev/null &1 | tee -a /tmp/pgpool_failover.log Help me with the procedure to automate the recovery of the failed node. Also let me know, for failover is it compulsory to use repmgr or we can do it without repmgr. Also specify any other method for failover without using Repmgr, its advantages and disadvantages over Repmgr.
yravi104 (21 rep)
Sep 8, 2016, 04:51 PM • Last activity: Jun 4, 2025, 03:08 AM
0 votes
1 answers
372 views
Postgres Installed in Linux (RHEL) and trying to access / connect using pgadmin (Windows)
I've installed and configured postgresql in Linux successfully, 1. postgres user created 2. new db created successfully 3. I'm trying to connect using pgadmin III (Windows) 4. Following details are adding in pgadmin window Host: DEMO Host: XXXX Port: 5432 Maintenance DB: postgres Username: postgres...
I've installed and configured postgresql in Linux successfully, 1. postgres user created 2. new db created successfully 3. I'm trying to connect using pgadmin III (Windows) 4. Following details are adding in pgadmin window Host: DEMO Host: XXXX Port: 5432 Maintenance DB: postgres Username: postgres password: G0!mf17. 5. Clicked on OK It's tried to connect and displayed message connecting to database.... Failed. I'm not sure why it's failed and what needs to configure in linux system. I'm new to linux, If anyone able to guide me solve this issue it will be greatful. Please let me know if you need more details.
Madhu Manohar (9 rep)
Oct 28, 2015, 12:24 PM • Last activity: May 21, 2025, 02:00 AM
0 votes
1 answers
254 views
Having two data folder in my postgresql setup- Postgresql
I am using centos 6 and installed Postgresql_9.4 using the following commands yum install postgresql postgresql-contrib postgresql-client pgadmin3 yum install postgresql-server After that i verified my config file or "Show data_directory" command to verify that data folder path. Its showing `/var/li...
I am using centos 6 and installed Postgresql_9.4 using the following commands yum install postgresql postgresql-contrib postgresql-client pgadmin3 yum install postgresql-server After that i verified my config file or "Show data_directory" command to verify that data folder path. Its showing /var/lib/pgsql/9.4/data but am also having the another data folder in this location /var/lib/pgsql/data. data path 1 --> /var/lib/pgsql/9.4/data data path 2 --> /var/lib/pgsql/data My question is which is my original data folder ??. And Also my exact data folder is configured in config file means what is the use of another data folder ?
Dharani Dharan (101 rep)
May 5, 2016, 11:23 AM • Last activity: May 18, 2025, 10:02 PM
1 votes
2 answers
828 views
Why isn't pg_restore --create working?
I backed up a database called `app_data`, on a 9.3 server, with this basic command: pg_dump -n public -F custom app_data > app_data.pg.dump Then I try to restore it on another server (running 9.4) like this: pg_restore -C -d postgres app_data.pg.dump But it puts all the tables in the `postgres` data...
I backed up a database called app_data, on a 9.3 server, with this basic command: pg_dump -n public -F custom app_data > app_data.pg.dump Then I try to restore it on another server (running 9.4) like this: pg_restore -C -d postgres app_data.pg.dump But it puts all the tables in the postgres database. The man page says it will create and use a new database, app_data. > -C --create > Create the database before restoring into it. [...] > > When this option is used, the database named with -d is used only to issue the initial DROP DATABASE and CREATE DATABASE commands. All data is restored into the database name that appears in the archive. That's not what it's doing. The name in the archive is app_data: bash-4.2$ pg_restore -l app_data.pg.dump ; ; Archive created at Tue Dec 15 04:16:52 2015 ; dbname: app_data ... Am I doing something wrong?
Rob N (111 rep)
Dec 15, 2015, 04:47 AM • Last activity: May 8, 2025, 09:02 PM
27 votes
1 answers
33793 views
Index not used with = ANY() but used with IN
Table `t` has two indexes: create table t (a int, b int); create type int_pair as (a int, b int); create index t_row_idx on t (((a,b)::int_pair)); create index t_a_b_idx on t (a,b); insert into t (a,b) select i, i from generate_series(1, 100000) g(i) ; No index is used with the `ANY` operator: expla...
Table t has two indexes: create table t (a int, b int); create type int_pair as (a int, b int); create index t_row_idx on t (((a,b)::int_pair)); create index t_a_b_idx on t (a,b); insert into t (a,b) select i, i from generate_series(1, 100000) g(i) ; No index is used with the ANY operator: explain analyze select * from t where (a,b) = any(array[(1,1),(1,2)]) ; QUERY PLAN --------------------------------------------------------------------------------------------------- Seq Scan on t (cost=0.00..1693.00 rows=1000 width=8) (actual time=0.042..126.789 rows=1 loops=1) Filter: (ROW(a, b) = ANY (ARRAY[ROW(1, 1), ROW(1, 2)])) Rows Removed by Filter: 99999 Planning time: 0.122 ms Execution time: 126.836 ms But one of them is used with the IN operator: explain analyze select * from t where (a,b) in ((1,1),(1,2)) ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Index Only Scan using t_a_b_idx on t (cost=0.29..8.32 rows=1 width=8) (actual time=0.028..0.029 rows=1 loops=1) Index Cond: (a = 1) Filter: ((b = 1) OR (b = 2)) Heap Fetches: 1 Planning time: 0.161 ms Execution time: 0.066 ms It uses the record index if the record is cast to the correct type: explain analyze select * from t where (a,b)::int_pair = any(array[row(1,1),row(1,2)]) ; QUERY PLAN -------------------------------------------------------------------------------------------------------------- Index Scan using t_row_idx on t (cost=0.42..12.87 rows=2 width=8) (actual time=0.106..0.126 rows=1 loops=1) Index Cond: (ROW(a, b)::int_pair = ANY (ARRAY[ROW(1, 1), ROW(1, 2)])) Planning time: 0.208 ms Execution time: 0.203 ms Why doesn't the planner use the non record index for the ANY operator as it uses it for the IN operator?
Clodoaldo (1145 rep)
Jan 6, 2016, 05:48 PM • Last activity: Apr 26, 2025, 03:43 PM
3 votes
2 answers
6202 views
How do you use psql client to connect to a postgresql ipv6 host?
# postgresql.conf listen_addresses='::' and # pg_hba.conf hostssl webdb webserver ::0/0 cert The postgresql server is running on docker with pingable ipv6 address of `"GlobalIPv6Address": "fe80::242:ac12:2"` - so no firewalls obstructing. I am using the following command to connect psql --command="s...
# postgresql.conf listen_addresses='::' and # pg_hba.conf hostssl webdb webserver ::0/0 cert The postgresql server is running on docker with pingable ipv6 address of "GlobalIPv6Address": "fe80::242:ac12:2" - so no firewalls obstructing. I am using the following command to connect psql --command="select * from test;" -d webdb -h fe80::242:ac12:2 -p 5432 -U postgres psql: could not connect to server: Invalid argument Is the server running on host "fe80::242:ac12:2" and accepting TCP/IP connections on port 5432? Why is the host not recognized? Is it not possible to use ipv6 with psql? Also, I did not find an ssl parameter option in psql.
Srikanth (161 rep)
Aug 27, 2016, 11:25 AM • Last activity: Apr 25, 2025, 11:51 AM
0 votes
1 answers
338 views
Stored procedure for adding cascade
I have a DB with 180 tables, and I was doing a logical deletion, now I need to do some testing with physical deletion. Is there a way to alter all my tables and to drop their fk's and add then add their corresponding fk's with its `on delete cascade`?
I have a DB with 180 tables, and I was doing a logical deletion, now I need to do some testing with physical deletion. Is there a way to alter all my tables and to drop their fk's and add then add their corresponding fk's with its on delete cascade?
user73639 (9 rep)
Aug 26, 2015, 03:29 PM • Last activity: Apr 20, 2025, 07:03 AM
0 votes
1 answers
632 views
How to make use of WITH in PostgreSQL to reduce query cost
I am trying to get twitter data of narendramodi using below command. SELECT b.t_id,a.profile_image,b.tweet_text,e.media_media_url,b.retweet_count,b.favorite_count as like_count,count(reply_to_status_id) as reply_count,f.imp_count,f.eng_count,f.eng_rate FROM twitter_users a LEFT JOIN twitter_tweets b...
I am trying to get twitter data of narendramodi using below command. SELECT b.t_id,a.profile_image,b.tweet_text,e.media_media_url,b.retweet_count,b.favorite_count as like_count,count(reply_to_status_id) as reply_count,f.imp_count,f.eng_count,f.eng_rate FROM twitter_users a LEFT JOIN twitter_tweets b on a.user_id=b.user_id LEFT JOIN replies c on b.t_id = c.t_id LEFT JOIN media e on b.t_id = e.t_id LEFT JOIN twitter_tweet_metric_aggregates f ON f.metric_timestamp=(select max(metric_timestamp) FROM twitter_tweet_metric_aggregates g WHERE g.t_id=f.t_id and g.t_id=b.t_id) WHERE a.twitter_screen_name= 'narendramodi' GROUP BY b.t_id,a.profile_image ,b.tweet_text,b.retweet_count,b.favorite_count, e.media_media_url,f.imp_count,f.eng_count,f.eng_rate); Query was working correctly But, in the above query I have used sub-select to get recent data of imp_counts of each tweet based on timestamp. Because of this sub-select Query_cost was huge and so it was taking more than 15min for query execution. I want to reduce that and should able to execute within 10seconds. For that reason I was trying to use WITH (CTE) expression WITH metric_counts AS ( SELECT max(metric_timestamp),f.t_id,f.imp_count,f.eng_count,f.eng_rate FROM twitter_tweet_metric_aggregates f LEFT JOIN tweets b on f.t_id=b.t_id ) SELECT b.t_id,a.profile_image,b.tweet_text,e.media_media_url,b.retweet_count ,b.favorite_count as like_count, count(reply_to_status_id) as reply_count,metric_counts.imp_count ,metric_counts.eng_count,metric_counts.eng_rate FROM twitter_users as a LEFT JOIN tweets as b on a.twitter_user_id=b.twitter_user_id LEFT JOIN replies c on b.t_id = c.t_id LEFT JOIN media e on b.t_id = e.t_id LEFT JOIN metric_counts on metric_counts.t_id = b.t_id WHERE lower(a.twitter_screen_name)=lower('narendramodi') GROUP BY b.t_id,a.profile_image,b.tweet_text,e.media_media_url, b.retweet_count,b.favorite_count, metric_counts.imp_count,metric_counts.eng_count, metric_counts.eng_rate; The above WITH expression was giving results of imp_counts also for each tweet but not giving latest record/value. Can anyone help me in achieving this. Here is the Query cost of WITH query HashAggregate (cost=1734856.13..1735618.48 rows=76235 width=673) So can anyone help me to reduce cost to even lesser but giving results within 15 sec. Query_cost HashAggregate (cost=4923196.15..4923958.50 rows=76235 width=673) (actual time=51871.524..51872.333 rows=1513 loops=1) Group Key: imp_counts.tweet_status_id, a.profile_image, b.tweet_text, b.tweet_created_at, d.sentiment, d.emotion, b.retweet_count, b.favorit e_count, e.media_media_url, imp_counts.tweet_impression_count, imp_counts.tweet_engagement_count, imp_counts.tweet_engagement_rate CTE imp_counts -> Seq Scan on twitter_tweet_metric_aggregates f (cost=0.00..3356389.12 rows=17676 width=47) (actual time=37516.805..41642.354 rows=6069 9 loops=1) Filter: (metric_timestamp = (SubPlan 2)) Rows Removed by Filter: 3475596 SubPlan 2 -> Result (cost=0.90..0.91 rows=1 width=0) (actual time=0.011..0.011 rows=1 loops=3536295) InitPlan 1 (returns $1) -> Limit (cost=0.56..0.90 rows=1 width=8) (actual time=0.010..0.010 rows=1 loops=3536295) -> Index Only Scan Backward using pk_twitter_tweet_metric_aggregates on twitter_tweet_metric_aggregates g (cost=0. 56..55.57 rows=158 width=8) (actual time=0.010..0.010 rows=1 loops=3536295) Index Cond: ((tweet_status_id = (f.tweet_status_id)::text) AND (metric_timestamp IS NOT NULL)) Heap Fetches: 3536295 -> Nested Loop Left Join (cost=1202381.45..1564329.39 rows=76235 width=673) (actual time=50478.887..51854.010 rows=10672 loops=1) -> Nested Loop Left Join (cost=1202380.90..1472231.15 rows=76235 width=641) (actual time=50478.871..51781.010 rows=10649 loops=1) -> Hash Right Join (cost=1202380.34..1362479.87 rows=76235 width=626) (actual time=50478.841..51702.556 rows=10649 loops=1) Hash Cond: ((c.tweet_status_id)::text = (b.tweet_status_id)::text) -> Seq Scan on twitter_tweet_replies c (cost=0.00..150216.68 rows=2606068 width=38) (actual time=0.041..692.358 rows=260 6068 loops=1) -> Hash (cost=1201427.40..1201427.40 rows=76235 width=607) (actual time=50477.837..50477.837 rows=1499 loops=1) Buckets: 131072 Batches: 1 Memory Usage: 1497kB -> Merge Left Join (cost=1200957.10..1201427.40 rows=76235 width=607) (actual time=50446.646..50476.746 rows=1499 loops=1) Merge Cond: ((b.tweet_status_id)::text = (imp_counts.tweet_status_id)::text) -> Sort (cost=1199356.58..1199547.17 rows=76235 width=293) (actual time=8608.183..8608.530 rows=1499 loops=1 ) Sort Key: b.tweet_status_id Sort Method: quicksort Memory: 597kB -> Hash Right Join (cost=29591.36..1193174.62 rows=76235 width=293) (actual time=3.714..8604.233 rows= 1499 loops=1) Hash Cond: ((b.twitter_user_id)::text = (a.twitter_user_id)::text) -> Seq Scan on twitter_tweets b (cost=0.00..1095151.30 rows=18045230 width=230) (actual time=0.0 26..5037.377 rows=18044981 loops=1) -> Hash (cost=29214.36..29214.36 rows=30160 width=89) (actual time=0.025..0.025 rows=1 loops=1) Buckets: 32768 Batches: 1 Memory Usage: 257kB -> Index Scan using twitter_screen_name_idx on twitter_users a (cost=0.56..29214.36 rows=3 0160 width=89) (actual time=0.020..0.021 rows=1 loops=1) Index Cond: (lower((twitter_screen_name)::text) = 'narendramodi'::text) -> Sort (cost=1600.52..1644.71 rows=17676 width=314) (actual time=41838.456..41847.253 rows=60655 loops=1) Sort Key: imp_counts.tweet_status_id Sort Method: quicksort Memory: 8540kB -> CTE Scan on imp_counts (cost=0.00..353.52 rows=17676 width=314) (actual time=37516.810..41684.563 r ows=60699 loops=1) -> Index Scan using tweet_sentiment_index on tweet_sentiment d (cost=0.56..1.43 rows=1 width=34) (actual time=0.006..0.007 row s=0 loops=10649) Index Cond: ((b.tweet_status_id)::text = (tweet_status_id)::text) -> Index Only Scan using twitter_tweet_media_pkey on twitter_tweet_media e (cost=0.55..1.20 rows=1 width=70) (actual time=0.006..0.0 06 rows=0 loops=10649) Index Cond: (tweet_status_id = (b.tweet_status_id)::text) Heap Fetches: 1074 Planning time: 2.913 ms Execution time: 51875.165 ms (43 rows)
bunny sunny (113 rep)
Mar 7, 2019, 04:28 AM • Last activity: Apr 11, 2025, 10:06 PM
1 votes
1 answers
1205 views
pgAdmin - "cannot open logfile!" (WINDOWS 10; Postgres 9.4)
Our Windows 10 clients have both PostgreSQL 9.4 and pgAdmin3 (v1.22) installed. I keep receiving the **FATAL: "Cannot open the logfile!"** error by invoking pgAdmin3 in either of the following EXE locations. 1. C:\Program Files(x86)\pgAdmin III\1.22\pgAdmin3.exe 2. C:\Program Files\PostgreSQL\9.4\bi...
Our Windows 10 clients have both PostgreSQL 9.4 and pgAdmin3 (v1.22) installed. I keep receiving the **FATAL: "Cannot open the logfile!"** error by invoking pgAdmin3 in either of the following EXE locations. 1. C:\Program Files(x86)\pgAdmin III\1.22\pgAdmin3.exe 2. C:\Program Files\PostgreSQL\9.4\bin\pgAdmin3.exe The problem is that when pgAdmin3 starts the pgAdmin splash screen will not dismiss and you cannot gain access to configure the postgresql.conf file. I'm assuming the log file its looking for is pgAdmin.log but if I navigate to
:\Users\\Documents\postgresql
I cannot find such a log file. I tied to create an empty pgAdmin.log file whith not luck of gaining proper entry into pgAdmin3.
user9491577 (13 rep)
Oct 5, 2020, 05:25 PM • Last activity: Apr 9, 2025, 12:04 PM
2 votes
0 answers
23 views
Greenplum/PostgreSQL: Querries run individually, but won't run in plpgsql function
We are running Greenplum 6.28.1, which is built on PostgreSQL 9.4.26. We have a master node and 6 server nodes, all running Oracle Linux release 8.10. We have 5 queries that progressively build temp tables that are use to generate a final result set. Those queries, in the order they are run, are: ``...
We are running Greenplum 6.28.1, which is built on PostgreSQL 9.4.26. We have a master node and 6 server nodes, all running Oracle Linux release 8.10. We have 5 queries that progressively build temp tables that are use to generate a final result set. Those queries, in the order they are run, are:
drop table if exists hourly_time_windows;
create temp table hourly_time_windows as (
	WITH
        time_windows as (
            select 
                (generate_series(0, 23, 1)::text || ' Hours'::text)::interval as window_start_interval,
                (generate_series(1, 24, 1)::text || ' Hours'::text)::interval as window_end_interval
            ),
        dates_and_users as (
			select distinct
					activity_date,
					career_account
			from network_logs_processed.hourly_internet_activity_building log
			where activity_date >= '2024-02-01'::date and activity_date  0
			THEN 1
			ELSE 0 
			END) as device_moved_buildings,
		sum(megabytes_transferred) as megabytes_transferred
	from 
		ip.hourly_activity_with_building_lag data
	left outer join 
		(  
            select schedule.*, banner.career_account from utility_features.student_class_schedule_detail schedule
            left outer join banner_oltp.banner_lookup banner 
                on schedule.person_uid = banner.person_uid::numeric
		) class
		on 
			data.activity_date = class.calendar_date
			and data.career_account = class.career_account
			and data.building = class.building 
			and (
					(data.session_start_in_hour between class.begin_time and class.end_time)
					OR
					(data.session_end_in_hour between class.begin_time and class.end_time)
					OR 
					(data.session_end_in_hour > class.end_time and data.session_start_in_hour  0 and
	            campus in ('PWL','CEC') and
	            schedule_type not in ('IND','RES','EX') and
	            substring(course_identification from '[a-zA-Z]+#"[0-9]#"%' for '#')::int <= 4 and   --undergrad courses only
	            sub_academic_period in ('1','F8','FHS','M1','M2','M3','M12','M23','S8','SHS') and
	            registration_status like 'R%'
            ),
        housed_students as (
    		select distinct academic_period, person_uid
		    from utility_features.resident_student_room_details
            ),
        full_student_list as (
    		select academic_period, person_uid::numeric from registered_students
    		UNION
    		select academic_period, person_uid::numeric from housed_students
            )
    select 
    	full_student_list.academic_period, 
    	full_student_list.person_uid, 
    	banner.career_account 
    from 
    	full_student_list
    left outer join banner_oltp.banner_lookup banner
    	on full_student_list.person_uid = banner.person_uid::numeric
) distributed by (career_account);
drop table if exists aggregated_hourly_data;
create temp table aggregated_hourly_data as (
	select
		hourly_time_windows.career_account,
		banner.puid,
		banner.person_uid,
		ac.academic_period,
		hourly_time_windows.activity_date,
		hourly_time_windows.window_start,
		hourly_time_windows.window_end,
		sum(not_in_residential) as not_in_residential,
		sum(in_class) as in_class,
		sum(device_moved_buildings) as device_moved_buildings,
		sum(megabytes_transferred) as megabytes_transferred
	from 
		hourly_time_windows 
	left outer join hourly_activity_with_movement_flag
	on 	
		hourly_time_windows.career_account = hourly_activity_with_movement_flag.career_account
		and hourly_time_windows.window_start = hourly_activity_with_movement_flag.window_start
	left outer join banner_oltp.banner_lookup banner 
	on 
		hourly_time_windows.career_account = banner.career_account
	left outer join utility.academic_calendar ac 
	on 
		hourly_time_windows.window_start between ac.last_term_end_date + INTERVAL '1 day' and ac.end_date
	inner join included_students
		on hourly_time_windows.career_account = included_students.career_account
			and ac.academic_period = included_students.academic_period
	group by 1,2,3,4,5,6,7
) distributed by (career_account);
Here is my problem: If I run each query directly and count the rows in the temp table after it is created, all five queries complete in a minute or less. (That's 60 seconds to run all 5, not 60 seconds for each of them.) I created a plpgsql function to run those five queries. The first four completed in about the same time it takes when I run them directly, and the final row counts for all four tables were exactly the same. But I've let the function run for 30+ minutes and the fifth query still never completes. I also tried creating a plpgsql function to run just the first four queries and then running the fifth query directly. Again, the function completes very quickly, and the temp tables it creates have the same row counts as when I run the queries directly, but the fifth query still does not complete. I know PostgreSQL optimizes things differently when run in a function rather than individually, but I really thought that running just the first four in a function and the fifth directly would give different results. I am kind of at my wit's end here. Has anyone run into anything like this before?
lpscott (33 rep)
Mar 25, 2025, 02:59 PM
5 votes
2 answers
8683 views
How to deal with large offsets in select?
Table `jtest` with 200k rows, each row contains jsonb `{ id: " ", key: }` (` ` is integer 1-200k incremented per each row). There is also btree index on `data->'key'`. create extension if not exists pgcrypto; create table jtest (data jsonb not null default '{}'); insert into jtest (data) select json...
Table jtest with 200k rows, each row contains jsonb { id: "", key: } (` is integer 1-200k incremented per each row). There is also btree index on data->'key'`. create extension if not exists pgcrypto; create table jtest (data jsonb not null default '{}'); insert into jtest (data) select json_build_object('id', gen_random_uuid(), 'key', i)::jsonb FROM generate_series(1,200000) i; create index jtest_key on jtest ((data->'key')); First query (fast): EXPLAIN ANALYZE select j.data from jtest j order by j.data->'key' limit 20; -- "Limit (cost=0.42..1.43 rows=20 width=74) (actual time=0.023..0.044 rows=20 loops=1)" -- " -> Index Scan using jtest_key on jtest j (cost=0.42..10150.42 rows=200000 width=74) (actual time=0.023..0.039 rows=20 loops=1)" -- "Planning time: 0.082 ms" -- "Execution time: 0.066 ms" Second query with large offset (slow): EXPLAIN ANALYZE select j.data from jtest j order by j.data->'key' offset 100000 limit 20; -- "Limit (cost=5075.42..5076.44 rows=20 width=74) (actual time=112.861..112.896 rows=20 loops=1)" -- " -> Index Scan using jtest_key on jtest j (cost=0.42..10150.42 rows=200000 width=74) (actual time=0.027..106.378 rows=100020 loops=1)" -- "Planning time: 0.105 ms" -- "Execution time: 112.931 ms" In PG docs I found: > The rows skipped by an OFFSET clause still have to be computed inside the server; therefore a large OFFSET might be inefficient. But they don't say anything about how to deal with large offsets. How I can improve this query? Also is this behaviour (inefficiency for large offsets) common to all RDMS or just PG (using 9.4)?
user606521 (1415 rep)
Aug 23, 2015, 06:48 PM • Last activity: Mar 4, 2025, 06:09 AM
0 votes
1 answers
4944 views
Invalid input syntax for timestamp error with Postgres Copy command
I am trying to load data in Postgres through `COPY` command in a SQL file ``` COPY ( , , etc etc) FROM stdin; 1 test1 test2 to_timestamp(1592818988000) ``` But I am getting this below error: > psql:/Users/sanupin/load.sql:1045: ERROR: invalid input syntax for > type timestamp: "to_timestamp(16412786...
I am trying to load data in Postgres through COPY command in a SQL file
COPY   (, ,  etc etc) FROM stdin;

1    test1   test2   to_timestamp(1592818988000)
But I am getting this below error: > psql:/Users/sanupin/load.sql:1045: ERROR: invalid input syntax for > type timestamp: "to_timestamp(1641278614000)" CONTEXT: COPY > cstone_storage, line 1, column last_loaded: > "to_timestamp(1641278614000)" Any idea what could be the problem? I know I have to convert the millisecond to second (div by 1000) on the epoch number, but not sure how else to proceed.
user2405589 (123 rep)
Jan 10, 2022, 01:23 AM • Last activity: Feb 27, 2025, 11:08 PM
2 votes
1 answers
2047 views
PG_DUMP on Replica DB server Error
I have a master - slave configuration with 9.4 version, but there is no WAL streaming replication configured. Customer simply copy xlog files on network share and replica applies it. I need to run pg_dump from replica although I face with error. What do I do: 1. SELECT pg_xlog_replay_pause() 2. Run...
I have a master - slave configuration with 9.4 version, but there is no WAL streaming replication configured. Customer simply copy xlog files on network share and replica applies it. I need to run pg_dump from replica although I face with error. What do I do: 1. SELECT pg_xlog_replay_pause() 2. Run pg_dump on replica server 3. SELECT pg_xlog_replay_resume() Immediately I receive after pg_xlog_replay_pause() ERROR: recovery is not in progress And when the command of pg_dump starts I see: pg_dump: [archiver (db)] query failed: ERROR: cannot assign TransactionIds during recovery pg_dump: [archiver (db)] query was: SELECT pg_export_snapshot() Questions: 1. Does it mean I can't do pg_dump from replica? I have a feeling that in case of wal streaming replication it would be possible. Correct me if I'm wrong. 2. I can't find any information for SELECT pg_xlog_replay_pause() / SELECT pg_xlog_replay_resume() works only with wal streaming setup. Can someone tell if this true? Thanks in advance. Customer's setup is: They copy xlog to network share and replica recovery.conf consist of standby_mode = 'on' restore_command = 'if exist A:\\Logs\\From_Master_DB\\%f (copy A:\\Logs\\From_master_DB\\%f %p) else (exit /b 1)' archive_cleanup_command = '"C:\\Program Files\\PostgreSQL\\9.4\\bin\\pg_archivecleanup" A:\\Logs\\From_Master_DB %r && "C:\\Program Files\\PostgreSQL\\9.4\\bin\\pg_archivecleanup" D:\\PG-SQL\\data\\pg_xlog %r' recovery_target_timeline = 'latest'
ntdrv (21 rep)
Jul 26, 2018, 10:11 AM • Last activity: Jan 19, 2025, 08:04 PM
19 votes
1 answers
41642 views
Postgres query to return JSON object keys as array
Is it possible to return a JSON object keys as an array of values in PostgreSQL? In JavaScript, this would simply be `Object.keys(obj)`, which returns an array of strings. For example, if I have a table like this: tbl_items --------- id bigserial NOT NULL obj json NOT NULL And if there's a row like...
Is it possible to return a JSON object keys as an array of values in PostgreSQL? In JavaScript, this would simply be Object.keys(obj), which returns an array of strings. For example, if I have a table like this: tbl_items --------- id bigserial NOT NULL obj json NOT NULL And if there's a row like this: id obj ----- ------------------------- 123 '{"foo":1,"bar":2}' How can I have a query to return: id keys ----- ------------------ 123 '{"foo","bar"}'
Yanick Rochon (1651 rep)
Jan 25, 2016, 02:51 PM • Last activity: Nov 26, 2024, 09:15 PM
9 votes
1 answers
2785 views
Do fixed-width rows improve PostgreSQL read performance?
I have a table `articles`: Table "articles" Column | Type | Modifiers | Storage | Stats target | Description ----------------+-----------------------------+----------------------------------------------------+----------+--------------+------------- id | integer | not null default nextval('articles_i...
I have a table articles: Table "articles" Column | Type | Modifiers | Storage | Stats target | Description ----------------+-----------------------------+----------------------------------------------------+----------+--------------+------------- id | integer | not null default nextval('articles_id_seq'::regclass) | plain | | user_id | integer | | plain | | title | character varying(255) | | extended | | author | character varying(255) | | extended | | body | text | default '--- [] +| extended | | | | '::text | | | created_at | timestamp without time zone | | plain | | updated_at | timestamp without time zone | | plain | | published_date | timestamp without time zone | | plain | | Indexes: "articles_pkey" PRIMARY KEY, btree (id) "index_articles_on_published_date" btree (published_date) "index_rents_on_user_id" btree (user_id) "index_articles_on_user_id_and_published_date" btree (user_id, published_date) We're on Postgres 9.4.4. The machine has 3.5 GB of memory and 150 GB of disk space on an SSD. Note: The 'published_date' is always rounded, by the application, to the nearest date. All hours/minutes/seconds are always 00. Legacy. Needs fixed. Etc. This table has hundreds of millions of articles. The table receives a great deal of read queries from (as many as 16) concurrent processes performing the following queries as quickly as our system will respond: * a count of the total number of articles SELECT COUNT(*) FROM articles; * a select of all articles published for a given user SELECT * FROM articles WHERE user_id = $1; * a select of the most recently published article for a given user SELECT * FROM articles WHERE user_id = $1 ORDER BY published_date DESC LIMIT 1; I am finding that, with a large number of workers, these queries are quite slow. (At peak load, the first takes minutes to complete; the other two are on the order of 10 seconds.) In particular, it seems that queries are being enqueued. ### The question *In the abstract, do tables with only fixed width values perform read queries better than those with varying widths? (Pretend disk space isn't an issue.)* In my case, I'm wondering if I would see a performance improvement if I were to extract the 'body' text field to a separate table and transform the character varying fields into fixed width character fields. I admit the question is a bit cargo cult-y. I simply don't know enough about the internals of the Postgres DB engine to construct an informed hypothesis. I do intend to perform real experiments with different schemas and configurations but I'd like to have a solid mental model of how Postgres actually works before I go much further. ### Related question *Where can I learn more about the internals of the Postgres DB engine?* I've Googled variations of the above question with little success. What are the correct terms to use for this search? Does this level of documentation exist only in source and the minds of Postgres DBAs? I also humbly invite the suggestion of good books on the topic.
bobocopy (765 rep)
Sep 3, 2015, 08:47 PM • Last activity: Nov 21, 2024, 10:44 PM
6 votes
6 answers
72916 views
After fresh install getting `psql: command not found`
I downloaded and installed [PostgreSQL and PostGIS from KyngChaos]( http://www.kyngchaos.com/software/postgres) for my OS X 10.10 install. When I run `psql` I get, bash: psql: command not found According to the site https://dba.stackexchange.com/questions/3005/how-to-run-psql-on-mac-os-x it gets ins...
I downloaded and installed [PostgreSQL and PostGIS from KyngChaos]( http://www.kyngchaos.com/software/postgres) for my OS X 10.10 install. When I run psql I get, bash: psql: command not found According to the site https://dba.stackexchange.com/questions/3005/how-to-run-psql-on-mac-os-x it gets installed into /Library/PostgreSQL. There is no file there.
four-eyes (187 rep)
Apr 11, 2015, 03:17 PM • Last activity: Nov 15, 2024, 12:11 PM
Showing page 1 of 20 total questions