Sample Header Ad - 728x90

Postgres-XL: SQueue timeout while waiting for Consumers finishing

2 votes
1 answer
326 views
We are running a pg-xl cluster: 1 gtm 2 coords 2 datanodes (+2 slaves) Everything works ok except for some medium queries (10 joins + group by). These queries always take 10.xx seconds. There is a 10-second timeout somewhere in pg-xl. If I analyze the query I can see that it takes way less: Planning time: 5.718 ms Execution time: 57.537 ms On the datanode logs, after 10 second, I can see: WARNING: SQueue p_1_3831_11, timeout while waiting for Consumers finishing I tried to play with the shared_queues/shared_queue_size parameters but no luck. Does anyone of you know where this timeout can come from? Thanks in advance for your help. EDIT: I can reproduce with a simple LEFT JOIN (invoice table has 900k rows and quote table has 150k): EXPLAIN SELECT * FROM quote LEFT JOIN invoice ON invoice.doc_id = quote.doc_id AND invoice.account_id = 99171 AND invoice.state = 0 WHERE quote.quote_id = 5880 AND quote.account_id = 99171 AND quote.state = 0; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Remote Subquery Scan on all (data01,data02) (cost=0.28..16.33 rows=1 width=4910) -> Nested Loop Left Join (cost=0.28..16.33 rows=1 width=4910) Join Filter: (invoice.doc_id = quote.doc_id) -> Remote Subquery Scan on all (data01,data02) (cost=100.14..109.31 rows=1 width=1142) Distribute results by H: doc_id -> Index Scan using idx_81855_account_id_3 on quote (cost=0.14..8.16 rows=1 width=1142) Index Cond: ((account_id = 99171) AND (state = 0)) Filter: (quote_id = 5880) -> Materialize (cost=100.14..111.93 rows=1 width=3768) -> Remote Subquery Scan on all (data01,data02) (cost=100.14..111.93 rows=1 width=3768) Distribute results by H: doc_id -> Index Scan using idx_82126_account_id_6 on invoice (cost=0.14..8.16 rows=1 width=3768) Index Cond: ((account_id = 99171) AND (state = 0)) (13 rows) On data-01, the transactions are fine: LOG: duration: 0.088 ms statement: SET global_session TO coord01_23261;SET parentPGXCPid TO 23261; LOG: duration: 0.012 ms statement: BEGIN LOG: duration: 0.020 ms statement: SET LOCAL coordinator_lxid TO "24"; LOG: duration: 0.738 ms parse p_1_5add_3a: Remote Subplan LOG: duration: 0.063 ms statement: SET global_session TO coord01_23261;SET parentpgxcpid TO "23261";SET parentPGXCPid TO 13090; LOG: duration: 0.011 ms statement: BEGIN LOG: duration: 0.016 ms statement: SET LOCAL coordinator_lxid TO "45"; LOG: duration: 0.194 ms parse p_1_5add_38: Remote Subplan LOG: duration: 0.069 ms statement: SET global_session TO coord01_23261;SET parentpgxcpid TO "23261";SET parentPGXCPid TO 10474; LOG: duration: 1.667 ms bind p_1_5add_3a/p_1_5add_3a: Remote Subplan LOG: duration: 0.011 ms statement: BEGIN LOG: duration: 0.194 ms parse p_1_5add_39: Remote Subplan LOG: duration: 0.062 ms bind p_1_5add_38/p_1_5add_38: Remote Subplan LOG: duration: 0.053 ms execute p_1_5add_38/p_1_5add_38: Remote Subplan LOG: duration: 0.018 ms statement: SET LOCAL coordinator_lxid TO "45"; LOG: duration: 0.520 ms execute p_1_5add_3a/p_1_5add_3a: Remote Subplan LOG: duration: 0.176 ms parse p_1_5add_38: Remote Subplan LOG: duration: 0.157 ms parse p_1_5add_39: Remote Subplan LOG: duration: 0.029 ms bind p_1_5add_38/p_1_5add_38: Remote Subplan LOG: duration: 0.011 ms execute p_1_5add_38/p_1_5add_38: Remote Subplan LOG: duration: 0.045 ms bind p_1_5add_39/p_1_5add_39: Remote Subplan LOG: duration: 0.462 ms execute p_1_5add_39/p_1_5add_39: Remote Subplan LOG: duration: 0.023 ms statement: COMMIT TRANSACTION LOG: duration: 0.036 ms statement: RESET ALL;RESET SESSION AUTHORIZATION;RESET transaction_isolation; LOG: duration: 0.488 ms statement: COMMIT TRANSACTION WARNING: SQueue p_1_5add_39, timeout while waiting for Consumers finishing LOG: duration: 0.053 ms statement: COMMIT TRANSACTION LOG: duration: 0.044 ms statement: RESET ALL;RESET SESSION AUTHORIZATION;RESET transaction_isolation; LOG: duration: 0.046 ms statement: RESET ALL;RESET SESSION AUTHORIZATION;RESET transaction_isolation; But on data-02, we can see the 10 second timeout LOG: duration: 0.087 ms statement: SET global_session TO coord01_23261;SET parentPGXCPid TO 23261; LOG: duration: 0.013 ms statement: BEGIN LOG: duration: 0.020 ms statement: SET LOCAL coordinator_lxid TO "24"; LOG: duration: 0.060 ms statement: SET global_session TO coord01_23261;SET parentpgxcpid TO "23261";SET parentPGXCPid TO 13090; LOG: duration: 0.773 ms parse p_1_5add_3a: Remote Subplan LOG: duration: 0.010 ms statement: BEGIN LOG: duration: 0.016 ms statement: SET LOCAL coordinator_lxid TO "45"; LOG: duration: 0.057 ms statement: SET global_session TO coord01_23261;SET parentpgxcpid TO "23261";SET parentPGXCPid TO 10474; LOG: duration: 0.177 ms parse p_1_5add_38: Remote Subplan LOG: duration: 0.173 ms parse p_1_5add_39: Remote Subplan LOG: duration: 0.037 ms bind p_1_5add_38/p_1_5add_38: Remote Subplan LOG: duration: 0.035 ms execute p_1_5add_38/p_1_5add_38: Remote Subplan LOG: duration: 0.010 ms statement: BEGIN LOG: duration: 0.016 ms statement: SET LOCAL coordinator_lxid TO "45"; LOG: duration: 0.194 ms parse p_1_5add_38: Remote Subplan LOG: duration: 1.666 ms bind p_1_5add_3a/p_1_5add_3a: Remote Subplan LOG: duration: 0.165 ms parse p_1_5add_39: Remote Subplan LOG: duration: 0.026 ms bind p_1_5add_38/p_1_5add_38: Remote Subplan LOG: duration: 0.003 ms execute p_1_5add_38/p_1_5add_38: Remote Subplan LOG: duration: 0.038 ms bind p_1_5add_39/p_1_5add_39: Remote Subplan LOG: duration: 0.454 ms execute p_1_5add_39/p_1_5add_39: Remote Subplan LOG: duration: 1.506 ms execute p_1_5add_3a/p_1_5add_3a: Remote Subplan LOG: duration: 0.018 ms statement: COMMIT TRANSACTION LOG: duration: 0.032 ms statement: RESET ALL;RESET SESSION AUTHORIZATION;RESET transaction_isolation; WARNING: SQueue p_1_5add_39, timeout while waiting for Consumers finishing LOG: duration: 0.050 ms statement: COMMIT TRANSACTION LOG: duration: 0.048 ms statement: RESET ALL;RESET SESSION AUTHORIZATION;RESET transaction_isolation; LOG: duration: 10010.353 ms statement: COMMIT TRANSACTION LOG: duration: 0.047 ms statement: RESET ALL;RESET SESSION AUTHORIZATION;RESET transaction_isolation;
Asked by dbaq (121 rep)
Jun 21, 2017, 12:40 AM
Last activity: Apr 2, 2019, 03:06 PM