PostgreSQL Foreign Data Wrappers - Simultaneous queries won't finish
0
votes
1
answer
2774
views
We're using foreign data wrappers in a database which points to another server (which is a read-only replica). We run scheduled jobs using python ( more on this here: https://github.com/sqlalchemy/sqlalchemy/discussions/8348 ) and lately we're facing an issue with a specific query (select statement with cte) - this query runs every hour on 10+ workers (python processes) each with their own conditions.
When I run this same query on the original server it takes ~6s, using fdw it's around 2-3 minutes. Since we reached 10+ workers these queries are stuck in an "active" state, I can see them is session manager, and after 20 minutes or so I get the following error:
SSL SYSCALL error: EOF detected
.
(The max connections option is set to 200.)
After a few of the workers fail with this error, the last ones fail with the following:
ERROR:app.services.cikk.main:(psycopg2.errors.ConnectionFailure) SSL SYSCALL error: EOF detected
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
CONTEXT: remote SQL command: START TRANSACTION ISOLATION LEVEL REPEATABLE READ
The postgres_fdw
doc says:
> The remote transaction uses SERIALIZABLE isolation level when the local transaction has SERIALIZABLE isolation level; otherwise it uses REPEATABLE READ isolation level.
> [...] That behavior would be expected anyway if the local transaction uses SERIALIZABLE or REPEATABLE READ isolation level, but it might be surprising for a READ COMMITTED local transaction.
This means that the server keeps read and write locks until the end of the transacion, and the read locks are released as soon as the select operation is performed - but it never finishes.
Maybe there's a deadlock (since 10+ queries try to use the same tables on the remote server)? If so how can I overcome this issue? Does this mean I can only make queries "synchronously" using fdw to make this work?
postgres version:
- PostgreSQL 12.10
- (Debian 12.10-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
server keepalive settings:
- tcp_keepalives_idle: 7200
- tcp_keepalives_interval: 75
- tcp_keepalives_count: 9
Thanks for the help in advance!
### UPDATE:
I think I figured it out.
- I had multiple ~3min queries running simultaneously (these queries used the same tables from a foreign server) and they wouldn't finish
- I started these manually to monnitor what's going on using pg_stat_activity
as @jjanes suggested
- What I saw is all of the queries were in an active
state, the wait_event_type
was IO
and the wait_event
was BufFileWrite
- Read into those a little bit to find out what's going on:
- wait_event_type - IO
: The type of event for which the backend is waiting.
- which is pretty self explanatory - and if the value is IO
it means that some IO operation is in progress
- Since the wait_event
was BufFileWrite
I looked into it what it means exactly: Buffered files in PostgreSQL are primarily temporary files. Temporary files are used for sorts and hashes. BufFileWrite is seen when these in memory buffers are filled and written to disk.
- So what could cause this? One site (link down below) says: Large waits on BufFileWrite can indicate that your work_mem setting is too small to capture most of your sorts and hashes in memory and are being dumped to disk.
and Ensure your sorts largely occur in memory rather than writing temp files by using explain analyze on your query ...
- I checked our work_mem
value with show work_mem;
which was 20971kB - I thought it should be enough so looked further
- The clue here for me was the explain analyze
part. I created the foreign server with use_remote_estimate: true
, which means When use_remote_estimate is true, postgres_fdw obtains row count and cost estimates from the remote server
- The solution was to set this property (use_remote_estimate
) to false
and now it seems to be working the way it should.
Useful links:
https://www.postgresql.org/docs/current/monitoring-stats.html
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/apg-waits.iobuffile.html
https://docs.dbmarlin.com/docs/kb/wait-events/postgresql/buffilewrite/
https://www.postgresql.org/docs/current/runtime-config-resource.html
https://www.postgresql.org/docs/current/postgres-fdw.html
Asked by K. Anye
(13 rep)
Aug 4, 2022, 02:11 PM
Last activity: Apr 19, 2025, 09:05 PM
Last activity: Apr 19, 2025, 09:05 PM