Sample Header Ad - 728x90

Patroni failed to get list of machines from etcd

0 votes
0 answers
963 views
I am running a patroni cluster (3.4) on linux with an etcd cluster. Normally the cluster runs perfectly fine but sometimes I get some errors saying Request to etcd server failed (ReadtmeoutError, NewConnectionError, ConnectTimeoutError) ETCD: 10.100.10.4 10.100.11.3 10.100.11.5 Patroni/PostgreSQL Nodes 10.100.10.10 10.100.11.6
2024-04-21 04:45:42,868 DEBUG: Writing {"conn_url":"postgres://10.100.10.10:5432/postgres","api_url":"http://10.100.10.10:8008/patroni ","state":"running","role":"master","version":"3.3.0","xlog_location":1642347112,"timeline":5} to key /db/mycluster/members/postgresql0 ttl=30 dir=False append=False
2024-04-21 04:45:42,869 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None)
2024-04-21 04:45:42,871 DEBUG: http://10.100.11.3:2379  "PUT /v2/keys/db/mycluster/members/postgresql0 HTTP/1.1" 200 790
2024-04-21 04:45:42,871 INFO: no action. I am (postgresql0), the leader with the lock
2024-04-21 04:45:46,136 DEBUG: Issuing read for key /db/mycluster/ with args {'recursive': True, 'quorum': False, 'retry': }
2024-04-21 04:45:46,136 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None)
2024-04-21 04:45:46,138 DEBUG: http://10.100.11.3:2379  "GET /v2/keys/db/mycluster/?recursive=true&quorum=false HTTP/1.1" 200 None
2024-04-21 04:45:46,139 DEBUG: API thread: 10.100.11.6 - - "GET /cluster HTTP/1.1" 200 - latency: 3.354 ms
2024-04-21 04:45:51,981 DEBUG: Issuing read for key /db/mycluster/ with args {'recursive': True, 'quorum': False, 'retry': }
2024-04-21 04:45:51,983 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None)
2024-04-21 04:45:51,987 DEBUG: http://10.100.11.3:2379  "GET /v2/keys/db/mycluster/?recursive=true&quorum=false HTTP/1.1" 200 None
2024-04-21 04:45:51,989 DEBUG: API thread: 10.100.11.6 - - "GET /cluster HTTP/1.1" 200 - latency: 16.522 ms
2024-04-21 04:45:52,859 DEBUG: Issuing read for key /db/mycluster/ with args {'recursive': True, 'quorum': False, 'retry': }
2024-04-21 04:45:52,861 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None)
2024-04-21 04:45:56,198 ERROR: Request to server http://10.100.11.3:2379  failed: MaxRetryError('HTTPConnectionPool(host=\'10.100.11.3\', port=2379): Max retries exceeded with url: /v2/keys/db/mycluster/?recursive=true&quorum=false (Caused by ReadTimeoutError("HTTPConnectionPool(host=\'10.100.11.3\', port=2379): Read timed out. (read timeout=3.332937417338447)"))')
2024-04-21 04:45:56,198 INFO: Reconnection allowed, looking for another server.
2024-04-21 04:45:56,198 INFO: Retrying on http://10.100.10.4:2379 
2024-04-21 04:45:56,199 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None)
2024-04-21 04:45:56,199 DEBUG: Starting new HTTP connection (1): 10.100.10.4:2379
2024-04-21 04:45:56,200 ERROR: Request to server http://10.100.10.4:2379  failed: MaxRetryError("HTTPConnectionPool(host='10.100.10.4', port=2379): Max retries exceeded with url: /v2/keys/db/mycluster/?recursive=true&quorum=false (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused'))")
2024-04-21 04:45:56,200 INFO: Reconnection allowed, looking for another server.
2024-04-21 04:45:56,200 INFO: Retrying on http://10.100.11.5:2379 
2024-04-21 04:45:56,200 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None)
2024-04-21 04:45:56,200 DEBUG: Starting new HTTP connection (1): 10.100.11.5:2379
2024-04-21 04:45:57,870 ERROR: Request to server http://10.100.11.5:2379  failed: MaxRetryError("HTTPConnectionPool(host='10.100.11.5', port=2379): Max retries exceeded with url: /v2/keys/db/mycluster/?recursive=true&quorum=false (Caused by ConnectTimeoutError(, 'Connection to 10.100.11.5 timed out. (connect timeout=1.6666666666666667)'))")
2024-04-21 04:45:57,870 INFO: Reconnection allowed, looking for another server.
2024-04-21 04:45:57,871 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None)
2024-04-21 04:45:57,871 DEBUG: Starting new HTTP connection (1): 10.100.11.5:2379
2024-04-21 04:45:59,540 ERROR: Failed to get list of machines from http://10.100.11.5:2379/v2 : MaxRetryError("HTTPConnectionPool(host='10.100.11.5', port=2379): Max retries exceeded with url: /v2/machines (Caused by ConnectTimeoutError(, 'Connection to 10.100.11.5 timed out. (connect timeout=1.6666666666666667)'))")
2024-04-21 04:45:59,541 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None)
2024-04-21 04:45:59,541 DEBUG: Starting new HTTP connection (1): 10.100.11.3:2379
2024-04-21 04:46:01,210 ERROR: Failed to get list of machines from http://10.100.11.3:2379/v2 : MaxRetryError("HTTPConnectionPool(host='10.100.11.3', port=2379): Max retries exceeded with url: /v2/machines (Caused by ConnectTimeoutError(, 'Connection to 10.100.11.3 timed out. (connect timeout=1.6666666666666667)'))")
2024-04-21 04:46:01,211 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None)
2024-04-21 04:46:01,211 DEBUG: Starting new HTTP connection (1): 10.100.10.4:2379
2024-04-21 04:46:01,212 ERROR: Failed to get list of machines from http://10.100.10.4:2379/v2 : MaxRetryError("HTTPConnectionPool(host='10.100.10.4', port=2379): Max retries exceeded with url: /v2/machines (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused'))")
2024-04-21 04:46:01,212 DEBUG: Failed to update list of etcd nodes: EtcdException('Could not get the list of servers, maybe you provided the wrong host(s) to connect to?')
2024-04-21 04:46:01,484 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None)
2024-04-21 04:46:01,484 DEBUG: Starting new HTTP connection (1): 10.100.11.3:2379
2024-04-21 04:46:02,486 ERROR: Request to server http://10.100.11.3:2379  failed: MaxRetryError("HTTPConnectionPool(host='10.100.11.3', port=2379): Max retries exceeded with url: /v2/keys/db/mycluster/?recursive=true&quorum=false (Caused by ConnectTimeoutError(, 'Connection to 10.100.11.3 timed out. (connect timeout=1.0)'))")
2024-04-21 04:46:02,486 INFO: Reconnection allowed, looking for another server.
Firewall should not be a problem, but maybe timeouts? This error only appear on one node (10.100.10.10) If you need more information, please let me know! Thank you!
Asked by mymarcelsql (21 rep)
Apr 21, 2024, 05:23 AM
Last activity: Apr 22, 2024, 09:20 AM