Sample Header Ad - 728x90

Connectivity issues in replication process

1 vote
1 answer
2188 views
I've been trying to add a new node to a replica set without success due to these errors: > 2019-08-12T06:48:15.031+0200 I ASIO [NetworkInterfaceASIO-RS-0] > Ending connection to host mongo03:27017 due to bad connection status; > 1 connections to that host remain open 2019-08-12T06:48:15.031+0200 I > REPL [replication-224] Restarting oplog query due to error: > NetworkInterfaceExceededTimeLimit: error in fetcher batch callback: > Operation timed out. Last fetched optime (with hash): { ts: > Timestamp(1565585283, 1317), t: 9 }. Restarts > remaining: 10 2019-08-12T06:48:15.032+0200 I REPL > [replication-224] Scheduled new oplog query Fetcher source: > mongo03:27017 database: local query: { find: "oplog.rs", filter: { ts: > { $gte: Timestamp(1565585283, 1317) } }, tailable: true, oplogReplay: > true, awaitData: true, maxTimeMS: 2000, batchSize: 13981010, term: 9, > readConcern: { afterOpTime: { ts: Timestamp(1565585283, 1317), t: 9 } > } } query metadata: { $replData: 1, $oplogQueryData: 1, > $readPreference: { mode: "secondaryPreferred" } } active: 1 > findNetworkTimeout: 7000ms getMoreNetworkTimeout: 10000ms shutting > down?: 0 first: 1 firstCommandScheduler: RemoteCommandRetryScheduler > request: RemoteCommand 2262327 -- target:mongo03:27017 db:local cmd:{ > find: "oplog.rs", filter: { ts: { $gte: Timestamp(1565585283, 1317) } > }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: > 2000, batchSize: 13981010, term: 9, readConcern: { afterOpTime: { ts: > Timestamp(1565585283, 1317), t: 9 } } } active: 1 > callbackHandle.valid: 1 callbackHandle.cancelled: 0 attempt: 1 > retryPolicy: RetryPolicyImpl maxAttempts: 1 maxTimeMillis: -1ms > 2019-08-12T06:48:17.637+0200 I ASIO [NetworkInterfaceASIO-RS-0] > Connecting to mongo03:27017 2019-08-12T06:48:17.645+0200 I ASIO > [NetworkInterfaceASIO-RS-0] Successfully connected to mongo03:27017, > took 8ms (2 connections now open to mongo03:27017) > 2019-08-12T06:49:36.961+0200 I - [repl writer worker 13] > DB.CollectionName collection clone progress: 221390504/643329790 > 34% (documents copied) 2019-08-12T06:51:52.859+0200 I ASIO > [NetworkInterfaceASIO-RS-0] Ending connection to host mongo03:27017 > due to bad connection status; 1 connections to that host remain open > 2019-08-12T06:51:52.859+0200 I REPL [replication-229] Restarting > oplog query due to error: NetworkInterfaceExceededTimeLimit: error in > fetcher batch callback: Operation timed out. Last fetched optime (with > hash): { ts: Timestamp(1565585502, 204), t: 9 }. > Restarts remaining: 10 2019-08-12T06:51:52.859+0200 I ASIO > [NetworkInterfaceASIO-RS-0] Connecting to mongo03:27017 > 2019-08-12T06:51:52.859+0200 I REPL [replication-229] Scheduled > new oplog query Fetcher source: mongo03:27017 database: local query: { > find: "oplog.rs", filter: { ts: { $gte: Timestamp(1565585502, 204) } > }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: > 2000, batchSize: 13981010, term: 9, readConcern: { afterOpTime: { ts: > Timestamp(1565585502, 204), t: 9 } } } query metadata: { $replData: 1, > $oplogQueryData: 1, $readPreference: { mode: "secondaryPreferred" } } > active: 1 findNetworkTimeout: 7000ms getMoreNetworkTimeout: 10000ms > shutting down?: 0 first: 1 firstCommandScheduler: > RemoteCommandRetryScheduler request: RemoteCommand 2272962 -- > target:mongo03:27017 db:local cmd:{ find: "oplog.rs", filter: { ts: { > $gte: Timestamp(1565585502, 204) } }, tailable: true, oplogReplay: > true, awaitData: true, maxTimeMS: 2000, batchSize: 13981010, term: 9, > readConcern: { afterOpTime: { ts: Timestamp(1565585502, 204), t: 9 } } > } active: 1 callbackHandle.valid: 1 callbackHandle.cancelled: 0 > attempt: 1 retryPolicy: RetryPolicyImpl maxAttempts: 1 maxTimeMillis: > -1ms I'm a little bit lost regarding how to debug it. It's a 250GB database, the synchronization is being done through a private network 100Mb. There is an iptables in all servers but the traffic from the private network is allowed: -A INPUT -i eth1 -p tcp -m tcp --dport 27017 -m state --state NEW,ESTABLISHED -j ACCEPT At some point the replication process fails: > 2019-08-12T02:22:37.582+0200 I REPL [replication-143] Restarting > oplog query due to error: ExceededTimeLimit: error in fetcher batch > callback: Executor error during find command :: caused by :: errmsg: > "oper ation exceeded time limit". Last fetched optime (with hash): { > ts: Timestamp(1565569328, 710), t: 9 }. Restarts > remaining: 1 2019-08-12T02:22:37.582+0200 I REPL [replication-143] > Scheduled new oplog query Fetcher source: mongo03:27017 database: > local query: { find: "oplog.rs", filter: { ts: { $gte: > Timestamp(1565569328, 710) } }, tailable: true, oplogReplay: true, > awaitData: true, maxTimeMS: 2000, batchSize: 13981010, term: 9, > readConcern: { afterOpTime: { ts: Timestamp(1565569328, 710), t: 9 } } > } query metadata: { $replData: 1, $oplogQ ueryData: 1, > $readPreference: { mode: "secondaryPreferred" } } active: 1 > findNetworkTimeout: 7000ms getMoreNetworkTimeout: 10000ms shutting > down?: 0 first: 1 firstCommandScheduler: RemoteCommandRetryScheduler > re quest: RemoteCommand 1381614 -- target:mongo03:27017 db:local cmd:{ > find: "oplog.rs", filter: { ts: { $gte: Timestamp(1565569328, 710) } > }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: > 2000, ba tchSize: 13981010, term: 9, readConcern: { afterOpTime: { ts: > Timestamp(1565569328, 710), t: 9 } } } active: 1 callbackHandle.valid: > 1 callbackHandle.cancelled: 0 attempt: 1 retryPolicy: RetryPolicyImpl > maxAttem pts: 1 maxTimeMillis: -1ms 2019-08-12T02:22:39.612+0200 I > REPL [replication-142] Error returned from oplog query (no more > query restarts left): ExceededTimeLimit: error in fetcher batch > callback: Executor error during find command :: caused by :: errmsg: > "operation exceeded time limit" 2019-08-12T02:22:39.615+0200 I REPL > [replication-142] Finished fetching oplog during initial sync: > ExceededTimeLimit: error in fetcher batch callback: Executor error > during find command :: caused by :: errms g: "operation exceeded time > limit". Last fetched optime and hash: { ts: Timestamp(1565569328, > 710), t: 9 } 2019-08-12T02:22:43.034+0200 I REPL > [replication-142] CollectionCloner ns:DB.CollectionName finished > cloning with status: IllegalOperation: AsyncResultsMerger killed > 2019-08-12T02:22:44.551+0200 W REPL [replication-142] collection > clone for 'DB.CollectionName' failed due to IllegalOperation: While > cloning collection 'DB.CollectionName' there was an error > 'AsyncResultsMerger killed' 2019-08-12T02:22:44.551+0200 I REPL > [replication-142] CollectionCloner::start called, on ns:DB.CollectionName2 > 2019-08-12T02:22:44.551+0200 W REPL [replication-142] database > 'dbName' (9 of 9) clone failed due to ShutdownInProgress: collection > cloner completed 2019-08-12T02:22:44.551+0200 I REPL > [replication-142] Finished cloning data: ShutdownInProgress: > collection cloner completed. Beginning oplog replay. > 2019-08-12T02:22:44.551+0200 I REPL [replication-142] Initial sync > attempt finishing up. And it has to start over again. Don't know if I could configure a bigger time out to avoid this disconnections. Thanks in advance.
Asked by brb (41 rep)
Aug 12, 2019, 05:12 AM
Last activity: Aug 9, 2025, 03:03 AM