Sample Header Ad - 728x90

How to debug Linux TCP slows/packet loss

0 votes
1 answer
3402 views
I'm trying to track down some particular network paths which are slowing down to about 200KByte/sec. I see this performance through various tests including with scp, rsync and iperf3:
$ iperf3 -c 157.130.91.64 -R
Connecting to host 157.130.91.64, port 5201
Reverse mode, remote host 157.130.91.64 is sending
[  5] local 172.16.1.177 port 47862 connected to 157.130.91.64 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   274 KBytes  2.25 Mbits/sec
[  5]   1.00-2.00   sec   199 KBytes  1.63 Mbits/sec
[  5]   2.00-3.00   sec   202 KBytes  1.66 Mbits/sec
[  5]   3.00-4.00   sec   198 KBytes  1.62 Mbits/sec
[  5]   4.00-5.00   sec   195 KBytes  1.60 Mbits/sec
[  5]   5.00-6.00   sec   184 KBytes  1.51 Mbits/sec
[  5]   6.00-7.00   sec   195 KBytes  1.60 Mbits/sec
[  5]   7.00-8.00   sec   209 KBytes  1.71 Mbits/sec
[  5]   8.00-9.00   sec   192 KBytes  1.58 Mbits/sec
[  5]   9.00-10.00  sec   187 KBytes  1.53 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  2.31 MBytes  1.94 Mbits/sec   65             sender
[  5]   0.00-10.00  sec  1.99 MBytes  1.67 Mbits/sec                  receiver

iperf Done.
The host in question is on a third party hosting provider. And I am downloading data to a co-located data center. I haven't entirely been able to pin down the common element. There are two routers on my side of the network and then a VM host and then a virtual machine. The VM host and the inner-most router are using a VxLAN (i.e. via ip link add vxlan100 type vxlan...), which I suspect is part of the problem. However, I can get 1Gbit speeds (measured with iperf3) directly over the VxLAN to various locations within the rack. I can provide an example but it reads like the above only orders of magnitude faster. The only clue I have at this point is if I capture traffic while this 200KByte/sec transfer is running I do see a higher incidence of TCP retransmissions, TCP out-of-order, and TCP Dup ACK messages in Wireshark. These do seem to correlate with the slow. Captured traffic which runs at much higher speeds also has some TCP retransmissions but much fewer in relation to how much traffic is being sent. My question here is how do I debug this to find the cause of the missing packets? And are there any specific places I should be checking? It seems like there is some degree of packet loss which is causing this slow, but I'm at a loss as to where to try to find it. The packet loss itself does not exhibit at slower speeds, nor does it exhibit within machines within my own datacenter. There seems to be no exact single place where this predictably occurring, only that it definitely occurs between a VM in my datacenter and another machine in another data center. (And also this other machines does have higher transfer rates to other places like AWS, so it's the third party machine, I've checked, it only reproduces when sending to my network). Any ideas?
Asked by bgp (153 rep)
Jun 4, 2022, 04:31 AM
Last activity: Jun 6, 2022, 05:06 AM