Sample Header Ad - 728x90

Why my IPVS didn't change source IP when response packet returned

1 vote
1 answer
694 views
I have a k8s node named edge1, which have two pods, one is a client pod named net-tool-edge1 and one is service pod named nginx-edge1. There is also a service named nginx For some reason, this node didn't have kube-proxy, instead an agent will generate IPVS for services. Today I found that net-tool-edge1 couldn't visit service nginx, there was no response. And after I used tcpdump to capture traffic, I found that IPVS didn't work as expected. Pod net-work-tool's IP is 10.234.67.29, Pod nginx-edge1's IP is 10.234.67.28, service nginx has clusterIP 10.234.39.157. The output of ipvsadm list:
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.234.14.175:80 rr
  -> 10.234.67.28:80              Masq    1      0          0         
TCP  10.234.39.157:80 rr
  -> 10.234.67.28:80              Masq    1      0          0         
TCP  10.234.50.96:80 rr
  -> 10.22.48.15:80               Masq    1      0          0
Following is tcpdump out of net-tool-edge1:
02:45:46.016748 ARP, Request who-has 10.234.67.1 tell 10.234.67.29, length 28
02:45:46.016858 ARP, Request who-has 10.234.67.1 tell 10.234.67.29, length 28
02:45:46.016862 ARP, Reply 10.234.67.1 is-at 96:b9:e1:32:f0:fa, length 28
02:45:46.016864 IP 10.234.67.29.52704 > 169.254.25.10.53: 6768+ A? nginx.fabedge-e2e-test.svc.cluster.local. (58)
02:45:46.016953 IP 10.234.67.29.52704 > 169.254.25.10.53: 7300+ AAAA? nginx.fabedge-e2e-test.svc.cluster.local. (58)
02:45:46.025844 IP 169.254.25.10.53 > 10.234.67.29.52704: 6768*- 1/0/0 A 10.234.39.157 (114)
02:45:47.023403 IP 169.254.25.10.53 > 10.234.67.29.52704: 7300*- 0/1/0 (151)
02:45:47.023958 IP 10.234.67.29.57824 > 10.234.39.157.80: Flags [S], seq 1920875836, win 27200, options [mss 1360,sackOK,TS val 688253 ecr 0,nop,wscale 7], length 0
02:45:47.024040 ARP, Request who-has 10.234.67.28 tell 10.234.67.1, length 28
02:45:47.024149 ARP, Request who-has 10.234.67.29 tell 10.234.67.28, length 28
02:45:47.024153 ARP, Reply 10.234.67.29 is-at f2:3e:7d:a6:f5:1d, length 28
02:45:47.024162 IP 10.234.67.28.80 > 10.234.67.29.57824: Flags [S.], seq 3004459791, ack 1920875837, win 26960, options [mss 1360,sackOK,TS val 688253 ecr 688253,nop,wscale 7], length 0
02:45:47.024180 IP 10.234.67.29.57824 > 10.234.67.28.80: Flags [R], seq 1920875837, win 0, length 0
02:45:48.026571 IP 10.234.67.29.57824 > 10.234.39.157.80: Flags [S], seq 1920875836, win 27200, options [mss 1360,sackOK,TS val 689256 ecr 0,nop,wscale 7], length 0
02:45:48.026687 IP 10.234.67.28.80 > 10.234.67.29.57824: Flags [S.], seq 3020124674, ack 1920875837, win 26960, options [mss 1360,sackOK,TS val 689256 ecr 689256,nop,wscale 7], length 0
02:45:48.026702 IP 10.234.67.29.57824 > 10.234.67.28.80: Flags [R], seq 1920875837, win 0, length 0
02:45:51.026582 ARP, Request who-has 10.234.67.29 tell 10.234.67.1, length 28
02:45:51.026595 ARP, Reply 10.234.67.29 is-at f2:3e:7d:a6:f5:1d, length 28
02:45:52.034599 ARP, Request who-has 10.234.67.28 tell 10.234.67.29, length 28
02:45:52.034668 ARP, Reply 10.234.67.28 is-at 92:01:73:4f:50:2f, length 28
As we can see, In pod net-tool-edge1 the request's dest IP is 10.234.39.157, but the response packet's source IP is 10.234.67.28, so net-tool-edge1 send a RST packet. Here is the tcpdump output of edge1 node:
10:45:47.023961 IP 10.234.67.29.57824 > 10.234.39.157.80: Flags [S], seq 1920875836, win 27200, options [mss 1360,sackOK,TS val 688253 ecr 0,nop,wscale 7], length 0
10:45:47.023978 IP 10.234.67.29.57824 > 10.234.39.157.80: Flags [S], seq 1920875836, win 27200, options [mss 1360,sackOK,TS val 688253 ecr 0,nop,wscale 7], length 0
10:45:47.024063 IP 10.234.67.29.57824 > 10.234.67.28.80: Flags [S], seq 1920875836, win 27200, options [mss 1360,sackOK,TS val 688253 ecr 0,nop,wscale 7], length 0
10:45:47.024064 IP 10.234.67.29.57824 > 10.234.67.28.80: Flags [S], seq 1920875836, win 27200, options [mss 1360,sackOK,TS val 688253 ecr 0,nop,wscale 7], length 0
10:45:47.024160 IP 10.234.67.28.80 > 10.234.67.29.57824: Flags [S.], seq 3004459791, ack 1920875837, win 26960, options [mss 1360,sackOK,TS val 688253 ecr 688253,nop,wscale 7], length 0
10:45:47.024161 IP 10.234.67.28.80 > 10.234.67.29.57824: Flags [S.], seq 3004459791, ack 1920875837, win 26960, options [mss 1360,sackOK,TS val 688253 ecr 688253,nop,wscale 7], length 0
10:45:47.024185 IP 10.234.67.29.57824 > 10.234.67.28.80: Flags [R], seq 1920875837, win 0, length 0
10:45:47.024186 IP 10.234.67.29.57824 > 10.234.67.28.80: Flags [R], seq 1920875837, win 0, length 0
10:45:48.026585 IP 10.234.67.29.57824 > 10.234.39.157.80: Flags [S], seq 1920875836, win 27200, options [mss 1360,sackOK,TS val 689256 ecr 0,nop,wscale 7], length 0
10:45:48.026598 IP 10.234.67.29.57824 > 10.234.39.157.80: Flags [S], seq 1920875836, win 27200, options [mss 1360,sackOK,TS val 689256 ecr 0,nop,wscale 7], length 0
10:45:48.026636 IP 10.234.67.29.57824 > 10.234.67.28.80: Flags [S], seq 1920875836, win 27200, options [mss 1360,sackOK,TS val 689256 ecr 0,nop,wscale 7], length 0
10:45:48.026640 IP 10.234.67.29.57824 > 10.234.67.28.80: Flags [S], seq 1920875836, win 27200, options [mss 1360,sackOK,TS val 689256 ecr 0,nop,wscale 7], length 0
10:45:48.026684 IP 10.234.67.28.80 > 10.234.67.29.57824: Flags [S.], seq 3020124674, ack 1920875837, win 26960, options [mss 1360,sackOK,TS val 689256 ecr 689256,nop,wscale 7], length 0
10:45:48.026686 IP 10.234.67.28.80 > 10.234.67.29.57824: Flags [S.], seq 3020124674, ack 1920875837, win 26960, options [mss 1360,sackOK,TS val 689256 ecr 689256,nop,wscale 7], length 0
10:45:48.026703 IP 10.234.67.29.57824 > 10.234.67.28.80: Flags [R], seq 1920875837, win 0, length 0
10:45:48.026704 IP 10.234.67.29.57824 > 10.234.67.28.80: Flags [R], seq 1920875837, win 0, length 0
Here we can see, when net-tool-edge1 send request to 10.234.39.157, IPVS(or other kernel module) changed dest IP to 10.234.67.28, but when nginx-edge1 send its response packet, IPVS didn't change the source IP to 10.234.39.157. I also has another node named edge2, which has the same settings, but everyting on edge2 worked well. In fact, before I restarted edge1, everything worked well. I googled a lot, but didn't found anything helpful. Any help, any tip and any document is welcome and Thanks in advance. There is some iptables rules as:
[root@edge1 ~]# iptables -S 
-P INPUT ACCEPT
-P FORWARD DROP
-P OUTPUT ACCEPT
-N DOCKER
-N DOCKER-ISOLATION-STAGE-1
-N DOCKER-ISOLATION-STAGE-2
-N DOCKER-USER
-N FABEDGE-FORWARD
-A INPUT -d 169.254.25.10/32 -p udp -m udp --dport 53 -j ACCEPT
-A INPUT -d 169.254.25.10/32 -p tcp -m tcp --dport 53 -j ACCEPT
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A FORWARD -j FABEDGE-FORWARD
-A OUTPUT -s 169.254.25.10/32 -p udp -m udp --sport 53 -j ACCEPT
-A OUTPUT -s 169.254.25.10/32 -p tcp -m tcp --sport 53 -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -j RETURN
-A DOCKER-USER -j RETURN
-A FABEDGE-FORWARD -s 10.234.67.0/24 -j ACCEPT
-A FABEDGE-FORWARD -d 10.234.67.0/24 -j ACCEPT
[root@edge1 ~]# iptables -t nat -S 
-P PREROUTING ACCEPT
-P INPUT ACCEPT
-P OUTPUT ACCEPT
-P POSTROUTING ACCEPT
-N DOCKER
-N FABEDGE-NAT-OUTGOING
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -j FABEDGE-NAT-OUTGOING
-A DOCKER -i docker0 -j RETURN
-A FABEDGE-NAT-OUTGOING -s 10.234.67.0/24 -m set --match-set FABEDGE-PEER-CIDR dst -j RETURN
-A FABEDGE-NAT-OUTGOING -s 10.234.67.0/24 -d 10.234.67.0/24 -j RETURN
-A FABEDGE-NAT-OUTGOING -s 10.234.67.0/24 -j MASQUERADE
[root@edge1 ~]# ipset list 
Name: FABEDGE-PEER-CIDR
Type: hash:net
Revision: 6
Header: family inet hashsize 1024 maxelem 65536
Size in memory: 952
References: 1
Number of entries: 9
Members:
10.234.66.0/24
10.234.0.0/18
10.22.48.28
10.22.48.16
10.22.48.34
10.234.64.0/24
10.234.68.0/24
10.234.65.0/24
10.22.48.17
Asked by Jianbo Yan (53 rep)
Jul 26, 2022, 03:20 AM
Last activity: Jul 26, 2022, 10:08 AM