Sample Header Ad - 728x90

Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

0 votes
0 answers
89 views
Why is Kubernetes' pod CPU and memory load uneven?
I'm running a Kubernetes Deployment with four Pods. Each Pod is expected to share the load equally, but I've noticed that one Pod consistently uses significantly more CPU and memory than the others. I suspect this issue might be related to kube-proxy's iptables mode, but I'm not sure of the exact ca...
I'm running a Kubernetes Deployment with four Pods. Each Pod is expected to share the load equally, but I've noticed that one Pod consistently uses significantly more CPU and memory than the others. I suspect this issue might be related to kube-proxy's iptables mode, but I'm not sure of the exact cause.
[root@iac ~]# kubectl top pod --sort-by=memory | grep api
api-5cd64b46cb-b9ffs                        183m         1220Mi   
api-5cd64b46cb-4m8h6                        172m         952Mi      
api-5cd64b46cb-hrnhw                        168m         939Mi           
api-5cd64b46cb-kbm7d                        215m         895Mi
I’ve checked the resource limits and requests for the pods, and they’re all configured identically.
yang lan (1 rep)
Aug 18, 2024, 06:54 AM
7 votes
2 answers
6151 views
How do packets flow through the kernel
When it comes to packet filtering/management I never actually know what is going on inside the kernel. There are so many different tools that act on the packets, either from userspace (modifying kernel-space subsystems) or directly on kernel-space. Is there any place where each tool documents the in...
When it comes to packet filtering/management I never actually know what is going on inside the kernel. There are so many different tools that act on the packets, either from userspace (modifying kernel-space subsystems) or directly on kernel-space. Is there any place where each tool documents the interaction with other tools, or where they act. I feel like there should be a diagram somewhere specifying what is going on for people who aren't technical enough to go and read the kernel code. So here's my example: A packet is received on one of my network interfaces and I have: - UFW - iptables - IPv4 subsystem (routing) - IPVs - eBPF Ok, so I know that UFW is a frontend for iptables, and iptables is a frontend for Netfiler. So now we're on kernel space and our tools are Netfiler, IPVs, IPv4 and eBPF. Again, the interactions between Netfilter and the IPv4 subsystems are easy to find since these are very old (not in a bad way) subsystems, so lack of docs would be very strange. This diagram is an overview of the interaction: enter image description here But what about IPVs and eBPF? What's the actual order in which kernel subsystems act upon the packets when these two are in the kernel? I always find amazing people who try to go into the guts and help others understand, for example, [this description of the interaction between LVS and Netfilter](http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.filter_rules.html) . But shouldn't this be documented in a more official fashion? I'm not looking for an explanation here as to how these submodules interact, I know I could find it myself by searching. My question is more general as to why is there no official documentation that actually tries to explain what is going on inside these kernel subsystems. Is it documented somewhere that I just don't know of? Is there any reason not to try to explain these tools? I apologize if I'm not making any sense. I just started learning about these things.
AFP_555 (311 rep)
Jan 26, 2022, 05:55 PM • Last activity: May 23, 2024, 03:53 PM
1 votes
1 answers
694 views
Why my IPVS didn't change source IP when response packet returned
I have a k8s node named `edge1`, which have two pods, one is a client pod named `net-tool-edge1` and one is service pod named `nginx-edge1`. There is also a service named `nginx` For some reason, this node didn't have kube-proxy, instead an agent will generate IPVS for services. Today I found that n...
I have a k8s node named edge1, which have two pods, one is a client pod named net-tool-edge1 and one is service pod named nginx-edge1. There is also a service named nginx For some reason, this node didn't have kube-proxy, instead an agent will generate IPVS for services. Today I found that net-tool-edge1 couldn't visit service nginx, there was no response. And after I used tcpdump to capture traffic, I found that IPVS didn't work as expected. Pod net-work-tool's IP is 10.234.67.29, Pod nginx-edge1's IP is 10.234.67.28, service nginx has clusterIP 10.234.39.157. The output of ipvsadm list:
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.234.14.175:80 rr
  -> 10.234.67.28:80              Masq    1      0          0         
TCP  10.234.39.157:80 rr
  -> 10.234.67.28:80              Masq    1      0          0         
TCP  10.234.50.96:80 rr
  -> 10.22.48.15:80               Masq    1      0          0
Following is tcpdump out of net-tool-edge1:
02:45:46.016748 ARP, Request who-has 10.234.67.1 tell 10.234.67.29, length 28
02:45:46.016858 ARP, Request who-has 10.234.67.1 tell 10.234.67.29, length 28
02:45:46.016862 ARP, Reply 10.234.67.1 is-at 96:b9:e1:32:f0:fa, length 28
02:45:46.016864 IP 10.234.67.29.52704 > 169.254.25.10.53: 6768+ A? nginx.fabedge-e2e-test.svc.cluster.local. (58)
02:45:46.016953 IP 10.234.67.29.52704 > 169.254.25.10.53: 7300+ AAAA? nginx.fabedge-e2e-test.svc.cluster.local. (58)
02:45:46.025844 IP 169.254.25.10.53 > 10.234.67.29.52704: 6768*- 1/0/0 A 10.234.39.157 (114)
02:45:47.023403 IP 169.254.25.10.53 > 10.234.67.29.52704: 7300*- 0/1/0 (151)
02:45:47.023958 IP 10.234.67.29.57824 > 10.234.39.157.80: Flags [S], seq 1920875836, win 27200, options [mss 1360,sackOK,TS val 688253 ecr 0,nop,wscale 7], length 0
02:45:47.024040 ARP, Request who-has 10.234.67.28 tell 10.234.67.1, length 28
02:45:47.024149 ARP, Request who-has 10.234.67.29 tell 10.234.67.28, length 28
02:45:47.024153 ARP, Reply 10.234.67.29 is-at f2:3e:7d:a6:f5:1d, length 28
02:45:47.024162 IP 10.234.67.28.80 > 10.234.67.29.57824: Flags [S.], seq 3004459791, ack 1920875837, win 26960, options [mss 1360,sackOK,TS val 688253 ecr 688253,nop,wscale 7], length 0
02:45:47.024180 IP 10.234.67.29.57824 > 10.234.67.28.80: Flags [R], seq 1920875837, win 0, length 0
02:45:48.026571 IP 10.234.67.29.57824 > 10.234.39.157.80: Flags [S], seq 1920875836, win 27200, options [mss 1360,sackOK,TS val 689256 ecr 0,nop,wscale 7], length 0
02:45:48.026687 IP 10.234.67.28.80 > 10.234.67.29.57824: Flags [S.], seq 3020124674, ack 1920875837, win 26960, options [mss 1360,sackOK,TS val 689256 ecr 689256,nop,wscale 7], length 0
02:45:48.026702 IP 10.234.67.29.57824 > 10.234.67.28.80: Flags [R], seq 1920875837, win 0, length 0
02:45:51.026582 ARP, Request who-has 10.234.67.29 tell 10.234.67.1, length 28
02:45:51.026595 ARP, Reply 10.234.67.29 is-at f2:3e:7d:a6:f5:1d, length 28
02:45:52.034599 ARP, Request who-has 10.234.67.28 tell 10.234.67.29, length 28
02:45:52.034668 ARP, Reply 10.234.67.28 is-at 92:01:73:4f:50:2f, length 28
As we can see, In pod net-tool-edge1 the request's dest IP is 10.234.39.157, but the response packet's source IP is 10.234.67.28, so net-tool-edge1 send a RST packet. Here is the tcpdump output of edge1 node:
10:45:47.023961 IP 10.234.67.29.57824 > 10.234.39.157.80: Flags [S], seq 1920875836, win 27200, options [mss 1360,sackOK,TS val 688253 ecr 0,nop,wscale 7], length 0
10:45:47.023978 IP 10.234.67.29.57824 > 10.234.39.157.80: Flags [S], seq 1920875836, win 27200, options [mss 1360,sackOK,TS val 688253 ecr 0,nop,wscale 7], length 0
10:45:47.024063 IP 10.234.67.29.57824 > 10.234.67.28.80: Flags [S], seq 1920875836, win 27200, options [mss 1360,sackOK,TS val 688253 ecr 0,nop,wscale 7], length 0
10:45:47.024064 IP 10.234.67.29.57824 > 10.234.67.28.80: Flags [S], seq 1920875836, win 27200, options [mss 1360,sackOK,TS val 688253 ecr 0,nop,wscale 7], length 0
10:45:47.024160 IP 10.234.67.28.80 > 10.234.67.29.57824: Flags [S.], seq 3004459791, ack 1920875837, win 26960, options [mss 1360,sackOK,TS val 688253 ecr 688253,nop,wscale 7], length 0
10:45:47.024161 IP 10.234.67.28.80 > 10.234.67.29.57824: Flags [S.], seq 3004459791, ack 1920875837, win 26960, options [mss 1360,sackOK,TS val 688253 ecr 688253,nop,wscale 7], length 0
10:45:47.024185 IP 10.234.67.29.57824 > 10.234.67.28.80: Flags [R], seq 1920875837, win 0, length 0
10:45:47.024186 IP 10.234.67.29.57824 > 10.234.67.28.80: Flags [R], seq 1920875837, win 0, length 0
10:45:48.026585 IP 10.234.67.29.57824 > 10.234.39.157.80: Flags [S], seq 1920875836, win 27200, options [mss 1360,sackOK,TS val 689256 ecr 0,nop,wscale 7], length 0
10:45:48.026598 IP 10.234.67.29.57824 > 10.234.39.157.80: Flags [S], seq 1920875836, win 27200, options [mss 1360,sackOK,TS val 689256 ecr 0,nop,wscale 7], length 0
10:45:48.026636 IP 10.234.67.29.57824 > 10.234.67.28.80: Flags [S], seq 1920875836, win 27200, options [mss 1360,sackOK,TS val 689256 ecr 0,nop,wscale 7], length 0
10:45:48.026640 IP 10.234.67.29.57824 > 10.234.67.28.80: Flags [S], seq 1920875836, win 27200, options [mss 1360,sackOK,TS val 689256 ecr 0,nop,wscale 7], length 0
10:45:48.026684 IP 10.234.67.28.80 > 10.234.67.29.57824: Flags [S.], seq 3020124674, ack 1920875837, win 26960, options [mss 1360,sackOK,TS val 689256 ecr 689256,nop,wscale 7], length 0
10:45:48.026686 IP 10.234.67.28.80 > 10.234.67.29.57824: Flags [S.], seq 3020124674, ack 1920875837, win 26960, options [mss 1360,sackOK,TS val 689256 ecr 689256,nop,wscale 7], length 0
10:45:48.026703 IP 10.234.67.29.57824 > 10.234.67.28.80: Flags [R], seq 1920875837, win 0, length 0
10:45:48.026704 IP 10.234.67.29.57824 > 10.234.67.28.80: Flags [R], seq 1920875837, win 0, length 0
Here we can see, when net-tool-edge1 send request to 10.234.39.157, IPVS(or other kernel module) changed dest IP to 10.234.67.28, but when nginx-edge1 send its response packet, IPVS didn't change the source IP to 10.234.39.157. I also has another node named edge2, which has the same settings, but everyting on edge2 worked well. In fact, before I restarted edge1, everything worked well. I googled a lot, but didn't found anything helpful. Any help, any tip and any document is welcome and Thanks in advance. There is some iptables rules as:
[root@edge1 ~]# iptables -S 
-P INPUT ACCEPT
-P FORWARD DROP
-P OUTPUT ACCEPT
-N DOCKER
-N DOCKER-ISOLATION-STAGE-1
-N DOCKER-ISOLATION-STAGE-2
-N DOCKER-USER
-N FABEDGE-FORWARD
-A INPUT -d 169.254.25.10/32 -p udp -m udp --dport 53 -j ACCEPT
-A INPUT -d 169.254.25.10/32 -p tcp -m tcp --dport 53 -j ACCEPT
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A FORWARD -j FABEDGE-FORWARD
-A OUTPUT -s 169.254.25.10/32 -p udp -m udp --sport 53 -j ACCEPT
-A OUTPUT -s 169.254.25.10/32 -p tcp -m tcp --sport 53 -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -j RETURN
-A DOCKER-USER -j RETURN
-A FABEDGE-FORWARD -s 10.234.67.0/24 -j ACCEPT
-A FABEDGE-FORWARD -d 10.234.67.0/24 -j ACCEPT
[root@edge1 ~]# iptables -t nat -S 
-P PREROUTING ACCEPT
-P INPUT ACCEPT
-P OUTPUT ACCEPT
-P POSTROUTING ACCEPT
-N DOCKER
-N FABEDGE-NAT-OUTGOING
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -j FABEDGE-NAT-OUTGOING
-A DOCKER -i docker0 -j RETURN
-A FABEDGE-NAT-OUTGOING -s 10.234.67.0/24 -m set --match-set FABEDGE-PEER-CIDR dst -j RETURN
-A FABEDGE-NAT-OUTGOING -s 10.234.67.0/24 -d 10.234.67.0/24 -j RETURN
-A FABEDGE-NAT-OUTGOING -s 10.234.67.0/24 -j MASQUERADE
[root@edge1 ~]# ipset list 
Name: FABEDGE-PEER-CIDR
Type: hash:net
Revision: 6
Header: family inet hashsize 1024 maxelem 65536
Size in memory: 952
References: 1
Number of entries: 9
Members:
10.234.66.0/24
10.234.0.0/18
10.22.48.28
10.22.48.16
10.22.48.34
10.234.64.0/24
10.234.68.0/24
10.234.65.0/24
10.22.48.17
Jianbo Yan (53 rep)
Jul 26, 2022, 03:20 AM • Last activity: Jul 26, 2022, 10:08 AM
1 votes
1 answers
1073 views
L4 balancing using ipvs: drop RST packets - failover
I have a L4 ipvs load balancer with L7 envoy balancers setup. Let's say one of my L4 balancers goes down and thanks to consistent hashing the traffic which is now handled (thanks to BGP) by another L4 balancer is proxied to the same L7 node. This should work without any problems and I would think is...
I have a L4 ipvs load balancer with L7 envoy balancers setup. Let's say one of my L4 balancers goes down and thanks to consistent hashing the traffic which is now handled (thanks to BGP) by another L4 balancer is proxied to the same L7 node. This should work without any problems and I would think is a common setup. Problem is with long-running connections. When new L4 node receives the traffic (just data - ACK/PUSH packets) and no SYN packet has been received by the node, the node just sends RST packet to the client which terminates the connection. Picture below illustrates this. This should not be happening and my question is, is there a way (a sysctl config or something) which is the reason for this? I know I can perhaps drop RST packets using iptables, but that doesn't sound right. enter image description here
Diavel (61 rep)
Mar 11, 2020, 06:27 AM • Last activity: Mar 30, 2020, 04:46 PM
1 votes
1 answers
655 views
Capturing lvsadm UDP traffic on the loopback interface
**Problem** I am unable to capture lvsadm traffic on the loopback interface. # tcpdump -i lo -n udp port 51444 -vv tcpdump: listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes **ifconfig** # ifconfig ens192: flags=4163 mtu 1500 inet 10.0.10.136 netmask 255.255.255.0 broadcast 10....
**Problem** I am unable to capture lvsadm traffic on the loopback interface. # tcpdump -i lo -n udp port 51444 -vv tcpdump: listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes **ifconfig** # ifconfig ens192: flags=4163 mtu 1500 inet 10.0.10.136 netmask 255.255.255.0 broadcast 10.0.10.255 ether 00:50:56:ce:e3:dc txqueuelen 1000 (Ethernet) RX packets 63480807660 bytes 12508576879467 (11.3 TiB) RX errors 0 dropped 2 overruns 0 frame 0 TX packets 19861938950 bytes 5978856268385 (5.4 TiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73 mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10 loop txqueuelen 1 (Local Loopback) RX packets 351855 bytes 1689650676 (1.5 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 351855 bytes 1689650676 (1.5 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 **ipvsadm commands to set up round robin load balancing to three SIEM nodes** ipvsadm -A -u 127.0.0.1:51444 -s rr -o ipvsadm -a -u 127.0.0.1:51444 -r 10.10.10.77:514 -m -w 1 ipvsadm -a -u 127.0.0.1:51444 -r 10.10.10.78:514 -m -w 1 ipvsadm -a -u 127.0.0.1:51444 -r 10.10.10.79:514 -m -w 1 # ipvsadm IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn UDP localhost:51444 rr ops -> 10.10.10.77:syslog Masq 1 0 1 -> 10.10.10.78:syslog Masq 1 0 2 -> 10.10.10.79:syslog Masq 1 0 2 **Background** I am sucessfully sending traffic to my SIEM via ipvsadm. I was able to do this on both interfaces; ens192 and interface lo. I had no issues using tcpdump to view traffic being sent via ipvsadm on interface ens192. My problem is that when i use tcpdump on interface lo I do not see any ipvsadm traffic.
brakertech (1415 rep)
Jul 2, 2018, 09:09 PM • Last activity: Jul 10, 2018, 05:59 AM
Showing page 1 of 5 total questions