Unix & Linux Stack Exchange
Q&A for users of Linux, FreeBSD and other Unix-like operating systems
Latest Questions
1
votes
2
answers
3321
views
How do I verify the parameters set using tc command?
I am in need of simulating a high latency and low bandwidth connection for a performance test of my application. I have gone through a number of pages describing the `tc` command. But, I haven't been able to validate the numbers that I set. For example: I took the following command values from: http...
I am in need of simulating a high latency and low bandwidth connection for a performance test of my application. I have gone through a number of pages describing the
tc
command. But, I haven't been able to validate the numbers that I set. For example:
I took the following command values from:
https://www.excentis.com/blog/use-linux-traffic-control-impairment-node-test-environment-part-2
tc qdisc add dev eth0 root tbf rate 1mbit burst 32kbit latency 400ms
With that applied on (say, machine A), according to the description on the page, I am assuming my output rate should be 128 kBps (at least approximately). To test this, I started transferring a 2 GB file using scp from machine A to another machine "B" which are in the same LAN. Transfer rates without any added impairment reach up to 12 MBps in this network. But, when the transfer started the rate was at 2 MBps, then it kept stalling and falling down until when it started to swing and stall between 11 kBps and 24 kBps.
I used nmon to monitor network throughput on both sides during the transfer, but it never went above 24 kBps (except for a couple of values reading 54 and 62).
I have also tried increasing the rate and bucket size, but the behavior during scp is the same. I tried the following command to increase the bucket size and the rate:
tc qdisc add dev eth0 root tbf rate 1024kbps burst 1024kb latency 500
And scp still stalled and swung around the same rates (11-30 kBps).
Am I inferring the term "rate" wrong here? I have looked at the man page for tc and it appears that my interpretation is correct. Could anyone explain to me what would be the best way to test the set parameters (assuming I did it correctly)?
james
(11 rep)
Nov 18, 2015, 03:39 PM
• Last activity: Jul 17, 2025, 09:09 PM
1
votes
0
answers
19
views
Why does netem delay not work when netem loss does
I am using `tc` to test the behaviour of a networked app under various network conditions. The setup is like this: ``` if [ -z "$(tc qdisc show dev ${MAIN_LINK} ingress)" ] then sudo tc qdisc add dev ${MAIN_LINK} handle ffff: ingress fi sudo tc filter del dev ${MAIN_LINK} ingress sudo tc filter add...
I am using
tc
to test the behaviour of a networked app under various network conditions. The setup is like this:
if [ -z "$(tc qdisc show dev ${MAIN_LINK} ingress)" ]
then
sudo tc qdisc add dev ${MAIN_LINK} handle ffff: ingress
fi
sudo tc filter del dev ${MAIN_LINK} ingress
sudo tc filter add dev ${MAIN_LINK} parent ffff: protocol ip u32 match ip dport 20780 0xffff match ip protocol 17 0xff action mirred egress redirect dev ${BRIDGE}
sudo tc qdisc add dev ${MAIN_LINK} root handle 1: prio
sudo tc filter add dev ${MAIN_LINK} parent 1: protocol ip prio 1 u32 flowid 1:1 match ip dport 20780 0xffff match ip protocol 17 0xff
If I add packet loss, using a loop to ramp it up bit by bit like this then it works:
tc qdisc add dev "${MAIN_LINK}" parent 1:1 netem loss random ${LEVEL}%
tc qdisc add dev "${BRIDGE}" root handle 1: netem loss random ${LEVEL}%
If I add packet delay, again ramping up bit by bit like this then it has no effect that I can see at all:
tc qdisc add dev "${MAIN_LINK}" parent 1:1 netem delay ${DELAY} ${JITTER} distribution normal
tc qdisc add dev "${BRIDGE}" root netem delay ${DELAY} ${JITTER} distribution normal
Values of DELAY
and JITTER
went up to 1870 and 1530 (340 to 3400 ms delay) and there was no apparent effect at all.
How do I get packet delay to work? Why does packet loss work but packet delay does not?
AlastairG
(213 rep)
Jul 8, 2025, 09:25 AM
2
votes
1
answers
2168
views
Using qdisc prio under htb class
I have 2 services, both operate over the same interface. Service A goal is keep high bandwidth while sending massive amount of data. Service B goal is low latency. Service B packets should **always** be in favor of Service's A packets. I need a TC structure to be able to : - Rate limit both service...
I have 2 services, both operate over the same interface.
Service A goal is keep high bandwidth while sending massive amount of data.
Service B goal is low latency.
Service B packets should **always** be in favor of Service's A packets.
I need a TC structure to be able to :
- Rate limit both service A & B
- Give service B packets priority with 0% latency affect by service A packets.
- Let each service utilize the whole line (or up to its limit) if the other service isn't transmitting.
I tried about an htb structure where I have
class htb classid x
which may be rate/ceil limit and qdisc prio
(say handle y:0) below as child (it shall auto create i.e. classes y:1, y:2 & y:3) and use filters by src ip to redirect packets to y:1 / y:2.
However, it doesn't seem to work.
Both class x
and it's children traffic seem to be 0. (used tc -s class/qdisc/filter show dev dev
to see)
When watching the filters I can clearly see the "hits" so the data was supposed to get redirected correctly.
Here are the commands I execute :
tc qdisc add dev dev root handle 1: htb
tc class add dev dev parent 1:0 classid 1:1 htb rate 10gbit ceil 10gbit
# class x
tc class add dev dev parent 1:1 classid 1:2 htb rate 10gbit ceil 10gbit
# auto creates classes 21:1, 21:2 and 21:3
tc qdisc add dev dev parent 1:2 handle 21: prio
# example for service b filter (latency driven)
tc filter add dev dev parent 1:0 prio 2 u32 match ip src x.x.x.x/32 flowid 21:1
# example for service a filter
tc filter add dev dev parent 1:0 prio 2 u32 match ip src x.x.x.x/32 flowid 21:2
SagiLow
(287 rep)
Jul 18, 2016, 06:48 PM
• Last activity: Jun 24, 2025, 04:02 AM
2
votes
1
answers
60
views
How to mark 802.1Q ethernet frame with PCP bits according to encapsulated IP header IP Precedence bits
I would like the IP header IP Precedence bits to be copied into 802.1Q PCP bits for outgoing traffic sourced from the host in question. Specifically for iperf3 and ping utilities. I have failed to set PCP bits for pings. OS Fedora release 38, "Server Edition", NetworkManager, eno2 ethernet eno2 eno2...
I would like the IP header IP Precedence bits to be copied into 802.1Q PCP bits for outgoing traffic sourced from the host in question. Specifically for iperf3 and ping utilities.
I have failed to set PCP bits for pings.
OS Fedora release 38, "Server Edition", NetworkManager,
eno2 ethernet eno2
eno2.814 vlan eno2.814
ip -d link show eno2
3: eno2: mtu 1600 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether ac:16:2d:72:3f:fd brd ff:ff:ff:ff:ff:ff promiscuity 0 allmulti 0 minmtu 60 maxmtu 9000 addrgenmode none numtxqueues 5 numrxqueues 5 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 parentbus pci parentdev 0000:03:00.1
altname enp3s0f1
ip -d link show eno2.814
10: eno2.814@eno2: mtu 1600 qdisc pfifo state UP mode DEFAULT group default qlen 1000
link/ether ac:16:2d:72:3f:fd brd ff:ff:ff:ff:ff:ff promiscuity 0 allmulti 0 minmtu 0 maxmtu 65535
vlan protocol 802.1Q id 814
ingress-qos-map { 1:1 2:2 3:3 4:4 5:5 6:6 7:7 }
egress-qos-map { 1:1 2:2 3:3 4:4 5:5 6:6 7:7 } addrgenmode none numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536
cat /proc/net/vlan/eno2.814
eno2.814 VID: 814 REORDER_HDR: 0 dev->priv_flags: 81021
total frames received 294
total bytes received 21846
Broadcast/Multicast Rcvd 0
total frames transmitted 271
total bytes transmitted 23846
Device: eno2
INGRESS priority mappings: 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
EGRESS priority mappings: 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
Ping command to send 8 requests:
for pcp in 0x00 0x20 0x40 0x60 0x80 0xA0 0xC0 0xE0; do ping 192.168.22.3 -w2 -c1 -Q $pcp ; done
Sent packets are captured on outgoing interface with "tshark -i eno2 -f 'icmp and dst host 192.168.22.3' -V".
grep for L2 and L3 CoS fields in headres shows intended DSCP values there but '000' PCP "Priority" values:
000. .... .... .... = Priority: Best Effort (default) (0)
0000 00.. = Differentiated Services Codepoint: Default (0)
000. .... .... .... = Priority: Best Effort (default) (0)
0010 00.. = Differentiated Services Codepoint: Class Selector 1 (8)
000. .... .... .... = Priority: Best Effort (default) (0)
0100 00.. = Differentiated Services Codepoint: Class Selector 2 (16)
000. .... .... .... = Priority: Best Effort (default) (0)
0110 00.. = Differentiated Services Codepoint: Class Selector 3 (24)
000. .... .... .... = Priority: Best Effort (default) (0)
1000 00.. = Differentiated Services Codepoint: Class Selector 4 (32)
000. .... .... .... = Priority: Best Effort (default) (0)
1010 00.. = Differentiated Services Codepoint: Class Selector 5 (40)
000. .... .... .... = Priority: Best Effort (default) (0)
1100 00.. = Differentiated Services Codepoint: Class Selector 6 (48)
000. .... .... .... = Priority: Best Effort (default) (0)
1110 00.. = Differentiated Services Codepoint: Class Selector 7 (56)
What I've tried that haven't helped:
swithing off reorder_hdr
ip link set eno2.814 type vlan reorder_hdr off
Setting vlan egress-qos-map to map kernel values(wich IMHO should be already set equal to IP precedence values of the ping utility) to PCP:
ip link set eno2.814 type vlan egress-qos-map 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
Setting outgoing interface qdisc. I've created eno2.814 on eno2 with nmtui and no qdisc was set by default. So I've thought it could be the problem and tried to set the queues and qdisc(s) manually
ip link set eno2 numtxqueues 8 numrxqueues 8
tc qdisc add dev eno2.814 root handle 1: mq -- RTNETLINK answers: Operation not supported
tc qdisc add dev eno2.814 root handle 1: mqprio -- Error: Specified qdisc kind is unknown.
tc qdisc add dev eno2.814 root handle 1: multiq -- Error: Specified qdisc kind is unknown.
tc qdisc delete dev eno2.814 root
tc qdisc add dev eno2.814 root handle 1: pfifo_fast
sudo systemctl restart NetworkManager does not seem to help either.
What I don't get:
I assume that ping -Q set kernel SO_PRIORITY for a packet. Does it?
Can vlan and parent qdiscs difference have any influence?
Why "/proc/net/vlan/eno2.814" EGRESS priority mappings shows mapping 0:0 but "ip -d link show eno2.814 egress-qos-map" does not?
Do I need to get into hw queus presented to kernel or I need just one hw or some default queues if I just want packet marking, not specific queue handling?
What is wrong with my config?
off-on
(61 rep)
Jun 11, 2025, 11:54 AM
• Last activity: Jun 11, 2025, 08:39 PM
0
votes
0
answers
66
views
Bidirectional Traffic Forwarding Issue with tc filter
I'm working with a `tc` filter setup and I have the following configuration: > sudo tc qdisc add dev eth0 handle ffff: ingress \ > sudo tc filter add dev eth0 parent ffff: protocol ip prio 1 flower ip_proto icmp src_ip 10.0.0.5 action mirred egress redirect dev tun0 This is what I expect from the se...
I'm working with a
tc
filter setup and I have the following configuration:
> sudo tc qdisc add dev eth0 handle ffff: ingress \
> sudo tc filter add dev eth0 parent ffff: protocol ip prio 1 flower ip_proto icmp src_ip 10.0.0.5 action mirred egress redirect dev tun0
This is what I expect from the setup: I want to forward ICMP traffic from a specific source IP (10.0.0.5) arriving at eth0 to the tun0 interface. Similarly, I expect traffic on tun0 destined to eth0 to be forwarded correctly.
However, I'm experiencing an issue where traffic from eth0 to tun0 flows as expected, but traffic from tun0 that should be forwarded to eth0 is not working. tun0 receives packets that should be sent to eth0, but they don't get forwarded.
I have tested this configuration on other devices, and it works correctly in both directions, so I'm puzzled about why it fails here.
Could someone help me understand what might be happening? Also, how can I troubleshoot this issue more effectively to observe what exactly is going wrong in the packet forwarding process?
Thanks in advance for your insights!
Andy R
(1 rep)
Feb 3, 2025, 12:37 PM
4
votes
1
answers
7685
views
Is it possible to throttle upload bandwidth per `IP` basis using `tc`, `htb` and `iptables` ? (Download limitation not required)
#### Problem I've searched `internet` like anything but couldn't find much about limiting `upload`. The solutions given are not limiting `IP` basis like [this one][1] but LAN as a whole. +-----+ +--------+ | S | | User A |---+ W | +--------+ | I | +--------+ | T | +--------+ +----------+ | User B |-...
#### Problem
I've searched
internet
like anything but couldn't find much about limiting upload
.
The solutions given are not limiting IP
basis like this one but LAN as a whole.
+-----+
+--------+ | S |
| User A |---+ W |
+--------+ | I |
+--------+ | T | +--------+ +----------+
| User B |---+ C +-----| Router |--------| Internet |
+--------+ | H | +--------+ +----------+
.... ... / ...
+--------+ | H |
| User N |---+ U |
+--------+ | B |
+-----+
- UserA:172.16.10.2
- UserB:172.16.10.3
- RouterPrivate:172.16.0.1
- UserC:172.16.10.4
I want to limit only upload
of 172.16.10.3
& 172.16.10.4
using tc
htb
and iptables
#### What I've already tried
I altered the script as per my requirement.
IF_INET=external
# upload bandwidth limit for interface
BW_MAX=2000
# upload bandwidth limit for 172.16.16.11
BW_CLIENT=900
# first, clear previous settings
tc qdisc del dev ${IF_INET} root
# top-level htb queue discipline; send unclassified data into class 1:10
tc qdisc add dev ${IF_INET} root handle 1: htb default 10
# parent class (wrap everything in this class to allow bandwidth borrowing)
tc class add dev externel parent 1: classid 1:1 htb \
rate ${BW_MAX}kbit ceil ${BW_MAX}kbit
# two child classes
#
# the default child class
tc class add dev ${IF_INET} parent 1:1 \
classid 1:10 htb rate $((${BW_MAX} - ${BW_CLIENT}))kbit ceil ${BW_MAX}kbit
# the child class for traffic from 172.16.16.11
tc class add dev ${IF_INET} parent 1:1 \
classid 1:20 htb rate ${BW_CLIENT}kbit ceil ${BW_MAX}kbit
# classify traffic
tc filter add dev ${IF_INET} parent 1:0 protocol ip prio 1 u32 \
match ip src 172.16.16.11/32 flowid 1:20
but this will *not* work for limiting upload. So what's the solution?
Adi
(93 rep)
Jun 11, 2015, 02:25 PM
• Last activity: Jan 28, 2025, 07:06 AM
4
votes
0
answers
2511
views
How can I limit bandwidth per connection using tc?
I am new to Linux and `tc` command and I have been looking to limit bandwidth per connection using `tc`. I have a server application that handles requests from clients consisting of I/O operations, and I want each request to reach a maximum speed of 50MB/s if there is enough bandwidth (but I make su...
I am new to Linux and
tc
command and I have been looking to limit bandwidth per connection using tc
. I have a server application that handles requests from clients consisting of I/O operations, and I want each request to reach a maximum speed of 50MB/s if there is enough bandwidth (but I make sure there are not too many parallel requests such that the bandwidth will go lower than 50MB/s per request).
I used tc
to limit bandwidth, but all connections split 50MB/s instead of each connection getting 50MB/s.
tc
commands I tried:
tc qdisc add dev eth4 root netem rate 400mbit
or
sudo tc qdisc add dev eth4 root handle 1: htb default 30
sudo tc class add dev eth4 parent 1: classid 1:1 htb rate 100gbit burst 15k
sudo tc class add dev eth4 parent 1:1 classid 1:10 htb rate 400mbit burst 15k
sudo tc class add dev eth4 parent 1:1 classid 1:20 htb rate 400mbit burst 15k
sudo tc filter add dev eth4 protocol ip parent 1: prio 1 u32 match ip dst 0.0.0.0/0 flowid 1:10
sudo tc filter add dev eth4 protocol ip parent 1: prio 1 u32 match ip src 0.0.0.0/0 flowid 1:20
or, since I am in control of both the server and clients and I know that the server handles clients' requests on port 9000, I tried the following on the client machine:
sudo tc qdisc add dev eth4 root handle 1: prio priomap 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
sudo tc qdisc add dev eth4 parent 1:2 handle 20: netem rate 400mbit
sudo tc filter add dev eth4 parent 1:0 protocol ip u32 match ip dport 9000 0xffff flowid 1:2
No solution did what I want. I think I need to create a class for each port through which requests are being sent, but I do not know them beforehand, since they are automatically selected. Is there a way to create these classes "on the go" or another solution?
Ben
(41 rep)
Apr 14, 2022, 12:10 PM
• Last activity: Jan 16, 2025, 07:31 AM
0
votes
0
answers
36
views
Tc-Netem not working with a bridge for simulating jitter
Im using Ubuntu 24.04. I need to make a bridge to simulate jitter using TC-Netem between two devices but is not working with im doing right now. I have a setup consisting of 3 units. One that generates network udp packets (Device A) that it sends to the Ubuntu input interface (enx7cc2c6474599), this...
Im using Ubuntu 24.04. I need to make a bridge to simulate jitter using TC-Netem between two devices but is not working with im doing right now.
I have a setup consisting of 3 units. One that generates network udp packets (Device A) that it sends to the Ubuntu input interface (enx7cc2c6474599), this should simulate a jitter and disorder in the forwarding through a bridge that is composed of the interface (enx7cc2c6474599) and the interface (enx7cc2c6331825) called br0.
Through this bridge it should send the jitter-affected traffic to Device B over the interface (enx7cc2c6331825).
The bridge is created with this set of commands in order to work:

sudo ip link add name br0 type bridge
sudo ip link set enx7cc2c6474599 master br0
sudo ip link set enx7cc2c6331825 master br0
sudo ip link set enx7cc2c6474599 up
sudo ip link set enx7cc2c6331825 up
sudo ip link set br0 up
sudo sysctl -w net.ipv4.ip_forward=1
sudo sysctl net.ipv4.conf.all.forwarding
sudo sysctl net.ipv4.conf.default.forwarding
sudo sysctl -p
Then for testing i send traffic and i can see it perfectly on the Device B. But when i do:
sudo tc qdisc add dev enx7cc2c6474599 root netem delay 10ms 8ms distribution normal
sudo tc qdisc add dev enx7cc2c6331825 root netem delay 10ms 8ms distribution normal
I cant see the packets in the correct order and with no jitter. Also i tried using the "tc qdisc" command on the br0 (brige) with the same or worse results.
Also tried this one:
sudo tc qdisc add dev enx7cc2c6474599 root netem delay 10ms 40ms reorder 25%
sudo tc qdisc add dev enx7cc2c6331825 root netem delay 10ms 40ms reorder 25%
Here is a description of the interfaces:
> br0: mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 5e:96:25:d5:26:df brd ff:ff:ff:ff:ff:ff
inet6 fe80::5c96:25ff:fed5:26df/64 scope link
valid_lft forever preferred_lft forever
> enx7cc2c6474599: mtu 1500 qdisc fq_codel master br0 state UP group default qlen 1000
link/ether 7c:c2:c6:47:45:99 brd ff:ff:ff:ff:ff:ff
> enx7cc2c6331825: mtu 1500 qdisc fq_codel master br0 state UP group default qlen 1000
link/ether 7c:c2:c6:33:18:25 brd ff:ff:ff:ff:ff:ff
Carlos López Martínez
(101 rep)
Nov 20, 2024, 03:52 PM
1
votes
1
answers
64
views
QoS on Linux: tc doesn't see RTP traffic
I have a camera that creates RTSP traffic. I connected it to a Linux PC via Ethernet, configured the network and access. But when I tried to apply QoS rules, the tc statistics showed that too few bytes were sent. After some research, I found that HTTP, SSH and RTSP (connection) traffic from the came...
I have a camera that creates RTSP traffic. I connected it to a Linux PC via Ethernet, configured the network and access. But when I tried to apply QoS rules, the tc statistics showed that too few bytes were sent.
After some research, I found that HTTP, SSH and RTSP (connection) traffic from the camera was displayed correctly in the statistics. However, tc seems to work differently with RTP traffic.
Video in VLC was playing, nft and tcpdump showed traffic. I tried using Debian 12, Ubuntu 24.04, Manjaro - it still didn't work. Imitating RTP with FFMPEG also did not bring success. This seems really weird and I didn't know what could cause the problem or what else to try.
eXulW0lf
(21 rep)
Sep 29, 2024, 06:49 PM
• Last activity: Oct 10, 2024, 01:38 PM
0
votes
0
answers
73
views
Is Linux tc tool only useful for tcp traffic?
Is the `tc` tool only useful for TPC traffic, or can it also be used to control other protocols like UDP traffic?
Is the
tc
tool only useful for TPC traffic, or can it also be used to control other protocols like UDP traffic?
Xiaoyong Guo
(101 rep)
Aug 29, 2024, 08:02 AM
• Last activity: Sep 2, 2024, 02:40 PM
0
votes
1
answers
296
views
Changing packet payload with tc
How can tc be used to match a particular payload of an ingress packet, e.g., if the first 32 bits of payload of an IP/UDP packet are equal to some constant `$c`, the value `$c` should be changed to `$d`? This should work in particular for variable length IP headers. It appears that the `u32` filter...
How can tc be used to match a particular payload of an ingress packet, e.g., if the first 32 bits of payload of an IP/UDP packet are equal to some constant
$c
, the value $c
should be changed to $d
? This should work in particular for variable length IP headers.
It appears that the u32
filter should be able to perform the matching. Is the following attempt correct? I am not sure about the nexthdr
part in particular.
tc filter add dev protocol ip parent ffff: u32 match $c 0xffffffff at nexthdr+8
Now pedit
can be used to change the packet but I don't see a way to write $d
in the UDP payload of a packet with variable length IP header.
Any help is appreciated.
qemvirt
(13 rep)
Oct 19, 2023, 10:19 PM
• Last activity: Apr 28, 2024, 02:30 PM
0
votes
1
answers
167
views
How to deterministically vary the delay in programs like netem?
I am trying to set up a network scenario in which there is a variable delay between two nodes. Netem allows to set up a fixed delay and add a jitter according to some probabilistic distribution. However I would like to achieve a delay that vary according to a similar law: [
rul_h
(1 rep)
Oct 9, 2022, 11:44 AM
• Last activity: Mar 27, 2024, 05:57 PM
1
votes
1
answers
358
views
Traffic shaping ineffective on tun device
I am developing a tunnel application that will provide a low-latency, variable bandwidth link. This will be operating in a system that requires traffic prioritization. However, while traffic towards the tun device is clearly being queued by the kernel, it appears whatever qdisc I apply to the device...
I am developing a tunnel application that will provide a low-latency, variable bandwidth link. This will be operating in a system that requires traffic prioritization. However, while traffic towards the tun device is clearly being queued by the kernel, it appears whatever qdisc I apply to the device it has no additional effect, including the default pfifo_fast, i.e. what should be high priority traffic is not being handled separately from normal traffic.
I have made a small test application to demonstrate the problem. It creates two tun devices and has two threads each with a loop passing packets from one interface to the other and back, respectively. Between receiving and sending the loop delays 1us for every byte, roughly emulating an 8Mbps bidirectional link:
void forward_traffic(int src_fd, int dest_fd) {
char buf[BUFSIZE];
ssize_t nbytes = 0;
while (nbytes >= 0) {
nbytes = read(src_fd, buf, sizeof(buf));
if (nbytes >= 0) {
usleep(nbytes);
nbytes = write(dest_fd, buf, nbytes);
}
}
perror("Read/write TUN device");
exit(EXIT_FAILURE);
}
With each tun interface placed in its own namespace, I can run iperf3 and get about 8Mbps of throughput. The default txqlen reported by ip link is 500 packets and when I run an iperf3 (-P 20) and a ping at the same time I see a RTTs from about 670-770ms, roughly corresponding to 500 x 1500 bytes of queue. Indeed, changing txqlen changes the latency proportionally. So far so good.
With the default pfifo_fast qdisc I would expect a ping with the right ToS mark to skip that normal queue and give me a low latency, e.g ping -Q 0x10 I think should have much lower RTT, but doesn't (I have tried other ToS/DSCP values as well - they all have the same ~700ms RTT. Additionally I have tried various other qdiscs with the same results, e.g. fq_codel doesn't have a significant effect on latency.
Regardless of the qdisc, tc -s qdisc always shows a backlog of 0 regardless of whether the link is congested. (But I do see ip -s link show dropped packets under congestion)
Am I fundamentally misunderstanding something here or there something else I need to do make the qdisc effective?
Complete source here
sheddenizen
(111 rep)
Dec 2, 2023, 06:05 PM
• Last activity: Dec 27, 2023, 03:42 PM
0
votes
2
answers
419
views
Can netfilter act as a DHCP relay?
I'm wondering whether instead of using a DHCP relay netfilter (be that `tc` or `nftables`) can be used to route DHCP broadcast packets to a Docker container attached to a bridge. The reasoning for this is that I'd like to move away from having to use a `macvlan` DHCP container so it can appear as if...
I'm wondering whether instead of using a DHCP relay netfilter (be that
tc
or nftables
) can be used to route DHCP broadcast packets to a Docker container attached to a bridge.
The reasoning for this is that I'd like to move away from having to use a macvlan
DHCP container so it can appear as if one IP (i.e. the router IP) is handling all of the network operations. DHCP containers usually require CAP_NET_ADMIN
(due to DHCP requiring promiscuous mode) and I understand that without a macvlan
this would give control over the host's network stack (I also userns-remap
my containers).
It would be great if it were possible to modify the DHCP packets and forward them on. A relay wouldn't work here as it would still require the same macvlan
approach as the DHCP container already has.
Is this something that's possible? Thanks
Synthetic Ascension
(249 rep)
Aug 19, 2023, 09:20 AM
• Last activity: Nov 11, 2023, 01:00 PM
1
votes
1
answers
510
views
MAC address rewriting using tc
I am using `tc` to change the MAC address of incoming packets on a TAP interface (`tap0`) as follows where `mac_org` is the MAC address of a guest in a QEMU virtual machine and `mac_new` is a different MAC address that `mac_org` should be replaced with. tc qdisc add dev tap0 ingress handle ffff: tc...
I am using
tc
to change the MAC address of incoming packets on a TAP interface (tap0
) as follows where mac_org
is the MAC address of a guest in a QEMU virtual machine and mac_new
is a different MAC address that mac_org
should be replaced with.
tc qdisc add dev tap0 ingress handle ffff:
tc filter add dev tap0 protocol ip parent ffff: \
flower src_mac ${mac_org} \
action pedit ex munge eth src set ${mac_new} pipe \
action csum ip pipe \
action xt -j LOG
I also add an iptables rule to log UDP packets on the input hook.
iptables -A INPUT -p udp -j LOG
syslog shows that indeed the DHCP discover packet is changed accordingly. The tc
log entry looks as follows:
IN=tap0 OUT= MAC=ff:ff:ff:ff:ff:ff:${mac_new}:08:00 SRC=0.0.0.0 DST=255.255.255.255 LEN=338 TOS=0x00 PREC=0xC0 TTL=64 ID=0 DF PROTO=UDP SPT=68 DPT=67 LEN=318
and the log entry of the netfilter input hook which follows the tc
ingress hook as the locally incoming packet is passed towards the socket shows the same result slightly differently formatted.
IN=tap0 OUT= MACSRC=${mac_new} MACDST=ff:ff:ff:ff:ff:ff MACPROTO=0800 SRC=0.0.0.0 DST=255.255.255.255 LEN=338 TOS=0x00 PREC=0xC0 TTL=64 ID=0 DF PROTO=UDP SPT=68 DPT=67 LEN=318
Before starting QEMU I run dnsmasq
on tap0
which surprisingly shows the output:
DHCPDISCOVER(tap0) ${mac_org}
Running strace -f -x -s 10000 -e trace=network dnsmasq ...
shows a recvmsg
call that contains ${mac_org}
instead of ${mac_new}
.
recvmsg(4, {msg_name={sa_family=AF_INET, sin_port=htons(68), sin_addr=inet_addr("0.0.0.0")}, msg_namelen=16, msg_iov=[{iov_base="... ${mac_org} ..." ...
How can that happen? It almost appears as if the packet is altered after the netfilter input hook.
qemvirt
(13 rep)
Oct 15, 2023, 10:18 PM
• Last activity: Oct 16, 2023, 12:06 AM
1
votes
1
answers
918
views
Redirect port using TC BPF
I'm want to use `TC BPF` to redirect incoming traffic from port `80` to port `8080`. Below is my own code, but I've also tried the example from [man 8 tc-bpf](https://man7.org/linux/man-pages/man8/tc-bpf.8.html) (search for `8080`) and I get the same result. ``` #include #include #include #include #...
I'm want to use
TC BPF
to redirect incoming traffic from port 80
to port 8080
.
Below is my own code, but I've also tried the example from [man 8 tc-bpf](https://man7.org/linux/man-pages/man8/tc-bpf.8.html) (search for 8080
) and I get the same result.
#include
#include
#include
#include
#include
#include
#include
#include
#include
static inline void set_tcp_dport(struct __sk_buff *skb, int nh_off,
__u16 old_port, __u16 new_port)
{
bpf_l4_csum_replace(skb, nh_off + offsetof(struct tcphdr, check),
old_port, new_port, sizeof(new_port));
bpf_skb_store_bytes(skb, nh_off + offsetof(struct tcphdr, dest),
&new_port, sizeof(new_port), 0);
}
SEC("tc_my")
int tc_bpf_my(struct __sk_buff *skb)
{
struct iphdr ip;
struct tcphdr tcp;
if (0 != bpf_skb_load_bytes(skb, sizeof(struct ethhdr), &ip, sizeof(struct iphdr))) {
bpf_printk("bpf_skb_load_bytes iph failed");
return TC_ACT_OK;
}
if (0 != bpf_skb_load_bytes(skb, sizeof(struct ethhdr) + (ip.ihl %pI4:%u", &ip.saddr, src_port, &ip.daddr, dst_port);
if (dst_port != 80)
return TC_ACT_OK;
set_tcp_dport(skb, ETH_HLEN + sizeof(struct iphdr), __constant_htons(80), __constant_htons(8080));
return TC_ACT_OK;
}
char LICENSE[] SEC("license") = "GPL";
On machine A, I am running:
clang -g -O2 -Wall -target bpf -c tc_my.c -o tc_my.o
tc qdisc add dev ens160 clsact
tc filter add dev ens160 ingress bpf da obj tc_my.o sec tc_my
nc -l 8080
On machine B:
nc $IP_A 80
On machine B, nc
seems connected, but ss
shows:
SYN-SENT 0 1 $IP_B:53442 $IP_A:80 users:(("nc",pid=30180,fd=3))
On machine A, connection remains in SYN-RECV
before being dropped.
I was expecting my program to behave as if I added this iptables
rule:
iptables -t nat -A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT --to-port 8080
Maybe my expectations are wrong, but I would like to understand why. How can I get my TC BPF
redirect to work?
SOLUTION
-----------------
Following the explanation in my accepted answer, here is an example code which works for TCP, does ingress NAT 90->8080, and egress de-NAT 8080->90.
#include
#include
#include
#include
#include
#include
#include
#include
#include
static inline void set_tcp_dport(struct __sk_buff *skb, int nh_off,
__u16 old_port, __u16 new_port)
{
bpf_l4_csum_replace(skb, nh_off + offsetof(struct tcphdr, check),
old_port, new_port, sizeof(new_port));
bpf_skb_store_bytes(skb, nh_off + offsetof(struct tcphdr, dest),
&new_port, sizeof(new_port), 0);
}
static inline void set_tcp_sport(struct __sk_buff *skb, int nh_off,
__u16 old_port, __u16 new_port)
{
bpf_l4_csum_replace(skb, nh_off + offsetof(struct tcphdr, check),
old_port, new_port, sizeof(new_port));
bpf_skb_store_bytes(skb, nh_off + offsetof(struct tcphdr, source),
&new_port, sizeof(new_port), 0);
}
SEC("tc_ingress")
int tc_ingress_(struct __sk_buff *skb)
{
struct iphdr ip;
struct tcphdr tcp;
if (0 != bpf_skb_load_bytes(skb, sizeof(struct ethhdr), &ip, sizeof(struct iphdr)))
{
bpf_printk("bpf_skb_load_bytes iph failed");
return TC_ACT_OK;
}
if (0 != bpf_skb_load_bytes(skb, sizeof(struct ethhdr) + (ip.ihl %pI4:%u", &ip.saddr, src_port, &ip.daddr, dst_port);
if (dst_port != 90)
return TC_ACT_OK;
set_tcp_dport(skb, ETH_HLEN + sizeof(struct iphdr), __constant_htons(90), __constant_htons(8080));
return TC_ACT_OK;
}
SEC("tc_egress")
int tc_egress_(struct __sk_buff *skb)
{
struct iphdr ip;
struct tcphdr tcp;
if (0 != bpf_skb_load_bytes(skb, sizeof(struct ethhdr), &ip, sizeof(struct iphdr)))
{
bpf_printk("bpf_skb_load_bytes iph failed");
return TC_ACT_OK;
}
if (0 != bpf_skb_load_bytes(skb, sizeof(struct ethhdr) + (ip.ihl %pI4:%u", &ip.saddr, src_port, &ip.daddr, dst_port);
if (src_port != 8080)
return TC_ACT_OK;
set_tcp_sport(skb, ETH_HLEN + sizeof(struct iphdr), __constant_htons(8080), __constant_htons(90));
return TC_ACT_OK;
}
char LICENSE[] SEC("license") = "GPL";
Here is how I build and loaded the different sections in my program:
clang -g -O2 -Wall -target bpf -c tc_my.c -o tc_my.o
tc filter add dev ens32 ingress bpf da obj /tc_my.o sec tc_ingress
tc filter add dev ens32 egress bpf da obj /tc_my.o sec tc_egress
greenro
(13 rep)
Sep 14, 2023, 01:04 PM
• Last activity: Sep 15, 2023, 09:25 AM
1
votes
0
answers
388
views
How to set bandwidth limit using linux tc
In my linux router: 1. interface eth1 total bandwidth is 1gbit 2. I want to divide 1140kbit to GroupA, divide 150kbit to GroupB 3. Set users 10.10.10.158, 10.10.21.5, 10.10.21.6 to GroupB 4. Each user has no more than 128kbit bandwidth 5. And three users has no more than 150kbit total bandwidth. Fol...
In my linux router:
1. interface eth1 total bandwidth is 1gbit
2. I want to divide 1140kbit to GroupA, divide 150kbit to GroupB
3. Set users 10.10.10.158, 10.10.21.5, 10.10.21.6 to GroupB
4. Each user has no more than 128kbit bandwidth
5. And three users has no more than 150kbit total bandwidth.
Following are what I set:
sudo tc qdisc del dev eth1 root 2>/dev/null
sudo tc qdisc add dev eth1 root handle 1: htb default 2
sudo tc class add dev eth1 parent 1: classid 1:1 htb rate 1gbit ceil 1gbit
sudo tc class add dev eth1 parent 1:1 classid 1:2 htb rate 10kbps ceil 10kbps
sudo tc class add dev eth1 parent 1:1 classid 1:10 htb rate 1140kbit ceil 1140kbit
sudo tc class add dev eth1 parent 1:1 classid 1:20 htb rate 128kbit ceil 128kbit
sudo tc class add dev eth1 parent 1:20 classid 1:21 htb rate 128kbit ceil 128kbit
sudo tc filter add dev eth1 protocol ip parent 1:0 prio 1 u32 match ip dst 10.10.10.158/32 flowid 1:21
sudo tc class add dev eth1 parent 1:20 classid 1:22 htb rate 128kbit ceil 128kbit
sudo tc filter add dev eth1 protocol ip parent 1:0 prio 1 u32 match ip dst 10.10.21.5/32 flowid 1:22
sudo tc class add dev eth1 parent 1:20 classid 1:23 htb rate 128kbit ceil 128kbit
sudo tc filter add dev eth1 protocol ip parent 1:0 prio 1 u32 match ip dst 10.10.21.6/32 flowid 1:23
However, I found three users total bandwidth is about 376kbit.
What should I do to acheive my goal?
ackema
(11 rep)
Aug 28, 2023, 06:19 AM
0
votes
0
answers
63
views
Can no longer ping containers after setting TBF qdisc on Docker0
I am trying to use the `tc` command to manipulate traffic on the docker0 interface. I run the commands ``` tc qdisc del dev docker0 root tc qdisc add dev docker0 root handle 1: tbf rate 100mbps burst 1600 limit 1 ``` I believe this is what it does: - `tbf`: Specifies the TBF qdisc to be used. - `rat...
I am trying to use the
tc
command to manipulate traffic on the docker0 interface.
I run the commands
tc qdisc del dev docker0 root
tc qdisc add dev docker0 root handle 1: tbf rate 100mbps burst 1600 limit 1
I believe this is what it does:
- tbf
: Specifies the TBF qdisc to be used.
- rate 100mbps
: Sets the maximum bandwidth rate to 100 Mbps for the docker0 interface.
- burst 1600
: Sets the maximum amount of data that can be transmitted in a single burst to 1600 bytes.
- limit 1
: Limits the token bucket size to 1 token, which limits the amount of data that can be sent at any given time to the burst size.
However, after setting this rule, I can no longer ping containers that are already running and attached to the default docker0 interface. I can also no longer build images that contain commands such as RUN apt-get update -y
.
Why is this the case. Can this qdisc configuration not be used alone?
akastack
(73 rep)
May 16, 2023, 01:32 AM
1
votes
0
answers
1447
views
tc filter - error talking to the kernel
I am trying to add a tc flower filter for the geneve protocol and I am getting this error: ``` % sudo tc filter add dev gnv0 protocol ip parent ffff: \ flower geneve_opts 0108:01:020000000000000000/FFFF:FF:FF0000000000000000,0108:02:020000000000000000/FFFF:FF:FF0000000000000000,0108:03:0100000000/FF...
I am trying to add a tc flower filter for the geneve protocol and I am getting this error:
% sudo tc filter add dev gnv0 protocol ip parent ffff: \
flower geneve_opts 0108:01:020000000000000000/FFFF:FF:FF0000000000000000,0108:02:020000000000000000/FFFF:FF:FF0000000000000000,0108:03:0100000000/FFFF:FF:FF00000000 \
action tunnel_key unset
RTNETLINK answers: No such file or directory
We have an error talking to the kernel
I am using Amazon Linux:
% uname -a
Linux ip-10-0-40-230.ec2.internal 4.14.311-233.529.amzn2.x86_64 #1 SMP Thu Mar 23 09:54:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
I want to terminate a GENEVE UDP tunnel coming from an AWS Gateway load balancer which is mirroring traffic. The idea is to decap the packet into its original form. Note that I have also tried vxlan_opts
with no luck either so the issue is more specific to tc than to the filter imo.
------
I have loaded some kernel modules that were suggested online (not sure that they are necessary nor sufficient);
% lsmod | grep sch
sch_htb 24576 0
sch_netem 20480 0
sch_ingress 16384 1
Full example:
sudo yum install tc
sudo modprobe sch_netem
sudo modprobe sch_htb
sudo ip link add name gnv0 type geneve dstport 6081 external
sudo ip link set gnv0 up
sudo tc qdisc add dev gnv0 ingress
sudo tc filter add dev gnv0 protocol ip parent ffff: \
flower geneve_opts 0108:01:020000000000000000/FFFF:FF:FF0000000000000000,0108:02:020000000000000000/FFFF:FF:FF0000000000000000,0108:03:0100000000/FFFF:FF:FF00000000 \
action tunnel_key unset
I tried it on AL2023 and similar error;
sudo ip link add name gnv0 type geneve dstport 6081 external
sudo ip link set gnv0 up
sudo tc qdisc add dev gnv0 ingress
sudo tc filter add dev gnv0 protocol ip parent ffff: \
flower geneve_opts 0108:01:020000000000000000/FFFF:FF:FF0000000000000000,0108:02:020000000000000000/FFFF:FF:FF0000000000000000,0108:03:0100000000/FFFF:FF:FF00000000 \
action tunnel_key unset
Error: Failed to load TC action module.
We have an error talking to the kernel
Ollie
(199 rep)
May 15, 2023, 05:06 PM
1
votes
1
answers
729
views
How to police ingress (input) packets belonging to a cgroup with iptables and tc?
I am trying to limit the download (ingress) rate for a certain app within a cgroup. I was able to limit the upload (egress) rate successfully by marking app's OUTPUT packets in iptables and then set a tc filter to handle that marked packets. However, when I did the same steps for ingress it didn't w...
I am trying to limit the download (ingress) rate for a certain app within a cgroup.
I was able to limit the upload (egress) rate successfully by marking app's OUTPUT packets in iptables and then set a tc filter to handle that marked packets.
However, when I did the same steps for ingress it didn't work.
------------------
steps I followed to limit **upload**:
1. Mark OUTPUT packets by their cgroup
$ sudo iptables -I OUTPUT -t mangle -m cgroup --path '/user.slice/.../app-firefox-...scope'\
-j MARK --set-mark 11
2. filter by fw mark (11) on the root qdisc
$ tc qdisc add dev $IFACE root handle 1: htb default 1
$ tc filter add dev $IFACE parent 1: protocol ip prio 1 handle 11 fw \
action police rate 1000kbit burst 10k drop
This limited the upload rate for firefox to 1000kbit successfully.
--------------
steps I followed trying to limit **download**:
1. Mark INPUT packets by their cgroup
$ sudo iptables -I INPUT -t mangle -m cgroup --path '/user.slice/.../app-firefox-...scope'\
-j MARK --set-mark 22
2. filter by fw mark (22) on the ingress qdisc
$ tc qdisc add dev $IFACE ingress handle ffff:
$ tc filter add dev $IFACE parent ffff: protocol ip prio 1 handle 22 fw \
action police rate 1000kbit burst 10k drop
-------
I am able to block app's download successfully with iptables:
$ sudo iptables -I INPUT -t mangle -m cgroup --path '/user.slice/.../app-firefox-....scope' -j DROP
So it seems like iptables is marking cgroup's input packets but for some reason, tc can't filter them or maybe the packets are being consumed before tc filter takes effect? if so, then what is the use of marking input packets?
If there is a way to block cgroup's input packets then there must be a way to limit them, right?
user216385
(63 rep)
Apr 29, 2023, 05:32 AM
• Last activity: Apr 29, 2023, 11:47 PM
Showing page 1 of 20 total questions