Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

1 votes

2 answers

3321 views

How do I verify the parameters set using tc command?

I am in need of simulating a high latency and low bandwidth connection for a performance test of my application. I have gone through a number of pages describing the `tc` command. But, I haven't been able to validate the numbers that I set. For example: I took the following command values from: http...

                                  I am in need of simulating a high latency and low bandwidth connection for a performance test of my application. I have gone through a number of pages describing the tc command. But, I haven't been able to validate the numbers that I set. For example:

I took the following command values from:
https://www.excentis.com/blog/use-linux-traffic-control-impairment-node-test-environment-part-2 

    tc qdisc add dev eth0 root tbf rate 1mbit burst 32kbit latency 400ms

With that applied on (say, machine A), according to the description on the page, I am assuming my output rate should be 128 kBps (at least approximately). To test this, I started transferring a 2 GB file using scp from machine A to another machine "B" which are in the same LAN. Transfer rates without any added impairment reach up to 12 MBps in this network. But, when the transfer started the rate was at 2 MBps, then it kept stalling and falling down until when it started to swing and stall between 11 kBps and 24 kBps.

I used nmon to monitor network throughput on both sides during the transfer, but it never went above 24 kBps (except for a couple of values reading 54 and 62).

I have also tried increasing the rate and bucket size, but the behavior during scp is the same. I tried the following command to increase the bucket size and the rate:

    tc qdisc add dev eth0 root tbf rate 1024kbps burst 1024kb latency 500

And scp still stalled and swung around the same rates (11-30 kBps).

Am I inferring the term "rate" wrong here? I have looked at the man page for tc and it appears that my interpretation is correct. Could anyone explain to me what would be the best way to test the set parameters (assuming I did it correctly)?

james (11 rep)

Nov 18, 2015, 03:39 PM • Last activity: Jul 17, 2025, 09:09 PM

1 votes

0 answers

19 views

Why does netem delay not work when netem loss does

I am using `tc` to test the behaviour of a networked app under various network conditions. The setup is like this: ``` if [ -z "$(tc qdisc show dev ${MAIN_LINK} ingress)" ] then sudo tc qdisc add dev ${MAIN_LINK} handle ffff: ingress fi sudo tc filter del dev ${MAIN_LINK} ingress sudo tc filter add...

I am using tc to test the behaviour of a networked app under various network conditions. The setup is like this:

if [ -z "$(tc qdisc show dev ${MAIN_LINK} ingress)" ]
then
    sudo tc qdisc add dev ${MAIN_LINK} handle ffff: ingress
fi

sudo tc filter del dev ${MAIN_LINK} ingress
sudo tc filter add dev ${MAIN_LINK} parent ffff: protocol ip u32 match ip dport 20780 0xffff match ip protocol 17 0xff action mirred egress redirect dev ${BRIDGE}

sudo tc qdisc add dev ${MAIN_LINK} root handle 1: prio
sudo tc filter add dev ${MAIN_LINK} parent 1: protocol ip prio 1 u32 flowid 1:1 match ip dport 20780 0xffff match ip protocol 17 0xff

If I add packet loss, using a loop to ramp it up bit by bit like this then it works:

tc qdisc add dev "${MAIN_LINK}" parent 1:1 netem loss random ${LEVEL}%
    tc qdisc add dev "${BRIDGE}" root handle 1: netem loss random ${LEVEL}%

If I add packet delay, again ramping up bit by bit like this then it has no effect that I can see at all:

tc qdisc add dev "${MAIN_LINK}" parent 1:1 netem delay ${DELAY} ${JITTER} distribution normal
    tc qdisc add dev "${BRIDGE}" root netem delay ${DELAY} ${JITTER} distribution normal

Values of DELAY and JITTER went up to 1870 and 1530 (340 to 3400 ms delay) and there was no apparent effect at all. How do I get packet delay to work? Why does packet loss work but packet delay does not?

AlastairG (213 rep)

Jul 8, 2025, 09:25 AM

2 votes

1 answers

2168 views

Using qdisc prio under htb class

networking bandwidth tc qos

I have 2 services, both operate over the same interface. Service A goal is keep high bandwidth while sending massive amount of data. Service B goal is low latency. Service B packets should **always** be in favor of Service's A packets. I need a TC structure to be able to : - Rate limit both service...

                                  I have 2 services, both operate over the same interface.  
Service A goal is keep high bandwidth while sending massive amount of data.  
Service B goal is low latency.  

Service B packets should **always** be in favor of Service's A packets.  
I need a TC structure to be able to :

 - Rate limit both service A & B
 - Give service B packets priority with 0% latency affect by service A packets.
 - Let each service utilize the whole line (or up to its limit) if the other service isn't transmitting.

I tried about an htb structure where I have class htb classid x which may be rate/ceil limit and qdisc prio (say handle y:0) below as child (it shall auto create i.e. classes y:1, y:2 & y:3) and use filters by src ip to redirect packets to y:1 / y:2.  
However, it doesn't seem to work.  
Both class x and it's children traffic seem to be 0. (used tc -s class/qdisc/filter show dev dev to see)   
When watching the filters I can clearly see the "hits" so the data was supposed to get redirected correctly.  

Here are the commands I execute :

    tc qdisc add dev dev root handle 1: htb
    tc class add dev dev parent 1:0 classid 1:1 htb rate 10gbit ceil 10gbit
    # class x
    tc class add dev dev parent 1:1 classid 1:2 htb rate 10gbit ceil 10gbit
    # auto creates classes 21:1, 21:2 and 21:3
    tc qdisc add dev dev parent 1:2 handle 21: prio
    # example for service b filter (latency driven)
    tc filter add dev dev parent 1:0 prio 2 u32 match ip src x.x.x.x/32 flowid 21:1
    # example for service a filter
    tc filter add dev dev parent 1:0 prio 2 u32 match ip src x.x.x.x/32 flowid 21:2
    
                                

SagiLow (287 rep)

Jul 18, 2016, 06:48 PM • Last activity: Jun 24, 2025, 04:02 AM

2 votes

1 answers

60 views

How to mark 802.1Q ethernet frame with PCP bits according to encapsulated IP header IP Precedence bits

linux tc vlan qos

I would like the IP header IP Precedence bits to be copied into 802.1Q PCP bits for outgoing traffic sourced from the host in question. Specifically for iperf3 and ping utilities. I have failed to set PCP bits for pings. OS Fedora release 38, "Server Edition", NetworkManager, eno2 ethernet eno2 eno2...

ip -d link show eno2
3: eno2:  mtu 1600 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether ac:16:2d:72:3f:fd brd ff:ff:ff:ff:ff:ff promiscuity 0  allmulti 0 minmtu 60 maxmtu 9000 addrgenmode none numtxqueues 5 numrxqueues 5 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 parentbus pci parentdev 0000:03:00.1
    altname enp3s0f1

ip -d link show eno2.814
10: eno2.814@eno2:  mtu 1600 qdisc pfifo state UP mode DEFAULT group default qlen 1000
    link/ether ac:16:2d:72:3f:fd brd ff:ff:ff:ff:ff:ff promiscuity 0  allmulti 0 minmtu 0 maxmtu 65535
    vlan protocol 802.1Q id 814
      ingress-qos-map { 1:1 2:2 3:3 4:4 5:5 6:6 7:7 }
      egress-qos-map { 1:1 2:2 3:3 4:4 5:5 6:6 7:7 } addrgenmode none numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536

cat /proc/net/vlan/eno2.814
eno2.814  VID: 814       REORDER_HDR: 0  dev->priv_flags: 81021
         total frames received          294
          total bytes received        21846
      Broadcast/Multicast Rcvd            0

      total frames transmitted          271
       total bytes transmitted        23846
    Device: eno2
INGRESS priority mappings: 0:0  1:1  2:2  3:3  4:4  5:5  6:6 7:7
EGRESS priority mappings: 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7

Ping command to send 8 requests:

for pcp in 0x00 0x20 0x40 0x60 0x80 0xA0 0xC0 0xE0; do ping 192.168.22.3 -w2 -c1 -Q $pcp ; done

Sent packets are captured on outgoing interface with "tshark -i eno2 -f 'icmp and dst host 192.168.22.3' -V". grep for L2 and L3 CoS fields in headres shows intended DSCP values there but '000' PCP "Priority" values:

000. .... .... .... = Priority: Best Effort (default) (0)
        0000 00.. = Differentiated Services Codepoint: Default (0)
    000. .... .... .... = Priority: Best Effort (default) (0)
        0010 00.. = Differentiated Services Codepoint: Class Selector 1 (8)
    000. .... .... .... = Priority: Best Effort (default) (0)
        0100 00.. = Differentiated Services Codepoint: Class Selector 2 (16)
    000. .... .... .... = Priority: Best Effort (default) (0)
        0110 00.. = Differentiated Services Codepoint: Class Selector 3 (24)
    000. .... .... .... = Priority: Best Effort (default) (0)
        1000 00.. = Differentiated Services Codepoint: Class Selector 4 (32)
    000. .... .... .... = Priority: Best Effort (default) (0)
        1010 00.. = Differentiated Services Codepoint: Class Selector 5 (40)
    000. .... .... .... = Priority: Best Effort (default) (0)
        1100 00.. = Differentiated Services Codepoint: Class Selector 6 (48)
    000. .... .... .... = Priority: Best Effort (default) (0)
        1110 00.. = Differentiated Services Codepoint: Class Selector 7 (56)

What I've tried that haven't helped: swithing off reorder_hdr ip link set eno2.814 type vlan reorder_hdr off Setting vlan egress-qos-map to map kernel values(wich IMHO should be already set equal to IP precedence values of the ping utility) to PCP: ip link set eno2.814 type vlan egress-qos-map 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7 Setting outgoing interface qdisc. I've created eno2.814 on eno2 with nmtui and no qdisc was set by default. So I've thought it could be the problem and tried to set the queues and qdisc(s) manually ip link set eno2 numtxqueues 8 numrxqueues 8 tc qdisc add dev eno2.814 root handle 1: mq -- RTNETLINK answers: Operation not supported tc qdisc add dev eno2.814 root handle 1: mqprio -- Error: Specified qdisc kind is unknown. tc qdisc add dev eno2.814 root handle 1: multiq -- Error: Specified qdisc kind is unknown. tc qdisc delete dev eno2.814 root tc qdisc add dev eno2.814 root handle 1: pfifo_fast sudo systemctl restart NetworkManager does not seem to help either. What I don't get: I assume that ping -Q set kernel SO_PRIORITY for a packet. Does it? Can vlan and parent qdiscs difference have any influence? Why "/proc/net/vlan/eno2.814" EGRESS priority mappings shows mapping 0:0 but "ip -d link show eno2.814 egress-qos-map" does not? Do I need to get into hw queus presented to kernel or I need just one hw or some default queues if I just want packet marking, not specific queue handling? What is wrong with my config?

off-on (61 rep)

Jun 11, 2025, 11:54 AM • Last activity: Jun 11, 2025, 08:39 PM

0 votes

0 answers

66 views

Bidirectional Traffic Forwarding Issue with tc filter

I'm working with a `tc` filter setup and I have the following configuration: > sudo tc qdisc add dev eth0 handle ffff: ingress \ > sudo tc filter add dev eth0 parent ffff: protocol ip prio 1 flower ip_proto icmp src_ip 10.0.0.5 action mirred egress redirect dev tun0 This is what I expect from the se...

                                  I'm working with a tc filter setup and I have the following configuration:

> sudo tc qdisc add dev eth0 handle ffff: ingress \
> sudo tc filter add dev eth0 parent ffff: protocol ip prio 1 flower ip_proto icmp src_ip 10.0.0.5 action mirred egress redirect dev tun0

This is what I expect from the setup: I want to forward ICMP traffic from a specific source IP (10.0.0.5) arriving at eth0 to the tun0 interface. Similarly, I expect traffic on tun0 destined to eth0 to be forwarded correctly.

However, I'm experiencing an issue where traffic from eth0 to tun0 flows as expected, but traffic from tun0 that should be forwarded to eth0 is not working. tun0 receives packets that should be sent to eth0, but they don't get forwarded.

I have tested this configuration on other devices, and it works correctly in both directions, so I'm puzzled about why it fails here.

Could someone help me understand what might be happening? Also, how can I troubleshoot this issue more effectively to observe what exactly is going wrong in the packet forwarding process?

Thanks in advance for your insights!

Andy R (1 rep)

Feb 3, 2025, 12:37 PM

4 votes

1 answers

7685 views

Is it possible to throttle upload bandwidth per `IP` basis using `tc`, `htb` and `iptables` ? (Download limitation not required)

linux networking iptables tc qos

#### Problem I've searched `internet` like anything but couldn't find much about limiting `upload`. The solutions given are not limiting `IP` basis like [this one][1] but LAN as a whole. +-----+ +--------+ | S | | User A |---+ W | +--------+ | I | +--------+ | T | +--------+ +----------+ | User B |-...

#### Problem I've searched internet like anything but couldn't find much about limiting upload. The solutions given are not limiting IP basis like this one but LAN as a whole. +-----+ +--------+ | S | | User A |---+ W | +--------+ | I | +--------+ | T | +--------+ +----------+ | User B |---+ C +-----| Router |--------| Internet | +--------+ | H | +--------+ +----------+ .... ... / ... +--------+ | H | | User N |---+ U | +--------+ | B | +-----+ - UserA:172.16.10.2 - UserB:172.16.10.3 - RouterPrivate:172.16.0.1 - UserC:172.16.10.4 I want to limit only upload of 172.16.10.3 & 172.16.10.4 using tc htb and iptables #### What I've already tried I altered the script as per my requirement.

IF_INET=external

# upload bandwidth limit for interface
BW_MAX=2000

# upload bandwidth limit for 172.16.16.11
BW_CLIENT=900


# first, clear previous settings
tc qdisc del dev ${IF_INET} root

# top-level htb queue discipline; send unclassified data into class 1:10
tc qdisc add dev ${IF_INET} root handle 1: htb default 10

# parent class (wrap everything in this class to allow bandwidth borrowing)
tc class add dev externel parent 1: classid 1:1 htb \
  rate ${BW_MAX}kbit ceil ${BW_MAX}kbit

# two child classes
#

# the default child class
tc class add dev ${IF_INET} parent 1:1 \
  classid 1:10 htb rate $((${BW_MAX} - ${BW_CLIENT}))kbit ceil ${BW_MAX}kbit

# the child class for traffic from 172.16.16.11
tc class add dev ${IF_INET} parent 1:1 \
  classid 1:20 htb rate ${BW_CLIENT}kbit ceil ${BW_MAX}kbit

# classify traffic
tc filter add dev ${IF_INET} parent 1:0 protocol ip prio 1 u32 \
  match ip src 172.16.16.11/32 flowid 1:20

but this will *not* work for limiting upload. So what's the solution?

Adi (93 rep)

Jun 11, 2015, 02:25 PM • Last activity: Jan 28, 2025, 07:06 AM

4 votes

0 answers

2511 views

How can I limit bandwidth per connection using tc?

limit tc bandwidth

I am new to Linux and `tc` command and I have been looking to limit bandwidth per connection using `tc`. I have a server application that handles requests from clients consisting of I/O operations, and I want each request to reach a maximum speed of 50MB/s if there is enough bandwidth (but I make su...

I am new to Linux and tc command and I have been looking to limit bandwidth per connection using tc. I have a server application that handles requests from clients consisting of I/O operations, and I want each request to reach a maximum speed of 50MB/s if there is enough bandwidth (but I make sure there are not too many parallel requests such that the bandwidth will go lower than 50MB/s per request). I used tc to limit bandwidth, but all connections split 50MB/s instead of each connection getting 50MB/s. tc commands I tried:

tc qdisc add dev eth4 root netem rate 400mbit

sudo tc qdisc add dev eth4 root handle 1: htb default 30
sudo tc class add dev eth4 parent 1: classid 1:1 htb rate 100gbit burst 15k

sudo tc class add dev eth4 parent 1:1 classid 1:10 htb rate 400mbit burst 15k
sudo tc class add dev eth4 parent 1:1 classid 1:20 htb rate 400mbit burst 15k 

sudo tc filter add dev eth4 protocol ip parent 1: prio 1 u32 match ip dst 0.0.0.0/0 flowid 1:10
sudo tc filter add dev eth4 protocol ip parent 1: prio 1 u32 match ip src 0.0.0.0/0 flowid 1:20

or, since I am in control of both the server and clients and I know that the server handles clients' requests on port 9000, I tried the following on the client machine:

sudo tc qdisc add dev eth4 root handle 1: prio priomap 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
sudo tc qdisc add dev eth4 parent 1:2 handle 20: netem rate 400mbit
sudo tc filter add dev eth4 parent 1:0 protocol ip u32 match ip dport 9000 0xffff flowid 1:2

No solution did what I want. I think I need to create a class for each port through which requests are being sent, but I do not know them beforehand, since they are automatically selected. Is there a way to create these classes "on the go" or another solution?

Ben (41 rep)

Apr 14, 2022, 12:10 PM • Last activity: Jan 16, 2025, 07:31 AM

0 votes

0 answers

36 views

Tc-Netem not working with a bridge for simulating jitter

ubuntu networking network-interface tc

Im using Ubuntu 24.04. I need to make a bridge to simulate jitter using TC-Netem between two devices but is not working with im doing right now. I have a setup consisting of 3 units. One that generates network udp packets (Device A) that it sends to the Ubuntu input interface (enx7cc2c6474599), this should simulate a jitter and disorder in the forwarding through a bridge that is composed of the interface (enx7cc2c6474599) and the interface (enx7cc2c6331825) called br0. Through this bridge it should send the jitter-affected traffic to Device B over the interface (enx7cc2c6331825).

The bridge is created with this set of commands in order to work:

sudo ip link add name br0 type bridge
	
	sudo ip link set enx7cc2c6474599 master br0
	sudo ip link set enx7cc2c6331825 master br0
	
	sudo ip link set enx7cc2c6474599 up
	sudo ip link set enx7cc2c6331825 up
	sudo ip link set br0 up
	sudo sysctl -w net.ipv4.ip_forward=1
	sudo sysctl net.ipv4.conf.all.forwarding
	sudo sysctl net.ipv4.conf.default.forwarding
	sudo sysctl -p

Then for testing i send traffic and i can see it perfectly on the Device B. But when i do:

sudo tc qdisc add dev enx7cc2c6474599 root netem delay 10ms 8ms distribution normal
	sudo tc qdisc add dev enx7cc2c6331825 root netem delay 10ms 8ms distribution normal

I cant see the packets in the correct order and with no jitter. Also i tried using the "tc qdisc" command on the br0 (brige) with the same or worse results. Also tried this one:

sudo tc qdisc add dev enx7cc2c6474599 root netem delay 10ms 40ms reorder 25%
    sudo tc qdisc add dev enx7cc2c6331825 root netem delay 10ms 40ms reorder 25%

Here is a description of the interfaces: > br0: mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 5e:96:25:d5:26:df brd ff:ff:ff:ff:ff:ff inet6 fe80::5c96:25ff:fed5:26df/64 scope link valid_lft forever preferred_lft forever > enx7cc2c6474599: mtu 1500 qdisc fq_codel master br0 state UP group default qlen 1000 link/ether 7c:c2:c6:47:45:99 brd ff:ff:ff:ff:ff:ff > enx7cc2c6331825: mtu 1500 qdisc fq_codel master br0 state UP group default qlen 1000 link/ether 7c:c2:c6:33:18:25 brd ff:ff:ff:ff:ff:ff

Carlos López Martínez (101 rep)

Nov 20, 2024, 03:52 PM

1 votes

1 answers

64 views

QoS on Linux: tc doesn't see RTP traffic

camera tc traffic qos

I have a camera that creates RTSP traffic. I connected it to a Linux PC via Ethernet, configured the network and access. But when I tried to apply QoS rules, the tc statistics showed that too few bytes were sent. After some research, I found that HTTP, SSH and RTSP (connection) traffic from the came...

                                  I have a camera that creates RTSP traffic. I connected it to a Linux PC via Ethernet, configured the network and access. But when I tried to apply QoS rules, the tc statistics showed that too few bytes were sent.

After some research, I found that HTTP, SSH and RTSP (connection) traffic from the camera was displayed correctly in the statistics. However, tc seems to work differently with RTP traffic. 

Video in VLC was playing, nft and tcpdump showed traffic. I tried using Debian 12, Ubuntu 24.04, Manjaro - it still didn't work. Imitating RTP with FFMPEG also did not bring success. This seems really weird and I didn't know what could cause the problem or what else to try.

eXulW0lf (21 rep)

Sep 29, 2024, 06:49 PM • Last activity: Oct 10, 2024, 01:38 PM

0 votes

0 answers

73 views

Is Linux tc tool only useful for tcp traffic?

tc protocols

Is the `tc` tool only useful for TPC traffic, or can it also be used to control other protocols like UDP traffic?

                                  Is the tc tool only useful for TPC traffic, or can it also be used to control other protocols like UDP traffic?

                                

Xiaoyong Guo (101 rep)

Aug 29, 2024, 08:02 AM • Last activity: Sep 2, 2024, 02:40 PM

0 votes

1 answers

296 views

Changing packet payload with tc

networking filter tc traffic-shaping

How can tc be used to match a particular payload of an ingress packet, e.g., if the first 32 bits of payload of an IP/UDP packet are equal to some constant `$c`, the value `$c` should be changed to `$d`? This should work in particular for variable length IP headers. It appears that the `u32` filter...

                                  How can tc be used to match a particular payload of an ingress packet, e.g., if the first 32 bits of payload of an IP/UDP packet are equal to some constant $c, the value $c should be changed to $d? This should work in particular for variable length IP headers.

It appears that the u32 filter should be able to perform the matching. Is the following attempt correct? I am not sure about the nexthdr part in particular.

    tc filter add dev protocol ip parent ffff: u32 match $c 0xffffffff at nexthdr+8

Now pedit can be used to change the packet but I don't see a way to write $d in the UDP payload of a packet with variable length IP header.

Any help is appreciated.

qemvirt (13 rep)

Oct 19, 2023, 10:19 PM • Last activity: Apr 28, 2024, 02:30 PM

0 votes

1 answers

167 views

How to deterministically vary the delay in programs like netem?

networking iptables network-interface tc traffic

I am trying to set up a network scenario in which there is a variable delay between two nodes. Netem allows to set up a fixed delay and add a jitter according to some probabilistic distribution. However I would like to achieve a delay that vary according to a similar law: [![enter image description...

                                  I am trying to set up a network scenario in which there is a variable delay between two nodes. Netem allows to set up a fixed delay and add a jitter according to some probabilistic distribution. However I would like to achieve a delay that vary according to a similar law:

Is there a way to get this using netem or similar softwares?

rul_h (1 rep)

Oct 9, 2022, 11:44 AM • Last activity: Mar 27, 2024, 05:57 PM

1 votes

1 answers

358 views

Traffic shaping ineffective on tun device

networking tunneling tc traffic-shaping

I am developing a tunnel application that will provide a low-latency, variable bandwidth link. This will be operating in a system that requires traffic prioritization. However, while traffic towards the tun device is clearly being queued by the kernel, it appears whatever qdisc I apply to the device it has no additional effect, including the default pfifo_fast, i.e. what should be high priority traffic is not being handled separately from normal traffic. I have made a small test application to demonstrate the problem. It creates two tun devices and has two threads each with a loop passing packets from one interface to the other and back, respectively. Between receiving and sending the loop delays 1us for every byte, roughly emulating an 8Mbps bidirectional link:

void forward_traffic(int src_fd, int dest_fd) {
    char buf[BUFSIZE];
    ssize_t nbytes = 0;
    
    while (nbytes >= 0) {
        nbytes = read(src_fd, buf, sizeof(buf));

        if (nbytes >= 0) {
            usleep(nbytes);
            nbytes = write(dest_fd, buf, nbytes);
        }
    }
    perror("Read/write TUN device");
    exit(EXIT_FAILURE);
}

With each tun interface placed in its own namespace, I can run iperf3 and get about 8Mbps of throughput. The default txqlen reported by ip link is 500 packets and when I run an iperf3 (-P 20) and a ping at the same time I see a RTTs from about 670-770ms, roughly corresponding to 500 x 1500 bytes of queue. Indeed, changing txqlen changes the latency proportionally. So far so good. With the default pfifo_fast qdisc I would expect a ping with the right ToS mark to skip that normal queue and give me a low latency, e.g ping -Q 0x10 I think should have much lower RTT, but doesn't (I have tried other ToS/DSCP values as well - they all have the same ~700ms RTT. Additionally I have tried various other qdiscs with the same results, e.g. fq_codel doesn't have a significant effect on latency. Regardless of the qdisc, tc -s qdisc always shows a backlog of 0 regardless of whether the link is congested. (But I do see ip -s link show dropped packets under congestion) Am I fundamentally misunderstanding something here or there something else I need to do make the qdisc effective? Complete source here

sheddenizen (111 rep)

Dec 2, 2023, 06:05 PM • Last activity: Dec 27, 2023, 03:42 PM

0 votes

2 answers

419 views

Can netfilter act as a DHCP relay?

dhcp container nftables tc netfilter

I'm wondering whether instead of using a DHCP relay netfilter (be that `tc` or `nftables`) can be used to route DHCP broadcast packets to a Docker container attached to a bridge. The reasoning for this is that I'd like to move away from having to use a `macvlan` DHCP container so it can appear as if...

                                  I'm wondering whether instead of using a DHCP relay netfilter (be that tc or nftables) can be used to route DHCP broadcast packets to a Docker container attached to a bridge.

The reasoning for this is that I'd like to move away from having to use a macvlan DHCP container so it can appear as if one IP (i.e. the router IP) is handling all of the network operations. DHCP containers usually require CAP_NET_ADMIN (due to DHCP requiring promiscuous mode) and I understand that without a macvlan this would give control over the host's network stack (I also userns-remap my containers).

It would be great if it were possible to modify the DHCP packets and forward them on. A relay wouldn't work here as it would still require the same macvlan approach as the DHCP container already has.

Is this something that's possible? Thanks

Synthetic Ascension (249 rep)

Aug 19, 2023, 09:20 AM • Last activity: Nov 11, 2023, 01:00 PM

1 votes

1 answers

510 views

MAC address rewriting using tc

networking iptables ethernet tc netfilter

I am using `tc` to change the MAC address of incoming packets on a TAP interface (`tap0`) as follows where `mac_org` is the MAC address of a guest in a QEMU virtual machine and `mac_new` is a different MAC address that `mac_org` should be replaced with. tc qdisc add dev tap0 ingress handle ffff: tc...

                                  I am using tc to change the MAC address of incoming packets on a TAP interface (tap0) as follows where mac_org is the MAC address of a guest in a QEMU virtual machine and mac_new is a different MAC address that mac_org should be replaced with.

    tc qdisc add dev tap0 ingress handle ffff:
    tc filter add dev tap0 protocol ip parent ffff: \
      flower src_mac ${mac_org} \
      action pedit ex munge eth src set ${mac_new} pipe \
      action csum ip pipe \
      action xt -j LOG

I also add an iptables rule to log UDP packets on the input hook.

    iptables -A INPUT -p udp -j LOG

syslog shows that indeed the DHCP discover packet is changed accordingly. The tc log entry looks as follows:

    IN=tap0 OUT= MAC=ff:ff:ff:ff:ff:ff:${mac_new}:08:00 SRC=0.0.0.0 DST=255.255.255.255 LEN=338 TOS=0x00 PREC=0xC0 TTL=64 ID=0 DF PROTO=UDP SPT=68 DPT=67 LEN=318

and the log entry of the netfilter input hook which follows the tc ingress hook as the locally incoming packet is passed towards the socket shows the same result slightly differently formatted.

    IN=tap0 OUT= MACSRC=${mac_new} MACDST=ff:ff:ff:ff:ff:ff MACPROTO=0800 SRC=0.0.0.0 DST=255.255.255.255 LEN=338 TOS=0x00 PREC=0xC0 TTL=64 ID=0 DF PROTO=UDP SPT=68 DPT=67 LEN=318

Before starting QEMU I run dnsmasq on tap0 which surprisingly shows the output:

    DHCPDISCOVER(tap0) ${mac_org}

Running strace -f -x -s 10000 -e trace=network dnsmasq ... shows a recvmsg call that contains ${mac_org} instead of ${mac_new}.

    recvmsg(4, {msg_name={sa_family=AF_INET, sin_port=htons(68), sin_addr=inet_addr("0.0.0.0")}, msg_namelen=16, msg_iov=[{iov_base="... ${mac_org} ..." ...

How can that happen? It almost appears as if the packet is altered after the netfilter input hook.

qemvirt (13 rep)

Oct 15, 2023, 10:18 PM • Last activity: Oct 16, 2023, 12:06 AM

1 votes

1 answers

918 views

Redirect port using TC BPF

linux networking tc ebpf

I'm want to use `TC BPF` to redirect incoming traffic from port `80` to port `8080`. Below is my own code, but I've also tried the example from [man 8 tc-bpf](https://man7.org/linux/man-pages/man8/tc-bpf.8.html) (search for `8080`) and I get the same result. ``` #include #include #include #include #...

I'm want to use TC BPF to redirect incoming traffic from port 80 to port 8080. Below is my own code, but I've also tried the example from [man 8 tc-bpf](https://man7.org/linux/man-pages/man8/tc-bpf.8.html) (search for 8080) and I get the same result.

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#include 

static inline void set_tcp_dport(struct __sk_buff *skb, int nh_off,
                                            __u16 old_port, __u16 new_port)
{
	bpf_l4_csum_replace(skb, nh_off + offsetof(struct tcphdr, check),
						old_port, new_port, sizeof(new_port));
	bpf_skb_store_bytes(skb, nh_off + offsetof(struct tcphdr, dest),
						&new_port, sizeof(new_port), 0);
}

SEC("tc_my")
int tc_bpf_my(struct __sk_buff *skb)
{
	struct iphdr ip;
	struct tcphdr tcp;
	if (0 != bpf_skb_load_bytes(skb, sizeof(struct ethhdr), &ip, sizeof(struct iphdr))) {
		bpf_printk("bpf_skb_load_bytes iph failed");
		return TC_ACT_OK;
	}

	if (0 != bpf_skb_load_bytes(skb, sizeof(struct ethhdr) + (ip.ihl  %pI4:%u", &ip.saddr, src_port, &ip.daddr, dst_port);

	if (dst_port != 80)
		return TC_ACT_OK;

	set_tcp_dport(skb, ETH_HLEN + sizeof(struct iphdr), __constant_htons(80), __constant_htons(8080));

	return TC_ACT_OK;
}

char LICENSE[] SEC("license") = "GPL";

On machine A, I am running: clang -g -O2 -Wall -target bpf -c tc_my.c -o tc_my.o tc qdisc add dev ens160 clsact tc filter add dev ens160 ingress bpf da obj tc_my.o sec tc_my nc -l 8080 On machine B: nc $IP_A 80 On machine B, nc seems connected, but ss shows: SYN-SENT 0 1 $IP_B:53442 $IP_A:80 users:(("nc",pid=30180,fd=3)) On machine A, connection remains in SYN-RECV before being dropped. I was expecting my program to behave as if I added this iptables rule: iptables -t nat -A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT --to-port 8080 Maybe my expectations are wrong, but I would like to understand why. How can I get my TC BPF redirect to work? SOLUTION ----------------- Following the explanation in my accepted answer, here is an example code which works for TCP, does ingress NAT 90->8080, and egress de-NAT 8080->90.

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#include 

static inline void set_tcp_dport(struct __sk_buff *skb, int nh_off,
								 __u16 old_port, __u16 new_port)
{
	bpf_l4_csum_replace(skb, nh_off + offsetof(struct tcphdr, check),
						old_port, new_port, sizeof(new_port));
	bpf_skb_store_bytes(skb, nh_off + offsetof(struct tcphdr, dest),
						&new_port, sizeof(new_port), 0);
}

static inline void set_tcp_sport(struct __sk_buff *skb, int nh_off,
								 __u16 old_port, __u16 new_port)
{
	bpf_l4_csum_replace(skb, nh_off + offsetof(struct tcphdr, check),
						old_port, new_port, sizeof(new_port));
	bpf_skb_store_bytes(skb, nh_off + offsetof(struct tcphdr, source),
						&new_port, sizeof(new_port), 0);
}

SEC("tc_ingress")
int tc_ingress_(struct __sk_buff *skb)
{
	struct iphdr ip;
	struct tcphdr tcp;
	if (0 != bpf_skb_load_bytes(skb, sizeof(struct ethhdr), &ip, sizeof(struct iphdr)))
	{
		bpf_printk("bpf_skb_load_bytes iph failed");
		return TC_ACT_OK;
	}

	if (0 != bpf_skb_load_bytes(skb, sizeof(struct ethhdr) + (ip.ihl  %pI4:%u", &ip.saddr, src_port, &ip.daddr, dst_port);

	if (dst_port != 90)
		return TC_ACT_OK;

	set_tcp_dport(skb, ETH_HLEN + sizeof(struct iphdr), __constant_htons(90), __constant_htons(8080));

	return TC_ACT_OK;
}

SEC("tc_egress")
int tc_egress_(struct __sk_buff *skb)
{
	struct iphdr ip;
	struct tcphdr tcp;
	if (0 != bpf_skb_load_bytes(skb, sizeof(struct ethhdr), &ip, sizeof(struct iphdr)))
	{
		bpf_printk("bpf_skb_load_bytes iph failed");
		return TC_ACT_OK;
	}

	if (0 != bpf_skb_load_bytes(skb, sizeof(struct ethhdr) + (ip.ihl  %pI4:%u", &ip.saddr, src_port, &ip.daddr, dst_port);

	if (src_port != 8080)
		return TC_ACT_OK;

	set_tcp_sport(skb, ETH_HLEN + sizeof(struct iphdr), __constant_htons(8080), __constant_htons(90));

	return TC_ACT_OK;
}

char LICENSE[] SEC("license") = "GPL";

Here is how I build and loaded the different sections in my program:

clang -g -O2 -Wall -target bpf -c tc_my.c -o tc_my.o
tc filter add dev ens32 ingress bpf da obj /tc_my.o sec tc_ingress
tc filter add dev ens32 egress bpf da obj /tc_my.o sec tc_egress

greenro (13 rep)

Sep 14, 2023, 01:04 PM • Last activity: Sep 15, 2023, 09:25 AM

1 votes

0 answers

388 views

How to set bandwidth limit using linux tc

linux filter tc bandwidth

In my linux router: 1. interface eth1 total bandwidth is 1gbit 2. I want to divide 1140kbit to GroupA, divide 150kbit to GroupB 3. Set users 10.10.10.158, 10.10.21.5, 10.10.21.6 to GroupB 4. Each user has no more than 128kbit bandwidth 5. And three users has no more than 150kbit total bandwidth. Fol...

                                  In my linux router:
1. interface eth1 total bandwidth is 1gbit
2. I want to divide 1140kbit to GroupA, divide 150kbit to GroupB
3. Set users 10.10.10.158, 10.10.21.5,  10.10.21.6 to GroupB
4. Each user has no more than 128kbit bandwidth
5. And three users has no more than 150kbit total bandwidth.

Following are what I set:

    sudo tc qdisc del dev eth1 root 2>/dev/null
    
    sudo tc qdisc add dev eth1 root handle 1: htb default 2
    sudo tc class add dev eth1 parent 1: classid 1:1 htb rate 1gbit ceil 1gbit
    sudo tc class add dev eth1 parent 1:1 classid 1:2 htb rate 10kbps ceil 10kbps
    
    sudo tc class add dev eth1 parent 1:1 classid 1:10 htb rate 1140kbit ceil 1140kbit
    sudo tc class add dev eth1 parent 1:1 classid 1:20 htb rate 128kbit ceil 128kbit
    
    sudo tc class add dev eth1 parent 1:20 classid 1:21 htb rate 128kbit ceil 128kbit
    sudo tc filter add dev eth1 protocol ip parent 1:0 prio 1 u32 match ip dst 10.10.10.158/32 flowid 1:21
    
    sudo tc class add dev eth1 parent 1:20 classid 1:22 htb rate 128kbit ceil 128kbit
    sudo tc filter add dev eth1 protocol ip parent 1:0 prio 1 u32 match ip dst 10.10.21.5/32 flowid 1:22
    
    sudo tc class add dev eth1 parent 1:20 classid 1:23 htb rate 128kbit ceil 128kbit
    sudo tc filter add dev eth1 protocol ip parent 1:0 prio 1 u32 match ip dst 10.10.21.6/32 flowid 1:23

However, I found three users total bandwidth is about 376kbit.
What should I do to acheive my goal?

                                

ackema (11 rep)

Aug 28, 2023, 06:19 AM

0 votes

0 answers

63 views

Can no longer ping containers after setting TBF qdisc on Docker0

networking docker tc traffic-shaping

I am trying to use the `tc` command to manipulate traffic on the docker0 interface. I run the commands ``` tc qdisc del dev docker0 root tc qdisc add dev docker0 root handle 1: tbf rate 100mbps burst 1600 limit 1 ``` I believe this is what it does: - `tbf`: Specifies the TBF qdisc to be used. - `rat...

I am trying to use the tc command to manipulate traffic on the docker0 interface. I run the commands

tc qdisc del dev docker0 root
tc qdisc add dev docker0 root handle 1: tbf rate 100mbps burst 1600 limit 1

I believe this is what it does: - tbf: Specifies the TBF qdisc to be used. - rate 100mbps: Sets the maximum bandwidth rate to 100 Mbps for the docker0 interface. - burst 1600: Sets the maximum amount of data that can be transmitted in a single burst to 1600 bytes. - limit 1: Limits the token bucket size to 1 token, which limits the amount of data that can be sent at any given time to the burst size. However, after setting this rule, I can no longer ping containers that are already running and attached to the default docker0 interface. I can also no longer build images that contain commands such as RUN apt-get update -y. Why is this the case. Can this qdisc configuration not be used alone?

akastack (73 rep)

May 16, 2023, 01:32 AM

1 votes

0 answers

1447 views

tc filter - error talking to the kernel

linux networking tc amazon-linux

I am trying to add a tc flower filter for the geneve protocol and I am getting this error: ``` % sudo tc filter add dev gnv0 protocol ip parent ffff: \ flower geneve_opts 0108:01:020000000000000000/FFFF:FF:FF0000000000000000,0108:02:020000000000000000/FFFF:FF:FF0000000000000000,0108:03:0100000000/FF...

I am trying to add a tc flower filter for the geneve protocol and I am getting this error:

% sudo tc filter add dev gnv0 protocol ip parent ffff: \
    flower geneve_opts 0108:01:020000000000000000/FFFF:FF:FF0000000000000000,0108:02:020000000000000000/FFFF:FF:FF0000000000000000,0108:03:0100000000/FFFF:FF:FF00000000 \
    action tunnel_key unset
RTNETLINK answers: No such file or directory
We have an error talking to the kernel

I am using Amazon Linux:

% uname -a
Linux ip-10-0-40-230.ec2.internal 4.14.311-233.529.amzn2.x86_64 #1 SMP Thu Mar 23 09:54:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

I want to terminate a GENEVE UDP tunnel coming from an AWS Gateway load balancer which is mirroring traffic. The idea is to decap the packet into its original form. Note that I have also tried vxlan_opts with no luck either so the issue is more specific to tc than to the filter imo. ------ I have loaded some kernel modules that were suggested online (not sure that they are necessary nor sufficient);

% lsmod | grep sch                                      
sch_htb                24576  0
sch_netem              20480  0
sch_ingress            16384  1

Full example:

sudo yum install tc
sudo modprobe sch_netem
sudo modprobe sch_htb

sudo ip link add name gnv0 type geneve dstport 6081 external
sudo ip link set gnv0 up
sudo tc qdisc add dev gnv0 ingress

sudo tc filter add dev gnv0 protocol ip parent ffff: \
    flower geneve_opts 0108:01:020000000000000000/FFFF:FF:FF0000000000000000,0108:02:020000000000000000/FFFF:FF:FF0000000000000000,0108:03:0100000000/FFFF:FF:FF00000000 \
    action tunnel_key unset

I tried it on AL2023 and similar error;

sudo ip link add name gnv0 type geneve dstport 6081 external
sudo ip link set gnv0 up
sudo tc qdisc add dev gnv0 ingress
sudo tc filter add dev gnv0 protocol ip parent ffff: \
    flower geneve_opts 0108:01:020000000000000000/FFFF:FF:FF0000000000000000,0108:02:020000000000000000/FFFF:FF:FF0000000000000000,0108:03:0100000000/FFFF:FF:FF00000000 \
    action tunnel_key unset
Error: Failed to load TC action module.
We have an error talking to the kernel

Ollie (199 rep)

May 15, 2023, 05:06 PM

1 votes

1 answers

729 views

How to police ingress (input) packets belonging to a cgroup with iptables and tc?

iptables tc traffic

I am trying to limit the download (ingress) rate for a certain app within a cgroup. I was able to limit the upload (egress) rate successfully by marking app's OUTPUT packets in iptables and then set a tc filter to handle that marked packets. However, when I did the same steps for ingress it didn't w...

$ sudo iptables -I OUTPUT -t mangle -m cgroup --path '/user.slice/.../app-firefox-...scope'\
  -j MARK --set-mark 11

2. filter by fw mark (11) on the root qdisc

$ tc qdisc add dev $IFACE root handle 1: htb default 1 
$ tc filter add dev $IFACE parent 1: protocol ip prio 1 handle 11 fw \
  action police rate 1000kbit burst 10k drop

This limited the upload rate for firefox to 1000kbit successfully. -------------- steps I followed trying to limit **download**: 1. Mark INPUT packets by their cgroup

$ sudo iptables -I INPUT -t mangle -m cgroup --path '/user.slice/.../app-firefox-...scope'\
  -j MARK --set-mark 22

2. filter by fw mark (22) on the ingress qdisc

$ tc qdisc add dev $IFACE ingress handle ffff:
$ tc filter add dev $IFACE parent ffff: protocol ip prio 1 handle 22 fw \ 
  action police rate 1000kbit burst 10k drop

------- I am able to block app's download successfully with iptables:

$ sudo iptables -I INPUT -t mangle -m cgroup --path '/user.slice/.../app-firefox-....scope' -j DROP

So it seems like iptables is marking cgroup's input packets but for some reason, tc can't filter them or maybe the packets are being consumed before tc filter takes effect? if so, then what is the use of marking input packets? If there is a way to block cgroup's input packets then there must be a way to limit them, right?

user216385 (63 rep)

Apr 29, 2023, 05:32 AM • Last activity: Apr 29, 2023, 11:47 PM

Showing page 1 of 20 total questions