Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

14 votes

1 answers

5016 views

How can I make a device available inside a systemd-nspawn container with user namespacing?

users namespace container bind-mount systemd-nspawn

I would like to mount an encrypted image file using `cryptsetup` inside a [`systemd-nspawn`][systemd-nspawn] container. However, I get this error message: [root@container ~]# echo $key | cryptsetup -d - open luks.img luks Cannot initialize device-mapper. Is dm_mod kernel module loaded? Cannot use de...

                                  I would like to mount an encrypted image file using cryptsetup inside a systemd-nspawn  container. However, I get this error message:

    [root@container ~]# echo $key | cryptsetup -d - open luks.img luks
    Cannot initialize device-mapper. Is dm_mod kernel module loaded?
    Cannot use device luks, name is invalid or still in use.

The dm_mod kernel module is loaded on the host system, although things look a bit weird inside the container:

    [root@host ~]# grep dm_mod /proc/modules
    dm_mod 159744 2 dm_crypt, Live 0xffffffffc12c6000

    [root@container ~]# grep dm_mod /proc/modules
    dm_mod 159744 2 dm_crypt, Live 0x0000000000000000

strace indicates that cryptsetup is unable to create /dev/mapper/control:

    [root@etrial ~]# echo $key | strace cryptsetup -d - open luks.img luks 2>&1 | grep mknod
    mknod("/dev/mapper/control", S_IFCHR|0600, makedev(0xa, 0xec)) = -1 EPERM (Operation not permitted)

I am not too sure why this is happening. I am starting the container with the systemd-nspawn@.service template unit , which seems like it should allow access to the device mapper:

    # nspawn can set up LUKS encrypted loopback files, in which case it needs
    # access to /dev/mapper/control and the block devices /dev/mapper/*.
    DeviceAllow=/dev/mapper/control rw
    DeviceAllow=block-device-mapper rw

Reading this comment on a related question about USB devices , I wondered whether the solution was to add a bind mount for /dev/mapper. However, cryptsetup gives me the same error message inside the container. When I strace it, it looks like there's still a permissions issue:

    # echo $key | strace cryptsetup open luks.img luks --key-file - 2>&1 | grep "/dev/mapper"
    stat("/dev/mapper/control", {st_mode=S_IFCHR|0600, st_rdev=makedev(0xa, 0xec), ...}) = 0
    openat(AT_FDCWD, "/dev/mapper/control", O_RDWR) = -1 EACCES (Permission denied)
    
    # ls -la /dev/mapper
    total 0
    drwxr-xr-x 2 nobody nobody      60 Dec 13 14:33 .
    drwxr-xr-x 8 root   root       460 Dec 15 14:54 ..
    crw------- 1 nobody nobody 10, 236 Dec 13 14:33 control

Apparently, this is happening because the template unit enables user namespacing, which I want anyway for security reasons. As explained in the documentation :

>In most cases, using --private-users=pick is the recommended option as it enhances container security massively and operates fully automatically in most cases ... [this] is the default if the systemd-nspawn@.service template unit file is used ...
>
>Note that when [the --bind option] is used in combination with --private-users, the resulting mount points will be owned by the nobody user. That's because the mount and its files and directories continue to be owned by the relevant host users and groups, which do not exist in the container, and thus show up under the wildcard UID 65534 (nobody). If such bind mounts are created, it is recommended to make them read-only, using --bind-ro=.

Presumably I won't be able to do anything with read-only permissions to /dev/mapper. So, is there any way I can get cryptsetup to work inside the container, so that my application can create and mount arbitrary encrypted volumes at runtime, without disabling user namespacing?

## Related questions

* systemd-nspawn: file-system permissions for a bound folder  relates to files rather than devices, and the only answer just says that "-U is mostly incompatible with rw --bind."

* systemd-nspawn: how to allow access to all devices  doesn't deal with user namespacing and there are no answers.

sjy (956 rep)

Dec 15, 2019, 02:53 AM • Last activity: Jul 31, 2025, 03:10 AM

6 votes

4 answers

23639 views

Podman errors on tar with potentially insufficient UIDs or GIDs available in user namespace

tar chmod namespace podman

When I run `podman run` I'm getting a particularly weird error, ```shell ❯ podman run -ti --restart=unless-stopped -p 80:80 -p 443:443 rancher/rancher:latest ✔ docker.io/rancher/rancher:latest Trying to pull docker.io/rancher/rancher:latest... Getting image source signatures [... blob copying...] Wr...

When I run podman run I'm getting a particularly weird error,

❯ podman run -ti --restart=unless-stopped -p 80:80 -p 443:443 rancher/rancher:latest
✔ docker.io/rancher/rancher:latest
Trying to pull docker.io/rancher/rancher:latest...
Getting image source signatures
[... blob copying...]
Writing manifest to image destination
Storing signatures
  Error processing tar file(exit status 1): potentially insufficient UIDs or GIDs available in user namespace (requested 630384594:600260513 for /usr/bin/etcdctl): Check /etc/subuid and /etc/subgid: lchown /usr/bin/etcdctl: invalid argument
Error: Error committing the finished image: error adding layer with blob "sha256:b4b03dbaa949daab471f94bcfd68cbe21c1147e8ec2acfe3f46f1520db48baeb": Error processing tar file(exit status 1): potentially insufficient UIDs or GIDs available in user namespace (requested 630384594:600260513 for /usr/bin/etcdctl): Check /etc/subuid and /etc/subgid: lchown /usr/bin/etcdctl: invalid argument

What does _"potentially insufficient UIDs or GIDs available in user namespace"_ mean and how can I remedy this problem?

Evan Carroll (34663 rep)

Feb 3, 2022, 07:43 PM • Last activity: Jul 3, 2025, 05:48 PM

2 votes

1 answers

86 views

How does a cgroup namespace work?

linux cgroups namespace hardening

I’m trying to understand how cgroup namespaces work, but I’m stuck on something that doesn’t make sense to me. My understanding is that a cgroup namespace should virtualize the cgroup hierarchy for a process, so that the process sees its current cgroup as / and doesn’t see the full host hierarchy. S...

sudo unshare --cgroup

cat /proc/self/cgroup
0::/

echo $$
3183

Then, from another terminal on the host, I checked the cgroup for that process:

cat /proc/3183/cgroup 
0::/user.slice/user-1000.slice/user@1000.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-ffe09412-f0d6-413e-b480-6d14f1290f84.scope

This matches what the man page says:

Cgroup namespaces virtualize the view of a process's cgroups (see cgroups(7)) as seen via /proc/[pid]/cgroup and /proc/[pid]/mountinfo.

Each cgroup namespace has its own set of cgroup root directories.
These root directories are the base points for the relative locations displayed in the corresponding records in the /proc/[pid]/cgroup file.

However, when I create a new cgroup inside my cgroup namespace, it appears in the host’s hierarchy too:

# Inside the namespace:
mkdir /sys/fs/cgroup/test

# On the host:
ls /sys/fs/cgroup/
...
test
...

So it seems that the entire host hierarchy is still visible and any new cgroup I make is visible system-wide. There’s no real isolation — from inside the namespace I can still see and modify all the host’s cgroups. I also tried combining it with a user namespace to avoid sudo but the result is the same:

unshare --map-root-user
unshare --cgroup
ls /sys/fs/cgroup/

Again, I see the full host hierarchy. So my questions are: - Am I misunderstanding how cgroup namespaces are supposed to work? - Is the cgroup namespace not designed to isolate the entire hierarchy like mount or PID namespaces do? - Is there a correct way to use them to limit what cgroups are visible or writable? Any clarification would be really appreciated!

Liric Ramer (85 rep)

Jun 27, 2025, 10:22 AM • Last activity: Jun 29, 2025, 09:52 AM

1 votes

0 answers

33 views

Linux mount namespaces - umount event propagates unexpectedly

mount namespace

I was reading about mount namespaces and encountered something that seemed odd to me. I'm using **Ubuntu22.04** I have a USB device connected to my machine: ubuntu@ubuntu-2204:/media/ubuntu$ cat /proc/self/mountinfo | grep -i media 3034 29 8:17 / /media/ubuntu/30A8-7347 rw,nosuid,nodev,relatime shar...

                                  I was reading about mount namespaces and encountered something that seemed odd to me. I'm using **Ubuntu22.04**

I have a USB device connected to my machine:

    ubuntu@ubuntu-2204:/media/ubuntu$ cat /proc/self/mountinfo | grep -i media
        3034 29 8:17 / /media/ubuntu/30A8-7347 rw,nosuid,nodev,relatime shared:675 - vfat /dev/sdb1 rw,uid=1000,gid=1000,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,showexec,utf8,flush,errors=remount-ro

Now I start running 2 shells:

 - #1 - A "regular" shell.
 - #2 - A shell that's started using unshare -m /bin/bash. Running unshare without specifying --propagation unchanged will result in an implicit mount --make-rprivate / in the new mount namespace. Great.

Before I do anything the USB mount is observable from both shells. This is expected.

Now I run umount /media/ubuntu/30A8-7347 in shell #1 and to my surprise the mount is no longer visible from the second shell either. However when I run the command from shell #2 - the mount is still visible from #1 as expected.

My question is why does the umount command propagate to the mount namespace shell #2 resides in? Does it have something to do with the fact that / was SHARED before I created the second mount namespace? It doesn't make much sense to me if this is indeed the case.

As a note, when I create shell #2 prior to connecting the USB - the mount event of the USB connection **does not** propagate to the second mount namespace as expected.

EL_9 (111 rep)

Jun 28, 2025, 07:29 PM

2 votes

1 answers

56 views

How to enable internet access for a bridge inside a Linux network namespace?

iptables ip bridge namespace tap

I've created two Linux network namespaces (ns1 and ns2), and inside each, I have: - A bridge (ns1-br0, ns2-br0) - A TAP device (tap0, tap1) connected to the respective bridge - Each TAP device gets an IP address like 10.0.0.2/24. The problem is: I want devices like tap0 and tap1 to access the intern...

                                  I've created two Linux network namespaces (ns1 and ns2), and inside each, I have:

- A bridge (ns1-br0, ns2-br0)
- A TAP device (tap0, tap1) connected to the respective bridge
- Each TAP device gets an IP address like 10.0.0.2/24.

The problem is: I want devices like tap0 and tap1 to access the internet, but I'm confused about how to set up routing and NAT properly.

The host has internet access via eth0.

How do I:
- Connect the namespace's bridge to the outside world?
- Use NAT or MASQUERADE correctly so that TAP devices can access the internet?
- Assign default gateways?

Bhautik Chudasama (121 rep)

Jun 18, 2025, 04:03 PM • Last activity: Jun 18, 2025, 08:57 PM

0 votes

0 answers

34 views

Bridging containers to external VLAN

networking container namespace network-namespaces podman

I have a physical network with several VLANs. One of my computers (my main workstation) is connected to two different VLANs on this network, one tagged, the other not. I have successfully set this computer up on both VLANs by making a VLAN clone interface, but I discovered that in order to actually...

                                  I have a physical network with several VLANs. One of my computers (my main workstation) is connected to two different VLANs on this network, one tagged, the other not.

I have successfully set this computer up on both VLANs by making a VLAN clone interface, but I discovered that in order to actually receive packets on that interface I had to change the MAC. It seems that the Linux network stack (or maybe the acceleration on the card) looks at the MAC and if it matches, ignores the VLAN tag.

I now want to attach this interface to a bridge somehow and then also have containers attach to this same bridge. I know enough about how containers are constructed that I can do this by hand after whatever container system I'm using (podman in this case) sets the container up.

The reason I want this is that I'm working on an IPv6 broadcast/multicast protocol that will only work for a local LAN, and in order to test it, I want to have several copies of the servent running in different containers so they can communicate with each other.

I've tried this in the obvious way, but none of the packets that are explicitly destined for one of the containers ever makes it to them. I suspect this is because the card or the Linux network stack is just dropping them at the physical interface when their destination MAC doesn't match any of the MACs assigned to the interface.

What would be a good way to accomplish this? Should I ask this on Server Fault or Stack Overflow instead?

Omnifarious (1412 rep)

Jun 1, 2025, 03:51 AM

4 votes

1 answers

2247 views

systemd "Failed to set up mount namespacing" in Docker container

systemd docker openvpn namespace

I recently updated a Docker that uses systemd internally from Debian stretch to Debian buster. And since then it's not working. So accoding to `systemctl status` it fails to setup the namespace: ``` Dec 10 14:22:11 f6f3e33e6bf2 systemd[1]: Starting OpenVPN tunnel for apu__ssl_vpn_config... Dec 10 14...

I recently updated a Docker that uses systemd internally from Debian stretch to Debian buster. And since then it's not working. So accoding to systemctl status it fails to setup the namespace:

Dec 10 14:22:11 f6f3e33e6bf2 systemd: Starting OpenVPN tunnel for apu__ssl_vpn_config...
Dec 10 14:22:11 f6f3e33e6bf2 systemd: openvpn-client@apu__ssl_vpn_config.service: Failed to set up mount namespacing: Permission denied
Dec 10 14:22:11 f6f3e33e6bf2 systemd: openvpn-client@apu__ssl_vpn_config.service: Failed at step NAMESPACE spawning /usr/sbin/openvpn: Permission denied
Dec 10 14:22:11 f6f3e33e6bf2 systemd: openvpn-client@apu__ssl_vpn_config.service: Main process exited, code=exited, status=226/NAMESPACE
Dec 10 14:22:11 f6f3e33e6bf2 systemd: openvpn-client@apu__ssl_vpn_config.service: Failed with result 'exit-code'.
Dec 10 14:22:11 f6f3e33e6bf2 systemd: Failed to start OpenVPN tunnel for apu__ssl_vpn_config.

Now I have had a similar issue with elasticsearch, which I fixed by adding a drop in unit config containing:

[Service]
PrivateTmp=false
NoNewPrivileges=yes

Though sadly that doesn't fix the issue this time. I also found this exact issue occuring when using LXC (or LXD?), though I don't know how to fix this with docker. Additionally I start the container like this:

docker run -dt \
    --tmpfs /run --tmpfs /tmp \
    --volume /sys/fs/cgroup:/sys/fs/cgroup:ro \
    --device=/dev/net/tun \
    --cap-add SYS_ADMIN \
    --cap-add NET_ADMIN \

Any pointers are much apprechiated.

BrainStone (3784 rep)

Dec 10, 2019, 03:18 PM • Last activity: May 22, 2025, 08:07 PM

1 votes

1 answers

35 views

How can I bind-mount a file into an existing directory tree inside a fresh user/mount namespace?

linux mount namespace

I would like to use `unshare` to create a new unprivileged user/mount namespace, with the goal of making a specific file appear at a specific location inside the child namespace. For example, assume that I would like `/home/user/path/to/file` to appear as `/opt/dir1/dir2/file`. However, `/opt` alrea...

I would like to use unshare to create a new unprivileged user/mount namespace, with the goal of making a specific file appear at a specific location inside the child namespace. For example, assume that I would like /home/user/path/to/file to appear as /opt/dir1/dir2/file. However, /opt already exists in the parent namespace and is not writable by the user that I'm starting with. This does not work:

user $ unshare -Urm
root # mount --bind /home/user/path/to/file /opt/dir1/dir2/file
mount: /opt/dir1/dir2/file: mount point does not exist.
       dmesg(1) may have more information after failed mount system call.

I think the underlying issue here is that /opt/dir1/dir2 does not exist ahead of the attempt to make the bind mount. However, I'm not able to create that directory since /opt is not writable in the parent:

root # mkdir -p /opt/dir1/dir2
mkdir: cannot create directory ‘/opt/dir1’: Permission denied

Is there a way to make this work so that the changes to /opt are only visible inside my child namespace, allowing me to work around the permission issue?

Jason R (657 rep)

May 20, 2025, 05:26 PM • Last activity: May 20, 2025, 09:59 PM

4 votes

1 answers

7081 views

running a process in another namespace

namespace network-namespaces

I would like to run a new process (for example an xterm) in another network namespace. This could be done like this: sudo ip netns exec otherns sudo -u $USER xterm This command looks a bit complicated and involves running a `sudo` which runs `ip` which runs `sudo` which runs the final `xterm`. Is th...

                                  I would like to run a new process (for example an xterm) in another network namespace. This could be done like this:

    sudo ip netns exec otherns sudo -u $USER xterm

This command looks a bit complicated and involves running a sudo which runs ip which runs sudo which runs the final xterm.

Is there a more direct way to run a process in a new namespace?

I was thinking of writing a own small (SUID or capability enabled) binary which switches namespace restores permissions and user and runs the command, but shouldn't there already be some standard tool doing exactly that?

This would allow me to simply call something like:

    runns otherns xterm

michas (21862 rep)

Jun 2, 2015, 10:40 PM • Last activity: May 4, 2025, 07:08 AM

2 votes

1 answers

103 views

Mapping two users to host with user namespaces

filesystems process root namespace privileges

I'm trying to understand whether it's possible to map two users from a **user namespace** to two different users on the host. The goal is to replicate the same permissions I have on my host inside a rootfs (Ubuntu base, because I'm trying to build a container from scratch). For example: - Everything under / should belong to root. - /home/user should belong to the regular user. To achieve this, I was thinking of using UID mapping in a user namespace, something like:

UID in user namespace      ---> UID on host
      1000 (admin)         ->       0 (root)
      1001 (bob)           ->    1001 (bob)

Is this kind of mapping even possible? Here’s what I’ve already tried: - Running echo -e "1000 0 1\n1001 1001 1" > /proc/[PID]/uid_map to define the mapping, but I get an error. - Trying to manually modify /proc/[PID]/uid_map using newuidmap for each user. However, I’ve never been able to map more than one user, and I can’t seem to map UID 0 (root) at all. I’ve read the man pages and followed the constraints mentioned there, but I’m still getting error messages. For example:

# terminal 1

unshare --user bash
echo $$ # 11591

# terminal 2 as user 'alex' (uid = 1000)

newuidmap 11591 0 0 1
# newuidmap: uid range [0-1) -> [0-1) not allowed

newuidmap 11591 1001 1001 1
# newuidmap: uid range [1001-1002) -> [1001-1002) not allowed

These commands fail, even when run with sudo. I also tried mapping to subuids that I’ve declared, but it still doesn’t work:

cat /etc/subuid

alex:100000:65536
root:200000:65536
self:300000:65536

cat /etc/subgid

alex:100000:65536
root:200000:65536
self:300000:65536

Liric Ramer (85 rep)

Apr 16, 2025, 01:59 PM • Last activity: Apr 27, 2025, 01:36 PM

0 votes

0 answers

29 views

Unexpected network namespace inode when accessing /var/run/netns/ from pod in host network namespace

networking container inode namespace kubernetes

I'm running a Kubernetes cluster with RKE2 v1.30.5+rke2r1 on Linux nixos 6.6.56 amd64, using Cilium CNI. Here's the setup: I have two pods (yaml manifests at the bottom): Pod A (xfrm-pod) is running in the default network namespace. Pod B (charon-pod) is running in the host network namespace (hostNe...

readlink /proc/$$/ns/net

This gives the expected value, e.g., net:. Then i mount /var/run/netns on pod B e.g. to /netns and run ls -li /netns, the inode for Pod A's network namespace is a strange value, like 53587. Permission show this is the only file there is write access to. (I can delete it) However, when I ls -li /var/run/netns directly on the host, the inode and file name are what I expect: the correct namespace symlink and inode number. Why is the inode different inside the host-network pod? And why does it appear writable, unlike other netns files? Any idea why this happens, and how I can get consistent behavior inside host network pods? Pod yaml manifests (fetched with kubectl get pod -o yaml since i create them in a controller in go): Pod A:

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2025-04-24T14:57:55Z"
  name: xfrm-pod
  namespace: ims
  resourceVersion: "7200524"
  uid: dd08aa88-460f-4bdd-8019-82a433682825
spec:
  containers:
  - command:
    - bash
    - -c
    - while true; do sleep 1000; done
    image: ubuntu:latest
    imagePullPolicy: Always
    name: xfrm-container
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /netns
      name: netns-dir
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-cszxx
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: nixos
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    sysctls:
    - name: net.ipv4.ip_forward
      value: "1"
    - name: net.ipv4.conf.all.rp_filter
      value: "0"
    - name: net.ipv4.conf.default.rp_filter
      value: "0"
    - name: net.ipv4.conf.all.arp_filter
      value: "1"
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - hostPath:
      path: /var/run/netns/
      type: Directory
    name: netns-dir
  - name: kube-api-access-cszxx
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace

Pod B:

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2025-04-24T14:57:45Z"
  labels:
    ipserviced: "true"
  name: charon-pod
  namespace: ims
  resourceVersion: "7200483"
  uid: 1c5542ba-16c8-4105-9556-7519ea50edef
spec:
  containers:
  - image: someimagewithstrongswan
    imagePullPolicy: IfNotPresent
    name: charondaemon
    resources: {}
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        add:
        - NET_ADMIN
        - NET_RAW
        - NET_BIND_SERVICE
        drop:
        - ALL
      seccompProfile:
        type: RuntimeDefault
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/
      name: charon-volume
    - mountPath: /etc/swanctl
      name: charon-conf
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-jjkpm
      readOnly: true
  - image: someimagewithswanctl
    imagePullPolicy: Always
    name: restctl
    resources: {}
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        add:
        - NET_ADMIN
        drop:
        - ALL
      seccompProfile:
        type: RuntimeDefault
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/
      name: charon-volume
    - mountPath: /etc/swanctl
      name: charon-conf
    - mountPath: /netns
      name: netns-dir
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-jjkpm
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostIPC: true
  hostNetwork: true
  hostPID: true
  initContainers:
  - command:
    - sh
    - -c
    - "echo 'someconfig'
      > /etc/swanctl/swanctl.conf"
    image: busybox:latest
    imagePullPolicy: Always
    name: create-conf
    resources: {}
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      seccompProfile:
        type: RuntimeDefault
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /etc/swanctl
      name: charon-conf
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-jjkpm
      readOnly: true
  nodeName: nixos
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - emptyDir: {}
    name: charon-volume
  - emptyDir: {}
    name: charon-conf
  - hostPath:
      path: /var/run/netns/
      type: Directory
    name: netns-dir
  - name: kube-api-access-jjkpm
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace

rrekaF (1 rep)

Apr 25, 2025, 07:07 AM

-1 votes

3 answers

184 views

is User Namespaces a security vulnerability and is it logical to disable in sysctl.conf?

security namespace administration

A security rule of `RHEL 8 must disable the use of user namespaces.` states > Discussion: It is detrimental for operating systems to provide, or install by default, functionality exceeding requirements or mission objectives. These unnecessary capabilities or services are often overlooked and therefo...

                                  A security rule of RHEL 8 must disable the use of user namespaces. states


> Discussion: It is detrimental for operating systems to provide, or install by default, functionality exceeding requirements or mission objectives. These unnecessary capabilities or services are often overlooked and therefore may remain unsecured. They increase the risk to the platform by providing additional attack vectors.
>
> Fix Text: user.max_user_namespaces = 0 in a sysctl.conf  followed by sysctl --system

- Does using user namespaces, for what I think is an unlimited (65535) value from a clean default install from rhel-8.10.iso, cause an *increased risk as an additional attack vector* ?
- Is User Namespaces an *unnecessary capability* ?
- Can the rationale behind user namespaces be stated here in layman's terms?  Why is it a [good?] thing ?
                                

ron (8647 rep)

Apr 16, 2025, 06:57 PM • Last activity: Apr 16, 2025, 11:34 PM

4 votes

1 answers

2322 views

I can ping across namespaces, but not connect with TCP

networking bridge namespace network-namespaces veth

I'm trying to set up two network namespaces to communicate with eachother. I've set up two namespaces, `ns0` and `ns1` that each have a veth pair, where the non-namespaced side of the veth is linked to a bridge. I set it up like this: ``` ip link add veth0 type veth peer name brveth0 ip link set brv...

I'm trying to set up two network namespaces to communicate with eachother. I've set up two namespaces, ns0 and ns1 that each have a veth pair, where the non-namespaced side of the veth is linked to a bridge. I set it up like this:

ip link add veth0 type veth peer name brveth0
ip link set brveth0 up

ip link add veth1 type veth peer name brveth1
ip link set brveth1 up

ip link add br10 type bridge
ip link set br10 up

ip addr add 192.168.1.11/24 brd + dev br10

ip netns add ns0
ip netns add ns1

ip link set veth0 netns ns0
ip link set veth1 netns ns1



ip netns exec ns0    ip addr add 192.168.1.20/24 dev veth0
ip netns exec ns0    ip link set veth0 up
ip netns exec ns0    ip link set lo up

ip netns exec ns1    ip addr add 192.168.1.21/24 dev veth1
ip netns exec ns1    ip link set veth1 up
ip netns exec ns1    ip link set lo up


ip link set  brveth0 master br10
ip link set  brveth1 master br10

As expected, I can ping the interface in ns0 from ns1.

$ sudo ip netns exec ns1 ping -c 3  192.168.1.20
PING 192.168.1.20 (192.168.1.20) 56(84) bytes of data.
64 bytes from 192.168.1.20: icmp_seq=1 ttl=64 time=0.099 ms
64 bytes from 192.168.1.20: icmp_seq=2 ttl=64 time=0.189 ms

But, I can't connect the two over TCP. For example, running a server in ns0 :

$ sudo ip netns exec ns0 python3 -m http.server 8080
Serving HTTP on 0.0.0.0 port 8080 (http://0.0.0.0:8080/)  ...

I would expect to be able to curl it from ns1, but that yields an error:

$ sudo ip netns exec ns1 curl 192.168.1.20:8080
curl: (7) Failed to connect to 192.168.1.20 port 8080: No route to host

Why is this happening?

Lee Avital (203 rep)

Oct 11, 2019, 12:25 AM • Last activity: Apr 14, 2025, 07:03 AM

0 votes

0 answers

40 views

How to modify a mount namespace without having a working mount command in it?

linux mount container namespace

I have a process which is running in a container I want to debug. To debug it, I want to modify that mount namespace (most importantly: I want to mount my toolkit root into it). How to do that from a root shell? I can execute anything with an `nsenter`, but of course I can not (and do not want to) e...

                                  I have a process which is running in a container I want to debug. To debug it, I want to modify that mount namespace (most importantly: I want to mount my toolkit root into it).

How to do that from a root shell? I can execute anything with an nsenter, but of course I can not (and do not want to) execute a mount from the identified namespace. I would like to simply alter another namespace, a different one from the location of my own mount binary.

How to do that?

peterh (10448 rep)

Apr 7, 2025, 05:16 PM • Last activity: Apr 7, 2025, 07:19 PM

0 votes

1 answers

42 views

Relationship between CLONE_NEWUSER, `/bin/unshare` and `unshare(2)` as it relates to User Namespace

linux users namespace documentation unshare

I am trying to comprehend some man7.org documentation about the User Namespace and the `/bin/unshare` command. I started by reading this page: https://man7.org/linux/man-pages/man7/user_namespaces.7.html On the page, there is a lot of mention of how the CLONE_NEWUSER flag can affect privileges...

                                  I am trying to comprehend some man7.org documentation
about the User Namespace and the /bin/unshare command.

I started by reading this page:

https://man7.org/linux/man-pages/man7/user_namespaces.7.html 

On the page, there is a lot of mention of
how the CLONE_NEWUSER flag can affect privileges and capabilities. 
But it is unclear to me whether unshare -U /bin/bash
or unshare -U -r /bin/bash uses CLONE_NEWUSER in any way.

So I visited unshare(1)  next to see if there is any explanation of the CLONE_NEWUSER flag usage in the /bin/unshare command. 
But there is no discussion about CLONE_NEWUSER on this page.

However, there is discussion about the CLONE_NEWUSER flag
on the system call unshare(2) . 
But it is unclear to me how the /bin/unshare is related to unshare(2) or if they are even related at all.

Can anyone explain the relationship between /bin/unshare -U /bin/bash and CLONE_NEWUSER and unshare(2)?

----
Note:

I am a front end HTML CSS developer
trying to learn all this for the first time. 
I welcome references to any reading material
to address gaps in knowledge about Linux basics.

                                

learningtech (631 rep)

Mar 21, 2025, 05:31 PM • Last activity: Mar 21, 2025, 10:58 PM

3 votes

1 answers

141 views

Why can't I connect a network namespace to the Internet?

networking iptables ip routing namespace

I've seen other answers on this site and read an [article][1] and watched a [video][2] on the topic, but I still can't connect my network namespace to the outside world. ## Setup I created a namespace named "foo" and a pair of `veth` interfaces, and moved one into the namespace. ```sh ip netns add f...

I've seen other answers on this site and read an article and watched a video on the topic, but I still can't connect my network namespace to the outside world. ## Setup I created a namespace named "foo" and a pair of veth interfaces, and moved one into the namespace.

ip netns add foo
ip link add veth-foo type veth peer name veth-out
ip link set dev veth-foo netns foo

I assigned each interface an IP address and made sure they're both up.

ip -n foo addr add 192.168.15.1 dev veth-foo
ip addr add 192.168.15.2 dev veth-out
ip -n foo link set dev veth-foo up
ip link set dev veth-out up

# Just in case, I made sure the loopback interfaces, too, are up, though they still show "UNKNOWN".
ip link set dev lo up
ip -n foo link set dev lo up

I added entries to the routing tables of both the global and the "foo" namespaces, so they can talk to each other.

ip route add 192.168.15.1 via 192.168.15.2
ip -n foo route add default via 192.168.15.1

Now, I can reach "foo" from the global namespace and the global namespace from "foo".

$ traceroute -n 192.168.15.1
traceroute to 192.168.15.1 (192.168.15.1), 30 hops max, 60 byte packets
 1  192.168.15.1  0.257 ms  0.209 ms  0.194 ms

$ ip netns exec foo traceroute -n 192.168.15.2
traceroute to 192.168.15.2 (192.168.15.2), 30 hops max, 60 byte packets
 1  192.168.15.2  0.046 ms  0.009 ms  0.008 ms

I can also reach the ethernet interface that connects the VM to the outside world from inside "foo".

# I ran this after I finished setting up IP forwarding, packet forwarding,
# and IP masquerading, so I'm not sure if it would work at this stage.
$ ip netns exec foo traceroute -n 10.0.2.15
traceroute to 10.0.2.15 (10.0.2.15), 30 hops max, 60 byte packets
 1  10.0.2.15  0.065 ms  0.010 ms  0.008 ms

Finally, I set up IP forwarding, packet forwarding, and IP masquerading.

echo 1 > /proc/sys/net/ipv4/ip_forward
iptables -A FORWARD -o enp1s0 -i veth-out -j ACCEPT
iptables -A FORWARD -i enp1s0 -o veth-out -j ACCEPT
iptables -t nat -A POSTROUTING -s 192.168.15.1/24 -o enp1s0 -j MASQUERADE

As a result, my system looks like this:

$ sysctl -a | grep ip_forward
net.ipv4.ip_forward = 1
net.ipv4.ip_forward_update_priority = 1
net.ipv4.ip_forward_use_pmtu = 0

$ iptables -t nat -L -v
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain POSTROUTING (policy ACCEPT 102 packets, 6816 bytes)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 MASQUERADE  all  --  any    enp1s0  192.168.15.0/24      anywhere

$ ip addr
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute 
       valid_lft forever preferred_lft forever
2: enp1s0:  mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:c3:cd:ac brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic noprefixroute enp1s0
       valid_lft 83383sec preferred_lft 83383sec
    inet6 fec0::11b8:4b3b:59ba:bae4/64 scope site dynamic noprefixroute 
       valid_lft 86026sec preferred_lft 14026sec
    inet6 fe80::f3fd:90f2:d15f:d570/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
3: veth-out@if4:  mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether da:a2:13:05:c4:f5 brd ff:ff:ff:ff:ff:ff link-netns foo
    inet 192.168.15.2/32 scope global veth-out
       valid_lft forever preferred_lft forever
    inet6 fe80::d8a2:13ff:fe05:c4f5/64 scope link proto kernel_ll 
       valid_lft forever preferred_lft forever

$ ip -n foo addr
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host proto kernel_lo 
       valid_lft forever preferred_lft forever
4: veth-foo@if3:  mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 7e:84:e6:16:92:8e brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.15.1/32 scope global veth-foo
       valid_lft forever preferred_lft forever
    inet6 fe80::7c84:e6ff:fe16:928e/64 scope link proto kernel_ll 
       valid_lft forever preferred_lft forever

$ ip route
default via 10.0.2.2 dev enp1s0 proto dhcp src 10.0.2.15 metric 100 
10.0.2.0/24 dev enp1s0 proto kernel scope link src 10.0.2.15 metric 100 
192.168.15.1 via 192.168.15.2 dev veth-out

$ ip -n foo route
default via 192.168.15.1 dev veth-foo

## Testing At this point, I expect to be able to reach the outside world, but no.

$ ip netns exec foo traceroute -n 8.8.8.8
traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets
 1  192.168.15.1  3067.680 ms !H  3067.655 ms !H  3067.650 ms !H

$ sudo ip netns exec foo ping -c 3 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
From 192.168.15.1 icmp_seq=1 Destination Host Unreachable
From 192.168.15.1 icmp_seq=2 Destination Host Unreachable
From 192.168.15.1 icmp_seq=3 Destination Host Unreachable

--- 8.8.8.8 ping statistics ---
3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2077ms

Of course, the VM itself is connected to the Internet.

$ traceroute -n 8.8.8.8
traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets
 1  10.0.2.2  0.719 ms  0.691 ms  0.676 ms
 2  192.168.100.1  1.913 ms  2.593 ms  5.264 ms
 3  31.146.255.37  18.493 ms  18.740 ms  19.041 ms
 4  188.123.128.85  19.384 ms  19.658 ms  19.925 ms
 5  188.123.128.96  20.275 ms  21.787 ms 188.123.128.84  21.773 ms
 6  192.178.69.213  47.953 ms  53.145 ms  53.127 ms
 7  192.178.69.212  53.116 ms  51.893 ms 188.123.128.33  51.293 ms
 8  192.178.107.87  48.513 ms 192.178.107.135  43.582 ms 192.178.107.203  43.391 ms
 9  72.14.237.137  43.195 ms 8.8.8.8  43.207 ms  43.200 ms

$ ping -c 3 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=255 time=40.2 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=255 time=37.9 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=255 time=38.0 ms

--- 8.8.8.8 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 37.859/38.698/40.205/1.067 ms

### Tcpdump Here's the output of tcpdump -n -i veth-out icmp. It captured packets when I targeted 192.168.15.2 & 10.0.2.15, but got nothing when I targeted 8.8.8.8.

listening on veth-out, link-type EN10MB (Ethernet), snapshot length 262144 bytes

# This is the output when I ran traceroute -n 192.168.15.2 (the address
# of "veth-out") in another terminal window (from inside "foo", of course).
12:44:19.172007 IP 192.168.15.2 > 192.168.15.1: ICMP 192.168.15.2 udp port traceroute unreachable, length 68
12:44:19.172029 IP 192.168.15.2 > 192.168.15.1: ICMP 192.168.15.2 udp port mtrace unreachable, length 68
12:44:19.172046 IP 192.168.15.2 > 192.168.15.1: ICMP 192.168.15.2 udp port 33436 unreachable, length 68
12:44:19.172063 IP 192.168.15.2 > 192.168.15.1: ICMP 192.168.15.2 udp port 33437 unreachable, length 68
12:44:19.172102 IP 192.168.15.2 > 192.168.15.1: ICMP 192.168.15.2 udp port 33438 unreachable, length 68
12:44:19.172119 IP 192.168.15.2 > 192.168.15.1: ICMP 192.168.15.2 udp port 33439 unreachable, length 68

# And this is when I ran the same command but addressed 10.0.2.15 (the
# ethernet interface to the outside world).
12:44:35.305689 IP 10.0.2.15 > 192.168.15.1: ICMP 10.0.2.15 udp port traceroute unreachable, length 68
12:44:35.305715 IP 10.0.2.15 > 192.168.15.1: ICMP 10.0.2.15 udp port mtrace unreachable, length 68
12:44:35.305733 IP 10.0.2.15 > 192.168.15.1: ICMP 10.0.2.15 udp port 33436 unreachable, length 68
12:44:35.305750 IP 10.0.2.15 > 192.168.15.1: ICMP 10.0.2.15 udp port 33437 unreachable, length 68
12:44:35.305766 IP 10.0.2.15 > 192.168.15.1: ICMP 10.0.2.15 udp port 33438 unreachable, length 68
12:44:35.305783 IP 10.0.2.15 > 192.168.15.1: ICMP 10.0.2.15 udp port 33439 unreachable, length 68
^C
12 packets captured
12 packets received by filter
0 packets dropped by kernel

Neither tcpdump -n -i lo icmp nor tcpdump -n -i enp1s0 icmp captured any packets, regardless of the target of traceroute—yes, even when "foo" successfully reached the "enp1s0" interface (addressed 10.0.2.15). # System Information This was done inside a VM (GNOME Boxes), on Fedora 41, kernel version 6.11.4-301.fc41.x86_64. My host machine is also running Fedora 41, though the kernel is at version 6.13.5-200.fc41.x86_64. **Edit:** Just in case this was a Fedora problem, I tested it in a Mint VM, and the exact same thing happened.

verified_tinker (133 rep)

Mar 8, 2025, 06:22 AM • Last activity: Mar 9, 2025, 06:12 AM

0 votes

0 answers

15 views

How to uses rsyslog with more than one hostname via Linux Namespaces?

hostname namespace

I am trying to learn about UTS namespace. I want to write some log entries from both a parent namespace and a child UTS namespace. This is for demonstration purposes, so it doesn't matter if rsyslog writes to the same log file or separate log files for each namespace. The only thing that matters is that the child namespace writes logs with a different hostname from parent namespace. But I can't seem to preserve the two different hostnames. Here is my latest attempt: I have two SSH terminal windows opened both with the same user. **Go to Terminal 1**

me@localhost: sudo unshare --uts /bin/bash
root@localhost: hostname api1
root@localhost: hostname
api1
root@localhost: rsyslogd -n -i /var/run/rsyslogd-child.pid -f /etc/rsyslog.conf &
root@localhost: logger foochild1

` **Go to Terminal 2**

me@localhost: logger fooparent1

Now in my /var/log/syslog, both entries were recorded with the hostname from the child namespace:

Mar  8 22:47:12 api1 root: foochild1
Mar  8 22:47:33 api1 me: fooparent1

Can someone suggest to me a way to write log files from different namespaces while also preserving the hostname of the respective namespaces?

learningtech (631 rep)

Mar 8, 2025, 11:00 PM

7 votes

3 answers

3892 views

firejail : only let a program access localhost

networking namespace network-namespaces sandbox firejail

I have this local network service and this client program needing to access it. I am running them both as an unprivileged user. I am looking for a way to sandbox the client using firejail, in a way that it cannot access network, except for localhost (or even better, except for that service). first t...

                                  I have this local network service and this client program needing to access it. I am running them both as an unprivileged user.

I am looking for a way to sandbox the client using firejail, in a way that it cannot access network, except for localhost (or even better, except for that service).
first thing I tried was of course

    firejail --net=lo program
But it didn’t work.

    Error: cannot attach to lo device

I think I could work around it by creating a virtual network interface, for example veth0 and veth1, 
moving veth1 to a new network namespace in which I’d run the service
and using firejail to restrain the client to veth0

Is there a way to actually automate this setting in a firejail profile, so that all of these interfaces are created and veth1 is moved when I type 
     
    firejail server
(without having to run anything as root)?

Or is there a simpler way solve this problem? (I cannot run both the client and the service in the same namespace, because the service needs to access the network)
                                

tbrugere (1084 rep)

Oct 27, 2018, 03:56 PM • Last activity: Feb 10, 2025, 05:50 PM

0 votes

1 answers

109 views

How do I change the default namespace used by kubectl?

namespace kubernetes

When using `kubectl`, for various operations a namespace is required. Typically it uses `default` as the default namespace, and a different namespace can be set using `-n`. But in my work, all resources relevant to me are in a team-namespace, so I never use the `default` namespace for anything and a...

                                  When using kubectl, for various operations a namespace is required. Typically it uses default as the default namespace, and a different namespace can be set using -n. But in my work, all resources relevant to me are in a team-namespace, so I never use the default namespace for anything and always have to use -n .

How can I set ` as the default so that I don't need to use the -n` option?

I'm not interested in installing anything extra.

---

Note: this is part of a series of posts meant to import relevant answers from other Stack Exchange sites here to make them easier to find, and also make it easier to handle duplicates as cross-site duplicates aren't a thing.

muru (77471 rep)

Dec 22, 2024, 02:03 PM • Last activity: Jan 31, 2025, 03:33 AM

1 votes

1 answers

99 views

`nsenter` `--root`: symlink vs. regular dir path

namespace

I am noticing a weird behavior for `nsenter` which I am looking some explanation for. When I enter the namespaces of another process created with `unshare` I observe the differences in resulting behavior between cases where I specify root directory as a regular path vs. using `/proc/PID/root` symlin...

I am noticing a weird behavior for nsenter which I am looking some explanation for. When I enter the namespaces of another process created with unshare I observe the differences in resulting behavior between cases where I specify root directory as a regular path vs. using /proc/PID/root symlink. Here is the sample setup. 1. Prepare target process

-sh
    sudo unshare --mount --mount-proc --pid --fork --root /tmp/jail bash

/tmp/jail has a linux distributive inside (I prepare it via docker export using ubuntu image):

-shellsession
    $ docker run ubuntu
     (grab the container ID)
    $ docker export 8c67e1fb5443 > ubuntu.tar
    $ mkdir /tmp/jail && cd /tmp/jail && tar -xf ~/ubuntu.tar

2. From another terminal try entering the namespaces of that process

-sh
    sudo nsenter --target=28716 --root=/tmp/jail --all bash

3. Try ps command Here I observe the error:

-shellsession
    root@ubuntu:/proc# ps
    Error, do this: mount -t proc proc /proc
    
    root@ubuntu:/proc# mount
    mount: failed to read mtab: No such file or directory

If I however specify --root=/proc/28716/root for nsenter it suddenly starts working.

-shellsession
$ sudo readlink /proc/28716/root
/tmp/jail

$ sudo nsenter --target=28716 --root=/proc/28716/root --all bash

root@ubuntu:.# ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.0  0.0   4624  3712 ?        S+   21:04   0:00 bash
root          31  0.0  0.0   4624  3712 ?        S    21:10   0:00 bash
root          39  0.0  0.0   7060  2944 ?        R+   21:10   0:00 ps aux

root@ubuntu:.# mount
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)

What are the reasons for this behavior? Why symlink vs. regular path makes a difference given they point to the same directory?

-shellsession
$ sudo nsenter --target=28716 --root=/proc/28716/root --all bash  # WORKS GOOD
$ sudo nsenter --target=28716 --root=/tmp/jail --all bash         # DOES NOT WORK GOOD

 (where /proc/28716/root is symlink to /tmp/jail)

Neither strace nor source code of nsenter seem to suggest the explanation for these differences.

6.8.0-49-generic
Ubuntu 24.04.1 LTS

Eugene D. Gubenkov (113 rep)

Dec 7, 2024, 09:18 PM • Last activity: Dec 9, 2024, 09:11 PM

Showing page 1 of 20 total questions