Unix & Linux Stack Exchange
Q&A for users of Linux, FreeBSD and other Unix-like operating systems
Latest Questions
14
votes
1
answers
5016
views
How can I make a device available inside a systemd-nspawn container with user namespacing?
I would like to mount an encrypted image file using `cryptsetup` inside a [`systemd-nspawn`][systemd-nspawn] container. However, I get this error message: [root@container ~]# echo $key | cryptsetup -d - open luks.img luks Cannot initialize device-mapper. Is dm_mod kernel module loaded? Cannot use de...
I would like to mount an encrypted image file using
cryptsetup
inside a systemd-nspawn
container. However, I get this error message:
[root@container ~]# echo $key | cryptsetup -d - open luks.img luks
Cannot initialize device-mapper. Is dm_mod kernel module loaded?
Cannot use device luks, name is invalid or still in use.
The dm_mod
kernel module is loaded on the host system, although things look a bit weird inside the container:
[root@host ~]# grep dm_mod /proc/modules
dm_mod 159744 2 dm_crypt, Live 0xffffffffc12c6000
[root@container ~]# grep dm_mod /proc/modules
dm_mod 159744 2 dm_crypt, Live 0x0000000000000000
strace
indicates that cryptsetup
is unable to create /dev/mapper/control
:
[root@etrial ~]# echo $key | strace cryptsetup -d - open luks.img luks 2>&1 | grep mknod
mknod("/dev/mapper/control", S_IFCHR|0600, makedev(0xa, 0xec)) = -1 EPERM (Operation not permitted)
I am not too sure why this is happening. I am starting the container with the systemd-nspawn@.service
template unit , which seems like it should allow access to the device mapper:
# nspawn can set up LUKS encrypted loopback files, in which case it needs
# access to /dev/mapper/control and the block devices /dev/mapper/*.
DeviceAllow=/dev/mapper/control rw
DeviceAllow=block-device-mapper rw
Reading this comment on a related question about USB devices , I wondered whether the solution was to add a bind mount for /dev/mapper
. However, cryptsetup
gives me the same error message inside the container. When I strace
it, it looks like there's still a permissions issue:
# echo $key | strace cryptsetup open luks.img luks --key-file - 2>&1 | grep "/dev/mapper"
stat("/dev/mapper/control", {st_mode=S_IFCHR|0600, st_rdev=makedev(0xa, 0xec), ...}) = 0
openat(AT_FDCWD, "/dev/mapper/control", O_RDWR) = -1 EACCES (Permission denied)
# ls -la /dev/mapper
total 0
drwxr-xr-x 2 nobody nobody 60 Dec 13 14:33 .
drwxr-xr-x 8 root root 460 Dec 15 14:54 ..
crw------- 1 nobody nobody 10, 236 Dec 13 14:33 control
Apparently, this is happening because the template unit enables user namespacing, which I want anyway for security reasons. As explained in the documentation :
>In most cases, using --private-users=pick
is the recommended option as it enhances container security massively and operates fully automatically in most cases ... [this] is the default if the systemd-nspawn@.service
template unit file is used ...
>
>Note that when [the --bind
option] is used in combination with --private-users
, the resulting mount points will be owned by the nobody
user. That's because the mount and its files and directories continue to be owned by the relevant host users and groups, which do not exist in the container, and thus show up under the wildcard UID 65534 (nobody
). If such bind mounts are created, it is recommended to make them read-only, using --bind-ro=
.
Presumably I won't be able to do anything with read-only permissions to /dev/mapper
. So, is there any way I can get cryptsetup
to work inside the container, so that my application can create and mount arbitrary encrypted volumes at runtime, without disabling user namespacing?
## Related questions
* systemd-nspawn: file-system permissions for a bound folder relates to files rather than devices, and the only answer just says that "-U
is mostly incompatible with rw --bind
."
* systemd-nspawn: how to allow access to all devices doesn't deal with user namespacing and there are no answers.
sjy
(956 rep)
Dec 15, 2019, 02:53 AM
• Last activity: Jul 31, 2025, 03:10 AM
6
votes
4
answers
23639
views
Podman errors on tar with potentially insufficient UIDs or GIDs available in user namespace
When I run `podman run` I'm getting a particularly weird error, ```shell ❯ podman run -ti --restart=unless-stopped -p 80:80 -p 443:443 rancher/rancher:latest ✔ docker.io/rancher/rancher:latest Trying to pull docker.io/rancher/rancher:latest... Getting image source signatures [... blob copying...] Wr...
When I run
podman run
I'm getting a particularly weird error,
❯ podman run -ti --restart=unless-stopped -p 80:80 -p 443:443 rancher/rancher:latest
✔ docker.io/rancher/rancher:latest
Trying to pull docker.io/rancher/rancher:latest...
Getting image source signatures
[... blob copying...]
Writing manifest to image destination
Storing signatures
Error processing tar file(exit status 1): potentially insufficient UIDs or GIDs available in user namespace (requested 630384594:600260513 for /usr/bin/etcdctl): Check /etc/subuid and /etc/subgid: lchown /usr/bin/etcdctl: invalid argument
Error: Error committing the finished image: error adding layer with blob "sha256:b4b03dbaa949daab471f94bcfd68cbe21c1147e8ec2acfe3f46f1520db48baeb": Error processing tar file(exit status 1): potentially insufficient UIDs or GIDs available in user namespace (requested 630384594:600260513 for /usr/bin/etcdctl): Check /etc/subuid and /etc/subgid: lchown /usr/bin/etcdctl: invalid argument
What does _"potentially insufficient UIDs or GIDs available in user namespace"_ mean and how can I remedy this problem?
Evan Carroll
(34663 rep)
Feb 3, 2022, 07:43 PM
• Last activity: Jul 3, 2025, 05:48 PM
2
votes
1
answers
86
views
How does a cgroup namespace work?
I’m trying to understand how cgroup namespaces work, but I’m stuck on something that doesn’t make sense to me. My understanding is that a cgroup namespace should virtualize the cgroup hierarchy for a process, so that the process sees its current cgroup as / and doesn’t see the full host hierarchy. S...
I’m trying to understand how cgroup namespaces work, but I’m stuck on something that doesn’t make sense to me.
My understanding is that a cgroup namespace should virtualize the cgroup hierarchy for a process, so that the process sees its current cgroup as / and doesn’t see the full host hierarchy.
So I tried to create a cgroup namespace like this:
sudo unshare --cgroup
cat /proc/self/cgroup
0::/
echo $$
3183
Then, from another terminal on the host, I checked the cgroup for that process:
cat /proc/3183/cgroup
0::/user.slice/user-1000.slice/user@1000.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-ffe09412-f0d6-413e-b480-6d14f1290f84.scope
This matches what the man page says:
Cgroup namespaces virtualize the view of a process's cgroups (see cgroups(7)) as seen via /proc/[pid]/cgroup and /proc/[pid]/mountinfo.
Each cgroup namespace has its own set of cgroup root directories.
These root directories are the base points for the relative locations displayed in the corresponding records in the /proc/[pid]/cgroup file.
However, when I create a new cgroup inside my cgroup namespace, it appears in the host’s hierarchy too:
# Inside the namespace:
mkdir /sys/fs/cgroup/test
# On the host:
ls /sys/fs/cgroup/
...
test
...
So it seems that the entire host hierarchy is still visible and any new cgroup I make is visible system-wide. There’s no real isolation — from inside the namespace I can still see and modify all the host’s cgroups.
I also tried combining it with a user namespace to avoid sudo but the result is the same:
unshare --map-root-user
unshare --cgroup
ls /sys/fs/cgroup/
Again, I see the full host hierarchy.
So my questions are:
- Am I misunderstanding how cgroup namespaces are supposed to work?
- Is the cgroup namespace not designed to isolate the entire hierarchy like mount or PID namespaces do?
- Is there a correct way to use them to limit what cgroups are visible or writable?
Any clarification would be really appreciated!
Liric Ramer
(85 rep)
Jun 27, 2025, 10:22 AM
• Last activity: Jun 29, 2025, 09:52 AM
1
votes
0
answers
33
views
Linux mount namespaces - umount event propagates unexpectedly
I was reading about mount namespaces and encountered something that seemed odd to me. I'm using **Ubuntu22.04** I have a USB device connected to my machine: ubuntu@ubuntu-2204:/media/ubuntu$ cat /proc/self/mountinfo | grep -i media 3034 29 8:17 / /media/ubuntu/30A8-7347 rw,nosuid,nodev,relatime shar...
I was reading about mount namespaces and encountered something that seemed odd to me. I'm using **Ubuntu22.04**
I have a USB device connected to my machine:
ubuntu@ubuntu-2204:/media/ubuntu$ cat /proc/self/mountinfo | grep -i media
3034 29 8:17 / /media/ubuntu/30A8-7347 rw,nosuid,nodev,relatime shared:675 - vfat /dev/sdb1 rw,uid=1000,gid=1000,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,showexec,utf8,flush,errors=remount-ro
Now I start running 2 shells:
- #1 - A "regular" shell.
- #2 - A shell that's started using
unshare -m /bin/bash
. Running unshare
without specifying --propagation unchanged
will result in an implicit mount --make-rprivate /
in the new mount namespace. Great.
Before I do anything the USB mount is observable from both shells. This is expected.
Now I run umount /media/ubuntu/30A8-7347
in shell #1 and to my surprise the mount is no longer visible from the second shell either. However when I run the command from shell #2 - the mount is still visible from #1 as expected.
My question is why does the umount
command propagate to the mount namespace shell #2 resides in? Does it have something to do with the fact that /
was SHARED
before I created the second mount namespace? It doesn't make much sense to me if this is indeed the case.
As a note, when I create shell #2 prior to connecting the USB - the mount event of the USB connection **does not** propagate to the second mount namespace as expected.
EL_9
(111 rep)
Jun 28, 2025, 07:29 PM
2
votes
1
answers
56
views
How to enable internet access for a bridge inside a Linux network namespace?
I've created two Linux network namespaces (ns1 and ns2), and inside each, I have: - A bridge (ns1-br0, ns2-br0) - A TAP device (tap0, tap1) connected to the respective bridge - Each TAP device gets an IP address like 10.0.0.2/24. The problem is: I want devices like tap0 and tap1 to access the intern...
I've created two Linux network namespaces (ns1 and ns2), and inside each, I have:
- A bridge (ns1-br0, ns2-br0)
- A TAP device (tap0, tap1) connected to the respective bridge
- Each TAP device gets an IP address like 10.0.0.2/24.
The problem is: I want devices like tap0 and tap1 to access the internet, but I'm confused about how to set up routing and NAT properly.
The host has internet access via eth0.
How do I:
- Connect the namespace's bridge to the outside world?
- Use NAT or MASQUERADE correctly so that TAP devices can access the internet?
- Assign default gateways?

Bhautik Chudasama
(121 rep)
Jun 18, 2025, 04:03 PM
• Last activity: Jun 18, 2025, 08:57 PM
0
votes
0
answers
34
views
Bridging containers to external VLAN
I have a physical network with several VLANs. One of my computers (my main workstation) is connected to two different VLANs on this network, one tagged, the other not. I have successfully set this computer up on both VLANs by making a VLAN clone interface, but I discovered that in order to actually...
I have a physical network with several VLANs. One of my computers (my main workstation) is connected to two different VLANs on this network, one tagged, the other not.
I have successfully set this computer up on both VLANs by making a VLAN clone interface, but I discovered that in order to actually receive packets on that interface I had to change the MAC. It seems that the Linux network stack (or maybe the acceleration on the card) looks at the MAC and if it matches, ignores the VLAN tag.
I now want to attach this interface to a bridge somehow and then also have containers attach to this same bridge. I know enough about how containers are constructed that I can do this by hand after whatever container system I'm using (
podman
in this case) sets the container up.
The reason I want this is that I'm working on an IPv6 broadcast/multicast protocol that will only work for a local LAN, and in order to test it, I want to have several copies of the servent running in different containers so they can communicate with each other.
I've tried this in the obvious way, but none of the packets that are explicitly destined for one of the containers ever makes it to them. I suspect this is because the card or the Linux network stack is just dropping them at the physical interface when their destination MAC doesn't match any of the MACs assigned to the interface.
What would be a good way to accomplish this? Should I ask this on Server Fault or Stack Overflow instead?
Omnifarious
(1412 rep)
Jun 1, 2025, 03:51 AM
4
votes
1
answers
2247
views
systemd "Failed to set up mount namespacing" in Docker container
I recently updated a Docker that uses systemd internally from Debian stretch to Debian buster. And since then it's not working. So accoding to `systemctl status` it fails to setup the namespace: ``` Dec 10 14:22:11 f6f3e33e6bf2 systemd[1]: Starting OpenVPN tunnel for apu__ssl_vpn_config... Dec 10 14...
I recently updated a Docker that uses systemd internally from Debian stretch to Debian buster.
And since then it's not working.
So accoding to
systemctl status
it fails to setup the namespace:
Dec 10 14:22:11 f6f3e33e6bf2 systemd: Starting OpenVPN tunnel for apu__ssl_vpn_config...
Dec 10 14:22:11 f6f3e33e6bf2 systemd: openvpn-client@apu__ssl_vpn_config.service: Failed to set up mount namespacing: Permission denied
Dec 10 14:22:11 f6f3e33e6bf2 systemd: openvpn-client@apu__ssl_vpn_config.service: Failed at step NAMESPACE spawning /usr/sbin/openvpn: Permission denied
Dec 10 14:22:11 f6f3e33e6bf2 systemd: openvpn-client@apu__ssl_vpn_config.service: Main process exited, code=exited, status=226/NAMESPACE
Dec 10 14:22:11 f6f3e33e6bf2 systemd: openvpn-client@apu__ssl_vpn_config.service: Failed with result 'exit-code'.
Dec 10 14:22:11 f6f3e33e6bf2 systemd: Failed to start OpenVPN tunnel for apu__ssl_vpn_config.
Now I have had a similar issue with elasticsearch, which I fixed by adding a drop in unit config containing:
[Service]
PrivateTmp=false
NoNewPrivileges=yes
Though sadly that doesn't fix the issue this time.
I also found this exact issue occuring when using LXC (or LXD?), though I don't know how to fix this with docker.
Additionally I start the container like this:
docker run -dt \
--tmpfs /run --tmpfs /tmp \
--volume /sys/fs/cgroup:/sys/fs/cgroup:ro \
--device=/dev/net/tun \
--cap-add SYS_ADMIN \
--cap-add NET_ADMIN \
Any pointers are much apprechiated.
BrainStone
(3784 rep)
Dec 10, 2019, 03:18 PM
• Last activity: May 22, 2025, 08:07 PM
1
votes
1
answers
35
views
How can I bind-mount a file into an existing directory tree inside a fresh user/mount namespace?
I would like to use `unshare` to create a new unprivileged user/mount namespace, with the goal of making a specific file appear at a specific location inside the child namespace. For example, assume that I would like `/home/user/path/to/file` to appear as `/opt/dir1/dir2/file`. However, `/opt` alrea...
I would like to use
unshare
to create a new unprivileged user/mount namespace, with the goal of making a specific file appear at a specific location inside the child namespace.
For example, assume that I would like /home/user/path/to/file
to appear as /opt/dir1/dir2/file
. However, /opt
already exists in the parent namespace and is not writable by the user that I'm starting with. This does not work:
user $ unshare -Urm
root # mount --bind /home/user/path/to/file /opt/dir1/dir2/file
mount: /opt/dir1/dir2/file: mount point does not exist.
dmesg(1) may have more information after failed mount system call.
I think the underlying issue here is that /opt/dir1/dir2
does not exist ahead of the attempt to make the bind mount. However, I'm not able to create that directory since /opt
is not writable in the parent:
root # mkdir -p /opt/dir1/dir2
mkdir: cannot create directory ‘/opt/dir1’: Permission denied
Is there a way to make this work so that the changes to /opt
are only visible inside my child namespace, allowing me to work around the permission issue?
Jason R
(657 rep)
May 20, 2025, 05:26 PM
• Last activity: May 20, 2025, 09:59 PM
4
votes
1
answers
7081
views
running a process in another namespace
I would like to run a new process (for example an xterm) in another network namespace. This could be done like this: sudo ip netns exec otherns sudo -u $USER xterm This command looks a bit complicated and involves running a `sudo` which runs `ip` which runs `sudo` which runs the final `xterm`. Is th...
I would like to run a new process (for example an xterm) in another network namespace. This could be done like this:
sudo ip netns exec otherns sudo -u $USER xterm
This command looks a bit complicated and involves running a
sudo
which runs ip
which runs sudo
which runs the final xterm
.
Is there a more direct way to run a process in a new namespace?
I was thinking of writing a own small (SUID or capability enabled) binary which switches namespace restores permissions and user and runs the command, but shouldn't there already be some standard tool doing exactly that?
This would allow me to simply call something like:
runns otherns xterm
michas
(21862 rep)
Jun 2, 2015, 10:40 PM
• Last activity: May 4, 2025, 07:08 AM
2
votes
1
answers
103
views
Mapping two users to host with user namespaces
I'm trying to understand whether it's possible to map two users from a **user namespace** to two different users on the host. The goal is to replicate the same permissions I have on my host inside a `rootfs` (Ubuntu base, because I'm trying to build a container from scratch). For example: - Everythi...
I'm trying to understand whether it's possible to map two users from a **user namespace** to two different users on the host.
The goal is to replicate the same permissions I have on my host inside a
rootfs
(Ubuntu base, because I'm trying to build a container from scratch).
For example:
- Everything under /
should belong to root
.
- /home/user
should belong to the regular user
.
To achieve this, I was thinking of using UID mapping in a user namespace, something like:
UID in user namespace ---> UID on host
1000 (admin) -> 0 (root)
1001 (bob) -> 1001 (bob)
Is this kind of mapping even possible?
Here’s what I’ve already tried:
- Running echo -e "1000 0 1\n1001 1001 1" > /proc/[PID]/uid_map
to define the mapping, but I get an error.
- Trying to manually modify /proc/[PID]/uid_map
using newuidmap
for each user.
However, I’ve never been able to map more than one user, and I can’t seem to map UID 0 (root) at all.
I’ve read the man pages and followed the constraints mentioned there, but I’m still getting error messages.
For example:
# terminal 1
unshare --user bash
echo $$ # 11591
# terminal 2 as user 'alex' (uid = 1000)
newuidmap 11591 0 0 1
# newuidmap: uid range [0-1) -> [0-1) not allowed
newuidmap 11591 1001 1001 1
# newuidmap: uid range [1001-1002) -> [1001-1002) not allowed
These commands fail, even when run with sudo
.
I also tried mapping to subuids that I’ve declared, but it still doesn’t work:
cat /etc/subuid
alex:100000:65536
root:200000:65536
self:300000:65536
cat /etc/subgid
alex:100000:65536
root:200000:65536
self:300000:65536
Liric Ramer
(85 rep)
Apr 16, 2025, 01:59 PM
• Last activity: Apr 27, 2025, 01:36 PM
0
votes
0
answers
29
views
Unexpected network namespace inode when accessing /var/run/netns/ from pod in host network namespace
I'm running a Kubernetes cluster with RKE2 v1.30.5+rke2r1 on Linux nixos 6.6.56 amd64, using Cilium CNI. Here's the setup: I have two pods (yaml manifests at the bottom): Pod A (xfrm-pod) is running in the default network namespace. Pod B (charon-pod) is running in the host network namespace (hostNe...
I'm running a Kubernetes cluster with RKE2 v1.30.5+rke2r1 on Linux nixos 6.6.56 amd64, using Cilium CNI.
Here's the setup:
I have two pods (yaml manifests at the bottom):
Pod A (xfrm-pod) is running in the default network namespace.
Pod B (charon-pod) is running in the host network namespace (hostNetwork: true).
On Pod A, I check the inode of its network namespace using:
readlink /proc/$$/ns/net
This gives the expected value, e.g., net:
.
Then i mount /var/run/netns
on pod B e.g. to /netns
and run ls -li /netns
, the inode for Pod A's network namespace is a strange value, like 53587.
Permission show this is the only file there is write access to. (I can delete it)
However, when I ls -li /var/run/netns
directly on the host, the inode and file name are what I expect: the correct namespace symlink and inode number.
Why is the inode different inside the host-network pod? And why does it appear writable, unlike other netns files?
Any idea why this happens, and how I can get consistent behavior inside host network pods?
Pod yaml manifests (fetched with kubectl get pod -o yaml since i create them in a controller in go):
Pod A:
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2025-04-24T14:57:55Z"
name: xfrm-pod
namespace: ims
resourceVersion: "7200524"
uid: dd08aa88-460f-4bdd-8019-82a433682825
spec:
containers:
- command:
- bash
- -c
- while true; do sleep 1000; done
image: ubuntu:latest
imagePullPolicy: Always
name: xfrm-container
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /netns
name: netns-dir
readOnly: true
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-cszxx
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: nixos
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
sysctls:
- name: net.ipv4.ip_forward
value: "1"
- name: net.ipv4.conf.all.rp_filter
value: "0"
- name: net.ipv4.conf.default.rp_filter
value: "0"
- name: net.ipv4.conf.all.arp_filter
value: "1"
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- hostPath:
path: /var/run/netns/
type: Directory
name: netns-dir
- name: kube-api-access-cszxx
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
Pod B:
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2025-04-24T14:57:45Z"
labels:
ipserviced: "true"
name: charon-pod
namespace: ims
resourceVersion: "7200483"
uid: 1c5542ba-16c8-4105-9556-7519ea50edef
spec:
containers:
- image: someimagewithstrongswan
imagePullPolicy: IfNotPresent
name: charondaemon
resources: {}
securityContext:
allowPrivilegeEscalation: false
capabilities:
add:
- NET_ADMIN
- NET_RAW
- NET_BIND_SERVICE
drop:
- ALL
seccompProfile:
type: RuntimeDefault
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/
name: charon-volume
- mountPath: /etc/swanctl
name: charon-conf
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-jjkpm
readOnly: true
- image: someimagewithswanctl
imagePullPolicy: Always
name: restctl
resources: {}
securityContext:
allowPrivilegeEscalation: false
capabilities:
add:
- NET_ADMIN
drop:
- ALL
seccompProfile:
type: RuntimeDefault
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/
name: charon-volume
- mountPath: /etc/swanctl
name: charon-conf
- mountPath: /netns
name: netns-dir
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-jjkpm
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
hostIPC: true
hostNetwork: true
hostPID: true
initContainers:
- command:
- sh
- -c
- "echo 'someconfig'
> /etc/swanctl/swanctl.conf"
image: busybox:latest
imagePullPolicy: Always
name: create-conf
resources: {}
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
seccompProfile:
type: RuntimeDefault
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/swanctl
name: charon-conf
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-jjkpm
readOnly: true
nodeName: nixos
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- emptyDir: {}
name: charon-volume
- emptyDir: {}
name: charon-conf
- hostPath:
path: /var/run/netns/
type: Directory
name: netns-dir
- name: kube-api-access-jjkpm
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
rrekaF
(1 rep)
Apr 25, 2025, 07:07 AM
-1
votes
3
answers
184
views
is User Namespaces a security vulnerability and is it logical to disable in sysctl.conf?
A security rule of `RHEL 8 must disable the use of user namespaces.` states > Discussion: It is detrimental for operating systems to provide, or install by default, functionality exceeding requirements or mission objectives. These unnecessary capabilities or services are often overlooked and therefo...
A security rule of
RHEL 8 must disable the use of user namespaces.
states
> Discussion: It is detrimental for operating systems to provide, or install by default, functionality exceeding requirements or mission objectives. These unnecessary capabilities or services are often overlooked and therefore may remain unsecured. They increase the risk to the platform by providing additional attack vectors.
>
> Fix Text: user.max_user_namespaces
= 0 in a sysctl.conf followed by sysctl --system
- Does using user namespaces, for what I think is an unlimited (65535) value from a clean default install from rhel-8.10.iso, cause an *increased risk as an additional attack vector* ?
- Is User Namespaces an *unnecessary capability* ?
- Can the rationale behind user namespaces be stated here in layman's terms? Why is it a [good?] thing ?
ron
(8647 rep)
Apr 16, 2025, 06:57 PM
• Last activity: Apr 16, 2025, 11:34 PM
4
votes
1
answers
2322
views
I can ping across namespaces, but not connect with TCP
I'm trying to set up two network namespaces to communicate with eachother. I've set up two namespaces, `ns0` and `ns1` that each have a veth pair, where the non-namespaced side of the veth is linked to a bridge. I set it up like this: ``` ip link add veth0 type veth peer name brveth0 ip link set brv...
I'm trying to set up two network namespaces to communicate with eachother. I've set up two namespaces,
ns0
and ns1
that each have a veth pair, where the non-namespaced side of the veth is linked to a bridge.
I set it up like this:
ip link add veth0 type veth peer name brveth0
ip link set brveth0 up
ip link add veth1 type veth peer name brveth1
ip link set brveth1 up
ip link add br10 type bridge
ip link set br10 up
ip addr add 192.168.1.11/24 brd + dev br10
ip netns add ns0
ip netns add ns1
ip link set veth0 netns ns0
ip link set veth1 netns ns1
ip netns exec ns0 ip addr add 192.168.1.20/24 dev veth0
ip netns exec ns0 ip link set veth0 up
ip netns exec ns0 ip link set lo up
ip netns exec ns1 ip addr add 192.168.1.21/24 dev veth1
ip netns exec ns1 ip link set veth1 up
ip netns exec ns1 ip link set lo up
ip link set brveth0 master br10
ip link set brveth1 master br10
As expected, I can ping the interface in ns0
from ns1
.
$ sudo ip netns exec ns1 ping -c 3 192.168.1.20
PING 192.168.1.20 (192.168.1.20) 56(84) bytes of data.
64 bytes from 192.168.1.20: icmp_seq=1 ttl=64 time=0.099 ms
64 bytes from 192.168.1.20: icmp_seq=2 ttl=64 time=0.189 ms
But, I can't connect the two over TCP.
For example, running a server in ns0
:
$ sudo ip netns exec ns0 python3 -m http.server 8080
Serving HTTP on 0.0.0.0 port 8080 (http://0.0.0.0:8080/) ...
I would expect to be able to curl it from ns1
, but that yields an error:
$ sudo ip netns exec ns1 curl 192.168.1.20:8080
curl: (7) Failed to connect to 192.168.1.20 port 8080: No route to host
Why is this happening?
Lee Avital
(203 rep)
Oct 11, 2019, 12:25 AM
• Last activity: Apr 14, 2025, 07:03 AM
0
votes
0
answers
40
views
How to modify a mount namespace without having a working mount command in it?
I have a process which is running in a container I want to debug. To debug it, I want to modify that mount namespace (most importantly: I want to mount my toolkit root into it). How to do that from a root shell? I can execute anything with an `nsenter`, but of course I can not (and do not want to) e...
I have a process which is running in a container I want to debug. To debug it, I want to modify that mount namespace (most importantly: I want to mount my toolkit root into it).
How to do that from a root shell? I can execute anything with an
nsenter
, but of course I can not (and do not want to) execute a mount
from the identified namespace. I would like to simply alter another namespace, a different one from the location of my own mount
binary.
How to do that?
peterh
(10448 rep)
Apr 7, 2025, 05:16 PM
• Last activity: Apr 7, 2025, 07:19 PM
0
votes
1
answers
42
views
Relationship between CLONE_NEWUSER, `/bin/unshare` and `unshare(2)` as it relates to User Namespace
I am trying to comprehend some man7.org documentation about the User Namespace and the `/bin/unshare` command. I started by reading this page: https://man7.org/linux/man-pages/man7/user_namespaces.7.html On the page, there is a lot of mention of how the CLONE_NEWUSER flag can affect privileges...
I am trying to comprehend some man7.org documentation
about the User Namespace and the
/bin/unshare
command.
I started by reading this page:
https://man7.org/linux/man-pages/man7/user_namespaces.7.html
On the page, there is a lot of mention of
how the CLONE_NEWUSER flag can affect privileges and capabilities.
But it is unclear to me whether unshare -U /bin/bash
or unshare -U -r /bin/bash
uses CLONE_NEWUSER in any way.
So I visited unshare(1)
next to see if there is any explanation of the CLONE_NEWUSER flag usage in the /bin/unshare
command.
But there is no discussion about CLONE_NEWUSER on this page.
However, there is discussion about the CLONE_NEWUSER flag
on the system call unshare(2)
.
But it is unclear to me how the /bin/unshare
is related to unshare(2)
or if they are even related at all.
Can anyone explain the relationship between /bin/unshare -U /bin/bash
and CLONE_NEWUSER and unshare(2)
?
----
Note:
I am a front end HTML CSS developer
trying to learn all this for the first time.
I welcome references to any reading material
to address gaps in knowledge about Linux basics.
learningtech
(631 rep)
Mar 21, 2025, 05:31 PM
• Last activity: Mar 21, 2025, 10:58 PM
3
votes
1
answers
141
views
Why can't I connect a network namespace to the Internet?
I've seen other answers on this site and read an [article][1] and watched a [video][2] on the topic, but I still can't connect my network namespace to the outside world. ## Setup I created a namespace named "foo" and a pair of `veth` interfaces, and moved one into the namespace. ```sh ip netns add f...
I've seen other answers on this site and read an article and watched a video on the topic, but I still can't connect my network namespace to the outside world.
## Setup
I created a namespace named "foo" and a pair of
veth
interfaces, and moved one into the namespace.
ip netns add foo
ip link add veth-foo type veth peer name veth-out
ip link set dev veth-foo netns foo
I assigned each interface an IP address and made sure they're both up.
ip -n foo addr add 192.168.15.1 dev veth-foo
ip addr add 192.168.15.2 dev veth-out
ip -n foo link set dev veth-foo up
ip link set dev veth-out up
# Just in case, I made sure the loopback interfaces, too, are up, though they still show "UNKNOWN".
ip link set dev lo up
ip -n foo link set dev lo up
I added entries to the routing tables of both the global and the "foo" namespaces, so they can talk to each other.
ip route add 192.168.15.1 via 192.168.15.2
ip -n foo route add default via 192.168.15.1
Now, I can reach "foo" from the global namespace and the global namespace from "foo".
$ traceroute -n 192.168.15.1
traceroute to 192.168.15.1 (192.168.15.1), 30 hops max, 60 byte packets
1 192.168.15.1 0.257 ms 0.209 ms 0.194 ms
$ ip netns exec foo traceroute -n 192.168.15.2
traceroute to 192.168.15.2 (192.168.15.2), 30 hops max, 60 byte packets
1 192.168.15.2 0.046 ms 0.009 ms 0.008 ms
I can also reach the ethernet interface that connects the VM to the outside world from inside "foo".
# I ran this after I finished setting up IP forwarding, packet forwarding,
# and IP masquerading, so I'm not sure if it would work at this stage.
$ ip netns exec foo traceroute -n 10.0.2.15
traceroute to 10.0.2.15 (10.0.2.15), 30 hops max, 60 byte packets
1 10.0.2.15 0.065 ms 0.010 ms 0.008 ms
Finally, I set up IP forwarding, packet forwarding, and IP masquerading.
echo 1 > /proc/sys/net/ipv4/ip_forward
iptables -A FORWARD -o enp1s0 -i veth-out -j ACCEPT
iptables -A FORWARD -i enp1s0 -o veth-out -j ACCEPT
iptables -t nat -A POSTROUTING -s 192.168.15.1/24 -o enp1s0 -j MASQUERADE
As a result, my system looks like this:
$ sysctl -a | grep ip_forward
net.ipv4.ip_forward = 1
net.ipv4.ip_forward_update_priority = 1
net.ipv4.ip_forward_use_pmtu = 0
$ iptables -t nat -L -v
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain POSTROUTING (policy ACCEPT 102 packets, 6816 bytes)
pkts bytes target prot opt in out source destination
0 0 MASQUERADE all -- any enp1s0 192.168.15.0/24 anywhere
$ ip addr
1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host noprefixroute
valid_lft forever preferred_lft forever
2: enp1s0: mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 52:54:00:c3:cd:ac brd ff:ff:ff:ff:ff:ff
inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic noprefixroute enp1s0
valid_lft 83383sec preferred_lft 83383sec
inet6 fec0::11b8:4b3b:59ba:bae4/64 scope site dynamic noprefixroute
valid_lft 86026sec preferred_lft 14026sec
inet6 fe80::f3fd:90f2:d15f:d570/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: veth-out@if4: mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether da:a2:13:05:c4:f5 brd ff:ff:ff:ff:ff:ff link-netns foo
inet 192.168.15.2/32 scope global veth-out
valid_lft forever preferred_lft forever
inet6 fe80::d8a2:13ff:fe05:c4f5/64 scope link proto kernel_ll
valid_lft forever preferred_lft forever
$ ip -n foo addr
1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host proto kernel_lo
valid_lft forever preferred_lft forever
4: veth-foo@if3: mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 7e:84:e6:16:92:8e brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 192.168.15.1/32 scope global veth-foo
valid_lft forever preferred_lft forever
inet6 fe80::7c84:e6ff:fe16:928e/64 scope link proto kernel_ll
valid_lft forever preferred_lft forever
$ ip route
default via 10.0.2.2 dev enp1s0 proto dhcp src 10.0.2.15 metric 100
10.0.2.0/24 dev enp1s0 proto kernel scope link src 10.0.2.15 metric 100
192.168.15.1 via 192.168.15.2 dev veth-out
$ ip -n foo route
default via 192.168.15.1 dev veth-foo
## Testing
At this point, I expect to be able to reach the outside world, but no.
$ ip netns exec foo traceroute -n 8.8.8.8
traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets
1 192.168.15.1 3067.680 ms !H 3067.655 ms !H 3067.650 ms !H
$ sudo ip netns exec foo ping -c 3 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
From 192.168.15.1 icmp_seq=1 Destination Host Unreachable
From 192.168.15.1 icmp_seq=2 Destination Host Unreachable
From 192.168.15.1 icmp_seq=3 Destination Host Unreachable
--- 8.8.8.8 ping statistics ---
3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2077ms
Of course, the VM itself is connected to the Internet.
$ traceroute -n 8.8.8.8
traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets
1 10.0.2.2 0.719 ms 0.691 ms 0.676 ms
2 192.168.100.1 1.913 ms 2.593 ms 5.264 ms
3 31.146.255.37 18.493 ms 18.740 ms 19.041 ms
4 188.123.128.85 19.384 ms 19.658 ms 19.925 ms
5 188.123.128.96 20.275 ms 21.787 ms 188.123.128.84 21.773 ms
6 192.178.69.213 47.953 ms 53.145 ms 53.127 ms
7 192.178.69.212 53.116 ms 51.893 ms 188.123.128.33 51.293 ms
8 192.178.107.87 48.513 ms 192.178.107.135 43.582 ms 192.178.107.203 43.391 ms
9 72.14.237.137 43.195 ms 8.8.8.8 43.207 ms 43.200 ms
$ ping -c 3 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=255 time=40.2 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=255 time=37.9 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=255 time=38.0 ms
--- 8.8.8.8 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 37.859/38.698/40.205/1.067 ms
### Tcpdump
Here's the output of tcpdump -n -i veth-out icmp
. It captured packets when I targeted 192.168.15.2
& 10.0.2.15
, but got nothing when I targeted 8.8.8.8
.
listening on veth-out, link-type EN10MB (Ethernet), snapshot length 262144 bytes
# This is the output when I ran traceroute -n 192.168.15.2
(the address
# of "veth-out") in another terminal window (from inside "foo", of course).
12:44:19.172007 IP 192.168.15.2 > 192.168.15.1: ICMP 192.168.15.2 udp port traceroute unreachable, length 68
12:44:19.172029 IP 192.168.15.2 > 192.168.15.1: ICMP 192.168.15.2 udp port mtrace unreachable, length 68
12:44:19.172046 IP 192.168.15.2 > 192.168.15.1: ICMP 192.168.15.2 udp port 33436 unreachable, length 68
12:44:19.172063 IP 192.168.15.2 > 192.168.15.1: ICMP 192.168.15.2 udp port 33437 unreachable, length 68
12:44:19.172102 IP 192.168.15.2 > 192.168.15.1: ICMP 192.168.15.2 udp port 33438 unreachable, length 68
12:44:19.172119 IP 192.168.15.2 > 192.168.15.1: ICMP 192.168.15.2 udp port 33439 unreachable, length 68
# And this is when I ran the same command but addressed 10.0.2.15 (the
# ethernet interface to the outside world).
12:44:35.305689 IP 10.0.2.15 > 192.168.15.1: ICMP 10.0.2.15 udp port traceroute unreachable, length 68
12:44:35.305715 IP 10.0.2.15 > 192.168.15.1: ICMP 10.0.2.15 udp port mtrace unreachable, length 68
12:44:35.305733 IP 10.0.2.15 > 192.168.15.1: ICMP 10.0.2.15 udp port 33436 unreachable, length 68
12:44:35.305750 IP 10.0.2.15 > 192.168.15.1: ICMP 10.0.2.15 udp port 33437 unreachable, length 68
12:44:35.305766 IP 10.0.2.15 > 192.168.15.1: ICMP 10.0.2.15 udp port 33438 unreachable, length 68
12:44:35.305783 IP 10.0.2.15 > 192.168.15.1: ICMP 10.0.2.15 udp port 33439 unreachable, length 68
^C
12 packets captured
12 packets received by filter
0 packets dropped by kernel
Neither tcpdump -n -i lo icmp
nor tcpdump -n -i enp1s0 icmp
captured any packets, regardless of the target of traceroute
—yes, even when "foo" successfully reached the "enp1s0" interface (addressed 10.0.2.15
).
# System Information
This was done inside a VM (GNOME Boxes), on Fedora 41, kernel version 6.11.4-301.fc41.x86_64.
My host machine is also running Fedora 41, though the kernel is at version 6.13.5-200.fc41.x86_64.
**Edit:** Just in case this was a Fedora problem, I tested it in a Mint VM, and the exact same thing happened.
verified_tinker
(133 rep)
Mar 8, 2025, 06:22 AM
• Last activity: Mar 9, 2025, 06:12 AM
0
votes
0
answers
15
views
How to uses rsyslog with more than one hostname via Linux Namespaces?
I am trying to learn about UTS namespace. I want to write some log entries from both a parent namespace and a child UTS namespace. This is for demonstration purposes, so it doesn't matter if `rsyslog` writes to the same log file or separate log files for each namespace. The only thing that matters i...
I am trying to learn about UTS namespace. I want to write some log entries from both a parent namespace and a child UTS namespace. This is for demonstration purposes, so it doesn't matter if
rsyslog
writes to the same log file or separate log files for each namespace. The only thing that matters is that the child namespace writes logs with a different hostname from parent namespace. But I can't seem to preserve the two different hostnames.
Here is my latest attempt:
I have two SSH terminal windows opened both with the same user.
**Go to Terminal 1**
me@localhost: sudo unshare --uts /bin/bash
root@localhost: hostname api1
root@localhost: hostname
api1
root@localhost: rsyslogd -n -i /var/run/rsyslogd-child.pid -f /etc/rsyslog.conf &
root@localhost: logger foochild1
`
**Go to Terminal 2**
me@localhost: logger fooparent1
Now in my /var/log/syslog
, both entries were recorded with the hostname from the child namespace:
Mar 8 22:47:12 api1 root: foochild1
Mar 8 22:47:33 api1 me: fooparent1
Can someone suggest to me a way to write log files from different namespaces while also preserving the hostname of the respective namespaces?
learningtech
(631 rep)
Mar 8, 2025, 11:00 PM
7
votes
3
answers
3892
views
firejail : only let a program access localhost
I have this local network service and this client program needing to access it. I am running them both as an unprivileged user. I am looking for a way to sandbox the client using firejail, in a way that it cannot access network, except for localhost (or even better, except for that service). first t...
I have this local network service and this client program needing to access it. I am running them both as an unprivileged user.
I am looking for a way to sandbox the client using firejail, in a way that it cannot access network, except for localhost (or even better, except for that service).
first thing I tried was of course
firejail --net=lo program
But it didn’t work.
Error: cannot attach to lo device
I think I could work around it by creating a virtual network interface, for example veth0 and veth1,
moving veth1 to a new network namespace in which I’d run the service
and using firejail to restrain the client to veth0
Is there a way to actually automate this setting in a firejail profile, so that all of these interfaces are created and veth1 is moved when I type
firejail server
(without having to run anything as root)?
Or is there a simpler way solve this problem? (I cannot run both the client and the service in the same namespace, because the service needs to access the network)
tbrugere
(1084 rep)
Oct 27, 2018, 03:56 PM
• Last activity: Feb 10, 2025, 05:50 PM
0
votes
1
answers
109
views
How do I change the default namespace used by kubectl?
When using `kubectl`, for various operations a namespace is required. Typically it uses `default` as the default namespace, and a different namespace can be set using `-n`. But in my work, all resources relevant to me are in a team-namespace, so I never use the `default` namespace for anything and a...
When using
kubectl
, for various operations a namespace is required. Typically it uses default
as the default namespace, and a different namespace can be set using -n
. But in my work, all resources relevant to me are in a team-namespace, so I never use the default
namespace for anything and always have to use -n
.
How can I set ` as the default so that I don't need to use the
-n` option?
I'm not interested in installing anything extra.
---
Note: this is part of a series of posts meant to import relevant answers from other Stack Exchange sites here to make them easier to find, and also make it easier to handle duplicates as cross-site duplicates aren't a thing.
muru
(77471 rep)
Dec 22, 2024, 02:03 PM
• Last activity: Jan 31, 2025, 03:33 AM
1
votes
1
answers
99
views
`nsenter` `--root`: symlink vs. regular dir path
I am noticing a weird behavior for `nsenter` which I am looking some explanation for. When I enter the namespaces of another process created with `unshare` I observe the differences in resulting behavior between cases where I specify root directory as a regular path vs. using `/proc/PID/root` symlin...
I am noticing a weird behavior for
nsenter
which I am looking some explanation for.
When I enter the namespaces of another process created with unshare
I observe the differences in resulting behavior between cases where I specify root directory as a regular path vs. using /proc/PID/root
symlink.
Here is the sample setup.
1. Prepare target process
-sh
sudo unshare --mount --mount-proc --pid --fork --root /tmp/jail bash
/tmp/jail
has a linux distributive inside (I prepare it via docker export
using ubuntu
image):
-shellsession
$ docker run ubuntu
(grab the container ID)
$ docker export 8c67e1fb5443 > ubuntu.tar
$ mkdir /tmp/jail && cd /tmp/jail && tar -xf ~/ubuntu.tar
2. From another terminal try entering the namespaces of that process
-sh
sudo nsenter --target=28716 --root=/tmp/jail --all bash
3. Try ps
command
Here I observe the error:
-shellsession
root@ubuntu:/proc# ps
Error, do this: mount -t proc proc /proc
root@ubuntu:/proc# mount
mount: failed to read mtab: No such file or directory
If I however specify --root=/proc/28716/root
for nsenter
it suddenly starts working.
-shellsession
$ sudo readlink /proc/28716/root
/tmp/jail
$ sudo nsenter --target=28716 --root=/proc/28716/root --all bash
root@ubuntu:.# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 4624 3712 ? S+ 21:04 0:00 bash
root 31 0.0 0.0 4624 3712 ? S 21:10 0:00 bash
root 39 0.0 0.0 7060 2944 ? R+ 21:10 0:00 ps aux
root@ubuntu:.# mount
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
What are the reasons for this behavior? Why symlink vs. regular path makes a difference given they point to the same directory?
-shellsession
$ sudo nsenter --target=28716 --root=/proc/28716/root --all bash # WORKS GOOD
$ sudo nsenter --target=28716 --root=/tmp/jail --all bash # DOES NOT WORK GOOD
(where /proc/28716/root is symlink to /tmp/jail)
Neither strace
nor source code of nsenter
seem to suggest the explanation for these differences.
6.8.0-49-generic
Ubuntu 24.04.1 LTS
Eugene D. Gubenkov
(113 rep)
Dec 7, 2024, 09:18 PM
• Last activity: Dec 9, 2024, 09:11 PM
Showing page 1 of 20 total questions