Unix & Linux Stack Exchange
Q&A for users of Linux, FreeBSD and other Unix-like operating systems
Latest Questions
7
votes
1
answers
2132
views
Mounting Overlayfs in a user namespace
Is it possible to mount an Overlayfs filesystem as an unprivileged user in a user namespace in Linux kernels >4.3.3; it seems that the fix to [this vulnerability][1] has blocked this functionality entirely. When I create a new user namespace with [clone()][2], passing the `CLONE_NEWNS` flag and atte...
Is it possible to mount an Overlayfs filesystem as an unprivileged user in a user namespace in Linux kernels >4.3.3; it seems that the fix to this vulnerability has blocked this functionality entirely.
When I create a new user namespace with clone() , passing the
CLONE_NEWNS
flag and attempt to invoke mount with an overlayfs filesystem, I'm given permission denied. I can mount any other filesystem though.
Is there a way to work around this/am I missing something?
Josh Hebert
(171 rep)
Jun 6, 2016, 05:49 PM
• Last activity: Jul 28, 2025, 03:08 PM
12
votes
6
answers
13592
views
How can I list all connections to my host, including those to LXC guests?
I tried both `netstat` and `lsof`, but it appears it's not possible to see the connections to my LXC guests. Is there a way to achieve this ... for **all** guests at once? ---------- Essentially what throws me off here is the fact that I can see the processes of the guests as long as I run as superu...
I tried both
netstat
and lsof
, but it appears it's not possible to see the connections to my LXC guests.
Is there a way to achieve this ... for **all** guests at once?
----------
Essentially what throws me off here is the fact that I can see the processes of the guests as long as I run as superuser. I can also see the veth
interfaces that get dynamically created per guest. Why can I not see connections on processes that are otherwise visible?
0xC0000022L
(16938 rep)
May 16, 2015, 12:09 AM
• Last activity: Oct 15, 2024, 01:54 PM
12
votes
1
answers
30958
views
How to enable user_namespaces in the kernel? (For unprivileged `unshare`.)
My Linux kernel must have been configured with [user_namespaces](http://man7.org/linux/man-pages/man7/user_namespaces.7.html) when built, but their use is restricted after boot and has to be explicitly enabled. Which sysctl should I use? (If this was turned on, this would allow to run an isolation c...
My Linux kernel must have been configured with [user_namespaces](http://man7.org/linux/man-pages/man7/user_namespaces.7.html) when built, but their use is restricted after boot and has to be explicitly enabled. Which sysctl should I use?
(If this was turned on, this would allow to run an isolation command like
unshare --user --map-root-user --mount-proc --pid --fork
, and then perform [chroot
without being root](https://unix.stackexchange.com/q/72696/4319)--a much anticipated feature of Linux.)
imz -- Ivan Zakharyaschev
(15862 rep)
Aug 13, 2016, 04:37 PM
• Last activity: Mar 27, 2023, 11:03 AM
4
votes
1
answers
1447
views
How do I enable unprivileged_userns_clone selectively for one executable or user?
How do I enable `CLONE_NEWUSER` in a more fine-grained fashion compared to just `kernel.unprivileged_userns_clone`? I want to keep kernel API attack surface manageable by keeping new and complicated things like non-root `CAP_SYS_ADMIN` or BPF disabled, but also selectively allow it for some specific...
How do I enable
CLONE_NEWUSER
in a more fine-grained fashion compared to just kernel.unprivileged_userns_clone
?
I want to keep kernel API attack surface manageable by keeping new and complicated things like non-root CAP_SYS_ADMIN
or BPF disabled, but also selectively allow it for some specific programs.
For example, chrome-sandbox
wants either CLOSE_NEWUSER
or suid-root for proper operation, but I don't want all the programs to be able to use such complicated tricks, only a handful of approved ones.
Vi.
(5985 rep)
Nov 27, 2022, 01:40 AM
• Last activity: Nov 27, 2022, 09:56 PM
1
votes
2
answers
1386
views
User namespaces: how to mount a folder only for a given program
I'd like to fake a FHS system on a non-FHS system (NixOs) without root access. To that end, I need to mount some folders at the root (like mounting `/tmp/mylib` to `/lib`) using usernamespaces (I don't see any other solution). Unfortunately, I can't find how to make it work: I tried to follow [this...
I'd like to fake a FHS system on a non-FHS system (NixOs) without root access. To that end, I need to mount some folders at the root (like mounting
/tmp/mylib
to /lib
) using usernamespaces (I don't see any other solution).
Unfortunately, I can't find how to make it work: I tried to follow this tutorial , but when I copy the code it fail (I can't even start a bash):
$ gcc userns_child_exec.c -lcap -o userns_child_exec
$ id
uid=1000(myname) gid=100(users) groups=100(users),1(wheel),17(audio),20(lp),57(networkmanager),59(scanner),131(docker),998(vboxusers),999(adbusers)
$ ./userns_child_exec -U -M '0 1000 1' -G '0 100 1' bash
write /proc/535313/gid_map: Operation not permitted
bash: initialize_job_control: no job control in background: Bad file descriptor
[nix-shell:~/Documents/Logiciels/Nix_bidouille/2022_04_26_-_nix_fake_FHS_user_namespace/demo]$
[root@bestos:~/Documents/Logiciels/Nix_bidouille/2022_04_26_-_nix_fake_FHS_user_namespace/demo]#
exit
(note that the prompt for the bash is displayed, but then I can't type anything, it quits directly)
Any idea how to make it work?
Code:
/* userns_child_exec.c
Copyright 2013, Michael Kerrisk
Licensed under GNU General Public License v2 or later
Create a child process that executes a shell command in new
namespace(s); allow UID and GID mappings to be specified when
creating a user namespace.
*/
#define _GNU_SOURCE
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
/* A simple error-handling function: print an error message based
on the value in 'errno' and terminate the calling process */
#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \
} while (0)
struct child_args {
char **argv; /* Command to be executed by child, with arguments */
int pipe_fd; /* Pipe used to synchronize parent and child */
};
static int verbose;
static void
usage(char *pname)
{
fprintf(stderr, "Usage: %s [options] cmd [arg...]\n\n", pname);
fprintf(stderr, "Create a child process that executes a shell command "
"in a new user namespace,\n"
"and possibly also other new namespace(s).\n\n");
fprintf(stderr, "Options can be:\n\n");
#define fpe(str) fprintf(stderr, " %s", str);
fpe("-i New IPC namespace\n");
fpe("-m New mount namespace\n");
fpe("-n New network namespace\n");
fpe("-p New PID namespace\n");
fpe("-u New UTS namespace\n");
fpe("-U New user namespace\n");
fpe("-M uid_map Specify UID map for user namespace\n");
fpe("-G gid_map Specify GID map for user namespace\n");
fpe(" If -M or -G is specified, -U is required\n");
fpe("-v Display verbose messages\n");
fpe("\n");
fpe("Map strings for -M and -G consist of records of the form:\n");
fpe("\n");
fpe(" ID-inside-ns ID-outside-ns len\n");
fpe("\n");
fpe("A map string can contain multiple records, separated by commas;\n");
fpe("the commas are replaced by newlines before writing to map files.\n");
exit(EXIT_FAILURE);
}
/* Update the mapping file 'map_file', with the value provided in
'mapping', a string that defines a UID or GID mapping. A UID or
GID mapping consists of one or more newline-delimited records
of the form:
ID_inside-ns ID-outside-ns length
Requiring the user to supply a string that contains newlines is
of course inconvenient for command-line use. Thus, we permit the
use of commas to delimit records in this string, and replace them
with newlines before writing the string to the file. */
static void
update_map(char *mapping, char *map_file)
{
int fd, j;
size_t map_len; /* Length of 'mapping' */
/* Replace commas in mapping string with newlines */
map_len = strlen(mapping);
for (j = 0; j pipe_fd[1] ); /* Close our descriptor for the write end
of the pipe so that we see EOF when
parent closes its descriptor */
if (read(args->pipe_fd, &ch, 1) != 0) {
fprintf(stderr, "Failure in child: read from pipe returned != 0\n");
exit(EXIT_FAILURE);
}
/* Execute a shell command */
execvp(args->argv, args->argv);
errExit("execvp");
}
#define STACK_SIZE (1024 * 1024)
static char child_stack[STACK_SIZE]; /* Space for child's stack */
int
main(int argc, char *argv[])
{
int flags, opt;
pid_t child_pid;
struct child_args args;
char *uid_map, *gid_map;
char map_path[PATH_MAX];
/* Parse command-line options. The initial '+' character in
the final getopt() argument prevents GNU-style permutation
of command-line options. That's useful, since sometimes
the 'command' to be executed by this program itself
has command-line options. We don't want getopt() to treat
those as options to this program. */
flags = 0;
verbose = 0;
gid_map = NULL;
uid_map = NULL;
while ((opt = getopt(argc, argv, "+imnpuUM:G:v")) != -1) {
switch (opt) {
case 'i': flags |= CLONE_NEWIPC; break;
case 'm': flags |= CLONE_NEWNS; break;
case 'n': flags |= CLONE_NEWNET; break;
case 'p': flags |= CLONE_NEWPID; break;
case 'u': flags |= CLONE_NEWUTS; break;
case 'v': verbose = 1; break;
case 'M': uid_map = optarg; break;
case 'G': gid_map = optarg; break;
case 'U': flags |= CLONE_NEWUSER; break;
default: usage(argv);
}
}
/* -M or -G without -U is nonsensical */
if ((uid_map != NULL || gid_map != NULL) &&
!(flags & CLONE_NEWUSER))
usage(argv);
args.argv = &argv[optind];
/* We use a pipe to synchronize the parent and child, in order to
ensure that the parent sets the UID and GID maps before the child
calls execve(). This ensures that the child maintains its
capabilities during the execve() in the common case where we
want to map the child's effective user ID to 0 in the new user
namespace. Without this synchronization, the child would lose
its capabilities if it performed an execve() with nonzero
user IDs (see the capabilities(7) man page for details of the
transformation of a process's capabilities during execve()). */
if (pipe(args.pipe_fd) == -1)
errExit("pipe");
/* Create the child in new namespace(s) */
child_pid = clone(childFunc, child_stack + STACK_SIZE,
flags | SIGCHLD, &args);
if (child_pid == -1)
errExit("clone");
/* Parent falls through to here */
if (verbose)
printf("%s: PID of child created by clone() is %ld\n",
argv, (long) child_pid);
/* Update the UID and GID maps in the child */
if (uid_map != NULL) {
snprintf(map_path, PATH_MAX, "/proc/%ld/uid_map",
(long) child_pid);
update_map(uid_map, map_path);
}
if (gid_map != NULL) {
snprintf(map_path, PATH_MAX, "/proc/%ld/gid_map",
(long) child_pid);
update_map(gid_map, map_path);
}
/* Close the write end of the pipe, to signal to the child that we
have updated the UID and GID maps */
close(args.pipe_fd[1] );
if (waitpid(child_pid, NULL, 0) == -1) /* Wait for child */
errExit("waitpid");
if (verbose)
printf("%s: terminating\n", argv);
exit(EXIT_SUCCESS);
}
**EDIT**
Actually, it's quite weird: the error appears when writing the group, but it did work for the uid:
[leo@bestos:~]$ cat /proc/582197/gid_map
[leo@bestos:~]$ cat /proc/582197/uid_map
0 1000 1
[leo@bestos:~]$ ll /proc/582197/gid_map
-rw-r--r-- 1 leo users 0 mai 18 09:09 /proc/582197/gid_map
[leo@bestos:~]$ ll /proc/582197/uid_map
-rw-r--r-- 1 leo users 0 mai 18 09:09 /proc/582197/uid_map
tobiasBora
(4621 rep)
May 18, 2022, 06:19 AM
• Last activity: May 20, 2022, 07:30 PM
21
votes
2
answers
15588
views
What is an unprivileged LXC container?
What does it mean if a Linux container (LXC container) is called "unprivileged"?
What does it mean if a Linux container (LXC container) is called "unprivileged"?
0xC0000022L
(16938 rep)
Jan 2, 2015, 10:32 AM
• Last activity: Jan 8, 2021, 01:32 PM
7
votes
3
answers
2959
views
Migrate an unprivileged LXC container between users
I have an Ubuntu 14.04 server installation which acts as an LXC host. It has two users: user1 and user2. user1 owns an unprivileged LXC container, which uses a directory (inside /home/user1/.local/...) as backing store. How do I make a full copy of the container for user2? I can't just copy the file...
I have an Ubuntu 14.04 server installation which acts as an LXC host.
It has two users: user1 and user2.
user1 owns an unprivileged LXC container, which uses a directory (inside /home/user1/.local/...) as backing store.
How do I make a full copy of the container for user2?
I can't just copy the files because they are mapped with owners ranging from 100000 to 100000+something, which are bound to user1.
Also, which I believe is basically the same question, how can I safely make a backup of my user1's LXC container to restore it later on another machine and/or user?
agdev84
(91 rep)
Sep 19, 2014, 07:41 PM
• Last activity: May 10, 2020, 09:15 AM
7
votes
2
answers
7281
views
How to influence the assignment of subordinate UIDs/GIDs when creating user accounts?
To my knowledge the subordinate UIDs and GIDs are assigned to accounts in such a manner that they form a contiguous range. The range starts at 100000 by default and probably stretches to the theoretical maximum value for a UID/GID (even though I haven't found a way to query this from the shell, `/et...
To my knowledge the subordinate UIDs and GIDs are assigned to accounts in such a manner that they form a contiguous range.
The range starts at 100000 by default and probably stretches to the theoretical maximum value for a UID/GID (even though I haven't found a way to query this from the shell,
/etc/login.defs
only lists the values allowed for the tools).
Now, it'd be a lot more convenient for me as a human if the ranges would start at a multiple of 100000, i.e. n*100000
with n
being a positive integer (n>0
), instead of 100000+n*65536
. This way I'd be able to see immediately which file is owned by which host user.
Is there a way to influence the assignment of subordinate UIDs/GIDs in some way in modern enough shadow-utils
to achieve a more human-readable assignment?
If not, is it alright to simply overwrite the files /etc/subuid
and /etc/subgid
with conforming data to get what I want?
0xC0000022L
(16938 rep)
Dec 30, 2014, 01:25 PM
• Last activity: Aug 22, 2019, 10:09 AM
14
votes
1
answers
7810
views
Is there a tool(!) to list assigned subuid and subgid values for users?
`usermod -v` (`--add-sub-uids`) and `usermod -w` (`--add-sub-gids`) can be used to manipulate the subuid and subgid ranges for a user account, but there appears to be no tool that can merely list them. Is there one? At least on my Ubuntu 14.04 box `getent` doesn't seem to be prepared to handle that...
usermod -v
(--add-sub-uids
) and usermod -w
(--add-sub-gids
) can be used to manipulate the subuid and subgid ranges for a user account, but there appears to be no tool that can merely list them. Is there one?
At least on my Ubuntu 14.04 box getent
doesn't seem to be prepared to handle that information from /etc/subuid
and /etc/subgid
.
Currently I am using a little shell script, using awk
for the purpose.
----------
Here's an excerpt from usermod(8)
:
-v, --add-sub-uids FIRST-LAST
Add a range of subordinate uids to the users account.
[...]
-V, --del-sub-uids FIRST-LAST
Remove a range of subordinate uids from the users account.
[...]
-w, --add-sub-gids FIRST-LAST
Add a range of subordinate gids to the users account.
[...]
-W, --del-sub-gids FIRST-LAST
Remove a range of subordinate gids from the users account.
[...]
0xC0000022L
(16938 rep)
May 11, 2014, 02:30 AM
• Last activity: Apr 30, 2019, 12:41 AM
1
votes
0
answers
526
views
Why would creating a user namespace with size 1 work but size >1 fail
I am experimenting with unprivileged linux containers and I am writing a Go program that creates a minimalist container. The program forks itself and creates namespaces in the process. However for some reason if I set the user namespace size to greater than 1, it fails when running as a regular user...
I am experimenting with unprivileged linux containers and I am writing a Go program that creates a minimalist container. The program forks itself and creates namespaces in the process. However for some reason if I set the user namespace size to greater than 1, it fails when running as a regular user.
cmd := exec.Command("/proc/self/exe", "run-container")
cmd.SysProcAttr = &syscall.SysProcAttr{
Cloneflags: syscall.CLONE_NEWUSER | syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID | syscall.CLONE_NEWNS,
Unshareflags: syscall.CLONE_NEWNS,
UidMappings: []syscall.SysProcIDMap{
{
ContainerID: 0,
HostID: os.Getuid(),
Size: 1, // set this to 2 or more and it fails
},
},
GidMappings: []syscall.SysProcIDMap{
{
ContainerID: 0,
HostID: os.Getgid(),
Size: 1,
},
},
}
// other flags: CLONE_NEWNET, CLONE_NEWIPC, CLONE_NEWCGROUP, CLONE_NEWUSER,
cmd.Stdin = os.Stdin
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
err := cmd.Run()
if err != nil {
fmt.Println("ERROR: parent cmd.Run", err)
os.Exit(1)
}
The code above (along with all the other stuff like pivot_root etc.. ) works fine. But the moment I set Size to 2, it bombs:
ERROR: parent cmd.Run fork/exec /proc/self/exe: operation not permitted
This seems to be a capabilities issue because when I run as root it works.
Here is my /etc/subuid
:
lxd:1000:1
root:1000:1
lxd:100000:65536
root:100000:65536
developer:165536:65536
mounter:231072:65536
Update:
---------
I figured out that you need CAP_SETUID to map more than just the current euid
to another (see user namespaces man page ).
But even after sudo setcap cap_setuid=eip /my/binary
it fails. The error message has changed to:
ERROR: parent cmd.Run fork/exec /proc/self/exe: permission denied
If I run strace
it fails with EPERM
when trying to write to /proc/xx/uid_map
.
openat(AT_FDCWD, "/proc/25233/uid_map", O_RDWR) = 5
write(5, "0 1000 100\n\0", 12) = -1 EPERM (Operation not permitted)
teleclimber
(111 rep)
Feb 16, 2019, 02:20 AM
• Last activity: Feb 16, 2019, 10:31 PM
2
votes
2
answers
962
views
Who would win, RLIMIT_NPROC or user namespaces?
Depending on configuration, unprivileged (non-root) processes can create a user namespace. `RLIMIT_NPROC` limits the number of processes *per user*. If I enter a user namespace, can I create processes with different UIDs, and hence exceed my real `RLIMIT_NPROC`?
Depending on configuration, unprivileged (non-root) processes can create a user namespace.
RLIMIT_NPROC
limits the number of processes *per user*.
If I enter a user namespace, can I create processes with different UIDs, and hence exceed my real RLIMIT_NPROC
?
sourcejedi
(53222 rep)
Feb 12, 2019, 07:01 PM
• Last activity: Feb 13, 2019, 11:04 PM
3
votes
1
answers
1122
views
Ping not working in a new C container
I've been working on writing my own Linux container from scratch in C. I've borrowed code from several places and put up a basic version with **namespaces** & **cgroups**. Basically, I **clone** a new process with all the **CLONE_NEW*** flags to create new namespaces for the **clone'ed** process. I...
I've been working on writing my own Linux container from scratch in C. I've borrowed code from several places and put up a basic version with **namespaces** & **cgroups**.
Basically, I **clone** a new process with all the **CLONE_NEW*** flags to create new namespaces for the **clone'ed** process.
I also set up UID mapping by inserting **0 0 1000** into the **uid_map** and **gid_map** files. I want to ensure that the *root* inside the container is mapped to the *root* outside.
For the filesystem, I am using a base image of **stretch** created with **debootstrap**.
Now, I am trying to set up the network connectivity from inside the container. I used this script to setup the interface inside the container. This script creates a new network-namespace of its own. I edited it slightly to mount the net-namespace of the created process onto the newly created net-namespace via the script.
mount --bind /proc/$PID/ns/net /var/run/netns/demo
I can just get into the new network namespace as follows:
ip netns exec ${NS} /bin/bash --rcfile \"")
and successfully ping outside.
But from the bash shell when I get inside the clone'ed process by default I am unable to PING. I get the error:
ping: socket: Operation not permitted
I've tried setting up capabilities: **cap_net_raw** and **cap_net_admin**
I would like some guidance.
Shabirmean
(135 rep)
Jan 21, 2019, 03:23 PM
• Last activity: Jan 21, 2019, 07:38 PM
14
votes
1
answers
1989
views
Why can't I bind-mount "/" inside a user namespace?
Why doesn't this work? $ unshare -rm mount --bind / /mnt mount: /mnt: wrong fs type, bad option, bad superblock on /, missing codepage or helper program, or other error. These work ok: $ unshare -rm mount --bind /tmp /mnt $ unshare -rm mount --bind /root /mnt $ --- $ uname -r # Linux kernel version...
Why doesn't this work?
$ unshare -rm mount --bind / /mnt
mount: /mnt: wrong fs type, bad option, bad superblock on /, missing codepage or helper program, or other error.
These work ok:
$ unshare -rm mount --bind /tmp /mnt
$ unshare -rm mount --bind /root /mnt
$
---
$ uname -r # Linux kernel version
4.17.3-200.fc28.x86_64
sourcejedi
(53222 rep)
Jul 18, 2018, 10:22 PM
• Last activity: Jul 18, 2018, 10:31 PM
5
votes
4
answers
6949
views
Building unprivileged (userns) LXC container from scratch, by migrating a privileged container to be unprivileged
How can I build a privileged LXC (1.0.3) container (that part I know) and then migrate it successfully to be run unprivileged? That is, I'd like to `debootstrap` it myself or adjust the `lxc-ubuntu` template (commonly under `/usr/share/lxc/templates`) in order for this to work. Here's why I am askin...
How can I build a privileged LXC (1.0.3) container (that part I know) and then migrate it successfully to be run unprivileged? That is, I'd like to
debootstrap
it myself or adjust the lxc-ubuntu
template (commonly under /usr/share/lxc/templates
) in order for this to work.
Here's why I am asking this question. If you look at the lxc-ubuntu
template, you'll notice:
# Detect use under userns (unsupported)
for arg in "$@"; do
[ "$arg" = "--" ] && break
if [ "$arg" = "--mapped-uid" -o "$arg" = "--mapped-gid" ]; then
echo "This template can't be used for unprivileged containers." 1>&2
echo "You may want to try the \"download\" template instead." 1>&2
exit 1
fi
done
Following the use of LXC_MAPPED_GID
and LXC_MAPPED_UID
in the referenced lxc-download
template, though, there seems to be nothing particularly special. In fact all it does is to adjust the file ownership (chgrp
+ chown
). But it's possible that the extended attributes in the download
template are fine-tuned already to accomplish whatever "magic" is needed.
In the comments to this blog post by Stéphane Graber Stéphane tells a commenter that
> There’s no easy way to do that unfortunately, you’d need to update
> your container config to match that from an unprivileged container,
> move the container’s directory over to the unprivileged user you want
> it to run as, then use Serge’s uidshift program to change the
> ownership of all files.
... and to:
* have a look at https://jenkins.linuxcontainers.org/ for the packages built for the download
template
* check out uidmapshift
from here
* This program appears to roughly do lxc-usernsexec -m b:0:1000:1 -m b:1:190000:1 -- /bin/chown 1:1 $file
as explained in lxc-usernsexec(1)
But there are no further pointers.
**So my question is: how can I take an ordinary (privileged) LXC container that I have built myself (having root
and all) and migrate it to become an unprivileged container?** Even if you can't provide a script or so, it would be great to know which points to consider and how they affect the ability to run the unprivileged LXC container. I can come up with a script on my own and pledge to post it as an answer to this question if a solution can be found :)
*Note:* Although I am using Ubuntu 14.04, this is a *generic* question.
0xC0000022L
(16938 rep)
May 2, 2014, 12:46 PM
• Last activity: Jan 31, 2018, 05:56 PM
3
votes
1
answers
1356
views
What makes firefox inside a container launch a new firefox window outside on the host with the UID of the host user? Isn't it weird for an LXC?
Can someone please explain this weird behaviour to me: I have an unpriviliged LXC container with firefox inside. **If firefox is running on the host outside of the container**, `/usr/bin/firefox` inside the container launches a new firefox window **outside** on the host with the UID of the host user...
Can someone please explain this weird behaviour to me:
I have an unpriviliged LXC container with firefox inside.
**If firefox is running on the host outside of the container**,
/usr/bin/firefox
inside the container launches a new firefox window **outside** on the host with the UID of the host user.
**If firefox is NOT running outside of the container**, /usr/bin/firefox
inside the container launches firefox with the (SUB)UID of the container user like it should be.
The reverse is also true:
If firefox is running inside the container (but not on the host), and firefox is started on the host, the firefox which is started has the UID of the container user.
?!?! How is that ?!?!
EDIT: Confirmed that the same issue emerges when using a default unprivileged Ubuntu container with default configuration file.
EDIT: asked the same question in the arch forums https://bbs.archlinux.org/viewtopic.php?pid=1622174#p1622174
config file:
lxc.devttydir = lxc
lxc.pts = 1024
lxc.tty = 4
lxc.cap.drop = mac_admin mac_override sys_time sys_module
lxc.pivotdir = lxc_putold
lxc.hook.clone = /usr/share/lxc/hooks/clonehostname
lxc.cgroup.devices.deny = a
lxc.cgroup.devices.allow = c *:* m
lxc.cgroup.devices.allow = b *:* m
lxc.cgroup.devices.allow = c 1:3 rwm
lxc.cgroup.devices.allow = c 1:5 rwm
lxc.cgroup.devices.allow = c 1:7 rwm
lxc.cgroup.devices.allow = c 5:0 rwm
lxc.cgroup.devices.allow = c 5:1 rwm
lxc.cgroup.devices.allow = c 5:2 rwm
lxc.cgroup.devices.allow = c 1:8 rwm
lxc.cgroup.devices.allow = c 1:9 rwm
lxc.cgroup.devices.allow = c 136:* rwm
lxc.cgroup.devices.allow = c 10:229 rwm
lxc.mount.auto = cgroup:mixed proc:mixed sys:mixed
lxc.mount.entry = /sys/fs/fuse/connections sys/fs/fuse/connections none bind,optional 0 0
lxc.seccomp = /usr/share/lxc/config/common.seccomp
lxc.hook.mount = /usr/share/lxcfs/lxc.mount.hook
lxc.hook.post-stop = /usr/share/lxcfs/lxc.reboot.hook
lxc.mount.entry = /sys/kernel/debug sys/kernel/debug none bind,optional 0 0
lxc.mount.entry = /sys/kernel/security sys/kernel/security none bind,optional 0 0
lxc.mount.entry = /sys/fs/pstore sys/fs/pstore none bind,optional 0 0
lxc.mount.entry = mqueue dev/mqueue mqueue rw,relatime,create=dir,optional 0 0
lxc.cgroup.devices.allow = c 254:0 rm
lxc.cgroup.devices.allow = c 10:200 rwm
lxc.cgroup.devices.allow = c 10:228 rwm
lxc.cgroup.devices.allow = c 10:232 rwm
lxc.cgroup.devices.deny =
lxc.cgroup.devices.allow =
lxc.devttydir =
lxc.mount.entry = /dev/console dev/console none bind,create=file 0 0
lxc.mount.entry = /dev/full dev/full none bind,create=file 0 0
lxc.mount.entry = /dev/null dev/null none bind,create=file 0 0
lxc.mount.entry = /dev/random dev/random none bind,create=file 0 0
lxc.mount.entry = /dev/tty dev/tty none bind,create=file 0 0
lxc.mount.entry = /dev/urandom dev/urandom none bind,create=file 0 0
lxc.mount.entry = /dev/zero dev/zero none bind,create=file 0 0
lxc.mount.entry = /sys/firmware/efi/efivars sys/firmware/efi/efivars none bind,optional 0 0
lxc.mount.entry = /proc/sys/fs/binfmt_misc proc/sys/fs/binfmt_misc none bind,optional 0 0
lxc.arch = x86_64
lxc.cgroup.devices.allow = c 226:* rwm
lxc.mount.entry = tmpfs tmp tmpfs defaults
lxc.mount.entry = /dev/dri dev/dri none bind,optional,create=dir
lxc.mount.entry = /tmp/.X11-unix tmp/.X11-unix none ro,bind,create=dir 0 0
The container is started like this:
lxc-start -n c1 -F -f /path/to/above/conf -s 'lxc.id_map = u 0 100000 65536' -s 'lxc.id_map = g 0 100000 65536' -s 'lxc.rootfs = /path/to/rootfs' -s 'lxc.init_cmd = /usr/bin/bash'
EDIT: Distribution Arch Linux
$ uname -r
4.6.0-rc4-customGIT+
# lxc-checkconfig
--- Namespaces ---
Namespaces: enabled
Utsname namespace: enabled
Ipc namespace: enabled
Pid namespace: enabled
User namespace: enabled
Network namespace: enabled
Multiple /dev/pts instances: enabled
--- Control groups ---
Cgroup: enabled
Cgroup clone_children flag: enabled
Cgroup device: enabled
Cgroup sched: enabled
Cgroup cpu account: enabled
Cgroup memory controller: enabled
Cgroup cpuset: enabled
--- Misc ---
Veth pair device: enabled
Macvlan: enabled
Vlan: enabled
Bridges: enabled
Advanced netfilter: enabled
CONFIG_NF_NAT_IPV4: enabled
CONFIG_NF_NAT_IPV6: enabled
CONFIG_IP_NF_TARGET_MASQUERADE: enabled
CONFIG_IP6_NF_TARGET_MASQUERADE: enabled
CONFIG_NETFILTER_XT_TARGET_CHECKSUM: enabled
FUSE (for use with lxcfs): enabled
--- Checkpoint/Restore ---
checkpoint restore: enabled
CONFIG_FHANDLE: enabled
CONFIG_EVENTFD: enabled
CONFIG_EPOLL: enabled
CONFIG_UNIX_DIAG: enabled
CONFIG_INET_DIAG: enabled
CONFIG_PACKET_DIAG: enabled
CONFIG_NETLINK_DIAG: enabled
File capabilities: enabled
MCH
(509 rep)
Apr 22, 2016, 01:35 AM
• Last activity: Aug 16, 2016, 09:28 AM
2
votes
0
answers
653
views
I have trouble analysing the cause of this (firefox) segfault
When executed within a minimal (unprivileged) LXC container, **firefox segfaults** (other graphical applications work fine). **I'am unable to find the exact cause of this segfault** (which is most likely due to insufficient permissions or missing resources). # strace /usr/bin/firefox ... open("/usr/...
When executed within a minimal (unprivileged) LXC container, **firefox segfaults** (other graphical applications work fine).
**I'am unable to find the exact cause of this segfault** (which is most likely due to insufficient permissions or missing resources).
# strace /usr/bin/firefox
...
open("/usr/lib/libfreebl3.so", O_RDONLY|O_CLOEXEC) = 26
read(26,"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0007\0\0\0\0\0\0"..., 832) = 832
fstat(26, {st_mode=S_IFREG|0755, st_size=544424, ...}) = 0
mmap(NULL, 2619144, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 26, 0) = 0x7f269bf1e000
mprotect(0x7f269bf97000, 2097152, PROT_NONE) = 0
mmap(0x7f269c197000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 26, 0x79000) = 0x7f269c197000
mmap(0x7f269c19a000, 14088, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f269c19a000
close(26) = 0
mprotect(0x7f269c197000, 8192, PROT_READ) = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x20} ---
unlink("/home/root/.mozilla/firefox/xqa348dr.default/lock") = 0
close(6) = 0
rt_sigaction(SIGSEGV, {SIG_DFL, [], SA_RESTORER, 0x7f26bafb5e80}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [SEGV], NULL, 8) = 0
tgkill(228, 228, SIGSEGV) = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SI_TKILL, si_pid=228, si_uid=0} ---
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)
**Background:** Firefox is executed in a minimal unprivileged LXC container (no init, not a whole distribution, just firefox and its dependencies) --- therefore I assume that this issue is related to possibly insufficient permissions or nonexistent resources Firefox needs to access. Inside of this container, trivial graphical programs like 'xclock' and even hardware accelerated programs like 'glxgears' work. It could be that the issue of firefox not working is related to dbus (I do not know if it is setup correctly --- all I did is
cp /etc/machine-id /container/etc/
).
**UPDATE:** I was able to solve the problem. The container was missing a dependency of firefox (at this point I cannot say which one because I took the half-assed way of mounting all package contents into the containers rootfs).
**UPDATE2:** I am still interested in how to find out the exact cause of above segfault.
MCH
(509 rep)
Apr 21, 2016, 11:27 PM
• Last activity: Apr 22, 2016, 12:27 PM
7
votes
1
answers
2975
views
Subordinate GIDs/UIDs with LXC and userns for unprivileged user?
When using userns (via LXC in my case), you assign a range of subordinate GIDs and UIDs to an unprivileged user. See for resources: [`subuid(5)`][1], [`subgid(5)`][2], [`newuidmap(1)`][3], [`newgidmap(1)`][4], [`user_namespaces(7)`][5]. That range can then be used and will via [tag:userns] be mapped...
When using userns (via LXC in my case), you assign a range of subordinate GIDs and UIDs to an unprivileged user. See for resources:
subuid(5)
, subgid(5)
, newuidmap(1)
, newgidmap(1)
, user_namespaces(7)
.
That range can then be used and will via [tag:userns] be mapped to the system account.
Let's assume we have a (host) system account john
with a UID (and GID) of 1000. The assigned range of GIDs and UIDs is 100000..165536.
So an entry exists in /etc/subgid
and /etc/subuid
respectively:
john:100000:65536
Files that inside the unprivileged container are owned by the "inside" john
will now be owned by 101000 on the host and those owned by the "inside" root
will be owned by 100000.
Normally these ranges are not assigned to any name on the host.
### Questions:
1. is it alright to create a user for those respective UIDs/GIDs on the host in order to have a more meaningful output for ls
and friends?
2. is there a way to make those files/folder accessible to the host user who "owns" the userns, i.e. john
in our case? And if so, is the only sensible method to create a group shared between those valid users inside the subordinate range and and the userns "owner" and set the permissions accordingly? Well, or ACLs, obviously.
0xC0000022L
(16938 rep)
Dec 21, 2014, 11:30 PM
• Last activity: Mar 24, 2016, 03:37 PM
2
votes
0
answers
476
views
How to map a UID to another UID (!= 0) inside a user namespace?
How can one map UID 1000 to another regular UID (not root/0) inside a user namespace? EDIT: For anyone interested, lists an option `lxc.init_uid` which seems to be related to this but is not recognized by my version of LXC. If anyone manages to get this to work please let me know.
How can one map UID 1000 to another regular UID (not root/0) inside a user namespace?
EDIT: For anyone interested, lists an option
lxc.init_uid
which seems to be related to this but is not recognized by my version of LXC. If anyone manages to get this to work please let me know.
MCH
(509 rep)
Feb 18, 2016, 05:31 PM
• Last activity: Feb 24, 2016, 04:59 PM
3
votes
1
answers
315
views
Why can't a UID 0 process hardlink to SUID files in a user namespace?
Consider the following transcript of a user-namespaced shell running with root privileges (UID 0 within the namespace, unprivileged outside): # cat /proc/$$/status | grep CapEff CapEff: 0000003cfdfeffff # ls -al total 8 drwxrwxrwx 2 root root 4096 Sep 16 22:09 . drwxr-xr-x 21 root root 4096 Sep 16 2...
Consider the following transcript of a user-namespaced shell running with root privileges (UID 0 within the namespace, unprivileged outside):
# cat /proc/$$/status | grep CapEff
CapEff: 0000003cfdfeffff
# ls -al
total 8
drwxrwxrwx 2 root root 4096 Sep 16 22:09 .
drwxr-xr-x 21 root root 4096 Sep 16 22:08 ..
-rwSr--r-- 1 nobody nobody 0 Sep 16 22:09 file
# ln file link
ln: failed to create hard link 'link' => 'file': Operation not permitted
# su nobody -s /bin/bash -c "ln file link"
# ls -al
total 8
drwxrwxrwx 2 root root 4096 Sep 16 22:11 .
drwxr-xr-x 21 root root 4096 Sep 16 22:08 ..
-rwSr--r-- 2 nobody nobody 0 Sep 16 22:09 file
-rwSr--r-- 2 nobody nobody 0 Sep 16 22:09 link
Apparently the process has the CAP_FOWNER permission (0x8) and thus should be able to hardlink to arbitrary files. However, it failes to link the SUID'd test file owned by
nobody
. There is nothing preventing the process from switching to nobody
and then linking the file, thus the parent namespace does not seem to be the issue.
**Why can't the namespaced UID 0 process hardlink link
to file
without switching its UID?**
dst
(141 rep)
Sep 16, 2015, 08:17 PM
• Last activity: Nov 15, 2015, 01:12 AM
8
votes
1
answers
7909
views
userns container fails to start, how to track down the reason?
When creating a userns (unprivileged) LXC container on Ubuntu 14.04 with the following command line: lxc-create -n test1 -t download -- -d $(lsb_release -si|tr 'A-Z' 'a-z') -r $(lsb_release -sc) -a $(dpkg --print-architecture) and (without touching the created configuration file) then attempting to...
When creating a userns (unprivileged) LXC container on Ubuntu 14.04 with the following command line:
lxc-create -n test1 -t download -- -d $(lsb_release -si|tr 'A-Z' 'a-z') -r $(lsb_release -sc) -a $(dpkg --print-architecture)
and (without touching the created configuration file) then attempting to start it with:
lxc-start -n test1 -l DEBUG
it fails. The log file shows me:
lxc-start 1420149317.700 INFO lxc_start_ui - using rcfile /home/user/.local/share/lxc/test1/config
lxc-start 1420149317.700 INFO lxc_utils - XDG_RUNTIME_DIR isn't set in the environment.
lxc-start 1420149317.701 INFO lxc_confile - read uid map: type u nsid 0 hostid 100000 range 65536
lxc-start 1420149317.701 INFO lxc_confile - read uid map: type g nsid 0 hostid 100000 range 65536
lxc-start 1420149317.701 WARN lxc_log - lxc_log_init called with log already initialized
lxc-start 1420149317.701 INFO lxc_lsm - LSM security driver AppArmor
lxc-start 1420149317.701 INFO lxc_utils - XDG_RUNTIME_DIR isn't set in the environment.
lxc-start 1420149317.702 DEBUG lxc_conf - allocated pty '/dev/pts/2' (5/6)
lxc-start 1420149317.702 DEBUG lxc_conf - allocated pty '/dev/pts/7' (7/8)
lxc-start 1420149317.702 DEBUG lxc_conf - allocated pty '/dev/pts/8' (9/10)
lxc-start 1420149317.702 DEBUG lxc_conf - allocated pty '/dev/pts/10' (11/12)
lxc-start 1420149317.702 INFO lxc_conf - tty's configured
lxc-start 1420149317.702 DEBUG lxc_start - sigchild handler set
lxc-start 1420149317.702 DEBUG lxc_console - opening /dev/tty for console peer
lxc-start 1420149317.702 DEBUG lxc_console - using '/dev/tty' as console
lxc-start 1420149317.702 DEBUG lxc_console - 14946 got SIGWINCH fd 17
lxc-start 1420149317.702 DEBUG lxc_console - set winsz dstfd:14 cols:118 rows:61
lxc-start 1420149317.905 INFO lxc_start - 'test1' is initialized
lxc-start 1420149317.906 DEBUG lxc_start - Not dropping cap_sys_boot or watching utmp
lxc-start 1420149317.906 INFO lxc_start - Cloning a new user namespace
lxc-start 1420149317.906 INFO lxc_cgroup - cgroup driver cgmanager initing for test1
lxc-start 1420149317.907 ERROR lxc_cgmanager - call to cgmanager_create_sync failed: invalid request
lxc-start 1420149317.907 ERROR lxc_cgmanager - Failed to create hugetlb:test1
lxc-start 1420149317.907 ERROR lxc_cgmanager - Error creating cgroup hugetlb:test1
lxc-start 1420149317.907 INFO lxc_cgmanager - cgroup removal attempt: hugetlb:test1 did not exist
lxc-start 1420149317.908 INFO lxc_cgmanager - cgroup removal attempt: perf_event:test1 did not exist
lxc-start 1420149317.908 INFO lxc_cgmanager - cgroup removal attempt: blkio:test1 did not exist
lxc-start 1420149317.908 INFO lxc_cgmanager - cgroup removal attempt: freezer:test1 did not exist
lxc-start 1420149317.909 INFO lxc_cgmanager - cgroup removal attempt: devices:test1 did not exist
lxc-start 1420149317.909 INFO lxc_cgmanager - cgroup removal attempt: memory:test1 did not exist
lxc-start 1420149317.909 INFO lxc_cgmanager - cgroup removal attempt: cpuacct:test1 did not exist
lxc-start 1420149317.909 INFO lxc_cgmanager - cgroup removal attempt: cpu:test1 did not exist
lxc-start 1420149317.910 INFO lxc_cgmanager - cgroup removal attempt: cpuset:test1 did not exist
lxc-start 1420149317.910 INFO lxc_cgmanager - cgroup removal attempt: name=systemd:test1 did not exist
lxc-start 1420149317.910 ERROR lxc_start - failed creating cgroups
lxc-start 1420149317.910 INFO lxc_utils - XDG_RUNTIME_DIR isn't set in the environment.
lxc-start 1420149317.910 ERROR lxc_start - failed to spawn 'test1'
lxc-start 1420149317.910 INFO lxc_utils - XDG_RUNTIME_DIR isn't set in the environment.
lxc-start 1420149317.910 INFO lxc_utils - XDG_RUNTIME_DIR isn't set in the environment.
lxc-start 1420149317.910 ERROR lxc_start_ui - The container failed to start.
lxc-start 1420149317.910 ERROR lxc_start_ui - Additional information can be obtained by setting the --logfile and --logpriority options.
Now I see two errors here, the latter probably being a result of the former, which is:
> lxc_start - failed creating cgroups
However, I see
/sys/fs/cgroup
mounted:
$ mount|grep cgr
none on /sys/fs/cgroup type tmpfs (rw)
and cgmanager
is installed:
$ dpkg -l|awk '$1 ~ /^ii$/ && /cgmanager/ {print $2 " " $3 " " $4}'
cgmanager 0.24-0ubuntu7 amd64
libcgmanager0:amd64 0.24-0ubuntu7 amd64
Note: My host defaults still to upstart
.
In case there's any doubt, the kernel support cgroups
:
$ grep CGROUP /boot/config-$(uname -r)
CONFIG_CGROUPS=y
# CONFIG_CGROUP_DEBUG is not set
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_HUGETLB=y
CONFIG_CGROUP_PERF=y
CONFIG_CGROUP_SCHED=y
CONFIG_BLK_CGROUP=y
# CONFIG_DEBUG_BLK_CGROUP is not set
CONFIG_NET_CLS_CGROUP=m
CONFIG_NETPRIO_CGROUP=m
Note: My host defaults still to upstart
.
0xC0000022L
(16938 rep)
Jan 1, 2015, 10:11 PM
• Last activity: Jun 15, 2015, 08:58 AM
Showing page 1 of 20 total questions