Sample Header Ad - 728x90

Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

7 votes
1 answers
2132 views
Mounting Overlayfs in a user namespace
Is it possible to mount an Overlayfs filesystem as an unprivileged user in a user namespace in Linux kernels >4.3.3; it seems that the fix to [this vulnerability][1] has blocked this functionality entirely. When I create a new user namespace with [clone()][2], passing the `CLONE_NEWNS` flag and atte...
Is it possible to mount an Overlayfs filesystem as an unprivileged user in a user namespace in Linux kernels >4.3.3; it seems that the fix to this vulnerability has blocked this functionality entirely. When I create a new user namespace with clone() , passing the CLONE_NEWNS flag and attempt to invoke mount with an overlayfs filesystem, I'm given permission denied. I can mount any other filesystem though. Is there a way to work around this/am I missing something?
Josh Hebert (171 rep)
Jun 6, 2016, 05:49 PM • Last activity: Jul 28, 2025, 03:08 PM
12 votes
6 answers
13592 views
How can I list all connections to my host, including those to LXC guests?
I tried both `netstat` and `lsof`, but it appears it's not possible to see the connections to my LXC guests. Is there a way to achieve this ... for **all** guests at once? ---------- Essentially what throws me off here is the fact that I can see the processes of the guests as long as I run as superu...
I tried both netstat and lsof, but it appears it's not possible to see the connections to my LXC guests. Is there a way to achieve this ... for **all** guests at once? ---------- Essentially what throws me off here is the fact that I can see the processes of the guests as long as I run as superuser. I can also see the veth interfaces that get dynamically created per guest. Why can I not see connections on processes that are otherwise visible?
0xC0000022L (16938 rep)
May 16, 2015, 12:09 AM • Last activity: Oct 15, 2024, 01:54 PM
12 votes
1 answers
30958 views
How to enable user_namespaces in the kernel? (For unprivileged `unshare`.)
My Linux kernel must have been configured with [user_namespaces](http://man7.org/linux/man-pages/man7/user_namespaces.7.html) when built, but their use is restricted after boot and has to be explicitly enabled. Which sysctl should I use? (If this was turned on, this would allow to run an isolation c...
My Linux kernel must have been configured with [user_namespaces](http://man7.org/linux/man-pages/man7/user_namespaces.7.html) when built, but their use is restricted after boot and has to be explicitly enabled. Which sysctl should I use? (If this was turned on, this would allow to run an isolation command like unshare --user --map-root-user --mount-proc --pid --fork, and then perform [chroot without being root](https://unix.stackexchange.com/q/72696/4319)--a much anticipated feature of Linux.)
imz -- Ivan Zakharyaschev (15862 rep)
Aug 13, 2016, 04:37 PM • Last activity: Mar 27, 2023, 11:03 AM
4 votes
1 answers
1447 views
How do I enable unprivileged_userns_clone selectively for one executable or user?
How do I enable `CLONE_NEWUSER` in a more fine-grained fashion compared to just `kernel.unprivileged_userns_clone`? I want to keep kernel API attack surface manageable by keeping new and complicated things like non-root `CAP_SYS_ADMIN` or BPF disabled, but also selectively allow it for some specific...
How do I enable CLONE_NEWUSER in a more fine-grained fashion compared to just kernel.unprivileged_userns_clone? I want to keep kernel API attack surface manageable by keeping new and complicated things like non-root CAP_SYS_ADMIN or BPF disabled, but also selectively allow it for some specific programs. For example, chrome-sandbox wants either CLOSE_NEWUSER or suid-root for proper operation, but I don't want all the programs to be able to use such complicated tricks, only a handful of approved ones.
Vi. (5985 rep)
Nov 27, 2022, 01:40 AM • Last activity: Nov 27, 2022, 09:56 PM
1 votes
2 answers
1386 views
User namespaces: how to mount a folder only for a given program
I'd like to fake a FHS system on a non-FHS system (NixOs) without root access. To that end, I need to mount some folders at the root (like mounting `/tmp/mylib` to `/lib`) using usernamespaces (I don't see any other solution). Unfortunately, I can't find how to make it work: I tried to follow [this...
I'd like to fake a FHS system on a non-FHS system (NixOs) without root access. To that end, I need to mount some folders at the root (like mounting /tmp/mylib to /lib) using usernamespaces (I don't see any other solution). Unfortunately, I can't find how to make it work: I tried to follow this tutorial , but when I copy the code it fail (I can't even start a bash):
$ gcc userns_child_exec.c -lcap -o userns_child_exec
$ id
uid=1000(myname) gid=100(users) groups=100(users),1(wheel),17(audio),20(lp),57(networkmanager),59(scanner),131(docker),998(vboxusers),999(adbusers)

$ ./userns_child_exec -U -M '0 1000 1' -G '0 100 1' bash
write /proc/535313/gid_map: Operation not permitted
bash: initialize_job_control: no job control in background: Bad file descriptor

[nix-shell:~/Documents/Logiciels/Nix_bidouille/2022_04_26_-_nix_fake_FHS_user_namespace/demo]$ 
[root@bestos:~/Documents/Logiciels/Nix_bidouille/2022_04_26_-_nix_fake_FHS_user_namespace/demo]# 
exit
(note that the prompt for the bash is displayed, but then I can't type anything, it quits directly) Any idea how to make it work? Code:
/* userns_child_exec.c

   Copyright 2013, Michael Kerrisk
   Licensed under GNU General Public License v2 or later

   Create a child process that executes a shell command in new
   namespace(s); allow UID and GID mappings to be specified when
   creating a user namespace.
*/
#define _GNU_SOURCE
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

/* A simple error-handling function: print an error message based
   on the value in 'errno' and terminate the calling process */

#define errExit(msg)    do { perror(msg); exit(EXIT_FAILURE); \
                        } while (0)

struct child_args {
    char **argv;        /* Command to be executed by child, with arguments */
    int    pipe_fd;  /* Pipe used to synchronize parent and child */
};

static int verbose;

static void
usage(char *pname)
{
    fprintf(stderr, "Usage: %s [options] cmd [arg...]\n\n", pname);
    fprintf(stderr, "Create a child process that executes a shell command "
            "in a new user namespace,\n"
            "and possibly also other new namespace(s).\n\n");
    fprintf(stderr, "Options can be:\n\n");
#define fpe(str) fprintf(stderr, "    %s", str);
    fpe("-i          New IPC namespace\n");
    fpe("-m          New mount namespace\n");
    fpe("-n          New network namespace\n");
    fpe("-p          New PID namespace\n");
    fpe("-u          New UTS namespace\n");
    fpe("-U          New user namespace\n");
    fpe("-M uid_map  Specify UID map for user namespace\n");
    fpe("-G gid_map  Specify GID map for user namespace\n");
    fpe("            If -M or -G is specified, -U is required\n");
    fpe("-v          Display verbose messages\n");
    fpe("\n");
    fpe("Map strings for -M and -G consist of records of the form:\n");
    fpe("\n");
    fpe("    ID-inside-ns   ID-outside-ns   len\n");
    fpe("\n");
    fpe("A map string can contain multiple records, separated by commas;\n");
    fpe("the commas are replaced by newlines before writing to map files.\n");

    exit(EXIT_FAILURE);
}

/* Update the mapping file 'map_file', with the value provided in
   'mapping', a string that defines a UID or GID mapping. A UID or
   GID mapping consists of one or more newline-delimited records
   of the form:

       ID_inside-ns    ID-outside-ns   length

   Requiring the user to supply a string that contains newlines is
   of course inconvenient for command-line use. Thus, we permit the
   use of commas to delimit records in this string, and replace them
   with newlines before writing the string to the file. */

static void
update_map(char *mapping, char *map_file)
{
    int fd, j;
    size_t map_len;     /* Length of 'mapping' */

    /* Replace commas in mapping string with newlines */

    map_len = strlen(mapping);
    for (j = 0; j pipe_fd[1] );    /* Close our descriptor for the write end
                                   of the pipe so that we see EOF when
                                   parent closes its descriptor */
    if (read(args->pipe_fd, &ch, 1) != 0) {
        fprintf(stderr, "Failure in child: read from pipe returned != 0\n");
        exit(EXIT_FAILURE);
    }

    /* Execute a shell command */

    execvp(args->argv, args->argv);
    errExit("execvp");
}

#define STACK_SIZE (1024 * 1024)

static char child_stack[STACK_SIZE];    /* Space for child's stack */

int
main(int argc, char *argv[])
{
    int flags, opt;
    pid_t child_pid;
    struct child_args args;
    char *uid_map, *gid_map;
    char map_path[PATH_MAX];

    /* Parse command-line options. The initial '+' character in
       the final getopt() argument prevents GNU-style permutation
       of command-line options. That's useful, since sometimes
       the 'command' to be executed by this program itself
       has command-line options. We don't want getopt() to treat
       those as options to this program. */

    flags = 0;
    verbose = 0;
    gid_map = NULL;
    uid_map = NULL;
    while ((opt = getopt(argc, argv, "+imnpuUM:G:v")) != -1) {
        switch (opt) {
        case 'i': flags |= CLONE_NEWIPC;        break;
        case 'm': flags |= CLONE_NEWNS;         break;
        case 'n': flags |= CLONE_NEWNET;        break;
        case 'p': flags |= CLONE_NEWPID;        break;
        case 'u': flags |= CLONE_NEWUTS;        break;
        case 'v': verbose = 1;                  break;
        case 'M': uid_map = optarg;             break;
        case 'G': gid_map = optarg;             break;
        case 'U': flags |= CLONE_NEWUSER;       break;
        default:  usage(argv);
        }
    }

    /* -M or -G without -U is nonsensical */

    if ((uid_map != NULL || gid_map != NULL) &&
            !(flags & CLONE_NEWUSER))
        usage(argv);

    args.argv = &argv[optind];

    /* We use a pipe to synchronize the parent and child, in order to
       ensure that the parent sets the UID and GID maps before the child
       calls execve(). This ensures that the child maintains its
       capabilities during the execve() in the common case where we
       want to map the child's effective user ID to 0 in the new user
       namespace. Without this synchronization, the child would lose
       its capabilities if it performed an execve() with nonzero
       user IDs (see the capabilities(7) man page for details of the
       transformation of a process's capabilities during execve()). */

    if (pipe(args.pipe_fd) == -1)
        errExit("pipe");

    /* Create the child in new namespace(s) */

    child_pid = clone(childFunc, child_stack + STACK_SIZE,
                      flags | SIGCHLD, &args);
    if (child_pid == -1)
        errExit("clone");

    /* Parent falls through to here */

    if (verbose)
        printf("%s: PID of child created by clone() is %ld\n",
                argv, (long) child_pid);

    /* Update the UID and GID maps in the child */

    if (uid_map != NULL) {
        snprintf(map_path, PATH_MAX, "/proc/%ld/uid_map",
                (long) child_pid);
        update_map(uid_map, map_path);
    }
    if (gid_map != NULL) {
        snprintf(map_path, PATH_MAX, "/proc/%ld/gid_map",
                (long) child_pid);
        update_map(gid_map, map_path);
    }

    /* Close the write end of the pipe, to signal to the child that we
       have updated the UID and GID maps */

    close(args.pipe_fd[1] );

    if (waitpid(child_pid, NULL, 0) == -1)      /* Wait for child */
        errExit("waitpid");

    if (verbose)
        printf("%s: terminating\n", argv);

    exit(EXIT_SUCCESS);
}
**EDIT** Actually, it's quite weird: the error appears when writing the group, but it did work for the uid:
[leo@bestos:~]$ cat /proc/582197/gid_map 

[leo@bestos:~]$ cat /proc/582197/uid_map 
         0       1000          1

[leo@bestos:~]$ ll /proc/582197/gid_map 
-rw-r--r-- 1 leo users 0 mai   18 09:09 /proc/582197/gid_map

[leo@bestos:~]$ ll /proc/582197/uid_map 
-rw-r--r-- 1 leo users 0 mai   18 09:09 /proc/582197/uid_map
tobiasBora (4621 rep)
May 18, 2022, 06:19 AM • Last activity: May 20, 2022, 07:30 PM
21 votes
2 answers
15588 views
What is an unprivileged LXC container?
What does it mean if a Linux container (LXC container) is called "unprivileged"?
What does it mean if a Linux container (LXC container) is called "unprivileged"?
0xC0000022L (16938 rep)
Jan 2, 2015, 10:32 AM • Last activity: Jan 8, 2021, 01:32 PM
7 votes
3 answers
2959 views
Migrate an unprivileged LXC container between users
I have an Ubuntu 14.04 server installation which acts as an LXC host. It has two users: user1 and user2. user1 owns an unprivileged LXC container, which uses a directory (inside /home/user1/.local/...) as backing store. How do I make a full copy of the container for user2? I can't just copy the file...
I have an Ubuntu 14.04 server installation which acts as an LXC host. It has two users: user1 and user2. user1 owns an unprivileged LXC container, which uses a directory (inside /home/user1/.local/...) as backing store. How do I make a full copy of the container for user2? I can't just copy the files because they are mapped with owners ranging from 100000 to 100000+something, which are bound to user1. Also, which I believe is basically the same question, how can I safely make a backup of my user1's LXC container to restore it later on another machine and/or user?
agdev84 (91 rep)
Sep 19, 2014, 07:41 PM • Last activity: May 10, 2020, 09:15 AM
7 votes
2 answers
7281 views
How to influence the assignment of subordinate UIDs/GIDs when creating user accounts?
To my knowledge the subordinate UIDs and GIDs are assigned to accounts in such a manner that they form a contiguous range. The range starts at 100000 by default and probably stretches to the theoretical maximum value for a UID/GID (even though I haven't found a way to query this from the shell, `/et...
To my knowledge the subordinate UIDs and GIDs are assigned to accounts in such a manner that they form a contiguous range. The range starts at 100000 by default and probably stretches to the theoretical maximum value for a UID/GID (even though I haven't found a way to query this from the shell, /etc/login.defs only lists the values allowed for the tools). Now, it'd be a lot more convenient for me as a human if the ranges would start at a multiple of 100000, i.e. n*100000 with n being a positive integer (n>0), instead of 100000+n*65536. This way I'd be able to see immediately which file is owned by which host user. Is there a way to influence the assignment of subordinate UIDs/GIDs in some way in modern enough shadow-utils to achieve a more human-readable assignment? If not, is it alright to simply overwrite the files /etc/subuid and /etc/subgid with conforming data to get what I want?
0xC0000022L (16938 rep)
Dec 30, 2014, 01:25 PM • Last activity: Aug 22, 2019, 10:09 AM
14 votes
1 answers
7810 views
Is there a tool(!) to list assigned subuid and subgid values for users?
`usermod -v` (`--add-sub-uids`) and `usermod -w` (`--add-sub-gids`) can be used to manipulate the subuid and subgid ranges for a user account, but there appears to be no tool that can merely list them. Is there one? At least on my Ubuntu 14.04 box `getent` doesn't seem to be prepared to handle that...
usermod -v (--add-sub-uids) and usermod -w (--add-sub-gids) can be used to manipulate the subuid and subgid ranges for a user account, but there appears to be no tool that can merely list them. Is there one? At least on my Ubuntu 14.04 box getent doesn't seem to be prepared to handle that information from /etc/subuid and /etc/subgid. Currently I am using a little shell script, using awk for the purpose. ---------- Here's an excerpt from usermod(8): -v, --add-sub-uids FIRST-LAST Add a range of subordinate uids to the users account. [...] -V, --del-sub-uids FIRST-LAST Remove a range of subordinate uids from the users account. [...] -w, --add-sub-gids FIRST-LAST Add a range of subordinate gids to the users account. [...] -W, --del-sub-gids FIRST-LAST Remove a range of subordinate gids from the users account. [...]
0xC0000022L (16938 rep)
May 11, 2014, 02:30 AM • Last activity: Apr 30, 2019, 12:41 AM
1 votes
0 answers
526 views
Why would creating a user namespace with size 1 work but size >1 fail
I am experimenting with unprivileged linux containers and I am writing a Go program that creates a minimalist container. The program forks itself and creates namespaces in the process. However for some reason if I set the user namespace size to greater than 1, it fails when running as a regular user...
I am experimenting with unprivileged linux containers and I am writing a Go program that creates a minimalist container. The program forks itself and creates namespaces in the process. However for some reason if I set the user namespace size to greater than 1, it fails when running as a regular user.
cmd := exec.Command("/proc/self/exe", "run-container")
	cmd.SysProcAttr = &syscall.SysProcAttr{
		Cloneflags:   syscall.CLONE_NEWUSER | syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID | syscall.CLONE_NEWNS,
		Unshareflags: syscall.CLONE_NEWNS,
		UidMappings: []syscall.SysProcIDMap{
			{
				ContainerID: 0,
				HostID:      os.Getuid(),
				Size:        1,   // set this to 2 or more and it fails
			},
		},
		GidMappings: []syscall.SysProcIDMap{
			{
				ContainerID: 0,
				HostID:      os.Getgid(),
				Size:        1,
			},
		},
	}
	// other flags: CLONE_NEWNET, CLONE_NEWIPC, CLONE_NEWCGROUP, CLONE_NEWUSER,
	cmd.Stdin = os.Stdin
	cmd.Stdout = os.Stdout
	cmd.Stderr = os.Stderr

	err := cmd.Run()
	if err != nil {
		fmt.Println("ERROR: parent cmd.Run", err)
		os.Exit(1)
	}
The code above (along with all the other stuff like pivot_root etc.. ) works fine. But the moment I set Size to 2, it bombs:
ERROR: parent cmd.Run fork/exec /proc/self/exe: operation not permitted
This seems to be a capabilities issue because when I run as root it works. Here is my /etc/subuid:
lxd:1000:1
root:1000:1
lxd:100000:65536
root:100000:65536
developer:165536:65536
mounter:231072:65536
Update: --------- I figured out that you need CAP_SETUID to map more than just the current euid to another (see user namespaces man page ). But even after sudo setcap cap_setuid=eip /my/binary it fails. The error message has changed to:
ERROR: parent cmd.Run fork/exec /proc/self/exe: permission denied
If I run strace it fails with EPERM when trying to write to /proc/xx/uid_map.
openat(AT_FDCWD, "/proc/25233/uid_map", O_RDWR) = 5
write(5, "0 1000 100\n\0", 12)          = -1 EPERM (Operation not permitted)
teleclimber (111 rep)
Feb 16, 2019, 02:20 AM • Last activity: Feb 16, 2019, 10:31 PM
2 votes
2 answers
962 views
Who would win, RLIMIT_NPROC or user namespaces?
Depending on configuration, unprivileged (non-root) processes can create a user namespace. `RLIMIT_NPROC` limits the number of processes *per user*. If I enter a user namespace, can I create processes with different UIDs, and hence exceed my real `RLIMIT_NPROC`?
Depending on configuration, unprivileged (non-root) processes can create a user namespace. RLIMIT_NPROC limits the number of processes *per user*. If I enter a user namespace, can I create processes with different UIDs, and hence exceed my real RLIMIT_NPROC?
sourcejedi (53222 rep)
Feb 12, 2019, 07:01 PM • Last activity: Feb 13, 2019, 11:04 PM
3 votes
1 answers
1122 views
Ping not working in a new C container
I've been working on writing my own Linux container from scratch in C. I've borrowed code from several places and put up a basic version with **namespaces** & **cgroups**. Basically, I **clone** a new process with all the **CLONE_NEW*** flags to create new namespaces for the **clone'ed** process. I...
I've been working on writing my own Linux container from scratch in C. I've borrowed code from several places and put up a basic version with **namespaces** & **cgroups**. Basically, I **clone** a new process with all the **CLONE_NEW*** flags to create new namespaces for the **clone'ed** process. I also set up UID mapping by inserting **0 0 1000** into the **uid_map** and **gid_map** files. I want to ensure that the *root* inside the container is mapped to the *root* outside. For the filesystem, I am using a base image of **stretch** created with **debootstrap**. Now, I am trying to set up the network connectivity from inside the container. I used this script to setup the interface inside the container. This script creates a new network-namespace of its own. I edited it slightly to mount the net-namespace of the created process onto the newly created net-namespace via the script.
mount --bind /proc/$PID/ns/net /var/run/netns/demo
I can just get into the new network namespace as follows:
ip netns exec ${NS} /bin/bash --rcfile  \"")
and successfully ping outside. But from the bash shell when I get inside the clone'ed process by default I am unable to PING. I get the error:
ping: socket: Operation not permitted
I've tried setting up capabilities: **cap_net_raw** and **cap_net_admin** I would like some guidance.
Shabirmean (135 rep)
Jan 21, 2019, 03:23 PM • Last activity: Jan 21, 2019, 07:38 PM
14 votes
1 answers
1989 views
Why can't I bind-mount "/" inside a user namespace?
Why doesn't this work? $ unshare -rm mount --bind / /mnt mount: /mnt: wrong fs type, bad option, bad superblock on /, missing codepage or helper program, or other error. These work ok: $ unshare -rm mount --bind /tmp /mnt $ unshare -rm mount --bind /root /mnt $ --- $ uname -r # Linux kernel version...
Why doesn't this work? $ unshare -rm mount --bind / /mnt mount: /mnt: wrong fs type, bad option, bad superblock on /, missing codepage or helper program, or other error. These work ok: $ unshare -rm mount --bind /tmp /mnt $ unshare -rm mount --bind /root /mnt $ --- $ uname -r # Linux kernel version 4.17.3-200.fc28.x86_64
sourcejedi (53222 rep)
Jul 18, 2018, 10:22 PM • Last activity: Jul 18, 2018, 10:31 PM
5 votes
4 answers
6949 views
Building unprivileged (userns) LXC container from scratch, by migrating a privileged container to be unprivileged
How can I build a privileged LXC (1.0.3) container (that part I know) and then migrate it successfully to be run unprivileged? That is, I'd like to `debootstrap` it myself or adjust the `lxc-ubuntu` template (commonly under `/usr/share/lxc/templates`) in order for this to work. Here's why I am askin...
How can I build a privileged LXC (1.0.3) container (that part I know) and then migrate it successfully to be run unprivileged? That is, I'd like to debootstrap it myself or adjust the lxc-ubuntu template (commonly under /usr/share/lxc/templates) in order for this to work. Here's why I am asking this question. If you look at the lxc-ubuntu template, you'll notice: # Detect use under userns (unsupported) for arg in "$@"; do [ "$arg" = "--" ] && break if [ "$arg" = "--mapped-uid" -o "$arg" = "--mapped-gid" ]; then echo "This template can't be used for unprivileged containers." 1>&2 echo "You may want to try the \"download\" template instead." 1>&2 exit 1 fi done Following the use of LXC_MAPPED_GID and LXC_MAPPED_UID in the referenced lxc-download template, though, there seems to be nothing particularly special. In fact all it does is to adjust the file ownership (chgrp + chown). But it's possible that the extended attributes in the download template are fine-tuned already to accomplish whatever "magic" is needed. In the comments to this blog post by Stéphane Graber Stéphane tells a commenter that > There’s no easy way to do that unfortunately, you’d need to update > your container config to match that from an unprivileged container, > move the container’s directory over to the unprivileged user you want > it to run as, then use Serge’s uidshift program to change the > ownership of all files. ... and to: * have a look at https://jenkins.linuxcontainers.org/ for the packages built for the download template * check out uidmapshift from here * This program appears to roughly do lxc-usernsexec -m b:0:1000:1 -m b:1:190000:1 -- /bin/chown 1:1 $file as explained in lxc-usernsexec(1) But there are no further pointers. **So my question is: how can I take an ordinary (privileged) LXC container that I have built myself (having root and all) and migrate it to become an unprivileged container?** Even if you can't provide a script or so, it would be great to know which points to consider and how they affect the ability to run the unprivileged LXC container. I can come up with a script on my own and pledge to post it as an answer to this question if a solution can be found :) *Note:* Although I am using Ubuntu 14.04, this is a *generic* question.
0xC0000022L (16938 rep)
May 2, 2014, 12:46 PM • Last activity: Jan 31, 2018, 05:56 PM
3 votes
1 answers
1356 views
What makes firefox inside a container launch a new firefox window outside on the host with the UID of the host user? Isn't it weird for an LXC?
Can someone please explain this weird behaviour to me: I have an unpriviliged LXC container with firefox inside. **If firefox is running on the host outside of the container**, `/usr/bin/firefox` inside the container launches a new firefox window **outside** on the host with the UID of the host user...
Can someone please explain this weird behaviour to me: I have an unpriviliged LXC container with firefox inside. **If firefox is running on the host outside of the container**, /usr/bin/firefox inside the container launches a new firefox window **outside** on the host with the UID of the host user. **If firefox is NOT running outside of the container**, /usr/bin/firefox inside the container launches firefox with the (SUB)UID of the container user like it should be. The reverse is also true: If firefox is running inside the container (but not on the host), and firefox is started on the host, the firefox which is started has the UID of the container user. ?!?! How is that ?!?! EDIT: Confirmed that the same issue emerges when using a default unprivileged Ubuntu container with default configuration file. EDIT: asked the same question in the arch forums https://bbs.archlinux.org/viewtopic.php?pid=1622174#p1622174 config file: lxc.devttydir = lxc lxc.pts = 1024 lxc.tty = 4 lxc.cap.drop = mac_admin mac_override sys_time sys_module lxc.pivotdir = lxc_putold lxc.hook.clone = /usr/share/lxc/hooks/clonehostname lxc.cgroup.devices.deny = a lxc.cgroup.devices.allow = c *:* m lxc.cgroup.devices.allow = b *:* m lxc.cgroup.devices.allow = c 1:3 rwm lxc.cgroup.devices.allow = c 1:5 rwm lxc.cgroup.devices.allow = c 1:7 rwm lxc.cgroup.devices.allow = c 5:0 rwm lxc.cgroup.devices.allow = c 5:1 rwm lxc.cgroup.devices.allow = c 5:2 rwm lxc.cgroup.devices.allow = c 1:8 rwm lxc.cgroup.devices.allow = c 1:9 rwm lxc.cgroup.devices.allow = c 136:* rwm lxc.cgroup.devices.allow = c 10:229 rwm lxc.mount.auto = cgroup:mixed proc:mixed sys:mixed lxc.mount.entry = /sys/fs/fuse/connections sys/fs/fuse/connections none bind,optional 0 0 lxc.seccomp = /usr/share/lxc/config/common.seccomp lxc.hook.mount = /usr/share/lxcfs/lxc.mount.hook lxc.hook.post-stop = /usr/share/lxcfs/lxc.reboot.hook lxc.mount.entry = /sys/kernel/debug sys/kernel/debug none bind,optional 0 0 lxc.mount.entry = /sys/kernel/security sys/kernel/security none bind,optional 0 0 lxc.mount.entry = /sys/fs/pstore sys/fs/pstore none bind,optional 0 0 lxc.mount.entry = mqueue dev/mqueue mqueue rw,relatime,create=dir,optional 0 0 lxc.cgroup.devices.allow = c 254:0 rm lxc.cgroup.devices.allow = c 10:200 rwm lxc.cgroup.devices.allow = c 10:228 rwm lxc.cgroup.devices.allow = c 10:232 rwm lxc.cgroup.devices.deny = lxc.cgroup.devices.allow = lxc.devttydir = lxc.mount.entry = /dev/console dev/console none bind,create=file 0 0 lxc.mount.entry = /dev/full dev/full none bind,create=file 0 0 lxc.mount.entry = /dev/null dev/null none bind,create=file 0 0 lxc.mount.entry = /dev/random dev/random none bind,create=file 0 0 lxc.mount.entry = /dev/tty dev/tty none bind,create=file 0 0 lxc.mount.entry = /dev/urandom dev/urandom none bind,create=file 0 0 lxc.mount.entry = /dev/zero dev/zero none bind,create=file 0 0 lxc.mount.entry = /sys/firmware/efi/efivars sys/firmware/efi/efivars none bind,optional 0 0 lxc.mount.entry = /proc/sys/fs/binfmt_misc proc/sys/fs/binfmt_misc none bind,optional 0 0 lxc.arch = x86_64 lxc.cgroup.devices.allow = c 226:* rwm lxc.mount.entry = tmpfs tmp tmpfs defaults lxc.mount.entry = /dev/dri dev/dri none bind,optional,create=dir lxc.mount.entry = /tmp/.X11-unix tmp/.X11-unix none ro,bind,create=dir 0 0 The container is started like this: lxc-start -n c1 -F -f /path/to/above/conf -s 'lxc.id_map = u 0 100000 65536' -s 'lxc.id_map = g 0 100000 65536' -s 'lxc.rootfs = /path/to/rootfs' -s 'lxc.init_cmd = /usr/bin/bash' EDIT: Distribution Arch Linux $ uname -r 4.6.0-rc4-customGIT+ # lxc-checkconfig --- Namespaces --- Namespaces: enabled Utsname namespace: enabled Ipc namespace: enabled Pid namespace: enabled User namespace: enabled Network namespace: enabled Multiple /dev/pts instances: enabled --- Control groups --- Cgroup: enabled Cgroup clone_children flag: enabled Cgroup device: enabled Cgroup sched: enabled Cgroup cpu account: enabled Cgroup memory controller: enabled Cgroup cpuset: enabled --- Misc --- Veth pair device: enabled Macvlan: enabled Vlan: enabled Bridges: enabled Advanced netfilter: enabled CONFIG_NF_NAT_IPV4: enabled CONFIG_NF_NAT_IPV6: enabled CONFIG_IP_NF_TARGET_MASQUERADE: enabled CONFIG_IP6_NF_TARGET_MASQUERADE: enabled CONFIG_NETFILTER_XT_TARGET_CHECKSUM: enabled FUSE (for use with lxcfs): enabled --- Checkpoint/Restore --- checkpoint restore: enabled CONFIG_FHANDLE: enabled CONFIG_EVENTFD: enabled CONFIG_EPOLL: enabled CONFIG_UNIX_DIAG: enabled CONFIG_INET_DIAG: enabled CONFIG_PACKET_DIAG: enabled CONFIG_NETLINK_DIAG: enabled File capabilities: enabled
MCH (509 rep)
Apr 22, 2016, 01:35 AM • Last activity: Aug 16, 2016, 09:28 AM
2 votes
0 answers
653 views
I have trouble analysing the cause of this (firefox) segfault
When executed within a minimal (unprivileged) LXC container, **firefox segfaults** (other graphical applications work fine). **I'am unable to find the exact cause of this segfault** (which is most likely due to insufficient permissions or missing resources). # strace /usr/bin/firefox ... open("/usr/...
When executed within a minimal (unprivileged) LXC container, **firefox segfaults** (other graphical applications work fine). **I'am unable to find the exact cause of this segfault** (which is most likely due to insufficient permissions or missing resources). # strace /usr/bin/firefox ... open("/usr/lib/libfreebl3.so", O_RDONLY|O_CLOEXEC) = 26 read(26,"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0007\0\0\0\0\0\0"..., 832) = 832 fstat(26, {st_mode=S_IFREG|0755, st_size=544424, ...}) = 0 mmap(NULL, 2619144, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 26, 0) = 0x7f269bf1e000 mprotect(0x7f269bf97000, 2097152, PROT_NONE) = 0 mmap(0x7f269c197000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 26, 0x79000) = 0x7f269c197000 mmap(0x7f269c19a000, 14088, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f269c19a000 close(26) = 0 mprotect(0x7f269c197000, 8192, PROT_READ) = 0 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x20} --- unlink("/home/root/.mozilla/firefox/xqa348dr.default/lock") = 0 close(6) = 0 rt_sigaction(SIGSEGV, {SIG_DFL, [], SA_RESTORER, 0x7f26bafb5e80}, NULL, 8) = 0 rt_sigprocmask(SIG_UNBLOCK, [SEGV], NULL, 8) = 0 tgkill(228, 228, SIGSEGV) = 0 --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_TKILL, si_pid=228, si_uid=0} --- +++ killed by SIGSEGV (core dumped) +++ Segmentation fault (core dumped) **Background:** Firefox is executed in a minimal unprivileged LXC container (no init, not a whole distribution, just firefox and its dependencies) --- therefore I assume that this issue is related to possibly insufficient permissions or nonexistent resources Firefox needs to access. Inside of this container, trivial graphical programs like 'xclock' and even hardware accelerated programs like 'glxgears' work. It could be that the issue of firefox not working is related to dbus (I do not know if it is setup correctly --- all I did is cp /etc/machine-id /container/etc/). **UPDATE:** I was able to solve the problem. The container was missing a dependency of firefox (at this point I cannot say which one because I took the half-assed way of mounting all package contents into the containers rootfs). **UPDATE2:** I am still interested in how to find out the exact cause of above segfault.
MCH (509 rep)
Apr 21, 2016, 11:27 PM • Last activity: Apr 22, 2016, 12:27 PM
7 votes
1 answers
2975 views
Subordinate GIDs/UIDs with LXC and userns for unprivileged user?
When using userns (via LXC in my case), you assign a range of subordinate GIDs and UIDs to an unprivileged user. See for resources: [`subuid(5)`][1], [`subgid(5)`][2], [`newuidmap(1)`][3], [`newgidmap(1)`][4], [`user_namespaces(7)`][5]. That range can then be used and will via [tag:userns] be mapped...
When using userns (via LXC in my case), you assign a range of subordinate GIDs and UIDs to an unprivileged user. See for resources: subuid(5) , subgid(5) , newuidmap(1) , newgidmap(1) , user_namespaces(7) . That range can then be used and will via [tag:userns] be mapped to the system account. Let's assume we have a (host) system account john with a UID (and GID) of 1000. The assigned range of GIDs and UIDs is 100000..165536. So an entry exists in /etc/subgid and /etc/subuid respectively: john:100000:65536 Files that inside the unprivileged container are owned by the "inside" john will now be owned by 101000 on the host and those owned by the "inside" root will be owned by 100000. Normally these ranges are not assigned to any name on the host. ### Questions: 1. is it alright to create a user for those respective UIDs/GIDs on the host in order to have a more meaningful output for ls and friends? 2. is there a way to make those files/folder accessible to the host user who "owns" the userns, i.e. john in our case? And if so, is the only sensible method to create a group shared between those valid users inside the subordinate range and and the userns "owner" and set the permissions accordingly? Well, or ACLs, obviously.
0xC0000022L (16938 rep)
Dec 21, 2014, 11:30 PM • Last activity: Mar 24, 2016, 03:37 PM
2 votes
0 answers
476 views
How to map a UID to another UID (!= 0) inside a user namespace?
How can one map UID 1000 to another regular UID (not root/0) inside a user namespace? EDIT: For anyone interested, lists an option `lxc.init_uid` which seems to be related to this but is not recognized by my version of LXC. If anyone manages to get this to work please let me know.
How can one map UID 1000 to another regular UID (not root/0) inside a user namespace? EDIT: For anyone interested, lists an option lxc.init_uid which seems to be related to this but is not recognized by my version of LXC. If anyone manages to get this to work please let me know.
MCH (509 rep)
Feb 18, 2016, 05:31 PM • Last activity: Feb 24, 2016, 04:59 PM
3 votes
1 answers
315 views
Why can't a UID 0 process hardlink to SUID files in a user namespace?
Consider the following transcript of a user-namespaced shell running with root privileges (UID 0 within the namespace, unprivileged outside): # cat /proc/$$/status | grep CapEff CapEff: 0000003cfdfeffff # ls -al total 8 drwxrwxrwx 2 root root 4096 Sep 16 22:09 . drwxr-xr-x 21 root root 4096 Sep 16 2...
Consider the following transcript of a user-namespaced shell running with root privileges (UID 0 within the namespace, unprivileged outside): # cat /proc/$$/status | grep CapEff CapEff: 0000003cfdfeffff # ls -al total 8 drwxrwxrwx 2 root root 4096 Sep 16 22:09 . drwxr-xr-x 21 root root 4096 Sep 16 22:08 .. -rwSr--r-- 1 nobody nobody 0 Sep 16 22:09 file # ln file link ln: failed to create hard link 'link' => 'file': Operation not permitted # su nobody -s /bin/bash -c "ln file link" # ls -al total 8 drwxrwxrwx 2 root root 4096 Sep 16 22:11 . drwxr-xr-x 21 root root 4096 Sep 16 22:08 .. -rwSr--r-- 2 nobody nobody 0 Sep 16 22:09 file -rwSr--r-- 2 nobody nobody 0 Sep 16 22:09 link Apparently the process has the CAP_FOWNER permission (0x8) and thus should be able to hardlink to arbitrary files. However, it failes to link the SUID'd test file owned by nobody. There is nothing preventing the process from switching to nobody and then linking the file, thus the parent namespace does not seem to be the issue. **Why can't the namespaced UID 0 process hardlink link to file without switching its UID?**
dst (141 rep)
Sep 16, 2015, 08:17 PM • Last activity: Nov 15, 2015, 01:12 AM
8 votes
1 answers
7909 views
userns container fails to start, how to track down the reason?
When creating a userns (unprivileged) LXC container on Ubuntu 14.04 with the following command line: lxc-create -n test1 -t download -- -d $(lsb_release -si|tr 'A-Z' 'a-z') -r $(lsb_release -sc) -a $(dpkg --print-architecture) and (without touching the created configuration file) then attempting to...
When creating a userns (unprivileged) LXC container on Ubuntu 14.04 with the following command line: lxc-create -n test1 -t download -- -d $(lsb_release -si|tr 'A-Z' 'a-z') -r $(lsb_release -sc) -a $(dpkg --print-architecture) and (without touching the created configuration file) then attempting to start it with: lxc-start -n test1 -l DEBUG it fails. The log file shows me: lxc-start 1420149317.700 INFO lxc_start_ui - using rcfile /home/user/.local/share/lxc/test1/config lxc-start 1420149317.700 INFO lxc_utils - XDG_RUNTIME_DIR isn't set in the environment. lxc-start 1420149317.701 INFO lxc_confile - read uid map: type u nsid 0 hostid 100000 range 65536 lxc-start 1420149317.701 INFO lxc_confile - read uid map: type g nsid 0 hostid 100000 range 65536 lxc-start 1420149317.701 WARN lxc_log - lxc_log_init called with log already initialized lxc-start 1420149317.701 INFO lxc_lsm - LSM security driver AppArmor lxc-start 1420149317.701 INFO lxc_utils - XDG_RUNTIME_DIR isn't set in the environment. lxc-start 1420149317.702 DEBUG lxc_conf - allocated pty '/dev/pts/2' (5/6) lxc-start 1420149317.702 DEBUG lxc_conf - allocated pty '/dev/pts/7' (7/8) lxc-start 1420149317.702 DEBUG lxc_conf - allocated pty '/dev/pts/8' (9/10) lxc-start 1420149317.702 DEBUG lxc_conf - allocated pty '/dev/pts/10' (11/12) lxc-start 1420149317.702 INFO lxc_conf - tty's configured lxc-start 1420149317.702 DEBUG lxc_start - sigchild handler set lxc-start 1420149317.702 DEBUG lxc_console - opening /dev/tty for console peer lxc-start 1420149317.702 DEBUG lxc_console - using '/dev/tty' as console lxc-start 1420149317.702 DEBUG lxc_console - 14946 got SIGWINCH fd 17 lxc-start 1420149317.702 DEBUG lxc_console - set winsz dstfd:14 cols:118 rows:61 lxc-start 1420149317.905 INFO lxc_start - 'test1' is initialized lxc-start 1420149317.906 DEBUG lxc_start - Not dropping cap_sys_boot or watching utmp lxc-start 1420149317.906 INFO lxc_start - Cloning a new user namespace lxc-start 1420149317.906 INFO lxc_cgroup - cgroup driver cgmanager initing for test1 lxc-start 1420149317.907 ERROR lxc_cgmanager - call to cgmanager_create_sync failed: invalid request lxc-start 1420149317.907 ERROR lxc_cgmanager - Failed to create hugetlb:test1 lxc-start 1420149317.907 ERROR lxc_cgmanager - Error creating cgroup hugetlb:test1 lxc-start 1420149317.907 INFO lxc_cgmanager - cgroup removal attempt: hugetlb:test1 did not exist lxc-start 1420149317.908 INFO lxc_cgmanager - cgroup removal attempt: perf_event:test1 did not exist lxc-start 1420149317.908 INFO lxc_cgmanager - cgroup removal attempt: blkio:test1 did not exist lxc-start 1420149317.908 INFO lxc_cgmanager - cgroup removal attempt: freezer:test1 did not exist lxc-start 1420149317.909 INFO lxc_cgmanager - cgroup removal attempt: devices:test1 did not exist lxc-start 1420149317.909 INFO lxc_cgmanager - cgroup removal attempt: memory:test1 did not exist lxc-start 1420149317.909 INFO lxc_cgmanager - cgroup removal attempt: cpuacct:test1 did not exist lxc-start 1420149317.909 INFO lxc_cgmanager - cgroup removal attempt: cpu:test1 did not exist lxc-start 1420149317.910 INFO lxc_cgmanager - cgroup removal attempt: cpuset:test1 did not exist lxc-start 1420149317.910 INFO lxc_cgmanager - cgroup removal attempt: name=systemd:test1 did not exist lxc-start 1420149317.910 ERROR lxc_start - failed creating cgroups lxc-start 1420149317.910 INFO lxc_utils - XDG_RUNTIME_DIR isn't set in the environment. lxc-start 1420149317.910 ERROR lxc_start - failed to spawn 'test1' lxc-start 1420149317.910 INFO lxc_utils - XDG_RUNTIME_DIR isn't set in the environment. lxc-start 1420149317.910 INFO lxc_utils - XDG_RUNTIME_DIR isn't set in the environment. lxc-start 1420149317.910 ERROR lxc_start_ui - The container failed to start. lxc-start 1420149317.910 ERROR lxc_start_ui - Additional information can be obtained by setting the --logfile and --logpriority options. Now I see two errors here, the latter probably being a result of the former, which is: > lxc_start - failed creating cgroups However, I see /sys/fs/cgroup mounted: $ mount|grep cgr none on /sys/fs/cgroup type tmpfs (rw) and cgmanager is installed: $ dpkg -l|awk '$1 ~ /^ii$/ && /cgmanager/ {print $2 " " $3 " " $4}' cgmanager 0.24-0ubuntu7 amd64 libcgmanager0:amd64 0.24-0ubuntu7 amd64 Note: My host defaults still to upstart. In case there's any doubt, the kernel support cgroups: $ grep CGROUP /boot/config-$(uname -r) CONFIG_CGROUPS=y # CONFIG_CGROUP_DEBUG is not set CONFIG_CGROUP_FREEZER=y CONFIG_CGROUP_DEVICE=y CONFIG_CGROUP_CPUACCT=y CONFIG_CGROUP_HUGETLB=y CONFIG_CGROUP_PERF=y CONFIG_CGROUP_SCHED=y CONFIG_BLK_CGROUP=y # CONFIG_DEBUG_BLK_CGROUP is not set CONFIG_NET_CLS_CGROUP=m CONFIG_NETPRIO_CGROUP=m Note: My host defaults still to upstart.
0xC0000022L (16938 rep)
Jan 1, 2015, 10:11 PM • Last activity: Jun 15, 2015, 08:58 AM
Showing page 1 of 20 total questions