Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

7 votes

1 answers

2132 views

Mounting Overlayfs in a user namespace

Is it possible to mount an Overlayfs filesystem as an unprivileged user in a user namespace in Linux kernels >4.3.3; it seems that the fix to [this vulnerability][1] has blocked this functionality entirely. When I create a new user namespace with [clone()][2], passing the `CLONE_NEWNS` flag and atte...

                                  Is it possible to mount an Overlayfs filesystem as an unprivileged user in a user namespace in Linux kernels >4.3.3; it seems that the fix to this vulnerability  has blocked this functionality entirely. 

When I create a new user namespace with clone() , passing the CLONE_NEWNS flag and attempt to invoke mount with an overlayfs filesystem, I'm given permission denied. I can mount any other filesystem though. 

Is there a way to work around this/am I missing something?

Josh Hebert (171 rep)

Jun 6, 2016, 05:49 PM • Last activity: Jul 28, 2025, 03:08 PM

12 votes

6 answers

13592 views

How can I list all connections to my host, including those to LXC guests?

lxc userns

I tried both `netstat` and `lsof`, but it appears it's not possible to see the connections to my LXC guests. Is there a way to achieve this ... for **all** guests at once? ---------- Essentially what throws me off here is the fact that I can see the processes of the guests as long as I run as superu...

                                  I tried both netstat and lsof, but it appears it's not possible to see the connections to my LXC guests.

Is there a way to achieve this ... for **all** guests at once?

----------

Essentially what throws me off here is the fact that I can see the processes of the guests as long as I run as superuser. I can also see the veth interfaces that get dynamically created per guest. Why can I not see connections on processes that are otherwise visible?

0xC0000022L (16938 rep)

May 16, 2015, 12:09 AM • Last activity: Oct 15, 2024, 01:54 PM

12 votes

1 answers

30958 views

How to enable user_namespaces in the kernel? (For unprivileged `unshare`.)

security non-root-user sysctl userns altlinux

My Linux kernel must have been configured with [user_namespaces](http://man7.org/linux/man-pages/man7/user_namespaces.7.html) when built, but their use is restricted after boot and has to be explicitly enabled. Which sysctl should I use? (If this was turned on, this would allow to run an isolation c...

                                  My Linux kernel must have been configured with [user_namespaces](http://man7.org/linux/man-pages/man7/user_namespaces.7.html)  when built, but their use is restricted after boot and has to be explicitly enabled. Which sysctl should I use?

(If this was turned on, this would allow to run an isolation command like unshare --user --map-root-user --mount-proc --pid --fork, and then perform [chroot without being root](https://unix.stackexchange.com/q/72696/4319)--a  much anticipated feature of Linux.)

imz -- Ivan Zakharyaschev (15862 rep)

Aug 13, 2016, 04:37 PM • Last activity: Mar 27, 2023, 11:03 AM

4 votes

1 answers

1447 views

How do I enable unprivileged_userns_clone selectively for one executable or user?

linux security group capabilities userns

How do I enable `CLONE_NEWUSER` in a more fine-grained fashion compared to just `kernel.unprivileged_userns_clone`? I want to keep kernel API attack surface manageable by keeping new and complicated things like non-root `CAP_SYS_ADMIN` or BPF disabled, but also selectively allow it for some specific...

                                  How do I enable CLONE_NEWUSER in a more fine-grained fashion compared to just kernel.unprivileged_userns_clone?

I want to keep kernel API attack surface manageable by keeping new and complicated things like non-root CAP_SYS_ADMIN or BPF disabled, but also selectively allow it for some specific programs.

For example, chrome-sandbox wants either CLOSE_NEWUSER or suid-root for proper operation, but I don't want all the programs to be able to use such complicated tricks, only a handful of approved ones.

Vi. (5985 rep)

Nov 27, 2022, 01:40 AM • Last activity: Nov 27, 2022, 09:56 PM

1 votes

2 answers

1386 views

User namespaces: how to mount a folder only for a given program

mount userns

I'd like to fake a FHS system on a non-FHS system (NixOs) without root access. To that end, I need to mount some folders at the root (like mounting `/tmp/mylib` to `/lib`) using usernamespaces (I don't see any other solution). Unfortunately, I can't find how to make it work: I tried to follow [this...

I'd like to fake a FHS system on a non-FHS system (NixOs) without root access. To that end, I need to mount some folders at the root (like mounting /tmp/mylib to /lib) using usernamespaces (I don't see any other solution). Unfortunately, I can't find how to make it work: I tried to follow this tutorial , but when I copy the code it fail (I can't even start a bash):

$ gcc userns_child_exec.c -lcap -o userns_child_exec
$ id
uid=1000(myname) gid=100(users) groups=100(users),1(wheel),17(audio),20(lp),57(networkmanager),59(scanner),131(docker),998(vboxusers),999(adbusers)

$ ./userns_child_exec -U -M '0 1000 1' -G '0 100 1' bash
write /proc/535313/gid_map: Operation not permitted
bash: initialize_job_control: no job control in background: Bad file descriptor

[nix-shell:~/Documents/Logiciels/Nix_bidouille/2022_04_26_-_nix_fake_FHS_user_namespace/demo]$ 
[root@bestos:~/Documents/Logiciels/Nix_bidouille/2022_04_26_-_nix_fake_FHS_user_namespace/demo]# 
exit

(note that the prompt for the bash is displayed, but then I can't type anything, it quits directly) Any idea how to make it work? Code:

/* userns_child_exec.c

   Copyright 2013, Michael Kerrisk
   Licensed under GNU General Public License v2 or later

   Create a child process that executes a shell command in new
   namespace(s); allow UID and GID mappings to be specified when
   creating a user namespace.
*/
#define _GNU_SOURCE
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

/* A simple error-handling function: print an error message based
   on the value in 'errno' and terminate the calling process */

#define errExit(msg)    do { perror(msg); exit(EXIT_FAILURE); \
                        } while (0)

struct child_args {
    char **argv;        /* Command to be executed by child, with arguments */
    int    pipe_fd;  /* Pipe used to synchronize parent and child */
};

static int verbose;

static void
usage(char *pname)
{
    fprintf(stderr, "Usage: %s [options] cmd [arg...]\n\n", pname);
    fprintf(stderr, "Create a child process that executes a shell command "
            "in a new user namespace,\n"
            "and possibly also other new namespace(s).\n\n");
    fprintf(stderr, "Options can be:\n\n");
#define fpe(str) fprintf(stderr, "    %s", str);
    fpe("-i          New IPC namespace\n");
    fpe("-m          New mount namespace\n");
    fpe("-n          New network namespace\n");
    fpe("-p          New PID namespace\n");
    fpe("-u          New UTS namespace\n");
    fpe("-U          New user namespace\n");
    fpe("-M uid_map  Specify UID map for user namespace\n");
    fpe("-G gid_map  Specify GID map for user namespace\n");
    fpe("            If -M or -G is specified, -U is required\n");
    fpe("-v          Display verbose messages\n");
    fpe("\n");
    fpe("Map strings for -M and -G consist of records of the form:\n");
    fpe("\n");
    fpe("    ID-inside-ns   ID-outside-ns   len\n");
    fpe("\n");
    fpe("A map string can contain multiple records, separated by commas;\n");
    fpe("the commas are replaced by newlines before writing to map files.\n");

    exit(EXIT_FAILURE);
}

/* Update the mapping file 'map_file', with the value provided in
   'mapping', a string that defines a UID or GID mapping. A UID or
   GID mapping consists of one or more newline-delimited records
   of the form:

       ID_inside-ns    ID-outside-ns   length

   Requiring the user to supply a string that contains newlines is
   of course inconvenient for command-line use. Thus, we permit the
   use of commas to delimit records in this string, and replace them
   with newlines before writing the string to the file. */

static void
update_map(char *mapping, char *map_file)
{
    int fd, j;
    size_t map_len;     /* Length of 'mapping' */

    /* Replace commas in mapping string with newlines */

    map_len = strlen(mapping);
    for (j = 0; j pipe_fd[1] );    /* Close our descriptor for the write end
                                   of the pipe so that we see EOF when
                                   parent closes its descriptor */
    if (read(args->pipe_fd, &ch, 1) != 0) {
        fprintf(stderr, "Failure in child: read from pipe returned != 0\n");
        exit(EXIT_FAILURE);
    }

    /* Execute a shell command */

    execvp(args->argv, args->argv);
    errExit("execvp");
}

#define STACK_SIZE (1024 * 1024)

static char child_stack[STACK_SIZE];    /* Space for child's stack */

int
main(int argc, char *argv[])
{
    int flags, opt;
    pid_t child_pid;
    struct child_args args;
    char *uid_map, *gid_map;
    char map_path[PATH_MAX];

    /* Parse command-line options. The initial '+' character in
       the final getopt() argument prevents GNU-style permutation
       of command-line options. That's useful, since sometimes
       the 'command' to be executed by this program itself
       has command-line options. We don't want getopt() to treat
       those as options to this program. */

    flags = 0;
    verbose = 0;
    gid_map = NULL;
    uid_map = NULL;
    while ((opt = getopt(argc, argv, "+imnpuUM:G:v")) != -1) {
        switch (opt) {
        case 'i': flags |= CLONE_NEWIPC;        break;
        case 'm': flags |= CLONE_NEWNS;         break;
        case 'n': flags |= CLONE_NEWNET;        break;
        case 'p': flags |= CLONE_NEWPID;        break;
        case 'u': flags |= CLONE_NEWUTS;        break;
        case 'v': verbose = 1;                  break;
        case 'M': uid_map = optarg;             break;
        case 'G': gid_map = optarg;             break;
        case 'U': flags |= CLONE_NEWUSER;       break;
        default:  usage(argv);
        }
    }

    /* -M or -G without -U is nonsensical */

    if ((uid_map != NULL || gid_map != NULL) &&
            !(flags & CLONE_NEWUSER))
        usage(argv);

    args.argv = &argv[optind];

    /* We use a pipe to synchronize the parent and child, in order to
       ensure that the parent sets the UID and GID maps before the child
       calls execve(). This ensures that the child maintains its
       capabilities during the execve() in the common case where we
       want to map the child's effective user ID to 0 in the new user
       namespace. Without this synchronization, the child would lose
       its capabilities if it performed an execve() with nonzero
       user IDs (see the capabilities(7) man page for details of the
       transformation of a process's capabilities during execve()). */

    if (pipe(args.pipe_fd) == -1)
        errExit("pipe");

    /* Create the child in new namespace(s) */

    child_pid = clone(childFunc, child_stack + STACK_SIZE,
                      flags | SIGCHLD, &args);
    if (child_pid == -1)
        errExit("clone");

    /* Parent falls through to here */

    if (verbose)
        printf("%s: PID of child created by clone() is %ld\n",
                argv, (long) child_pid);

    /* Update the UID and GID maps in the child */

    if (uid_map != NULL) {
        snprintf(map_path, PATH_MAX, "/proc/%ld/uid_map",
                (long) child_pid);
        update_map(uid_map, map_path);
    }
    if (gid_map != NULL) {
        snprintf(map_path, PATH_MAX, "/proc/%ld/gid_map",
                (long) child_pid);
        update_map(gid_map, map_path);
    }

    /* Close the write end of the pipe, to signal to the child that we
       have updated the UID and GID maps */

    close(args.pipe_fd[1] );

    if (waitpid(child_pid, NULL, 0) == -1)      /* Wait for child */
        errExit("waitpid");

    if (verbose)
        printf("%s: terminating\n", argv);

    exit(EXIT_SUCCESS);
}

**EDIT** Actually, it's quite weird: the error appears when writing the group, but it did work for the uid:

[leo@bestos:~]$ cat /proc/582197/gid_map 

[leo@bestos:~]$ cat /proc/582197/uid_map 
         0       1000          1

[leo@bestos:~]$ ll /proc/582197/gid_map 
-rw-r--r-- 1 leo users 0 mai   18 09:09 /proc/582197/gid_map

[leo@bestos:~]$ ll /proc/582197/uid_map 
-rw-r--r-- 1 leo users 0 mai   18 09:09 /proc/582197/uid_map

tobiasBora (4621 rep)

May 18, 2022, 06:19 AM • Last activity: May 20, 2022, 07:30 PM

21 votes

2 answers

15588 views

What is an unprivileged LXC container?

lxc userns

What does it mean if a Linux container (LXC container) is called "unprivileged"?

                                  What does it mean if a Linux container (LXC container) is called "unprivileged"?
                                

0xC0000022L (16938 rep)

Jan 2, 2015, 10:32 AM • Last activity: Jan 8, 2021, 01:32 PM

7 votes

3 answers

2959 views

Migrate an unprivileged LXC container between users

linux permissions lxc userns

I have an Ubuntu 14.04 server installation which acts as an LXC host. It has two users: user1 and user2. user1 owns an unprivileged LXC container, which uses a directory (inside /home/user1/.local/...) as backing store. How do I make a full copy of the container for user2? I can't just copy the file...

                                  I have an Ubuntu 14.04 server installation which acts as an LXC host.
It has two users: user1 and user2.

user1 owns an unprivileged LXC container, which uses a directory (inside /home/user1/.local/...) as backing store.

How do I make a full copy of the container for user2?
I can't just copy the files because they are mapped with owners ranging from 100000 to 100000+something, which are bound to user1.

Also, which I believe is basically the same question, how can I safely make a backup of my user1's LXC container to restore it later on another machine and/or user?

agdev84 (91 rep)

Sep 19, 2014, 07:41 PM • Last activity: May 10, 2020, 09:15 AM

7 votes

2 answers

7281 views

How to influence the assignment of subordinate UIDs/GIDs when creating user accounts?

linux ubuntu shadow userns

To my knowledge the subordinate UIDs and GIDs are assigned to accounts in such a manner that they form a contiguous range. The range starts at 100000 by default and probably stretches to the theoretical maximum value for a UID/GID (even though I haven't found a way to query this from the shell, `/et...

                                  To my knowledge the subordinate UIDs and GIDs are assigned to accounts in such a manner that they form a contiguous range.

The range starts at 100000 by default and probably stretches to the theoretical maximum value for a UID/GID (even though I haven't found a way to query this from the shell, /etc/login.defs only lists the values allowed for the tools).

Now, it'd be a lot more convenient for me as a human if the ranges would start at a multiple of 100000, i.e. n*100000 with n being a positive integer (n>0), instead of 100000+n*65536. This way I'd be able to see immediately which file is owned by which host user.

Is there a way to influence the assignment of subordinate UIDs/GIDs in some way in modern enough shadow-utils to achieve a more human-readable assignment?

If not, is it alright to simply overwrite the files /etc/subuid and /etc/subgid with conforming data to get what I want?

0xC0000022L (16938 rep)

Dec 30, 2014, 01:25 PM • Last activity: Aug 22, 2019, 10:09 AM

14 votes

1 answers

7810 views

Is there a tool(!) to list assigned subuid and subgid values for users?

linux ubuntu users userns

`usermod -v` (`--add-sub-uids`) and `usermod -w` (`--add-sub-gids`) can be used to manipulate the subuid and subgid ranges for a user account, but there appears to be no tool that can merely list them. Is there one? At least on my Ubuntu 14.04 box `getent` doesn't seem to be prepared to handle that...

                                  usermod -v (--add-sub-uids) and usermod -w (--add-sub-gids) can be used to manipulate the subuid and subgid ranges for a user account, but there appears to be no tool that can merely list them. Is there one?

At least on my Ubuntu 14.04 box getent doesn't seem to be prepared to handle that information from /etc/subuid and /etc/subgid.

Currently I am using a little shell script, using awk for the purpose.

----------

Here's an excerpt from usermod(8):

    -v, --add-sub-uids FIRST-LAST
        Add a range of subordinate uids to the users account.
    [...]
    -V, --del-sub-uids FIRST-LAST
        Remove a range of subordinate uids from the users account.
    [...]
    -w, --add-sub-gids FIRST-LAST
        Add a range of subordinate gids to the users account.
    [...]
    -W, --del-sub-gids FIRST-LAST
        Remove a range of subordinate gids from the users account.
    [...]
                                

0xC0000022L (16938 rep)

May 11, 2014, 02:30 AM • Last activity: Apr 30, 2019, 12:41 AM

1 votes

0 answers

526 views

Why would creating a user namespace with size 1 work but size >1 fail

linux userns

I am experimenting with unprivileged linux containers and I am writing a Go program that creates a minimalist container. The program forks itself and creates namespaces in the process. However for some reason if I set the user namespace size to greater than 1, it fails when running as a regular user...

cmd := exec.Command("/proc/self/exe", "run-container")
	cmd.SysProcAttr = &syscall.SysProcAttr{
		Cloneflags:   syscall.CLONE_NEWUSER | syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID | syscall.CLONE_NEWNS,
		Unshareflags: syscall.CLONE_NEWNS,
		UidMappings: []syscall.SysProcIDMap{
			{
				ContainerID: 0,
				HostID:      os.Getuid(),
				Size:        1,   // set this to 2 or more and it fails
			},
		},
		GidMappings: []syscall.SysProcIDMap{
			{
				ContainerID: 0,
				HostID:      os.Getgid(),
				Size:        1,
			},
		},
	}
	// other flags: CLONE_NEWNET, CLONE_NEWIPC, CLONE_NEWCGROUP, CLONE_NEWUSER,
	cmd.Stdin = os.Stdin
	cmd.Stdout = os.Stdout
	cmd.Stderr = os.Stderr

	err := cmd.Run()
	if err != nil {
		fmt.Println("ERROR: parent cmd.Run", err)
		os.Exit(1)
	}

The code above (along with all the other stuff like pivot_root etc.. ) works fine. But the moment I set Size to 2, it bombs:

ERROR: parent cmd.Run fork/exec /proc/self/exe: operation not permitted

This seems to be a capabilities issue because when I run as root it works. Here is my /etc/subuid:

lxd:1000:1
root:1000:1
lxd:100000:65536
root:100000:65536
developer:165536:65536
mounter:231072:65536

Update: --------- I figured out that you need CAP_SETUID to map more than just the current euid to another (see user namespaces man page ). But even after sudo setcap cap_setuid=eip /my/binary it fails. The error message has changed to:

ERROR: parent cmd.Run fork/exec /proc/self/exe: permission denied

If I run strace it fails with EPERM when trying to write to /proc/xx/uid_map.

openat(AT_FDCWD, "/proc/25233/uid_map", O_RDWR) = 5
write(5, "0 1000 100\n\0", 12)          = -1 EPERM (Operation not permitted)

teleclimber (111 rep)

Feb 16, 2019, 02:20 AM • Last activity: Feb 16, 2019, 10:31 PM

2 votes

2 answers

962 views

Who would win, RLIMIT_NPROC or user namespaces?

ulimit resources userns

Depending on configuration, unprivileged (non-root) processes can create a user namespace. `RLIMIT_NPROC` limits the number of processes *per user*. If I enter a user namespace, can I create processes with different UIDs, and hence exceed my real `RLIMIT_NPROC`?

                                  Depending on configuration, unprivileged (non-root) processes can create a user namespace.

RLIMIT_NPROC limits the number of processes *per user*.

If I enter a user namespace, can I create processes with different UIDs, and hence exceed my real RLIMIT_NPROC?

sourcejedi (53222 rep)

Feb 12, 2019, 07:01 PM • Last activity: Feb 13, 2019, 11:04 PM

3 votes

1 answers

1122 views

Ping not working in a new C container

permissions ping network-namespaces capabilities userns

I've been working on writing my own Linux container from scratch in C. I've borrowed code from several places and put up a basic version with **namespaces** & **cgroups**. Basically, I **clone** a new process with all the **CLONE_NEW*** flags to create new namespaces for the **clone'ed** process. I also set up UID mapping by inserting **0 0 1000** into the **uid_map** and **gid_map** files. I want to ensure that the *root* inside the container is mapped to the *root* outside. For the filesystem, I am using a base image of **stretch** created with **debootstrap**. Now, I am trying to set up the network connectivity from inside the container. I used this script to setup the interface inside the container. This script creates a new network-namespace of its own. I edited it slightly to mount the net-namespace of the created process onto the newly created net-namespace via the script.

mount --bind /proc/$PID/ns/net /var/run/netns/demo

I can just get into the new network namespace as follows:

ip netns exec ${NS} /bin/bash --rcfile  \"")

and successfully ping outside. But from the bash shell when I get inside the clone'ed process by default I am unable to PING. I get the error:

ping: socket: Operation not permitted

I've tried setting up capabilities: **cap_net_raw** and **cap_net_admin** I would like some guidance.

Shabirmean (135 rep)

Jan 21, 2019, 03:23 PM • Last activity: Jan 21, 2019, 07:38 PM

14 votes

1 answers

1989 views

Why can't I bind-mount "/" inside a user namespace?

mount namespace bind-mount userns

Why doesn't this work? $ unshare -rm mount --bind / /mnt mount: /mnt: wrong fs type, bad option, bad superblock on /, missing codepage or helper program, or other error. These work ok: $ unshare -rm mount --bind /tmp /mnt $ unshare -rm mount --bind /root /mnt $ --- $ uname -r # Linux kernel version...

                                  Why doesn't this work?

    $ unshare -rm mount --bind / /mnt
    mount: /mnt: wrong fs type, bad option, bad superblock on /, missing codepage or helper program, or other error.

These work ok:

    $ unshare -rm mount --bind /tmp /mnt
    $ unshare -rm mount --bind /root /mnt
    $

---

    $ uname -r  # Linux kernel version
    4.17.3-200.fc28.x86_64
                                

sourcejedi (53222 rep)

Jul 18, 2018, 10:22 PM • Last activity: Jul 18, 2018, 10:31 PM

5 votes

4 answers

6949 views

Building unprivileged (userns) LXC container from scratch, by migrating a privileged container to be unprivileged

linux lxc userns

How can I build a privileged LXC (1.0.3) container (that part I know) and then migrate it successfully to be run unprivileged? That is, I'd like to `debootstrap` it myself or adjust the `lxc-ubuntu` template (commonly under `/usr/share/lxc/templates`) in order for this to work. Here's why I am askin...

                                  How can I build a privileged LXC (1.0.3) container (that part I know) and then migrate it successfully to be run unprivileged? That is, I'd like to debootstrap it myself or adjust the lxc-ubuntu template (commonly under /usr/share/lxc/templates) in order for this to work.

Here's why I am asking this question. If you look at the lxc-ubuntu template, you'll notice:



    # Detect use under userns (unsupported)
    for arg in "$@"; do
        [ "$arg" = "--" ] && break
        if [ "$arg" = "--mapped-uid" -o "$arg" = "--mapped-gid" ]; then
            echo "This template can't be used for unprivileged containers." 1>&2
            echo "You may want to try the \"download\" template instead." 1>&2
            exit 1
        fi
    done

Following the use of LXC_MAPPED_GID and LXC_MAPPED_UID in the referenced lxc-download template, though, there seems to be nothing particularly special. In fact all it does is to adjust the file ownership (chgrp + chown). But it's possible that the extended attributes in the download template are fine-tuned already to accomplish whatever "magic" is needed.

In the comments to this blog post by Stéphane Graber  Stéphane tells a commenter that

> There’s no easy way to do that unfortunately, you’d need to update
> your container config to match that from an unprivileged container,
> move the container’s directory over to the unprivileged user you want
> it to run as, then use Serge’s uidshift program to change the
> ownership of all files.

... and to:

* have a look at https://jenkins.linuxcontainers.org/  for the packages built for the download template
* check out uidmapshift from here 
  * This program appears to roughly do lxc-usernsexec -m b:0:1000:1 -m b:1:190000:1 -- /bin/chown 1:1 $file as explained in lxc-usernsexec(1) 

But there are no further pointers.

**So my question is: how can I take an ordinary (privileged) LXC container that I have built myself (having root and all) and migrate it to become an unprivileged container?** Even if you can't provide a script or so, it would be great to know which points to consider and how they affect the ability to run the unprivileged LXC container. I can come up with a script on my own and pledge to post it as an answer to this question if a solution can be found :)

*Note:* Although I am using Ubuntu 14.04, this is a *generic* question.


                                

0xC0000022L (16938 rep)

May 2, 2014, 12:46 PM • Last activity: Jan 31, 2018, 05:56 PM

3 votes

1 answers

1356 views

What makes firefox inside a container launch a new firefox window outside on the host with the UID of the host user? Isn't it weird for an LXC?

firefox lxc userns

Can someone please explain this weird behaviour to me: I have an unpriviliged LXC container with firefox inside. **If firefox is running on the host outside of the container**, `/usr/bin/firefox` inside the container launches a new firefox window **outside** on the host with the UID of the host user...

                                  Can someone please explain this weird behaviour to me:

I have an unpriviliged LXC container with firefox inside. 

**If firefox is running on the host outside of the container**, /usr/bin/firefox inside the container launches a new firefox window **outside** on the host with the UID of the host user.

**If firefox is NOT running outside of the container**, /usr/bin/firefox inside the container launches firefox with the (SUB)UID of the container user like it should be.

The reverse is also true:
If firefox is running inside the container (but not on the host), and firefox is started on the host, the firefox which is started has the UID of the container user.

?!?! How is that ?!?!

EDIT: Confirmed that the same issue emerges when using a default unprivileged Ubuntu container with default configuration file.

EDIT: asked the same question in the arch forums https://bbs.archlinux.org/viewtopic.php?pid=1622174#p1622174 

config file:

    lxc.devttydir = lxc 
    lxc.pts = 1024
    lxc.tty = 4 
    lxc.cap.drop = mac_admin mac_override sys_time sys_module
    lxc.pivotdir = lxc_putold
    lxc.hook.clone = /usr/share/lxc/hooks/clonehostname
    lxc.cgroup.devices.deny = a 
    lxc.cgroup.devices.allow = c *:* m
    lxc.cgroup.devices.allow = b *:* m
    lxc.cgroup.devices.allow = c 1:3 rwm 
    lxc.cgroup.devices.allow = c 1:5 rwm 
    lxc.cgroup.devices.allow = c 1:7 rwm 
    lxc.cgroup.devices.allow = c 5:0 rwm 
    lxc.cgroup.devices.allow = c 5:1 rwm 
    lxc.cgroup.devices.allow = c 5:2 rwm 
    lxc.cgroup.devices.allow = c 1:8 rwm 
    lxc.cgroup.devices.allow = c 1:9 rwm 
    lxc.cgroup.devices.allow = c 136:* rwm 
    lxc.cgroup.devices.allow = c 10:229 rwm 
    lxc.mount.auto = cgroup:mixed proc:mixed sys:mixed
    lxc.mount.entry = /sys/fs/fuse/connections sys/fs/fuse/connections   none bind,optional 0 0 
    lxc.seccomp = /usr/share/lxc/config/common.seccomp
    lxc.hook.mount = /usr/share/lxcfs/lxc.mount.hook
    lxc.hook.post-stop = /usr/share/lxcfs/lxc.reboot.hook
    lxc.mount.entry = /sys/kernel/debug sys/kernel/debug none bind,optional 0 0 
    lxc.mount.entry = /sys/kernel/security sys/kernel/security none bind,optional 0 0 
    lxc.mount.entry = /sys/fs/pstore sys/fs/pstore none bind,optional 0 0 
    lxc.mount.entry = mqueue dev/mqueue mqueue rw,relatime,create=dir,optional 0 0 
    lxc.cgroup.devices.allow = c 254:0 rm
    lxc.cgroup.devices.allow = c 10:200 rwm 
    lxc.cgroup.devices.allow = c 10:228 rwm 
    lxc.cgroup.devices.allow = c 10:232 rwm 
    lxc.cgroup.devices.deny =
    lxc.cgroup.devices.allow =
    lxc.devttydir =
    lxc.mount.entry = /dev/console dev/console none bind,create=file 0 0 
    lxc.mount.entry = /dev/full dev/full none bind,create=file 0 0 
    lxc.mount.entry = /dev/null dev/null none bind,create=file 0 0 
    lxc.mount.entry = /dev/random dev/random none bind,create=file 0 0 
    lxc.mount.entry = /dev/tty dev/tty none bind,create=file 0 0 
    lxc.mount.entry = /dev/urandom dev/urandom none bind,create=file 0 0 
    lxc.mount.entry = /dev/zero dev/zero none bind,create=file 0 0 
    lxc.mount.entry = /sys/firmware/efi/efivars sys/firmware/efi/efivars  none bind,optional 0 0 
    lxc.mount.entry = /proc/sys/fs/binfmt_misc proc/sys/fs/binfmt_misc none bind,optional 0 0 
    lxc.arch = x86_64
    lxc.cgroup.devices.allow = c 226:* rwm 
    lxc.mount.entry = tmpfs tmp tmpfs defaults
    lxc.mount.entry = /dev/dri dev/dri none bind,optional,create=dir
    lxc.mount.entry = /tmp/.X11-unix tmp/.X11-unix none ro,bind,create=dir 0 0


The container is started like this:

lxc-start -n c1 -F -f /path/to/above/conf -s 'lxc.id_map = u 0 100000 65536' -s 'lxc.id_map = g 0 100000 65536' -s 'lxc.rootfs = /path/to/rootfs' -s 'lxc.init_cmd = /usr/bin/bash'


EDIT: Distribution Arch Linux


    $ uname -r
    4.6.0-rc4-customGIT+

    # lxc-checkconfig
    --- Namespaces ---
    Namespaces: enabled
    Utsname namespace: enabled
    Ipc namespace: enabled
    Pid namespace: enabled
    User namespace: enabled
    Network namespace: enabled
    Multiple /dev/pts instances: enabled

    --- Control groups ---
    Cgroup: enabled
    Cgroup clone_children flag: enabled
    Cgroup device: enabled
    Cgroup sched: enabled
    Cgroup cpu account: enabled
    Cgroup memory controller: enabled
    Cgroup cpuset: enabled

    --- Misc ---
    Veth pair device: enabled
    Macvlan: enabled
    Vlan: enabled
    Bridges: enabled
    Advanced netfilter: enabled
    CONFIG_NF_NAT_IPV4: enabled
    CONFIG_NF_NAT_IPV6: enabled
    CONFIG_IP_NF_TARGET_MASQUERADE: enabled
    CONFIG_IP6_NF_TARGET_MASQUERADE: enabled
    CONFIG_NETFILTER_XT_TARGET_CHECKSUM: enabled
    FUSE (for use with lxcfs): enabled

    --- Checkpoint/Restore ---
    checkpoint restore: enabled
    CONFIG_FHANDLE: enabled
    CONFIG_EVENTFD: enabled
    CONFIG_EPOLL: enabled
    CONFIG_UNIX_DIAG: enabled
    CONFIG_INET_DIAG: enabled
    CONFIG_PACKET_DIAG: enabled
    CONFIG_NETLINK_DIAG: enabled
    File capabilities: enabled
                                

MCH (509 rep)

Apr 22, 2016, 01:35 AM • Last activity: Aug 16, 2016, 09:28 AM

2 votes

0 answers

653 views

I have trouble analysing the cause of this (firefox) segfault

firefox lxc d-bus segmentation-fault userns

When executed within a minimal (unprivileged) LXC container, **firefox segfaults** (other graphical applications work fine). **I'am unable to find the exact cause of this segfault** (which is most likely due to insufficient permissions or missing resources). # strace /usr/bin/firefox ... open("/usr/...

                                  When executed within a minimal (unprivileged) LXC container, **firefox segfaults** (other graphical applications work fine).

**I'am unable to find the exact cause of this segfault** (which is most likely due to insufficient permissions or missing resources).

   
    # strace /usr/bin/firefox
    ...
    open("/usr/lib/libfreebl3.so", O_RDONLY|O_CLOEXEC) = 26
    read(26,"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0007\0\0\0\0\0\0"..., 832) = 832
    fstat(26, {st_mode=S_IFREG|0755, st_size=544424, ...}) = 0
    mmap(NULL, 2619144, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 26, 0) = 0x7f269bf1e000
    mprotect(0x7f269bf97000, 2097152, PROT_NONE) = 0
    mmap(0x7f269c197000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 26, 0x79000) = 0x7f269c197000
    mmap(0x7f269c19a000, 14088, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f269c19a000
    close(26)                               = 0
    mprotect(0x7f269c197000, 8192, PROT_READ) = 0
    --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x20} ---
    unlink("/home/root/.mozilla/firefox/xqa348dr.default/lock") = 0
    close(6)                                = 0
    rt_sigaction(SIGSEGV, {SIG_DFL, [], SA_RESTORER, 0x7f26bafb5e80}, NULL, 8) = 0
    rt_sigprocmask(SIG_UNBLOCK, [SEGV], NULL, 8) = 0 
    tgkill(228, 228, SIGSEGV)               = 0
    --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_TKILL, si_pid=228, si_uid=0} ---
    +++ killed by SIGSEGV (core dumped) +++
    Segmentation fault (core dumped)


**Background:** Firefox is executed in a minimal unprivileged LXC container (no init, not a whole distribution, just firefox and its dependencies) --- therefore I assume that this issue is related to possibly insufficient permissions or nonexistent resources Firefox needs to access. Inside of this container, trivial graphical programs like 'xclock' and even hardware accelerated programs like 'glxgears' work. It could be that the issue of firefox not working is related to dbus (I do not know if it is setup correctly --- all I did is cp /etc/machine-id /container/etc/).



**UPDATE:** I was able to solve the problem. The container was missing a dependency of firefox (at this point I cannot say which one because I took the half-assed way of mounting all package contents into the containers rootfs).

**UPDATE2:** I am still interested in how to find out the exact cause of above segfault.
                                

MCH (509 rep)

Apr 21, 2016, 11:27 PM • Last activity: Apr 22, 2016, 12:27 PM

7 votes

1 answers

2975 views

Subordinate GIDs/UIDs with LXC and userns for unprivileged user?

lxc shadow userns

When using userns (via LXC in my case), you assign a range of subordinate GIDs and UIDs to an unprivileged user. See for resources: [`subuid(5)`][1], [`subgid(5)`][2], [`newuidmap(1)`][3], [`newgidmap(1)`][4], [`user_namespaces(7)`][5]. That range can then be used and will via [tag:userns] be mapped...

                                  When using userns (via LXC in my case), you assign a range of subordinate GIDs and UIDs to an unprivileged user. See for resources: subuid(5) , subgid(5) , newuidmap(1) , newgidmap(1) , user_namespaces(7) .

That range can then be used and will via [tag:userns] be mapped to the system account.

Let's assume we have a (host) system account john with a UID (and GID) of 1000. The assigned range of GIDs and UIDs is 100000..165536.

So an entry exists in /etc/subgid and /etc/subuid respectively:

    john:100000:65536

Files that inside the unprivileged container are owned by the "inside" john will now be owned by 101000 on the host and those owned by the "inside" root will be owned by 100000.

Normally these ranges are not assigned to any name on the host.

### Questions:

1. is it alright to create a user for those respective UIDs/GIDs on the host in order to have a more meaningful output for ls and friends?
2. is there a way to make those files/folder accessible to the host user who "owns" the userns, i.e. john in our case? And if so, is the only sensible method to create a group shared between those valid users inside the subordinate range and and the userns "owner" and set the permissions accordingly? Well, or ACLs, obviously.

0xC0000022L (16938 rep)

Dec 21, 2014, 11:30 PM • Last activity: Mar 24, 2016, 03:37 PM

2 votes

0 answers

476 views

How to map a UID to another UID (!= 0) inside a user namespace?

lxc userns

How can one map UID 1000 to another regular UID (not root/0) inside a user namespace? EDIT: For anyone interested, lists an option `lxc.init_uid` which seems to be related to this but is not recognized by my version of LXC. If anyone manages to get this to work please let me know.

                                  How can one map UID 1000 to another regular UID (not root/0) inside a user namespace?

EDIT: For anyone interested,  lists an option lxc.init_uid which seems to be related to this but is not recognized by my version of LXC. If anyone manages to get this to work please let me know.

MCH (509 rep)

Feb 18, 2016, 05:31 PM • Last activity: Feb 24, 2016, 04:59 PM

3 votes

1 answers

315 views

Why can't a UID 0 process hardlink to SUID files in a user namespace?

linux hard-link capabilities userns

Consider the following transcript of a user-namespaced shell running with root privileges (UID 0 within the namespace, unprivileged outside): # cat /proc/$$/status | grep CapEff CapEff: 0000003cfdfeffff # ls -al total 8 drwxrwxrwx 2 root root 4096 Sep 16 22:09 . drwxr-xr-x 21 root root 4096 Sep 16 2...

                                  Consider the following transcript of a user-namespaced shell running with root privileges (UID 0 within the namespace, unprivileged outside):

    # cat /proc/$$/status | grep CapEff
    CapEff:	0000003cfdfeffff
    # ls -al
    total 8
    drwxrwxrwx  2 root   root   4096 Sep 16 22:09 .
    drwxr-xr-x 21 root   root   4096 Sep 16 22:08 ..
    -rwSr--r--  1 nobody nobody    0 Sep 16 22:09 file
    # ln file link
    ln: failed to create hard link 'link' => 'file': Operation not permitted
    # su nobody -s /bin/bash -c "ln file link"
    # ls -al
    total 8
    drwxrwxrwx  2 root   root   4096 Sep 16 22:11 .
    drwxr-xr-x 21 root   root   4096 Sep 16 22:08 ..
    -rwSr--r--  2 nobody nobody    0 Sep 16 22:09 file
    -rwSr--r--  2 nobody nobody    0 Sep 16 22:09 link

Apparently the process has the CAP_FOWNER permission (0x8) and thus should be able to hardlink to arbitrary files. However, it failes to link the SUID'd test file owned by nobody. There is nothing preventing the process from switching to nobody and then linking the file, thus the parent namespace does not seem to be the issue.

**Why can't the namespaced UID 0 process hardlink link to file without switching its UID?**
                                

dst (141 rep)

Sep 16, 2015, 08:17 PM • Last activity: Nov 15, 2015, 01:12 AM

8 votes

1 answers

7909 views

userns container fails to start, how to track down the reason?

ubuntu lxc userns

When creating a userns (unprivileged) LXC container on Ubuntu 14.04 with the following command line: lxc-create -n test1 -t download -- -d $(lsb_release -si|tr 'A-Z' 'a-z') -r $(lsb_release -sc) -a $(dpkg --print-architecture) and (without touching the created configuration file) then attempting to...

                                  When creating a userns (unprivileged) LXC container on Ubuntu 14.04 with the following command line:

    lxc-create -n test1 -t download -- -d $(lsb_release -si|tr 'A-Z' 'a-z') -r $(lsb_release -sc) -a $(dpkg --print-architecture)

and (without touching the created configuration file) then attempting to start it with:

    lxc-start -n test1 -l DEBUG

it fails. The log file shows me:

    lxc-start 1420149317.700 INFO     lxc_start_ui - using rcfile /home/user/.local/share/lxc/test1/config
    lxc-start 1420149317.700 INFO     lxc_utils - XDG_RUNTIME_DIR isn't set in the environment.
    lxc-start 1420149317.701 INFO     lxc_confile - read uid map: type u nsid 0 hostid 100000 range 65536
    lxc-start 1420149317.701 INFO     lxc_confile - read uid map: type g nsid 0 hostid 100000 range 65536
    lxc-start 1420149317.701 WARN     lxc_log - lxc_log_init called with log already initialized
    lxc-start 1420149317.701 INFO     lxc_lsm - LSM security driver AppArmor
    lxc-start 1420149317.701 INFO     lxc_utils - XDG_RUNTIME_DIR isn't set in the environment.
    lxc-start 1420149317.702 DEBUG    lxc_conf - allocated pty '/dev/pts/2' (5/6)
    lxc-start 1420149317.702 DEBUG    lxc_conf - allocated pty '/dev/pts/7' (7/8)
    lxc-start 1420149317.702 DEBUG    lxc_conf - allocated pty '/dev/pts/8' (9/10)
    lxc-start 1420149317.702 DEBUG    lxc_conf - allocated pty '/dev/pts/10' (11/12)
    lxc-start 1420149317.702 INFO     lxc_conf - tty's configured
    lxc-start 1420149317.702 DEBUG    lxc_start - sigchild handler set
    lxc-start 1420149317.702 DEBUG    lxc_console - opening /dev/tty for console peer
    lxc-start 1420149317.702 DEBUG    lxc_console - using '/dev/tty' as console
    lxc-start 1420149317.702 DEBUG    lxc_console - 14946 got SIGWINCH fd 17
    lxc-start 1420149317.702 DEBUG    lxc_console - set winsz dstfd:14 cols:118 rows:61
    lxc-start 1420149317.905 INFO     lxc_start - 'test1' is initialized
    lxc-start 1420149317.906 DEBUG    lxc_start - Not dropping cap_sys_boot or watching utmp
    lxc-start 1420149317.906 INFO     lxc_start - Cloning a new user namespace
    lxc-start 1420149317.906 INFO     lxc_cgroup - cgroup driver cgmanager initing for test1
    lxc-start 1420149317.907 ERROR    lxc_cgmanager - call to cgmanager_create_sync failed: invalid request
    lxc-start 1420149317.907 ERROR    lxc_cgmanager - Failed to create hugetlb:test1
    lxc-start 1420149317.907 ERROR    lxc_cgmanager - Error creating cgroup hugetlb:test1
    lxc-start 1420149317.907 INFO     lxc_cgmanager - cgroup removal attempt: hugetlb:test1 did not exist
    lxc-start 1420149317.908 INFO     lxc_cgmanager - cgroup removal attempt: perf_event:test1 did not exist
    lxc-start 1420149317.908 INFO     lxc_cgmanager - cgroup removal attempt: blkio:test1 did not exist
    lxc-start 1420149317.908 INFO     lxc_cgmanager - cgroup removal attempt: freezer:test1 did not exist
    lxc-start 1420149317.909 INFO     lxc_cgmanager - cgroup removal attempt: devices:test1 did not exist
    lxc-start 1420149317.909 INFO     lxc_cgmanager - cgroup removal attempt: memory:test1 did not exist
    lxc-start 1420149317.909 INFO     lxc_cgmanager - cgroup removal attempt: cpuacct:test1 did not exist
    lxc-start 1420149317.909 INFO     lxc_cgmanager - cgroup removal attempt: cpu:test1 did not exist
    lxc-start 1420149317.910 INFO     lxc_cgmanager - cgroup removal attempt: cpuset:test1 did not exist
    lxc-start 1420149317.910 INFO     lxc_cgmanager - cgroup removal attempt: name=systemd:test1 did not exist
    lxc-start 1420149317.910 ERROR    lxc_start - failed creating cgroups
    lxc-start 1420149317.910 INFO     lxc_utils - XDG_RUNTIME_DIR isn't set in the environment.
    lxc-start 1420149317.910 ERROR    lxc_start - failed to spawn 'test1'
    lxc-start 1420149317.910 INFO     lxc_utils - XDG_RUNTIME_DIR isn't set in the environment.
    lxc-start 1420149317.910 INFO     lxc_utils - XDG_RUNTIME_DIR isn't set in the environment.
    lxc-start 1420149317.910 ERROR    lxc_start_ui - The container failed to start.
    lxc-start 1420149317.910 ERROR    lxc_start_ui - Additional information can be obtained by setting the --logfile and --logpriority options.

Now I see two errors here, the latter probably being a result of the former, which is:

> lxc_start - failed creating cgroups

However, I see /sys/fs/cgroup mounted:

    $ mount|grep cgr
    none on /sys/fs/cgroup type tmpfs (rw)

and cgmanager is installed:

    $ dpkg -l|awk '$1 ~ /^ii$/ && /cgmanager/ {print $2 " " $3 " " $4}'
    cgmanager 0.24-0ubuntu7 amd64
    libcgmanager0:amd64 0.24-0ubuntu7 amd64

Note: My host defaults still to upstart.

In case there's any doubt, the kernel support cgroups:

    $ grep CGROUP /boot/config-$(uname -r)
    CONFIG_CGROUPS=y
    # CONFIG_CGROUP_DEBUG is not set
    CONFIG_CGROUP_FREEZER=y
    CONFIG_CGROUP_DEVICE=y
    CONFIG_CGROUP_CPUACCT=y
    CONFIG_CGROUP_HUGETLB=y
    CONFIG_CGROUP_PERF=y
    CONFIG_CGROUP_SCHED=y
    CONFIG_BLK_CGROUP=y
    # CONFIG_DEBUG_BLK_CGROUP is not set
    CONFIG_NET_CLS_CGROUP=m
    CONFIG_NETPRIO_CGROUP=m

Note: My host defaults still to upstart.
                                

0xC0000022L (16938 rep)

Jan 1, 2015, 10:11 PM • Last activity: Jun 15, 2015, 08:58 AM

Showing page 1 of 20 total questions