Commits · ef659aaf61b57aeda16cf1244a0dbf4cd5cc7740 · Chen Yisong / lxc

19 Jan, 2018 1 commit

authored Jan 02, 2018

- mapped_hostid_entry()
- idmap_add()

Closes #2033.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

ef659aaf

02 Jan, 2018 11 commits

mainloop: use epoll_create1(EPOLL_CLOEXEC) · 20c4a521
Christian Brauner authored Dec 26, 2017
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
20c4a521

console: do not allow non-pty devices on open() · 3a6b6e1d

authored Dec 26, 2017

We don't allow non-pty devices anyway so don't let open() create unneeded
files.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

3a6b6e1d

start: properly cleanup mainloop · afa93cd3
Christian Brauner authored Dec 26, 2017
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
afa93cd3

lxc_config: Add -h and --help flags handler · f811f7fd

authored Dec 30, 2017

As the other tools already handle, show usage message when -h or --help
are used.
Signed-off-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com>

f811f7fd

mainloop: capture output of short-lived init procs · 452badd5

authored Dec 25, 2017

The handler for the signal fd will detect when the init process of a container
has exited and cause the mainloop to close. However, this can happen before the
console handlers - or any other events for that matter - are handled. So in the
case of init exiting we still need to allow for all buffered input to the
console to be handled before exiting. This allows us to capture output from
short-lived init processes.

This is conceptually equivalent to my implementation of ExecReaderToChannel()
https://github.com/lxc/lxd/blob/master/shared/util_linux.go#L527

Closes #1694.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

452badd5

mainloop: add mainloop macros · c56c715c

authored Dec 25, 2017

This makes it clearer why handlers return what value.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

c56c715c

start: handle setting death signal smarter · e8560f46
Christian Brauner authored Dec 22, 2017
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
e8560f46

start: fix death signal · 86952125

authored Dec 22, 2017

On set{g,u}id() the kernel does:

 	/* dumpability changes */
	if (!uid_eq(old->euid, new->euid) ||
	    !gid_eq(old->egid, new->egid) ||
	    !uid_eq(old->fsuid, new->fsuid) ||
	    !gid_eq(old->fsgid, new->fsgid) ||
	    !cred_cap_issubset(old, new)) {
		if (task->mm)
			set_dumpable(task->mm, suid_dumpable);
		task->pdeath_signal = 0;
		smp_wmb();
	}

which means we need to re-enable the deat signal after the set{g,u}id().
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

86952125

start: simplify cgroup namespace preservation · a2ffd257

authored Dec 22, 2017

Since we are now dumpable we can open /proc/<child-pid>/ns/cgroup so let's
avoid the overhead of sending around fds.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

a2ffd257

start: make us dumpable · 25753d59

authored Dec 22, 2017

When set set{u,g}id() the kernel will make us undumpable. This is unnecessary
since we can guarantee that whatever is running inside the child process at
this point this is fully trusted by the parent. Making us dumpable let's users
use debuggers on the child process before the exec as well and also allows us
to open /proc/<child-pid> files in lieu of the child.
Note, that we only need to perform the prctl(PR_SET_DUMPABLE, ...) if our
effective uid on the host is not 0. If our effective uid on the host is 0 then
we will keep all capabilities in the child user namespace across set{g,u}id().
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

25753d59

start: log closing cmd socket and STOPPED state · d9ef6641
Christian Brauner authored Dec 16, 2017
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
d9ef6641

01 Jan, 2018 11 commits

start: use lxc_raw_clone_cb() where possible · 66fe662e

authored Dec 15, 2017

This way we can rely on the kernel's copy-on-write support similar to fork().
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

66fe662e

namespace: add lxc_raw_clone_cb() · 14c678f1

authored Dec 15, 2017

This is a copy-on-write (no stack passed) variant of lxc_clone().
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

14c678f1

namespace: comment lxc_{raw_}clone() · 3d3691a3
Christian Brauner authored Dec 15, 2017
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
3d3691a3

tree-wide: s/getpid()/lxc_raw_getpid()/g · 0659cfa4

authored Dec 16, 2017

This is to avoid bad surprises caused by older glibc's pid cache (up to 2.25)
when using clone().
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

0659cfa4

namespace: add lxc_raw_getpid() · d74dfbb0

authored Dec 16, 2017

Because of older glibc's pid cache (up to 2.25) whenever clone() is called the
child must must retrieve it's own pid via lxc_raw_getpid().
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

d74dfbb0

tests: expand lxc_raw_clone() tests · 7831a82c

authored Dec 15, 2017

- test CLONE_VFORK
- test CLONE_FILES
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

7831a82c

attach: handle /proc with hidepid={1,2} property · 94aff6a4

authored Dec 21, 2017

Receive fd for LSM security module before we set{g,u}id(). The reason is that
on set{g,u}id() the kernel will a) make us undumpable and b) we will change our
effective uid. This means our effective uid will be different from the
effective uid of the process that created us which means that this processs no
longer has capabilities in our namespace including CAP_SYS_PTRACE. This means
we will not be able to read and /proc/<pid> files for the process anymore when
/proc is mounted with hidepid={1,2}. So let's get the lsm label fd before the
set{g,u}id().
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

94aff6a4

attach: use lxc_raw_clone() · 00139de8

authored Dec 20, 2017

This let's us simplify the whole file a lot and makes things way clearer. It
also let's us avoid the infamous pid cache.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

00139de8

attach: simplify significantly · ad1ab969
Christian Brauner authored Dec 18, 2017
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
ad1ab969

cgfsng: Add new macro to print errors · a2f65700

authored Dec 19, 2017

At this point, macros such DEBUG or ERROR does not take effect because
this code is called from cgroup_ops_init(cgroup.c), which runs with
__attribute__((constructor)), before any log level is set form any tool
like lxc-start, so these messages are lost.

For now on, use the same LXC_DEBUG_CGFSNG environment variable to
control these messages.
Signed-off-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com>

a2f65700

[monitor] wrong statement of break · ddbb1dbc

authored Dec 18, 2017

if lxc_abstract_unix_connect fail and return -1,  this code never goto retry.
Signed-off-by: liuhao <liuhao27@huawei.com>

ddbb1dbc

18 Dec, 2017 1 commit
- commands_utils: add missing mutex · 457df41b
  Christian Brauner authored Dec 18, 2017
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
  457df41b
17 Dec, 2017 16 commits

tests: s/lxc.init.cmd/lxc.init_cmd/g · 36cffe6e

authored Dec 17, 2017

lxc.init.cmd is the new key that stable-2.0 doesn't know about.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

36cffe6e

lxc_init: fix cgroup parsing · 3fe57496

authored Dec 14, 2017

coverity: #1426132
coverity: #1426133
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

3fe57496

utils: use lxc_raw_clone() in run_command() · 80d90c34
Christian Brauner authored Dec 14, 2017
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
80d90c34

namespace: add lxc_raw_clone() · a7ef3151

authored Dec 14, 2017

This is based on raw_clone in systemd but adapted to our needs. The main reason
is that we need an implementation of fork()/clone() that does guarantee us that
no pthread_atfork() handlers are run. While clone() in glibc currently doesn't
run pthread_atfork() handlers we should be fine but there's no guarantee that
this won't be the case in the future. So let's do the syscall directly - or as
direct as we can. An additional nice feature is that we get fork() behavior,
i.e. lxc_raw_clone() returns 0 in the child and the child pid in the parent.

Our implementation tries to make sure that we cover all cases according to
kernel sources. Note that we are not interested in any arguments that could be
passed after the stack.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

a7ef3151

commands: fix race when open()/close() cmd socket · 59beaa6f

authored Dec 14, 2017

When we report STOPPED to a caller and then close the command socket it is
technically possible - and I've seen this happen on the test builders - that a
container start() right after a wait() will receive ECONNREFUSED because it
called open() before we close(). So for all new state clients simply close the
command socket. This will inform all state clients that the container is
STOPPED and also prevents a race between a open()/close() on the command socket
causing a new process to get ECONNREFUSED because we haven't yet closed the
command socket.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

59beaa6f

SHARE_NS options should be before OPT_USAGE · 76365631
Tycho Andersen authored Dec 14, 2017
```
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
```
76365631

init: don't kill(-1) if we aren't in a pid ns · 76c31763

authored Dec 08, 2017

...otherwise we'll kill everyone on the machine. Instead, let's explicitly
try to kill our children. Let's do a best effort against fork bombs by
disabling forking via the pids cgroup if it exists. This is best effort for
a number of reasons:

* the pids cgroup may not be available
* the container may have bind mounted /dev/null over pids.max, so the write
  doesn't do anything
Signed-off-by: Tycho Andersen <tycho@tycho.ws>

76c31763

start: fix cgroup namespace preservation · 662a9832

authored Dec 13, 2017

Prior to this patch we raced with a very short-lived init process. Essentially,
the init process could exit before we had time to record the cgroup namespace
causing the container to abort and report ABORTING to the caller when it
actually started just fine. Let's not do this.

(This uses syscall(SYS_getpid) in the the child to retrieve the pid just in case
we're on an older glibc version and we end up in the namespace sharing branch
of the actual lxc_clone() call.)

Additionally this fixes the shortlived tests. They were faulty so far and
should have actually failed because of the cgroup namespace recording race but
the ret variable used to return from the function was not correctly
initialized. This fixes it.
Furthermore, the shortlived tests used the c->error_num variable to determine
success or failure but this is actually not correct when the container is
started daemonized.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

662a9832

tools: exit success when lxc-execute is daemonized · 8ba2c9bd

authored Dec 12, 2017

The error_num value doesn't tell us anything since the container hasn't exited.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

8ba2c9bd

start: do not unconditionally dup std{in,out,err} · e16da8e8

authored Dec 12, 2017

Starting with commit

    commit c5b93afb
    Author: Li Feng <lifeng68@huawei.com>
    Date:   Mon Jul 10 17:19:52 2017 +0800

        start: dup std{in,out,err} to pty slave

        In the case the container has a console with a valid slave pty file descriptor
        we duplicate std{in,out,err} to the slave file descriptor so console logging
        works correctly. When the container does not have a valid slave pty file
        descriptor for its console and is started daemonized we should dup to
        /dev/null.

        Closes #1646.
Signed-off-by: Li Feng <lifeng68@huawei.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

we made std{err,in,out} a duplicate of the slave file descriptor of the console
if it existed. This meant we also duplicated all of them when we executed
application containers in the foreground even if some std{err,in,out} file
descriptor did not refer to a {p,t}ty. This blocked use cases such as:

    echo foo | lxc-execute -n -- cat

which are very valid and common with application containers but less common
with system containers where we don't have to care about this. So my suggestion
is to unconditionally duplicate std{err,in,out} to the console file descriptor
if we are either running daemonized - this ensures that daemonized application
containers with a single bash shell keep on working - or when we are not
running an application container. In other cases we only duplicate those file
descriptors that actually refer to a {p,t}ty. This logic is similar to what we
do for lxc-attach already.

Refers to #1690.
Closes #2028.
Reported-by: Felix Abecassis <fabecassis@nvidia.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

e16da8e8

coverity: #1425857 · 53ee6301

authored Dec 09, 2017

remove logically dead code
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

53ee6301

coverity: #1425858 · 0483c219

authored Dec 09, 2017

free allocated memory
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

0483c219

coverity: #1425859 · 656bb5fb

authored Dec 09, 2017

check return value of snprintf()
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

656bb5fb

coverity: #1425860 · 46fa2b43

authored Dec 09, 2017

remove logically dead code
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

46fa2b43

coverity: #1425862 · 63efabee

authored Dec 09, 2017

initialize handler
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

63efabee

coverity: #1425863 · 68c9b0a1

authored Dec 09, 2017

remove logically dead code
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

68c9b0a1