Commits · 8a545679076e2aabf205bd920b9e28d3cfb9ab6d · Chen Yisong / lxc

13 Jun, 2019 3 commits

lxc_clone: get rid of some indirection · 8a545679

authored May 09, 2019

We have a do_clone(), which just calls a void f(void *) that it gets
passed. We build up a struct consisting of two args that are just the
actual arg and actual function. Let's just have the syscall do this for us.
Signed-off-by: Tycho Andersen <tycho@tycho.ws>

8a545679

doc: add a little note about shared ns + LSMs · 4f7e281f

authored May 09, 2019

We should add a little not about the race in the previous patch.
Signed-off-by: Tycho Andersen <tycho@tycho.ws>

4f7e281f

lxc_clone: pass non-stack allocated stack to clone · 68a1966d

authored May 09, 2019

There are two problems with this code:

1. The math is wrong. We allocate a char *foo[__LXC_STACK_SIZE]; which
means it's really sizeof(char *) * __LXC_STACK_SIZE, instead of just
__LXC_STACK SIZE.

2. We can't actually allocate it on our stack. When we use CLONE_VM (which
we do in the shared ns case) that means that the new thread is just
running one page lower on the stack, but anything that allocates a page
on the stack may clobber data. This is a pretty short race window since
we just do the shared ns stuff and then do a clone without CLONE_VM.

However, it does point out an interesting possible privilege escalation if
things aren't configured correctly: do_share_ns() sets up namespaces while
it shares the address space of the task that spawned it; once it enters the
pid ns of the thing it's sharing with, the thing it's sharing with can
ptrace it and write stuff into the host's address space. Since the function
that does the clone() is lxc_spawn(), it has a struct cgroup_ops* on the
stack, which itself has function pointers called later in the function, so
it's possible to allocate shellcode in the address space of the host and
run it fairly easily.

ASLR doesn't mitigate this since we know exactly the stack offsets; however
this patch has the kernel allocate a new stack, which will help. Of course,
the attacker could just check /proc/pid/maps to find the location of the
stack, but they'd still have to guess where to write stuff in.

The thing that does prevent this is the default configuration of apparmor.
Since the apparmor profile is set in the second clone, and apparmor
prevents ptracing things under a different profile, attackers confined by
apparmor can't do this. However, if users are using a custom configuration
with shared namespaces, care must be taken to avoid this race.

Shared namespaces aren't widely used now, so perhaps this isn't a problem,
but with the advent of crio-lxc for k8s, this functionality will be used
more.
Signed-off-by: Tycho Andersen <tycho@tycho.ws>

68a1966d

21 May, 2019 1 commit
- configure: remove additional comma · f9bbc96e
  Christian Brauner authored May 21, 2019
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
  f9bbc96e
18 May, 2019 36 commits

start: remove unused label · 1cbdf1ea
Christian Brauner authored May 18, 2019
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
1cbdf1ea
lxccontainer: remove unused function · f9df3281
Christian Brauner authored May 18, 2019
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
f9df3281

lxccontainer: cleanup attach functions · 89f59fa2

authored May 17, 2019

Specifically, refloat function arguments and remove useless comments.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

89f59fa2

attach: do not reload container · b748fa8f

authored May 16, 2019

Let lxc_attach() reuse the already initialized container.

Closes https://github.com/lxc/lxd/issues/5755.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

b748fa8f

network: Fixes bug that stopped down hook from running for phys netdevs · d880b034
Thomas Parrott authored May 15, 2019
```
Signed-off-by: Thomas Parrott <thomas.parrott@canonical.com>
```
d880b034

network: move phys netdevs back to monitor's net ns rather than pid 1's · c0c0d9ec

authored May 15, 2019

Updates lxc_restore_phys_nics_to_netns() to move phys netdevs back to the monitor's network namespace rather than the previously hardcoded PID 1 net ns.

This is to fix instances where LXC is started inside a net ns different from PID 1 and physical devices are moved back to a different net ns when the container is shutdown than the net ns than where the container was started from.
Signed-off-by: Thomas Parrott <thomas.parrott@canonical.com>

c0c0d9ec

configure: handle checks when cross-compiling · eabeaa39
Christian Brauner authored May 15, 2019
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
eabeaa39

Error prone semicolon · 0b8deb65

authored May 13, 2019

Suppressed error prone semicolon in SYSTRACE() macro.
Signed-off-by: Rachid Koucha <rachid.koucha@gmail.com>

0b8deb65

Use %m instead of strerror() when available · 5d27c86a

authored May 13, 2019

Use %m under HAVE_M_FORMAT instead of strerror()
Signed-off-by: Rachid Koucha <rachid.koucha@gmail.com>

5d27c86a

Config: check for %m availability · 7d1a06e5

authored May 13, 2019

GLIBC supports %m to avoid calling strerror(). Using it saves some code space.
==> This check will define HAVE_M_FORMAT to be use wherever possible (e.g. log.h)
Signed-off-by: Rachid Koucha <rachid.koucha@gmail.com>

7d1a06e5

initutils: Fix memleak on realloc failure · 22c8f39b
Rikard Falkeborn authored May 12, 2019
```
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
```
22c8f39b

zfs: Fix return value on zfs_snapshot error · 3cd86139

authored May 12, 2019

Returning -1 in a function with return type bool is the same as
returning true. Change to return false to indicate error properly.

Detected with cppcheck.
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>

3cd86139

lvm: Fix return value if lvm_create_clone fails · c5e6088f

authored May 12, 2019

Returning -1 in a function with return type bool is the same as
returning true. Change to return false to indicate error properly.

Detected with cppcheck.
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>

c5e6088f

criu: Remove unnecessary return after _exit() · b526996b

authored May 12, 2019

Since _exit() will terminate, the return statement is dead code. Also,
returning -1 from a function with bool as return type is confusing.

Detected with cppcheck.
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>

b526996b

criu: Use -v4 instead of -vvvvvv · d3accb17

authored May 10, 2019

CRIU has only 4 levels of verbosity (errors, warnings, info, debug).
Thus, using `-v4` is more appropriate.

https://criu.org/LoggingSigned-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>

d3accb17

Option --busybox-path instead of --bbpath · 09f55bc4
Rachid Koucha authored May 10, 2019
```
As suggested during the review.
Signed-off-by: Rachid Koucha <rachid.koucha@gmail.com>
```
09f55bc4

lxccontainer: do not display if missing privileges · cd2ca8a1

authored May 10, 2019

lxc-ls without root privileges on privileged containers should not display
information. In lxc_container_new(), ongoing_create()'s result is not checked
for all possible returned values. Hence, an unprivileged user can send command
messages to the container's monitor. For example:

$ lxc-ls -P /.../tests -f
NAME     STATE AUTOSTART GROUPS IPV4 IPV6 UNPRIVILEGED
ctr -     0         -      -    -    false
$ sudo lxc-ls -P /.../tests -f
NAME     STATE   AUTOSTART GROUPS IPV4      IPV6 UNPRIVILEGED
ctr RUNNING 0         -      10.0.3.51 -    false

After this change:

$ lxc-ls -P /.../tests -f      <-------- No more display without root privileges
$ sudo lxc-ls -P /.../tests -f
NAME     STATE   AUTOSTART GROUPS IPV4      IPV6 UNPRIVILEGED
ctr RUNNING 0         -      10.0.3.37 -    false
$
Signed-off-by: Rachid Koucha <rachid.koucha@gmail.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

cd2ca8a1

New --bbpath option and unecessary --rootfs checks · 46dde527

authored May 10, 2019

. Add the "--bbpath" option to pass an alternate busybox pathname instead of the one found from ${PATH}.
. Take this opportunity to add some formatting in the usage display
. As a try is done to pick rootfs from the config file and set it to ${path}/rootfs, it is unnecessary to make it mandatory
Signed-off-by: Rachid Koucha <rachid.koucha@gmail.com>

46dde527

coding style: update · 4e6bfc48
Christian Brauner authored May 10, 2019
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
4e6bfc48

Redirect error messages to stderr · dcf5c826

authored May 10, 2019

Some error messages were not redirected to stderr.
Moreover, do "exit 0" instead of "exit 1" when "help" option is passed.
Signed-off-by: Rachid Koucha <rachid.koucha@gmail.com>

dcf5c826

start: use CLONE_PIDFD · e77c83f6

authored May 09, 2019

Use CLONE_PIDFD when possible.

Note the clone() syscall ignores unknown flags which is usually a design
mistake. However, for us this bug is a feature since we can just pass the flag
along and see whether the kernel has given us a pidfd.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

e77c83f6

network: Restores phys device MTU on container shutdown · 3ef7f2c0

authored May 09, 2019

The phys devices will now have their original MTUs recorded at start and restored at shutdown.

This is to protect the original phys device from having any container level MTU customisation being applied to the device once it is restored to the host.
Signed-off-by: Thomas Parrott <thomas.parrott@canonical.com>

3ef7f2c0

namespace: support CLONE_PIDFD with lxc_clone() · 463334b7
Christian Brauner authored May 09, 2019
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
463334b7
network: Adds mtu support for phys and macvlan types · ded425a6
Thomas Parrott authored May 09, 2019
```
Signed-off-by: Thomas Parrott <thomas.parrott@canonical.com>
```
ded425a6

clone: add infrastructure for CLONE_PIDFD · df5644f3

authored May 09, 2019

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=eac7078a0fff1e72cf2b641721e3f55ec7e5e21eSigned-off-by: Christian Brauner <christian.brauner@ubuntu.com>

df5644f3

raw_syscalls: simplify assembly · ceda5ac3

authored May 09, 2019

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Co-developed-by: David Howells <dhowells@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>

ceda5ac3

utils: improve switch_to_ns() · 47576a3f
Christian Brauner authored Mar 12, 2019
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
47576a3f
Devices created in rootfs instead of rootfs/dev · c9ecca07
Rachid Koucha authored May 07, 2019
```
Added /dev in the mknod commands.
Signed-off-by: Rachid Koucha <rachid.koucha@gmail.com>
```
c9ecca07

raw_syscalls: add initial support for pidfd_send_signal() · 4f464a77

authored May 06, 2019

Well, I added this syscall so we better use it. :)
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

4f464a77

compiler: add __returns_twice attribute · 84721447

authored May 04, 2019

The returns_twice attribute tells the compiler that a function may return more
than one time. The compiler will ensure that all registers are dead before
calling such a function and will emit a warning about the variables that may be
clobbered after the second return from the function. Examples of such functions
are setjmp and vfork. The longjmp-like counterpart of such function, if any,
might need to be marked with the noreturn attribute.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

84721447

tree-wide: make socket SOCK_CLOEXEC · 45760f62
Christian Brauner authored May 03, 2019
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
45760f62
namespaces: allow a pathname to a nsfd for namespace to share · 0dfb9453
Serge Hallyn authored May 01, 2019
```
Signed-off-by: Serge Hallyn <shallyn@cisco.com>
```
0dfb9453
seccomp: notifier fixes · 7b0aa99b
Christian Brauner authored May 01, 2019
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
7b0aa99b
network: Fixes bug in macvlan mode selection · a533ec46
tomponline authored Apr 29, 2019
```
Signed-off-by: tomponline <thomas.parrott@canonical.com>
```
a533ec46
tests: Updates .gitignore to ignore test build artefacts · 1350fc84
tomponline authored Apr 29, 2019
```
Signed-off-by: tomponline <thomas.parrott@canonical.com>
```
1350fc84
network: Fixes vlan hook script · 0fef58cf
tomponline authored Apr 29, 2019
```
Signed-off-by: tomponline <thomas.parrott@canonical.com>
```
0fef58cf