Commits · 312953a70bc45ff7f23d8a6f8d3bf18f4bf500ca · Chen Yisong / lxc

22 Nov, 2016 15 commits

cgroup: improve isolcpus handling · 312953a7

authored Nov 21, 2016

- add more logging
- only write to cpuset.cpus if we really have to
- simplify cleanup on error and success
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>

312953a7

namespace: always attach to user namespace first · 1d4a1733

authored Nov 20, 2016

Move the user namespace at the first position in the array so that we always
attach to it first when iterating over the struct and using setns() to switch
namespaces. This especially affects lxc_attach(): Suppose you cloned a new user
namespace and mount namespace as an unprivileged user on the host and want to
setns() to the mount namespace. This requires you to attach to the user
namespace first otherwise the kernel will fail this check:

    if (!ns_capable(mnt_ns->user_ns, CAP_SYS_ADMIN) ||
        !ns_capable(current_user_ns(), CAP_SYS_CHROOT) ||
        !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
    	return -EPERM;

in

    linux/fs/namespace.c:mntns_install().
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>

1d4a1733

attach: use ns_info[LXC_NS_MAX] struct · b3677ba8

authored Nov 20, 2016

Using custom structs in attach.c risks getting out of sync with the commonly
used ns_info[LXC_NS_MAX] struct and thus attaching to wrong namespaces. Switch
to using ns_info[LXC_NS_MAX].
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>

b3677ba8

attach, utils: bugfixes · f23504af

authored Nov 19, 2016

- simply check /proc/self/ns
- improve SYSERROR() report
- use #define to prevent gcc & clang to use a VLA
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>

f23504af

start, namespace: move ns_info to namespace.{c,h} · a2f2695a

authored Oct 31, 2016

It's much more appropriate there and makes start.{c,h} cleaner and leaner.
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>

a2f2695a

start, error: improve log + non-functional changes · c6677625

authored Oct 29, 2016

Improve log and comments in a bunch of places to make it easier for us on bug
reports.
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>

c6677625

start, utils: improve preserve_ns() · 1ba79bfe

authored Oct 29, 2016

- Allocating an error message that the caller must free seems pointless. We can
  just print the error message in preserve_ns() itself. This also allows us to
  avoid using the GNU extension asprintf().
- Improve lxc_preserve_ns(): By passing in NULL or "" as the second argument
  the function can now also be used to check whether namespaces are supported
  by the kernel.
- Use lxc_preserve_ns() in preserve_ns().
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>

1ba79bfe

conf, start: be smarter when deleting networks · 74ba4120

authored Oct 28, 2016

- So far we blindly called lxc_delete_network() to make sure that we deleted
  all network interfaces. This resulted in pointless netlink calls, especially
  when a container had multiple networks defined. Let's be smarter and have
  lxc_delete_network() return a boolean that indicates whether *all* configured
  networks have been deleted. If so, don't needlessly try to delete them again
  in start.c. This also decreases confusing error messages a user might see.

- When we receive -ENODEV from one of our lxc_netdev_delete_*() functions,
  let's assume that either the network device already got deleted or that it
  got moved to a different network namespace. Inform the user about this but do
  not report an error in this case.

- When we have explicitly deleted the host side of a veth pair let's
  immediately free(priv.veth_attr.pair) and NULL it, or
  memset(priv.veth_attr.pair, ...) the corresponding member so we don't
  needlessly try to destroy them again when we have to call
  lxc_delete_network() again in start.c
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>

74ba4120

conf: explicitly remove veth device from host · c7dc0721
Christian Brauner authored Oct 27, 2016
```
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>
```
c7dc0721
conf, start: improve log output · ed8e8611
Christian Brauner authored Oct 27, 2016
```
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>
```
ed8e8611
lxc_user_nic: use lxc_preserve_ns() · 671c3c49
Christian Brauner authored Oct 28, 2016
```
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>
```
671c3c49
attach: use lxc_preserve_ns() · 377d0119
Christian Brauner authored Oct 28, 2016
```
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>
```
377d0119
conf: use lxc_preserve_ns() · e437b4ba
Christian Brauner authored Oct 27, 2016
```
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>
```
e437b4ba
start: add netnsfd to lxc_handler · e9f7729e
Christian Brauner authored Oct 27, 2016
```
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>
```
e9f7729e

utils: add lxc_preserve_ns() · c1ff672f

authored Oct 27, 2016

This allows to retrieve a file descriptor referring to a namespace.
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>

c1ff672f

17 Nov, 2016 25 commits

cgroups: prevent segfault in cgfsng · d3795ab5

authored Nov 16, 2016

When we set LXC_DEBUG_CGFSNG=1 we print out info about detected cgroup
hierarchies. When there's no named cgroup mounted we need to make sure that we
don't try to index an unallocated pointer.
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>

d3795ab5

lxc-checkpoint: automatically detect if --external or --veth-pair · 85031ca0

authored Nov 15, 2016

With the criu release 2.8 criu deprecated the --veth-pair command-line
option in favor of --external:

f2037e6 veth: Make --external support --veth-pair

git tag --contains f2037e6d3445fc400
v2.8

With this commit lxc-checkpoint will automatically switch between
the new and old command-line option dependent on the detected
criu version.

For criu version older than 2.8 something like this will be used:

  --veth-pair eth0=vethYOK6RW@lxcbr0

and starting with criu version 2.8 it will look like this:

  --external veth[eth0]:vethCRPEYL@lxcbr0
Signed-off-by: Adrian Reber <areber@redhat.com>

85031ca0

cgroups: use %zu format specifier to print size_t · bf5174e0
Christian Brauner authored Nov 15, 2016
```
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>
```
bf5174e0

debian: Don't depend on libui-dialog-perl · 8da006e4

authored Nov 14, 2016

This package doesn't exist in stretch anymore, and it's unclear why we
were depending on a library to begin with (as opposed to having it
brought by whatever needs it).
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>

8da006e4

conf: do not use %m format specifier · 134bceb3

authored Nov 13, 2016

This is a GNU extension and some libcs might be missing it.
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>

134bceb3

install bash completion where pkg-config tells us to · 50066905
Evgeni Golov authored Nov 12, 2016
```
Signed-off-by: Evgeni Golov <evgeni@debian.org>
```
50066905
add lxc.egg-info to gitignore · 991c1b95
Evgeni Golov authored Nov 12, 2016
```
Signed-off-by: Evgeni Golov <evgeni@debian.org>
```
991c1b95
also stop lxc-net in runlevels 0 and 6 · d2b51fd1
Evgeni Golov authored Nov 12, 2016
```
there is no reason to not do this :)
Signed-off-by: Evgeni Golov <evgeni@debian.org>
```
d2b51fd1
cgroups: skip v2 hierarchy entry · dafe5349
Christian Brauner authored Nov 11, 2016
```
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>
```
dafe5349

templates: add squashfs support to lxc-ubuntu-cloud.in · 26312a76

authored Nov 10, 2016

Add squashfs format file support for lxc-ubuntu-cloud.in
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>

26312a76

Update Ubuntu release name: add zesty and remove wily · 3a5495cf

authored Nov 09, 2016

Add zesty to KNOWN_RELEASES
Remove EOL wily from KNOWN_RELEASES
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>

3a5495cf

cgroups: remove isolated cpus from cpuset.cpus · b50cf4ac

authored Nov 06, 2016

In case the system was booted with

    isolcpus=n_i-n_j,n_k,n_m

we cannot simply copy the cpuset.cpus file from our parent cgroup. For example,
in the root cgroup cpuset.cpus will contain all of the cpus including the
isolated cpus. Copying the values of the root cgroup into a child cgroup will
lead to a wrong view in /proc/self/status: For the root cgroup
/sys/fs/cgroup/cpuset /proc/self/status will correctly show

    Cpus_allowed_list:      0-1,3

even though cpuset.cpus will show

    0-3

However, initializing a subcgroup in the cpuset controller by copying the
cpuset.cpus setting from the root cgroup will cause /proc/self/status to
incorrectly show

    Cpus_allowed_list:      0-3

Hence, we need to make sure to remove the isolated cpus from cpuset.cpus. Seth
has argued that this is not a kernel bug but by design. So let us be the smart
guys and fix this in liblxc.

The solution is straightforward: To avoid having to work with raw cpulist
strings we create cpumasks based on uint32_t bit arrays.
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>

b50cf4ac

utils: add lxc_append_string() · 798ee9ba

authored Nov 06, 2016

lxc_append_string() appends strings without separator. This is mostly useful
for reading in whole files line-by-line.
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>

798ee9ba

create symlink for /var/run · 57af0c7a

authored Nov 08, 2016

this patch create /var/run link to point to /run.

This will fix various issue present when /var/run is persistent.
Signed-off-by: Marc Gariepy <gariepy.marc@gmail.com>

57af0c7a

start: CLONE_NEWCGROUP after we have setup cgroups · 20c16a76

authored Nov 03, 2016

If we do it earlier we end up with a wrong view of /proc/self/cgroup. For
example, assume we unshare(CLONE_NEWCGROUP) first, and then create the cgroup
for the container, say /sys/fs/cgroup/cpuset/lxc/c, then /proc/self/cgroup
would show us:

     8:cpuset:/lxc/c

whereas it should actually show

     8:cpuset:/
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>

20c16a76

c/r: check state before doing a checkpoint/restore · 5048abad

authored Nov 03, 2016

This would already fail, but with a not-as-good error message. Let's make
the error better.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>

5048abad

c/r: fix off-by-one error · 87a06d9d

authored Nov 02, 2016

When we read sizeof(buf) bytes here, we'd write off the end of the array,
which is bad :)
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>

87a06d9d

c/r: remove extra \ns from logs · d0a4b88c

authored Nov 02, 2016

The macros put a \n in for us, so let's not put another one in.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>

d0a4b88c

c/r: save criu's stdout during dump too · 31348e68

authored Nov 01, 2016

This also allows us to commonize some bits of the dup2 code.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>

31348e68

conf: merge network namespace move & rename on shutdown · 27866a41

authored Aug 17, 2016

On shutdown we move physical network interfaces back to the
host namespace and rename them afterwards as well as in the
later lxc_network_delete() step. However, if the device had
a name which already exists in the host namespace then the
moving fails and so do the subsequent rename attempts. When
the namespace ceases to exist the devices finally end up
in the host namespace named 'dev<ID>' by the kernel.

In order to avoid this, we do the moving and renaming in a
single step (lxc_netdev_move_by_*()'s move & rename happen
in a single netlink transaction).
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>

27866a41

log: bump LXC_LOG_BUFFER_SIZE to 4096 · bc2250ff

authored Oct 31, 2016

We need to log longer lines due to CRIU arguments.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>

bc2250ff

c/r: explicitly emit bind mounts as criu arguments · 1fe2570e

authored Oct 31, 2016

We switched to --ext-mount-map auto because of "system" (liblxc) added
mounts like the cgmanager socket that weren't in the config file. This had
the added advantage that we could drop all the mount processing code,
because we no longer needed an --ext-mount-map argument.

The problem here is that mounts can move between hosts. While
--ext-mount-map auto does its best to detect this situation, it explicitly
disallows moves that change the path name. In LXD, we bind mount
/var/lib/lxd/shmounts/$container to /dev/.lxd-mounts for each container,
and so when a container is renamed in a migration, the name changes.
--ext-mount-map auto won't detect this, and so the migration fails.

We *could* implement mount rewriting in CRIU, but my experience with cgroup
and apparmor rewriting is that this is painful and error prone. Instead, it
is much easier to go back to explicitly listing --ext-mount-map arguments
from the config file, and allow the source of the bind to change. We leave
--ext-mount-map auto to catch any stragling (or future) system added
mounts.

I believe this should fix Launchpad Bug 1580765
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>

1fe2570e

tools: use correct exit code for lxc-stop · 037f33c4

authored Oct 30, 2016

When the container is already running our manpage promises to exit with 2.
Let's make it so.
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>

037f33c4

cgfs: explicitly check for NULL · 45aec6a1

authored Oct 30, 2016

Somehow this implementation of a cgroupfs backend decided to use the hierarchy
numbers it detects in /proc/cgroups and /proc/self/cgroups as indices for
the hierarchy struct. Controller numbering usually starts at 1 but may start at
0 if:

    a) the controller is not mounted on a cgroups v1 hierarchy;
    b) the controller is bound to the cgroups v2 single unified hierarchy; or
    c) the controller is disabled

To avoid having to rework our fallback backend significantly, we should
explicitly check for each controller if hierarchy[i] != NULL.
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>

45aec6a1

cgfs: skip empty entries under /proc/self/cgroup · 613fe8e9

authored Oct 30, 2016

If cgroupv2 is enabled either alone or together with legacy hierarchies
/proc/self/cgroup can contain entries of the form:

        0::/

These entries need to be skipped.
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>

613fe8e9