Commits · d75c14e262cce3d5d200fc5a0c9e502d6301fa91 · Chen Yisong / lxc

03 Sep, 2017 1 commit
- utils: add lxc_nic_exists() · d75c14e2
  Christian Brauner authored Sep 03, 2017
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
  d75c14e2
02 Sep, 2017 2 commits

lxc-user-nic: keep lines from other {users,links} · 32311345

authored Sep 02, 2017

Assume the db contained the following entries:

    chb veth lxcbr0 veth1
    chb veth lxcbr0 veth2
    chb veth lxdbr0 veth3
    chb veth lxdbr0 veth2
    didi veth lxcbr0 veth4

And you request

    cull_entries("chb", "veth", "lxdbr0", "veth3");

lxc-user-nic would wipe any entries that did not match irrespective of whether
they existed or not. Let's fix that.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

32311345

lxc-user-nic: fix adding database entries · a92028b2

authored Sep 02, 2017

The code before inserted \0-bytes after every new line which made the db
basically unusable.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

a92028b2

01 Sep, 2017 7 commits

network: remove netpipe · 7ab1ba02

authored Sep 01, 2017

We use data_sock for all things we need to send around between parent and child
now. It doesn't make sense to have so many different pipes and sockets if one
will do just fine.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

7ab1ba02

network: use correct network device name · 8843fde4
Christian Brauner authored Sep 01, 2017
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
8843fde4

network: stop recording saved physical net devices · b809f232

authored Sep 01, 2017

liblxc will now correctly log any network device names and ifindeces in their
respective network namespaces. So there's no need to record physical network
devices any more. This spares us heap allocations and memory we need to have
lying around til the container is shutdown.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

b809f232

network: retrieve correct names and ifindices · 790255cf

authored Sep 01, 2017

On privileged network creation we only retrieved the names and ifindeces of
network devices in the host's network namespace. This meant that the monitor
process was acting on possibly incorrect information. With this commit we have
the child send back the correct device names and ifindeces in the container's
network namespace.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

790255cf

start: non-functional changes · c6012571

authored Sep 01, 2017

This renames the socketpair() variable "ttysock" to "data_sock" since we will
use it to send arbitrary data around, not just ttys anymore.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

c6012571

network: non-functional changes · 535e8859
Christian Brauner authored Sep 01, 2017
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
535e8859

network: use static memory for net device names · de4855a8

authored Sep 01, 2017

All network devices can only be of size < IFNAMSIZ. So let's spare the useless
heap allocations and use static memory.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

de4855a8

31 Aug, 2017 11 commits

lxc-user-nic: initialize vars to silence gcc-7 · 99573f4a
Christian Brauner authored Aug 31, 2017
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
99573f4a

lxc-user-nic: free memory and check for error · 8424b4e1

authored Aug 31, 2017

- check for error on ifindex retrieval
- free allocated memory
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

8424b4e1

start: non-functional changes · d0b915aa
Christian Brauner authored Aug 31, 2017
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
d0b915aa

network: retrieve the host's veth device ifindex · 8da62485

authored Aug 31, 2017

- Retrieve the host's veth device ifindex in the host's network namespace.
- Add a note why we retrieve the container's veth device ifindex in the host's
  network namespace.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

8da62485

Merge pull request #1772 from brauner/2017-08-31/ensure_lxc_user_nic_tests_privilege_over_netns · 94a182af
Serge Hallyn authored Aug 31, 2017
```
lxc-user-nic: test privilege over netns on delete
```
94a182af

network: rework network creation · 74c6e2b0

authored Aug 31, 2017

- On unprivileged veth network creation have lxc-user-nic send the names of the
  veth devices and their respective ifindeces. The advantage of retrieving this
  information from lxc-user-nic is that we spare us sending around more stuff
  via the netpipe in start.c. Also, lxc-user-nic operates in both namespaces
  (the container's namespace and the hosts's namespace) via setns and so is
  guaranteed to retrieve the correct ifindex via if_nametoindex() which is an
  network namespace aware ioctl() call. While I'm pretty sure the ifindeces for
  veth devices are identical across network namespaces I'm weary to rely on
  this. We need the ifindexes to guarantee safe deletion of unprivileged
  network devices via lxc-user-nic later on since we use them to identify the
  network devices in their corresponding network namespaces.
- Move the network device logging from the child to the parent. The child does
  not have all of the information about the network devices available only the
  few bits it actually needs to now. The monitor process is the only process
  that needs all this information.
- The network creation code for privileged and unprivileged networks was
  previously mangled into one single function but at the same time some of the
  privileged code had additional functions that were called in other places in
  start.c. Let's divide and conquer and split out the privileged and
  unprivileged network creation into completely separate functions. This makes
  what's happening way more clear. This will also have no performance impact
  since either you are privileged and only execute the privileged network
  creation functions or you are unprivileged and only execute the unprivileged
  network creation functions.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

74c6e2b0

network: log ifindex for host side veth device · d952b351
Christian Brauner authored Aug 31, 2017
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
d952b351

network: document all fields in struct lxc_netdev · 085bb443

authored Aug 31, 2017

This is menial work but I'll thank myself later... a lot.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

085bb443

network: add ifindex field for host veth device · 4239e9c3

authored Aug 31, 2017

We should not just record the ifindex for the container's veth device but also
for the host's veth device. This is useful when {configuring,deconfiguring}
veth devices and becomes crucial when calling our lxc-user-nic setuid helper
where we rely on the ifindex to make decisions about whether we are licensed to
perform certain operations on the veth device in question.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

4239e9c3

network: log veth_attr.pair and veth_attr.veth1 · 8ce727fc

authored Aug 31, 2017

If the user specified lxc.net.[i].veth.pair attribute to request that the host
side of a veth pair be given a specific name let's log it at the trace level.
Otherwise, if the user didn't not specify lxc.net.[i].veth.pair veth_attr.veth1
will contain the name of the host side veth device.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

8ce727fc

lxc-user-nic: test privilege over netns on delete · 1bd8d726

authored Aug 31, 2017

When lxc-user-nic is called with the "delete" subcommand we need to make sure
that we are actually privileged over the network namespace for which we are
supposed to delete devices on the host. To this end we require that path to the
affected network namespace is passed. We then setns() to the network namespace
and drop privilege to the caller's real user id. Then we try to delete the
loopback interface which is not possible. If we are privileged over the network
namespace this operation will fail with ENOTSUP. If we are not privileged over
the network namespace we will get EPERM.

This is the first part of the commit. As of now nothing guarantees that the
caller does not just give us a random path to a network namespace it is
privileged over.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

1bd8d726

30 Aug, 2017 3 commits

Merge pull request #1769 from brauner/2017-08-30/improve_empty_cgroup_deletion · 70a49815
Stéphane Graber authored Aug 30, 2017
```
Revert "cgfsng: try to delete parent cgroups"
```
70a49815

confile: remove unnecessary cleanup code · cf7faeb3

authored Aug 30, 2017

set_config_string_item() already free()s before setting the new value.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

cf7faeb3

Revert "cgfsng: try to delete parent cgroups" · 308a6c94

authored Aug 30, 2017

This reverts commit 92c590ae.

Problem:

    Commit 92c590ae introduced the following
    behavior:

    > cgfsng: try to delete parent cgroups
    >
    > Say we have
    >
    >     lxc.uts.name = c1
    >     lxc.cgroup.dir = lxd/a/b/c
    >
    > the path for the container's cgroup would be
    >
    >     lxd/a/b/c/c1
    >
    > When the container is shutdown we should not just try to delete "c1" we
    > should also try to delete "c", "b", "a", and "lxd". This is to ensure
    > that we don't leave empty cgroups around thereby increasing the chance
    > that we run into trouble with cgroup limits. The algorithm for this isn't
    > too costly since we can simply stop walking upwards at the first rmdir()
    > failure.

    The algorithm employs recursive_destroy() which opens each directory
    specified in lxc.cgroup.dir and tries to delete each directory within that
    directory. For example, assume "/sys/fs/cgroup/memory/lxd/a/b/c" only
    contains the cgroup "c1" for container "c1". Assume that "c1" calls
    recursive_destroy() to cleanup it's cgroups. It will first delete "c1" and
    anything underneath it. This is perfectly fine since anything underneath
    that cgroup is under its control. The new algorithm will then tell it to
    "recurse upwards". So recursive_destroy() will try to delete
    "/sys/fs/cgroup/lxd/a/b/c" next. Now assume that a second container "c2"
    has "lxc.cgroup.dir = lxd/a/b/c" set in its config file and calls
    cgroup_create(). This will create the *empty* cgroup
    "/sys/fs/cgroup/memory/lxd/a/b/c/c2". Now assume that after having created
    "c2" container "c1"'s call to recursive_destroy() reaches
    "/sys/fs/cgroup/memory/lxd/a/b/c/c2" before it is populated. Then the
    cgroup "c2" will be removed. Now "c2" calls cgroup_enter() to enter its
    created cgroup. This will fail since c1 deleted the cgroup "c2". (As a
    sidenote: This is in the set of the few race conditions that are actually
    easy to describe.)

Possible Solution:

    Instead of calling recursive_destroy() on all cgroups specified in
    lxc.cgroup.dir we only call recursive_destroy() on the container's own
    cgroup "/sys/fs/cgroup/memory/lxd/a/b/c/c1". When we start to recurse
    upwards we only call unlinkat(AT_FDCWD, path, AT_REMOVEDIR). This should
    avoid the race described above. My argument is as follows. Assume that the
    container c1 has created the cgroup "/sys/fs/cgroup/lxd/a/b/c/c1" for
    itself. Now c1 calls cgroup_destroy(). First, recursive_destroy() will be
    called on the cgroup "c1" which will delete any emtpy cgroup directories
    underneath "c1" and finally "c1" itself. This is fine since everything
    under "c1" is the container's c1 sole property. Now container c1 will call
    unlinkat() on "/sys/fs/cgroup/memory/lxd/a/b/c/c1":
    - Assume that in the meantime container c2 has created the cgroup
      "/sys/fs/cgroup/memory/lxd/a/b/c/c2". Then c1's unlinkat() will fail.
      This will stop c1 from recursing upwards. So c2's cgroup_enter() call
      will find all its cgroups intact and well. unlinkat() will come with the
      appropriate in-kernel locking which will stop it from racing with
      mkdir().
    - There's still a subtle race left. c2 might be calling an implementation
      of mkdir -p to try and create e.g. the cgroup
      "/sys/fs/cgroup/memory/lxd/a/b". Let's assume "b" exists then c2 will
      receive EEXIST on "b" and move on to create "c". Let's further assume c1
      has already deleted "c". c1 will now be able to delete
      "/sys/fs/cgroup/memory/lxd/a/b/" and c2's call to create "c" will fail.

The latter subtle race makes me rethink this approach. For now we'll just leave
empty cgroups behind since I don't want to start locking stuff.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

308a6c94

29 Aug, 2017 3 commits
- Merge pull request #1761 from brauner/2017-08-10/further_lxc_2.1_preparations · 2e02bbdb
  Serge Hallyn authored Aug 29, 2017
```
further lxc 2.1 preparations
```
  2e02bbdb
- Merge pull request #1767 from xnox/upstart-ssh · 5965257b
  Christian Brauner authored Aug 29, 2017
```
templates/ubuntu: conditionally move upstart ssh job, as it is now op…
```
  5965257b
- templates/ubuntu: conditionally move upstart ssh job, as it is now optional. · 4a1bd8d6
  Dimitri John Ledkov authored Aug 29, 2017
```
Mimic the code from the debian template.
Signed-off-by: Dimitri John Ledkov <xnox@ubuntu.com>
```
  4a1bd8d6
28 Aug, 2017 7 commits

network: non-functional changes · 811ef482

authored Aug 28, 2017

This moves all of the network handling code into network.{c,h}. This makes what
is going on much clearer. Also it's easier to find relevant code if it is all
in one place.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

811ef482

conf: increase lxc-user-nic buffer · 89092815

authored Aug 27, 2017

This will allow us log more detailed failures.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

89092815

lxc-user-nic: check db before trying to delete · 8b8e00a2
Christian Brauner authored Aug 27, 2017
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
8b8e00a2
lxc-user-nic: non-functional changes · af256970
Christian Brauner authored Aug 27, 2017
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
af256970
network: delete ovs for unprivileged networks · a055595c
Christian Brauner authored Aug 27, 2017
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
a055595c
Merge pull request #1763 from brauner/2017-08-28/lxc_2.1_upgrade_script · b7ab9e86
Stéphane Graber authored Aug 28, 2017
```
lxc-update-config: handle legacy networks
```
b7ab9e86

lxc-update-config: handle legacy networks · 37694da4

authored Aug 28, 2017

Older instances of liblxc allowed to specify networks like this:

lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = lxdbr0
lxc.network.name= eth0

lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = lxdbr0
lxc.network.name = eth1

Each occurrence of "lxc.network.type" indicated the definition of a new
network. This syntax is not allowed in newer liblxc instances. Instead, network
must carry an index. So in new liblxc these two networks would be translated to:

lxc.net.0.type = veth
lxc.net.0.flags = up
lxc.net.0.link = lxdbr0
lxc.net.0.name= eth0

lxc.net.1.type = veth
lxc.net.1.flags = up
lxc.net.1.link = lxdbr0
lxc.net.1.name = eth1

The update script did not handle this case correctly. It should now.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

37694da4

27 Aug, 2017 6 commits

network: log ifindex · 7a582518
Christian Brauner authored Aug 27, 2017
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
7a582518

network: send ifindex for unpriv networks · 0cffb676

authored Aug 27, 2017

We use the ifindex as an indicator that liblxc created the network so let's
record it for the unprivileged case as well.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

0cffb676

lxc-user-nic: rework renaming net devices · c92dfebd

authored Aug 27, 2017

This should make things a little less convoluted.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

c92dfebd

conf: adapt to lxc-user-nic usage · 25aead3f

authored Aug 26, 2017

- lxc-user-nic gains the subcommands {create,delete}
- dup2() STDERR_FILENO as well so that we can show helpful messages in our logs
  on failure
- initialize output buffer so that we don't print garbage
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

25aead3f

tests: adapt lxc-user-nic tests to new syntax · f703d990
Christian Brauner authored Aug 27, 2017
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
f703d990
lxc-user-nic: add new {create,delete} subcommands · 900e5f94
Christian Brauner authored Aug 26, 2017
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
900e5f94