- 31 Aug, 2017 6 commits
-
-
Christian Brauner authored
- On unprivileged veth network creation have lxc-user-nic send the names of the veth devices and their respective ifindeces. The advantage of retrieving this information from lxc-user-nic is that we spare us sending around more stuff via the netpipe in start.c. Also, lxc-user-nic operates in both namespaces (the container's namespace and the hosts's namespace) via setns and so is guaranteed to retrieve the correct ifindex via if_nametoindex() which is an network namespace aware ioctl() call. While I'm pretty sure the ifindeces for veth devices are identical across network namespaces I'm weary to rely on this. We need the ifindexes to guarantee safe deletion of unprivileged network devices via lxc-user-nic later on since we use them to identify the network devices in their corresponding network namespaces. - Move the network device logging from the child to the parent. The child does not have all of the information about the network devices available only the few bits it actually needs to now. The monitor process is the only process that needs all this information. - The network creation code for privileged and unprivileged networks was previously mangled into one single function but at the same time some of the privileged code had additional functions that were called in other places in start.c. Let's divide and conquer and split out the privileged and unprivileged network creation into completely separate functions. This makes what's happening way more clear. This will also have no performance impact since either you are privileged and only execute the privileged network creation functions or you are unprivileged and only execute the unprivileged network creation functions. Signed-off-by:Christian Brauner <christian.brauner@ubuntu.com>
-
Christian Brauner authored
Signed-off-by:Christian Brauner <christian.brauner@ubuntu.com>
-
Christian Brauner authored
This is menial work but I'll thank myself later... a lot. Signed-off-by:Christian Brauner <christian.brauner@ubuntu.com>
-
Christian Brauner authored
We should not just record the ifindex for the container's veth device but also for the host's veth device. This is useful when {configuring,deconfiguring} veth devices and becomes crucial when calling our lxc-user-nic setuid helper where we rely on the ifindex to make decisions about whether we are licensed to perform certain operations on the veth device in question. Signed-off-by:Christian Brauner <christian.brauner@ubuntu.com>
-
Christian Brauner authored
If the user specified lxc.net.[i].veth.pair attribute to request that the host side of a veth pair be given a specific name let's log it at the trace level. Otherwise, if the user didn't not specify lxc.net.[i].veth.pair veth_attr.veth1 will contain the name of the host side veth device. Signed-off-by:Christian Brauner <christian.brauner@ubuntu.com>
-
Christian Brauner authored
When lxc-user-nic is called with the "delete" subcommand we need to make sure that we are actually privileged over the network namespace for which we are supposed to delete devices on the host. To this end we require that path to the affected network namespace is passed. We then setns() to the network namespace and drop privilege to the caller's real user id. Then we try to delete the loopback interface which is not possible. If we are privileged over the network namespace this operation will fail with ENOTSUP. If we are not privileged over the network namespace we will get EPERM. This is the first part of the commit. As of now nothing guarantees that the caller does not just give us a random path to a network namespace it is privileged over. Signed-off-by:Christian Brauner <christian.brauner@ubuntu.com>
-
- 30 Aug, 2017 3 commits
-
-
Stéphane Graber authored
Revert "cgfsng: try to delete parent cgroups"
-
Christian Brauner authored
set_config_string_item() already free()s before setting the new value. Signed-off-by:Christian Brauner <christian.brauner@ubuntu.com>
-
Christian Brauner authored
This reverts commit 92c590ae. Problem: Commit 92c590ae introduced the following behavior: > cgfsng: try to delete parent cgroups > > Say we have > > lxc.uts.name = c1 > lxc.cgroup.dir = lxd/a/b/c > > the path for the container's cgroup would be > > lxd/a/b/c/c1 > > When the container is shutdown we should not just try to delete "c1" we > should also try to delete "c", "b", "a", and "lxd". This is to ensure > that we don't leave empty cgroups around thereby increasing the chance > that we run into trouble with cgroup limits. The algorithm for this isn't > too costly since we can simply stop walking upwards at the first rmdir() > failure. The algorithm employs recursive_destroy() which opens each directory specified in lxc.cgroup.dir and tries to delete each directory within that directory. For example, assume "/sys/fs/cgroup/memory/lxd/a/b/c" only contains the cgroup "c1" for container "c1". Assume that "c1" calls recursive_destroy() to cleanup it's cgroups. It will first delete "c1" and anything underneath it. This is perfectly fine since anything underneath that cgroup is under its control. The new algorithm will then tell it to "recurse upwards". So recursive_destroy() will try to delete "/sys/fs/cgroup/lxd/a/b/c" next. Now assume that a second container "c2" has "lxc.cgroup.dir = lxd/a/b/c" set in its config file and calls cgroup_create(). This will create the *empty* cgroup "/sys/fs/cgroup/memory/lxd/a/b/c/c2". Now assume that after having created "c2" container "c1"'s call to recursive_destroy() reaches "/sys/fs/cgroup/memory/lxd/a/b/c/c2" before it is populated. Then the cgroup "c2" will be removed. Now "c2" calls cgroup_enter() to enter its created cgroup. This will fail since c1 deleted the cgroup "c2". (As a sidenote: This is in the set of the few race conditions that are actually easy to describe.) Possible Solution: Instead of calling recursive_destroy() on all cgroups specified in lxc.cgroup.dir we only call recursive_destroy() on the container's own cgroup "/sys/fs/cgroup/memory/lxd/a/b/c/c1". When we start to recurse upwards we only call unlinkat(AT_FDCWD, path, AT_REMOVEDIR). This should avoid the race described above. My argument is as follows. Assume that the container c1 has created the cgroup "/sys/fs/cgroup/lxd/a/b/c/c1" for itself. Now c1 calls cgroup_destroy(). First, recursive_destroy() will be called on the cgroup "c1" which will delete any emtpy cgroup directories underneath "c1" and finally "c1" itself. This is fine since everything under "c1" is the container's c1 sole property. Now container c1 will call unlinkat() on "/sys/fs/cgroup/memory/lxd/a/b/c/c1": - Assume that in the meantime container c2 has created the cgroup "/sys/fs/cgroup/memory/lxd/a/b/c/c2". Then c1's unlinkat() will fail. This will stop c1 from recursing upwards. So c2's cgroup_enter() call will find all its cgroups intact and well. unlinkat() will come with the appropriate in-kernel locking which will stop it from racing with mkdir(). - There's still a subtle race left. c2 might be calling an implementation of mkdir -p to try and create e.g. the cgroup "/sys/fs/cgroup/memory/lxd/a/b". Let's assume "b" exists then c2 will receive EEXIST on "b" and move on to create "c". Let's further assume c1 has already deleted "c". c1 will now be able to delete "/sys/fs/cgroup/memory/lxd/a/b/" and c2's call to create "c" will fail. The latter subtle race makes me rethink this approach. For now we'll just leave empty cgroups behind since I don't want to start locking stuff. Signed-off-by:
Christian Brauner <christian.brauner@ubuntu.com>
-
- 29 Aug, 2017 3 commits
-
-
Serge Hallyn authored
further lxc 2.1 preparations
-
Christian Brauner authored
templates/ubuntu: conditionally move upstart ssh job, as it is now op…
-
Dimitri John Ledkov authored
Mimic the code from the debian template. Signed-off-by:Dimitri John Ledkov <xnox@ubuntu.com>
-
- 28 Aug, 2017 7 commits
-
-
Christian Brauner authored
This moves all of the network handling code into network.{c,h}. This makes what is going on much clearer. Also it's easier to find relevant code if it is all in one place. Signed-off-by:Christian Brauner <christian.brauner@ubuntu.com>
-
Christian Brauner authored
This will allow us log more detailed failures. Signed-off-by:Christian Brauner <christian.brauner@ubuntu.com>
-
Christian Brauner authored
Signed-off-by:Christian Brauner <christian.brauner@ubuntu.com>
-
Christian Brauner authored
Signed-off-by:Christian Brauner <christian.brauner@ubuntu.com>
-
Christian Brauner authored
Signed-off-by:Christian Brauner <christian.brauner@ubuntu.com>
-
Stéphane Graber authored
lxc-update-config: handle legacy networks
-
Christian Brauner authored
Older instances of liblxc allowed to specify networks like this: lxc.network.type = veth lxc.network.flags = up lxc.network.link = lxdbr0 lxc.network.name= eth0 lxc.network.type = veth lxc.network.flags = up lxc.network.link = lxdbr0 lxc.network.name = eth1 Each occurrence of "lxc.network.type" indicated the definition of a new network. This syntax is not allowed in newer liblxc instances. Instead, network must carry an index. So in new liblxc these two networks would be translated to: lxc.net.0.type = veth lxc.net.0.flags = up lxc.net.0.link = lxdbr0 lxc.net.0.name= eth0 lxc.net.1.type = veth lxc.net.1.flags = up lxc.net.1.link = lxdbr0 lxc.net.1.name = eth1 The update script did not handle this case correctly. It should now. Signed-off-by:Christian Brauner <christian.brauner@ubuntu.com>
-
- 27 Aug, 2017 13 commits
-
-
Christian Brauner authored
Signed-off-by:Christian Brauner <christian.brauner@ubuntu.com>
-
Christian Brauner authored
We use the ifindex as an indicator that liblxc created the network so let's record it for the unprivileged case as well. Signed-off-by:Christian Brauner <christian.brauner@ubuntu.com>
-
Christian Brauner authored
This should make things a little less convoluted. Signed-off-by:Christian Brauner <christian.brauner@ubuntu.com>
-
Christian Brauner authored
- lxc-user-nic gains the subcommands {create,delete} - dup2() STDERR_FILENO as well so that we can show helpful messages in our logs on failure - initialize output buffer so that we don't print garbage Signed-off-by:Christian Brauner <christian.brauner@ubuntu.com>
-
Christian Brauner authored
Signed-off-by:Christian Brauner <christian.brauner@ubuntu.com>
-
Christian Brauner authored
Signed-off-by:Christian Brauner <christian.brauner@ubuntu.com>
-
Christian Brauner authored
get_new_nicname() calls lxc_mkifname() which allocates memory and returns it to the caller. The way get_new_nicname() and get_nic_if_avail() were implemented they hid that fact by returning a boolean. That doesn't make sense. Let's rather have them return a pointer to the allocated nic name which the caller needs to free. Signed-off-by:Christian Brauner <christian.brauner@ubuntu.com>
-
Christian Brauner authored
Signed-off-by:Christian Brauner <christian.brauner@ubuntu.com>
-
Christian Brauner authored
Say we have lxc.uts.name = c1 lxc.cgroup.dir = lxd/a/b/c the path for the container's cgroup would be lxd/a/b/c/c1 When the container is shutdown we should not just try to delete "c1" we should also try to delete "c", "b", "a", and "lxd". This is to ensure that we don't leave empty cgroups around thereby increasing the chance that we run into trouble with cgroup limits. The algorithm for this isn't too costly since we can simply stop walking upwards at the first rmdir() failure. Signed-off-by:Christian Brauner <christian.brauner@ubuntu.com>
-
Christian Brauner authored
Say we have lxc.uts.name = c1 lxc.cgroup.dir = lxd the actual path should be lxd/c1 Right now it would just be lxd Signed-off-by:Christian Brauner <christian.brauner@ubuntu.com>
-
Christian Brauner authored
Signed-off-by:Christian Brauner <christian.brauner@ubuntu.com>
-
Christian Brauner authored
Signed-off-by:Christian Brauner <christian.brauner@ubuntu.com>
-
Christian Brauner authored
Signed-off-by:Christian Brauner <christian.brauner@ubuntu.com>
-
- 25 Aug, 2017 8 commits
-
-
Stéphane Graber authored
further lxc 2.1 preparations
-
Christian Brauner authored
Signed-off-by:Christian Brauner <christian.brauner@ubuntu.com>
-
Christian Brauner authored
Signed-off-by:Christian Brauner <christian.brauner@ubuntu.com>
-
Christian Brauner authored
Signed-off-by:Christian Brauner <christian.brauner@ubuntu.com>
-
Christian Brauner authored
Signed-off-by:Christian Brauner <christian.brauner@ubuntu.com>
-
Christian Brauner authored
This will obviously not work. Signed-off-by:Christian Brauner <christian.brauner@ubuntu.com>
-
Christian Brauner authored
I'm ashamed at how aweful my previous code was. Signed-off-by:Christian Brauner <christian.brauner@ubuntu.com>
-
Christian Brauner authored
So far, when creating veth devices attached to openvswitch bridges we used to fork() off a thread on container startup. This thread was kept around until the container shut down. I have no good explanation why we did it that why but it's certainly not necessary. Instead, let's fork() off the thread on container shutdown to delete the veth. Signed-off-by:Christian Brauner <christian.brauner@ubuntu.com>
-