Commits · 2ed797da7e36d25ea3377494d8a7ec024140445f · Chen Yisong / lxc

05 Nov, 2017 1 commit
- Merge pull request #1884 from brauner/2017-10-28/move_tools_to_api_only · 2ed797da
  Serge Hallyn authored Nov 04, 2017
```
confile: add lxc.namespace.<namespace-key> + add user namespace sharing + rework start logic
```
  2ed797da
03 Nov, 2017 9 commits

conf: reap child in all cases · 686dd5d1
Christian Brauner authored Nov 01, 2017
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
686dd5d1
network: reap child in all cases · 6b9f82a9
Christian Brauner authored Nov 01, 2017
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
6b9f82a9

start: rework ns sharing + add userns sharing · fa3a5b22

authored Oct 29, 2017

- Implement inheriting user namespaces.
  - When inheriting user namespaces make sure to not try and map ids again. The
    kernel will not allow you to do this.
- Change clone() logic:
  1. If we inherit no namespaces simply call lxc_clone().
  2. If we inherit any namespaces call lxc_fork_attach_clone(). Here's why:
     - Causes one syscall (fork()) instead of two syscalls (setns() to
       inherited namespace and setns() back to parent namespace) to be
       performed.
     - Allows us to get rid of a bunch of variables and helper functions/code.
     - Sharing a user namespaces requires us to setns() to the inherited user
       namespace but the kernel does not allow reattaching to a parent user
       namespace. So the old logic made user namespace inheritance impossible.
       By using the lxc_fork_attach_clone() model we can simply setns() to the
       inherited user namespace in the fork()ed child and be done with it.
       The only thing we need to do is to specify CLONE_PARENT when calling
       clone() in lxc_fork_attach_clone() so that we can wait on the child.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

fa3a5b22

monitor: do not log useless warnings · 2469f9b6

authored Oct 29, 2017

lxc-monitord is deprecated so this is expected to fail.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

2469f9b6

start: close data socket in parent · a9e1109e

authored Oct 29, 2017

Brings the number of open fds in the monitor process for a standard container
without ttys down to 17.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

a9e1109e

confile: add lxc.namespace.<namespace-key> · 28d9e29e

authored Oct 28, 2017

This commit also gets rid of ~10 unnecessarily file descriptors that were kept
open. Before we kept open:

- A set of file descriptors that refer to the monitor's namespaces. These were
  only used to reattach to the monitor's namespace in lxc_spawn() and were
  never used anywhere else. So close them and don't keep them around.
- A list of inherited file descriptors.
- A list of file descriptors referring to the containers's namespaces to pass
  to lxc.hook.stop. This list duplicated inherited file descriptors.

Let's simply use a single list in the handler that has all file descriptors we
need and get rid of all other ones. As an illustration. Starting a container

1. Without this patch and looking at the fds that the monitor keeps open (26):

chb@conventiont|~
> ls -al /proc/27219/fd
total 0
dr-x------ 2 root root  0 Oct 29 14:30 .
dr-xr-xr-x 9 root root  0 Oct 29 14:30 ..
lrwx------ 1 root root 64 Oct 29 14:30 0 -> /dev/null
lrwx------ 1 root root 64 Oct 29 14:30 1 -> /dev/null
lrwx------ 1 root root 64 Oct 29 14:30 10 -> anon_inode:[signalfd]
lrwx------ 1 root root 64 Oct 29 14:30 11 -> /dev/ptmx
lrwx------ 1 root root 64 Oct 29 14:30 12 -> /dev/pts/10
lr-x------ 1 root root 64 Oct 29 14:30 13 -> net:[4026532553]
lrwx------ 1 root root 64 Oct 29 14:30 15 -> socket:[7909181]
lrwx------ 1 root root 64 Oct 29 14:30 16 -> socket:[7909182]
lr-x------ 1 root root 64 Oct 29 14:30 17 -> uts:[4026531838]
lr-x------ 1 root root 64 Oct 29 14:30 18 -> ipc:[4026531839]
lr-x------ 1 root root 64 Oct 29 14:30 19 -> net:[4026532009]
lrwx------ 1 root root 64 Oct 29 14:30 2 -> /dev/null
lr-x------ 1 root root 64 Oct 29 14:30 20 -> mnt:[4026532611]
lr-x------ 1 root root 64 Oct 29 14:30 21 -> pid:[4026532612]
lr-x------ 1 root root 64 Oct 29 14:30 22 -> uts:[4026532548]
lr-x------ 1 root root 64 Oct 29 14:30 23 -> ipc:[4026532549]
lr-x------ 1 root root 64 Oct 29 14:30 24 -> net:[4026532553]
l-wx------ 1 root root 64 Oct 29 14:30 3 -> /var/log/lxc/a1.log
lr-x------ 1 root root 64 Oct 29 14:30 4 -> uts:[4026532548]
lr-x------ 1 root root 64 Oct 29 14:30 5 -> ipc:[4026532549]
lr-x------ 1 root root 64 Oct 29 14:30 6 -> net:[4026532553]
lrwx------ 1 root root 64 Oct 29 14:30 7 -> anon_inode:[eventpoll]
lrwx------ 1 root root 64 Oct 29 14:30 9 -> socket:[7911594]

2. With this patch and looking at the fds that the monitor keeps open (19):

chb@conventiont|~
> ls -al /proc/28465/fd
total 0
dr-x------ 2 root root  0 Oct 29 14:31 .
dr-xr-xr-x 9 root root  0 Oct 29 14:31 ..
lrwx------ 1 root root 64 Oct 29 14:31 0 -> /dev/null
lrwx------ 1 root root 64 Oct 29 14:31 1 -> /dev/null
lr-x------ 1 root root 64 Oct 29 14:31 10 -> net:[4026532820]
lrwx------ 1 root root 64 Oct 29 14:31 12 -> socket:[7912349]
lrwx------ 1 root root 64 Oct 29 14:31 13 -> socket:[7912350]
lr-x------ 1 root root 64 Oct 29 14:31 14 -> mnt:[4026532611]
lr-x------ 1 root root 64 Oct 29 14:31 15 -> pid:[4026532813]
lr-x------ 1 root root 64 Oct 29 14:31 16 -> uts:[4026532612]
lr-x------ 1 root root 64 Oct 29 14:31 17 -> ipc:[4026532613]
lr-x------ 1 root root 64 Oct 29 14:31 18 -> net:[4026532820]
lrwx------ 1 root root 64 Oct 29 14:31 2 -> /dev/null
l-wx------ 1 root root 64 Oct 29 14:31 3 -> /var/log/lxc/a1.log
lrwx------ 1 root root 64 Oct 29 14:31 4 -> anon_inode:[signalfd]
lrwx------ 1 root root 64 Oct 29 14:31 5 -> /dev/ptmx
lrwx------ 1 root root 64 Oct 29 14:31 6 -> /dev/pts/10
lrwx------ 1 root root 64 Oct 29 14:31 7 -> anon_inode:[eventpoll]
lrwx------ 1 root root 64 Oct 29 14:31 9 -> socket:[7913041]

Relates to #1881.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

28d9e29e

handler: make name argument const · f0ecc19d

authored Oct 28, 2017

There's no obvious need to strdup() the name of the container in the handler.
We can simply make this a pointer to the memory allocated in
lxc_container_new().
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

f0ecc19d

start: close non-needed file descriptors · 6e5fc7a5
Christian Brauner authored Oct 29, 2017
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
6e5fc7a5

lxc-start: remove unnecessary checks · 4e4832ee

authored Oct 28, 2017

The console struct is internal and liblxc takes care of creating paths.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

4e4832ee

02 Nov, 2017 2 commits

Merge pull request #1896 from ffontaine/master · 190f9aee
Christian Brauner authored Nov 02, 2017
```
Fix compilation on toolchain without prlimit
```
190f9aee

Fix compilation on toolchain without prlimit · f48b5fd8

authored Nov 02, 2017

Some toolchains which are not bionic like uclibc does not support
prlimit or prlimit64. In this case, return an error.
Moreover, if prlimit64 is available, use lxc implementation of prlimit.
Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>

f48b5fd8

30 Oct, 2017 5 commits
- Merge pull request #1883 from brauner/2017-10-29/fix_namespace_inheritance_on_attach · 2ba9ef6c
  Serge Hallyn authored Oct 30, 2017
```
attach: correctly handle namespace inheritance
```
  2ba9ef6c
- Merge pull request #1875 from brauner/2017-10-27/tools_allow_undefined_containers · 82df9e1e
  Stéphane Graber authored Oct 30, 2017
```
tools: allow lxc-attach to undefined containers
```
  82df9e1e
- Merge pull request #1888 from brauner/2017-10-30/enable_cgfsng_cgroup_mounting · af949cc1
  Serge Hallyn authored Oct 30, 2017
```
cgroups: enable container without CAP_SYS_ADMIN
```
  af949cc1
- cgroups: enable container without CAP_SYS_ADMIN · b635e92d
  Christian Brauner authored Oct 30, 2017
```
In case cgroup namespaces are supported but we do not have CAP_SYS_ADMIN we
need to mount cgroups for the container. This patch enables both privileged and
unprivileged containers without CAP_SYS_ADMIN.

Closes #1737.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
  b635e92d
- cgfsng: fix cgroup2 detection · cdfe90a4
  Christian Brauner authored Oct 30, 2017
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
  cdfe90a4
29 Oct, 2017 1 commit

attach: correctly handle namespace inheritance · 299d1198

authored Oct 29, 2017

When attaching to a container's namespaces we did not handle the case where we
inherited namespaces correctly. In essence, liblxc on start records the
namespaces the container was created with in the handler. But it only records
the clone flags that were passed to clone() and doesn't record the namespaces
we e.g. inherited from other containers. This means that attach only ever
attached to the clone flags. But this is only correct if all other namespaces
not recorded in the handler refer to the namespaces of the caller. However,
this need not be the case if the container has inherited namespaces from
another container. To handle this case we need to check whether caller and
container are in the same namespace. If they are, we know that things are all
good. If they aren't then we need to attach to these namespaces as well.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

299d1198

28 Oct, 2017 3 commits

Merge pull request #1880 from terceiro/lxc-debian · cf13d107
Christian Brauner authored Oct 28, 2017
```
lxc-debian improvements
```
cf13d107

lxc-debian: don't hardcode valid releases · dba285d5

authored Oct 28, 2017

This avoids the dance of updating the list of valid releases every time
Debian makes a new release.

It also fixes the following bug: even though lxc-debian will default to
creating containers of the latest stable by querying the archive, it
won't allow you to explicitly request `stable` because the current list
of valid releases don't include it.

Last, but not least, avoid hitting the mirror in the case the desired
release is one of the ones we know will always be there, i.e. stable,
testing, sid, and unstable.
Signed-off-by: Antonio Terceiro <terceiro@debian.org>

dba285d5

lxc-debian: don't write C.* locales to /etc/locale.gen · c99055ea

authored Oct 27, 2017

Doing that confuses locale generation. lxc-ubuntu does the same check
Signed-off-by: Antonio Terceiro <terceiro@debian.org>

c99055ea

27 Oct, 2017 7 commits
- Merge pull request #1879 from jordemort/lxc-execute-config-define-load · b25c853b
  Christian Brauner authored Oct 28, 2017
```
Call lxc_config_define_load from lxc_execute again
```
  b25c853b
- Merge pull request #1874 from adrian5/patch-1 · b4819bd8
  Stéphane Graber authored Oct 27, 2017
```
Fix typo in lxc-net script
```
  b4819bd8
- Add missing lxc_container_put · baebdaf9
  Jordan Webb authored Oct 27, 2017
```
Signed-off-by: Jordan Webb <jordemort@github.com>
```
  baebdaf9
- Fix typo in lxc-net script · 09a4c380
  adrian5 authored Oct 27, 2017
```
Signed-off-by: adrian5 <adrian5@users.noreply.github.com>
```
  09a4c380
- Call lxc_config_define_load from lxc_execute again · 47d556f5
  Jordan Webb authored Oct 27, 2017
```
Signed-off-by: Jordan Webb <jordemort@github.com>
```
  47d556f5
- tools: allow lxc-attach to undefined containers · 5e5129d7
  Christian Brauner authored Oct 27, 2017
```
For example the following sequence is expected to work:

lxc-start -n containerName -f /path/to/conf \
-s 'lxc.id_map = u 0 100000 65536' \
-s 'lxc.id_map = g 0 100000 65536' \
-s 'lxc.rootfs = /path/to/rootfs' \
-s 'lxc.init_cmd = /path/to/initcmd'

lxc-attach -n containerName

Closes #984.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
  5e5129d7
- Merge pull request #1873 from terceiro/debian-rolling · 546d4469
  Christian Brauner authored Oct 27, 2017
```
lxc-debian: allow creating `testing` and `unstable`
```
  546d4469
26 Oct, 2017 1 commit

lxc-debian: allow creating `testing` and `unstable` · 61fa1329

authored Oct 26, 2017

Being able to create `testing` containers, regardless of what's the name
of the next stable, is useful in several contexts, included but not
limited to testing purposes. i.e. one won't need to explicitly switch to
`bullseye` once `buster` is released to be able to continue tracking
`testing`. While we are at it, let's also enable `unstable`, which is
exactly the same as `sid`, but there is no reason for not being able to.
Signed-off-by: Antonio Terceiro <terceiro@debian.org>

61fa1329

21 Oct, 2017 11 commits

Merge pull request #1864 from brauner/2017-10-18/ringbuffer · f3d91bf0
Serge Hallyn authored Oct 21, 2017
```
ringbuffer: implement efficient and performant ringbuffer
```
f3d91bf0
namespace: use lxc_getpagesize() · a2028b8f
Christian Brauner authored Oct 21, 2017
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
a2028b8f
console: add ringbuffer · 732375f5
Christian Brauner authored Oct 18, 2017
```
Closes #1857.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
732375f5
conf: lxc_setup() -> lxc_setup_child() · 7f135597
Christian Brauner authored Oct 18, 2017
```
Closes #1857.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
7f135597
confile: add lxc.console.logsize · a04220de
Christian Brauner authored Oct 18, 2017
```
Closes #1857.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
a04220de
confile_utils: add lxc_get_conf_uint64() · 2ea479c9
Christian Brauner authored Oct 18, 2017
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
2ea479c9
utils: add lxc_find_next_power2() · 6222c3f4
Christian Brauner authored Oct 18, 2017
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
6222c3f4
utils: parse_byte_size_string() · e3db0162
Christian Brauner authored Oct 18, 2017
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
e3db0162
utils: add lxc_safe_long_long() · b037bc67
Christian Brauner authored Oct 18, 2017
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
b037bc67

ringbuf: implement simple and efficient ringbuffer · f3d05ee6

authored Oct 18, 2017

liblxc will use a ringbuffer implementation that employs mmap()ed memory.
Specifically, the ringbuffer will create an anonymous memory mapping twice the
requested size for the ringbuffer. Afterwards, an in-memory file the requested
size for the ringbuffer will be created. This in-memory file will then be
memory mapped twice into the previously established anonymous memory mapping
thereby effectively splitting the anoymous memory mapping in two halves of
equal size.  This will allow the ringbuffer to get rid of any complex boundary
and wrap-around calculation logic. Since the underlying physical memory is the
same in both halves of the memory mapping only a single memcpy() call for both
reads and writes from and to the ringbuffer is needed.

Design Notes:
- Since we're using MAP_FIXED memory mappings to map the same in-memory file
  twice into the anonymous memory mapping the kernel requires us to always
  operate on properly aligned pages. To guarantee proper page aligment the size
  of the ringbuffer must always be a muliple of the kernel's page size. This
  also implies that the minimum size of the ringbuffer must be at least equal to
  one page size. This additional requirement is reasonably unproblematic.
  First, any ringbuffer smaller than the size of a single page is very likely
  useless since the standard page size on linux is 4096 bytes.
- Because liblxc is not able to predict the output a user is going to produce
  (e.g. users could cat binary files onto the console) and because the
  ringbuffer is located in a hotpath and needs to be as performant as possible
  liblxc will not parse the buffer.

Use Case:
The ringbuffer is needed by liblxc in order to safely log the output of write
intensive callers that produce unpredictable output or unpredictable amounts of
output. The console output created by a booting system and the user is one of
those cases. Allowing a container to log the console's output to a file it
would be possible for a malicious user to fill up the host filesystem by
producing random ouput on the container's console if quota support is either
not enabled or not available for the underlying filesystem. Using a ringbuffer
is a reliable and secure way to ensure a fixed-size log.

Closes #1857.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

f3d05ee6

utils: add lxc_getpagesize() · e4636123
Christian Brauner authored Oct 21, 2017
```
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
```
e4636123