Commits · 9c4693b853c5a9ab2156544ee3334a082cdba420 · Chen Yisong / lxc

14 Aug, 2013 8 commits

lxc-attach: Completely rework lxc-attach and move to API function · 9c4693b8

authored May 08, 2013

 - Move attach functionality to a completely new API function for
   attaching to containers. The API functions accepts the name of the
   container, the lxcpath, a structure indicating options for attaching
   and returns the pid of the attached process. The calling thread may
   then use waitpid() or similar to wait for the attached process to
   finish. lxc-attach itself is just a simple wrapper around the new
   API function.

 - Use CLONE_PARENT when creating the attached process from the
   intermediate process. This allows the intermediate process to exit
   immediately after attach and the original thread may supervise the
   attached process directly.

 - Since the intermediate process exits quickly, its only job is to
   send the original process the pid of the attached process (as seen
   from outside the pidns) and exit. This allows us to simplify the
   synchronisation logic by quite a bit.

 - Use O_CLOEXEC / SOCK_CLOEXEC on (hopefully) all FDs opened in the
   main thread by the attach logic so that other threads of the same
   program may safely fork+exec off. Also, use shutdown() on the
   synchronisation socket, so that if another thread forks off without
   exec'ing, the synchronisation will not fail. (Not tested whether
   this solves this issue.)

 - Instead of directly specifying a program to execute on the API
   level, one specifies a callback function and a payload. This allows
   code using the API to execute a custom function directly inside the
   container without having to execute a program. Two default callbacks
   are provided directly, one to execute an arbitrary program, another
   to execute a shell. The lxc-attach utility will always use either
   one of these default callbacks.

 - More fine-grained control of the attached process on the API level
   (not implemented in lxc-attach utility yet, some may not be sensible):
     * Specify which file descriptors should be stdin/stdout/stderr of
       the newly created process. If fds other than 0/1/2 are
       specified, they will be dup'd in the attached process (and the
       originals closed). This allows e.g. threaded applications to
       specify pipes for communication with the attached process
       without having to modify its own stdin/stdout/stderr before
       running lxc-attach.
     * Specify user and group id for the newly attached process.
     * Specify initial working directory for the newly attached
       process.
     * Fine-grained control on whether to do any, all or none of the
       following: move attached process into the container's init's
       cgroup, drop capabilities of the process, set the processes's
       personality, load the proper apparmor profile and (for partial
       attaches to any but not mount-namespaces) whether to unshare the
       mount namespace and remount /sys and /proc. If additional
       features (SELinux policy, SMACK policy, ...) are implemented,
       flags for those may also be provided.
Signed-off-by: Christian Seiler <christian@iwakd.de>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

9c4693b8

Fix return type of read/write utility functions. · 650468bb

authored May 21, 2013

Signed-off-by: Christian Seiler <christian@iwakd.de>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

650468bb

lxc-stop: exit with 1 or 2, not -1 or -2. · b93aac46
Serge Hallyn authored Aug 14, 2013
```
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
```
b93aac46
lxc_destroy: print an error if the container is not defined. · 01e6b714
Serge Hallyn authored Aug 14, 2013
```
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
```
01e6b714

cgroups: rework to handle nested containers with multiple and partial mounts · b98f7d6e

authored Aug 09, 2013

Currently, if you create a container and use the mountcgruop hook,
you get the /lxc/c1/c1.real cgroup mounted to /.  If you then try
to start containers inside that container, lxc can get confused.
This patch addresses that, by accepting that the cgroup as found
in /proc/self/cgroup can be partially hidden by bind mounts.

In this patch:

Add optional 'lxc.cgroup.use' to /etc/lxc/lxc.conf to specify which
mounted cgroup filesystems lxc should use.  So far only the cgroup
creation respects this.

Keep separate cgroup information for each cgroup mountpoint.  So if
the caller is in devices cgroup /a but cpuset cgroup /b that should
now be ok.

Change how we decide whether to ignore failure to set devices cgroup
settings.  Actually look to see if our current cgroup already has the
settings.  If not, add them.

Finally, the real reason for this patch: in a nested container,
/proc/self/cgroup says nothing about where under /sys/fs/cgroup you
might find yourself.  Handle this by searching for our pid in tasks
files, and keep that info in the cgroup handler.

Also remove all strdupa from cgroup.c (not android-friendly).
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>

b98f7d6e

lxc-user-nic: specify config and db files in autoconf · 070a4b8e
Serge Hallyn authored Aug 09, 2013
```
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
```
070a4b8e

add lxc-user-nic · 20ab58c7

authored Aug 09, 2013

It is meant to be run setuid-root to allow unprivileged users to
tunnel veths from a host bridge to their containers.  The program
looks at /etc/lxc/lxc-usernet which has entries of the form

	user type bridge number

The type currently must be veth.  Whenver lxc-user-nic creates a
nic for a user, it records it in /var/lib/lxc/nics (better location
is needed).  That way when a container dies lxc-user-nic can cull
the dead nic from the list.

The -DISTEST allows lxc-user-nic to be compiled so that it uses
files under /tmp and doesn't actually create the nic, so that
unprivileged users can compile and test the code.  lxc-test-usernic
is a script which runs a few tests using lxc-usernic-test, which
is a version of lxc-user-nic compiled with -DISTEST.

The next step, after issues with this code are raised and addressed,
is to have lxc-start, when running unprivileged, call out to
lxc-user-nic (will have to exec so that setuid-root is honored).
On top of my previous unprivileged-creation patchset, that should
allow unprivileged users to create and start useful containers.

Also update .gitignore.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>

20ab58c7

hooks/Makefile.am: add ubuntu-cloud-prep · 3fb18be9
Serge Hallyn authored Aug 14, 2013
```
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
```
3fb18be9

13 Aug, 2013 2 commits
- lxc.conf.sgml.in: note the arguments and environment variables passed to hooks · baece282
  Serge Hallyn authored Aug 13, 2013
```
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
```
  baece282
- mountcgroups: use the right configuration file! · 8bb17b77
  Serge Hallyn authored Aug 13, 2013
```
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
```
  8bb17b77
12 Aug, 2013 2 commits

ubuntu-cloud-prep: cleanup, fix bug with userdata · 79159a86

authored Aug 10, 2013

--userdata was broken, completely missing an implementation.
This adds that implementation back in, makes 'debug' logic
correct, and then also improves the doc at the top.
Signed-off-by: Scott Moser <smoser@ubuntu.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>

79159a86

lxc-destroy: Fix regular expression for getting rootfs · 034a0159

authored Aug 12, 2013

The `lxc-destroy` script was using a simple `grep` for extracting
`lxc.rootfs` from the lxc config. This regex also matches commented lines
and breaks at least removing btrfs subvolumes if the string `lxc.rootfs`
is mentioned in a comment. Furthermore, due to the unescaped dot in the
regex it would also match other wrong strings like `lxc rootfs`.

This patch modifies the regular expression to correctly match the beginning
of the line plus potential whitespace characters and the string
`lxc.rootfs`.
Signed-off-by: Franz Pletz <fpletz@fnordicwalking.de>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>

034a0159

09 Aug, 2013 4 commits

ubuntu-cloud-prep: fix bad declare of VERBOSITY · 54e339f9

authored Aug 09, 2013

Signed-off-by: Scott Moser <smoser@ubuntu.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>

54e339f9

add a clone hook for ubuntu-cloud images · 65d8ae9c

authored Aug 08, 2013

This allows ability to now specify '--userdata' arguments to 'create' or
to 'clone'. So now, the following means very fast start of instances with
different user-data.

$ sudo lxc-create -t ubuntu-cloud -n precise -- \
   -r precise --arch amd64

$ sudo lxc-clone -B overlayfs -o precise -s -n ephem1 \
   --userdata="my.userdata1"
$ sudo lxc-clone -B overlayfs -o precise -s -n ephem2 \
   --userdata="my.userdata2"

Also present here is
 * an improvement to the static list of Ubuntu releases. It uses
   ubuntu-distro-info if available degrades back to a static list on failure.
 * moving of the replacement variables to the top of the create template This
   is just to make it more obvious what is being replaced and put them in a
   single location.
Signed-off-by: Scott Moser <smoser@ubuntu.com>

65d8ae9c

Cleanup Makefile.am · 1c8e4ee0

authored Aug 09, 2013

Remove some dead code and fix identation, no functional change.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>

1c8e4ee0

Replace mktemp() by a new mkifname() · 4a0ba80d

authored Aug 09, 2013

Using mktemp() leads to build time warnings and isn't actually
appropriate for what we want to do as it's checking for the existence of
a file and not a network interface.

Replace those calls by an equivalent mkifname() function which uses the
same template as mktemp but instead checks for existing network
interfaces.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>

4a0ba80d

07 Aug, 2013 3 commits

Logging: don't confuse command line and config file specified values · b40a606e

authored Aug 06, 2013

Currently if loglevel/logfile are specified on command line in a
program using LXC api, and that program does any
container->save_config(), then the new config will be saved with the
loglevel/logfile specified on command line.  This is wrong, especially
in the case of

cat > lxc.conf << EOF
lxc.logfile=a
EOF

lxc-create -t cirros -n c1 -o b

which will result in a container config with lxc.logfile=b.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>

b40a606e

lxc-clone: don't s/oldname/newname in the config file and hooks · 96532523

authored Aug 05, 2013

1. container hooks should use lxcpath and lxcname from the environment.
2. the utsname now gets separately updated
3. the rootfs path gets updated by the bdev backend.
4. the fstab mount targets should be relative
5. the fstab source directories could be separately updated if needed.

This leaves one definate bug: the lxc.logfile does not get updated.
This made me wonder why it was in the configuration file to begin with.
Digging deeper, I realized that whatever '-o outfile' you give
lxc-create gets set in log.c and gets used by the lxc_container object
we create at write_config().  So if you say
	lxc-create -t cirros -n c1 -o /tmp/out1
then /var/lib/lxc/c1/config will have lxc.logfile=/tmp/out1 - which is
clearly wrong.  Therefore I leave fixing that for later.

I'm looking for candidates for $p/$n expansion.  Note we can't expand
these at config_utsname() etc, because then lxc-clone would see the
expanded variable.  So we want to read $p/$n verbatim at config_*(),
and expand them only when they are used.  lxc.logfile is an obvious
good use case.  lxc.utsname can do it too, in case you want container
c1 to be called "c1-whatever".  I'm not sure that's worth it though.
Are there any others, or is that it?
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>

96532523

ubuntu-cloud: remove debugging echo · d273b8ab
Serge Hallyn authored Aug 07, 2013
```
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
```
d273b8ab

26 Jul, 2013 1 commit

cgroups: fix the recently broken setting of clone_children · c9cbb9e5

authored Jul 26, 2013

Several places think that the current cgroup will be NULL rather
than "/" when we're in the root cgroup.  Fix that.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>

c9cbb9e5

23 Jul, 2013 3 commits

cgroup_enter: catch write errors · 2c495ae3

authored Jul 22, 2013

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

2c495ae3

define lxc-usernsexec · d155b47d

authored Jul 22, 2013

It uses the newuidmap and newgidmap program to start a shell in
a mapped user namespace.  While newuidmap and newgidmap are
setuid-root, lxc-usernsexec is not.

If new{ug}idmap are not available, then this program is not
built or installed.  Otherwise, it will be used to support creating,
starting, destroying, etc containers by unprivileged users using
their authorized subuids and subgids.

Example:
	usernsexec -m u:0:100000:1 -- /bin/bash

will, if the user is authorized to use subuid 100000, start a
bash shell in a user namespace where 100000 on the host is
mapped to root in the namespace, and the shell is running as
(privileged) root.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

d155b47d

lxclock: use XDG_RUNTIME_DIR for lock if appropriate (v2) · 469b5787

authored Jul 22, 2013

If we are euid==0 or XDG_RUNTIME_DIR is not set, then use
/run/lock/lxc/$lxcpath/$lxcname as before.  Otherwise,
use $XDG_RUNTIME_DIR/lock/lxc/$lxcpath/$lxcname.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: Stéphane Graber <stephane.graber@canonical.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

469b5787

22 Jul, 2013 5 commits

A few changes for unprivileged lxc-start · b60ed720

authored May 10, 2013

When doing reboot test, must add clone_newuser to clone flags, else
we can't clone(CLONE_NEWPID).

If we don't have caps at lxc-start, don't refuse to start.  Drop the
lxc_caps_check() function altogether as it is unused now.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

b60ed720

send current cgroup to lxc_cgroup_create() · b113383b

authored Jul 18, 2013

This is needed if we're going to have unprivileged users
create containers inside cgroups which they own.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

b113383b

ubuntu-cloud: changes to support unprivileged use · 1aad9e44

authored Jul 15, 2013

don't try to lock if using a specified tarball

The lock/subsys/lxc-ubuntu-cloud lock is to protect the tarballs
managed under /var/cache/lxc/cloud-$release.  Don't lock if we've
been handed a tarball.

fake device creation

Unprivileged users can't create devices, so bind mount null, tty, urandom
and console from the host.

Changelog:
	Jul 22: as Stéphane points out, remove a left-over debug line
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

1aad9e44

lxc-create: support unpriv users · 460bcbd8

authored May 08, 2013

Just make sure we are root if we are asked to deal with something other
than a directory, and make sure we have permission to create the
container in the given lxcpath.

The templates will need much more work.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

460bcbd8

templates: require running as root · 5be56973

authored May 08, 2013

Up to now lxc-create ensured that you were running as root.  Now the
templates which require root need to do it for themselves.  Templates
which do mknod definately require root.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

5be56973

18 Jul, 2013 1 commit
- teach lxc-cirros about the --rootfs argument · 4165b2c6
  Serge Hallyn authored Jul 18, 2013
```
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
```
  4165b2c6
17 Jul, 2013 1 commit

ubuntu templates: add some kernel filesystems to container fstab · 6f259716

authored Jul 17, 2013

The debugfs, fusectl, and securityfs may not be mounted inside a
non-init userns.  But mountall hangs waiting for them to be
mounted.  So just pre-mount them using $lxcpath/$name/fstab as
bind mounts, which will prevent mountall from trying to mount
them.

If the kernel doesn't provide them, then the bind mount failure
will be ignored, and mountall in the container will proceed
without the mount since it is 'optional'.  But without these
bind mounts, starting a container inside a user namespace
hangs.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

6f259716

16 Jul, 2013 4 commits

clone: only update <rootfs>/etc/hostname if it exists · 8058be39

authored Jul 16, 2013

Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>

8058be39

Make get_ips timeout poll configurable · 819554fe

authored Jul 12, 2013

This commit increases the default timeout used by lxc-start-ephemeral
from 5 to 10, and adds support for an LXC_IP_TIMEOUT override.

Patchset 2:
  - Previous patch used a command line arg.
Signed-off-by: John McFarlane <john@rockfloat.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

819554fe

lxccontainer: don't define certain variables if !HAVE_GNUTLS · 52026772
Serge Hallyn authored Jul 16, 2013
```
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
```
52026772

userns: clear and save id_map (v2) · 27c27d73

authored Jul 15, 2013

Otherwise (a) there is a memory leak when using user namespaces and
clearing a config, and (b) saving a container configuration file doesn't
maintain the userns mapping.  For instance, if container c1 has
lxc.id_map configuration entries, then

python3
import lxc
c=lxc.Container("c1")
c.save_config("/tmp/config1")

should show 'lxc.id_map =' entries in /tmp/config1.

Changelog for v2:
   1. fix incorrect saving of group types (s/'c'/'g')
   2. fix typo -> idmap->type should be idmap->idtype
Reported-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Dwight Engen <dwight.engen@oracle.com>
Tested-by: Dwight Engen <dwight.engen@oracle.com>

27c27d73

15 Jul, 2013 1 commit

lxc_create: prepend pretty header to config file (v2) · 3ce74686

authored Jul 12, 2013

Define a sha1sum_file() function in utils.c.  Use that in lxcapi_create
to write out the sha1sum of the template being used.  If libgnutls is
not found, then the template sha1sum simply won't be printed into the
container config.

This patch also trivially fixes some cases where SYSERROR is used after
a fclose (masking errno) and missing consts in mkdir_p.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>

3ce74686

12 Jul, 2013 4 commits

ubuntu-cloud template: accept --rootfs argument · 868a70af
Serge Hallyn authored Jul 12, 2013
```
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
```
868a70af
remove old lxc-create script. · 6a2e602b
Serge Hallyn authored Jul 12, 2013
```
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
```
6a2e602b

create: add a quiet flag · dc23c1c8

authored Jul 12, 2013

If set, then fds 0,1,2 will be redirected while the creation
template is executed.

Note, as Dwight has pointed out, if fd 0 is redirected, then if
templates ask for input there will be a problem.  We could simply
not redirect fd 0, or we could require that templates work without
interaction.  I'm assuming here that we want to do the latter, but
I'm open to changing that.
Reported-by: "S.Çağlar Onur" <caglar@10ur.org>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>

dc23c1c8

lxc_clone.c: Allow size subfixes for -L parameter · ae13ae08

authored Jul 11, 2013

lxc-clone ignores size subfixes (K, M, G) when using -L parameter. The
following is a quick patch to allow, for example, lxc-clone -L 10G.
Signed-off-by: Norberto Bensa <nbensa@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>

ae13ae08

11 Jul, 2013 1 commit

Accomodate stricter devices cgroup rules · 283678ed

authored Jul 05, 2013

3.10 kernel comes with proper hierarchical enforcement of devices
cgroup.  To keep that code somewhat sane, certain things are not
allowed.  Switching from default-allow to default-deny and vice versa
are not allowed when there are children cgroups.  (This *could* be
simplified in the kernel by checking that all child cgroups are
unpopulated, but that has not yet been done and may be rejected)

The mountcgroup hook causes lxc-start to break with 3.10 kernels, because
you cannot write 'a' to devices.deny once you have a child cgroup.  With
this patch, (a) lxcpath is passed to hooks, (b) the cgroup mount hook sets
the container's devices cgroup, and (c) setup_cgroup() during lxc startup
ignores failures to write to devices subsystem if we are already in a
child of the container's new cgroup.

((a) is not really related to this bug, but is definately needed.
The followup work of making the other hooks use the passed-in lxcpath
is still to be done)
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>

283678ed