1. 14 Aug, 2013 8 commits
    • lxc-attach: Completely rework lxc-attach and move to API function · 9c4693b8
      Christian Seiler authored
       - Move attach functionality to a completely new API function for
         attaching to containers. The API functions accepts the name of the
         container, the lxcpath, a structure indicating options for attaching
         and returns the pid of the attached process. The calling thread may
         then use waitpid() or similar to wait for the attached process to
         finish. lxc-attach itself is just a simple wrapper around the new
         API function.
      
       - Use CLONE_PARENT when creating the attached process from the
         intermediate process. This allows the intermediate process to exit
         immediately after attach and the original thread may supervise the
         attached process directly.
      
       - Since the intermediate process exits quickly, its only job is to
         send the original process the pid of the attached process (as seen
         from outside the pidns) and exit. This allows us to simplify the
         synchronisation logic by quite a bit.
      
       - Use O_CLOEXEC / SOCK_CLOEXEC on (hopefully) all FDs opened in the
         main thread by the attach logic so that other threads of the same
         program may safely fork+exec off. Also, use shutdown() on the
         synchronisation socket, so that if another thread forks off without
         exec'ing, the synchronisation will not fail. (Not tested whether
         this solves this issue.)
      
       - Instead of directly specifying a program to execute on the API
         level, one specifies a callback function and a payload. This allows
         code using the API to execute a custom function directly inside the
         container without having to execute a program. Two default callbacks
         are provided directly, one to execute an arbitrary program, another
         to execute a shell. The lxc-attach utility will always use either
         one of these default callbacks.
      
       - More fine-grained control of the attached process on the API level
         (not implemented in lxc-attach utility yet, some may not be sensible):
           * Specify which file descriptors should be stdin/stdout/stderr of
             the newly created process. If fds other than 0/1/2 are
             specified, they will be dup'd in the attached process (and the
             originals closed). This allows e.g. threaded applications to
             specify pipes for communication with the attached process
             without having to modify its own stdin/stdout/stderr before
             running lxc-attach.
           * Specify user and group id for the newly attached process.
           * Specify initial working directory for the newly attached
             process.
           * Fine-grained control on whether to do any, all or none of the
             following: move attached process into the container's init's
             cgroup, drop capabilities of the process, set the processes's
             personality, load the proper apparmor profile and (for partial
             attaches to any but not mount-namespaces) whether to unshare the
             mount namespace and remount /sys and /proc. If additional
             features (SELinux policy, SMACK policy, ...) are implemented,
             flags for those may also be provided.
      Signed-off-by: 's avatarChristian Seiler <christian@iwakd.de>
      Acked-by: 's avatarSerge E. Hallyn <serge.hallyn@ubuntu.com>
    • lxc-stop: exit with 1 or 2, not -1 or -2. · b93aac46
      Serge Hallyn authored
      Signed-off-by: 's avatarSerge Hallyn <serge.hallyn@ubuntu.com>
    • cgroups: rework to handle nested containers with multiple and partial mounts · b98f7d6e
      Serge Hallyn authored
      Currently, if you create a container and use the mountcgruop hook,
      you get the /lxc/c1/c1.real cgroup mounted to /.  If you then try
      to start containers inside that container, lxc can get confused.
      This patch addresses that, by accepting that the cgroup as found
      in /proc/self/cgroup can be partially hidden by bind mounts.
      
      In this patch:
      
      Add optional 'lxc.cgroup.use' to /etc/lxc/lxc.conf to specify which
      mounted cgroup filesystems lxc should use.  So far only the cgroup
      creation respects this.
      
      Keep separate cgroup information for each cgroup mountpoint.  So if
      the caller is in devices cgroup /a but cpuset cgroup /b that should
      now be ok.
      
      Change how we decide whether to ignore failure to set devices cgroup
      settings.  Actually look to see if our current cgroup already has the
      settings.  If not, add them.
      
      Finally, the real reason for this patch: in a nested container,
      /proc/self/cgroup says nothing about where under /sys/fs/cgroup you
      might find yourself.  Handle this by searching for our pid in tasks
      files, and keep that info in the cgroup handler.
      
      Also remove all strdupa from cgroup.c (not android-friendly).
      Signed-off-by: 's avatarSerge Hallyn <serge.hallyn@ubuntu.com>
    • add lxc-user-nic · 20ab58c7
      Serge Hallyn authored
      It is meant to be run setuid-root to allow unprivileged users to
      tunnel veths from a host bridge to their containers.  The program
      looks at /etc/lxc/lxc-usernet which has entries of the form
      
      	user type bridge number
      
      The type currently must be veth.  Whenver lxc-user-nic creates a
      nic for a user, it records it in /var/lib/lxc/nics (better location
      is needed).  That way when a container dies lxc-user-nic can cull
      the dead nic from the list.
      
      The -DISTEST allows lxc-user-nic to be compiled so that it uses
      files under /tmp and doesn't actually create the nic, so that
      unprivileged users can compile and test the code.  lxc-test-usernic
      is a script which runs a few tests using lxc-usernic-test, which
      is a version of lxc-user-nic compiled with -DISTEST.
      
      The next step, after issues with this code are raised and addressed,
      is to have lxc-start, when running unprivileged, call out to
      lxc-user-nic (will have to exec so that setuid-root is honored).
      On top of my previous unprivileged-creation patchset, that should
      allow unprivileged users to create and start useful containers.
      
      Also update .gitignore.
      Signed-off-by: 's avatarSerge Hallyn <serge.hallyn@ubuntu.com>
    • hooks/Makefile.am: add ubuntu-cloud-prep · 3fb18be9
      Serge Hallyn authored
      Signed-off-by: 's avatarSerge Hallyn <serge.hallyn@ubuntu.com>
  2. 13 Aug, 2013 2 commits
  3. 12 Aug, 2013 2 commits
  4. 09 Aug, 2013 4 commits
  5. 07 Aug, 2013 3 commits
    • Logging: don't confuse command line and config file specified values · b40a606e
      Serge Hallyn authored
      Currently if loglevel/logfile are specified on command line in a
      program using LXC api, and that program does any
      container->save_config(), then the new config will be saved with the
      loglevel/logfile specified on command line.  This is wrong, especially
      in the case of
      
      cat > lxc.conf << EOF
      lxc.logfile=a
      EOF
      
      lxc-create -t cirros -n c1 -o b
      
      which will result in a container config with lxc.logfile=b.
      Signed-off-by: 's avatarSerge Hallyn <serge.hallyn@ubuntu.com>
    • lxc-clone: don't s/oldname/newname in the config file and hooks · 96532523
      Serge Hallyn authored
      1. container hooks should use lxcpath and lxcname from the environment.
      2. the utsname now gets separately updated
      3. the rootfs path gets updated by the bdev backend.
      4. the fstab mount targets should be relative
      5. the fstab source directories could be separately updated if needed.
      
      This leaves one definate bug: the lxc.logfile does not get updated.
      This made me wonder why it was in the configuration file to begin with.
      Digging deeper, I realized that whatever '-o outfile' you give
      lxc-create gets set in log.c and gets used by the lxc_container object
      we create at write_config().  So if you say
      	lxc-create -t cirros -n c1 -o /tmp/out1
      then /var/lib/lxc/c1/config will have lxc.logfile=/tmp/out1 - which is
      clearly wrong.  Therefore I leave fixing that for later.
      
      I'm looking for candidates for $p/$n expansion.  Note we can't expand
      these at config_utsname() etc, because then lxc-clone would see the
      expanded variable.  So we want to read $p/$n verbatim at config_*(),
      and expand them only when they are used.  lxc.logfile is an obvious
      good use case.  lxc.utsname can do it too, in case you want container
      c1 to be called "c1-whatever".  I'm not sure that's worth it though.
      Are there any others, or is that it?
      Signed-off-by: 's avatarSerge Hallyn <serge.hallyn@ubuntu.com>
    • ubuntu-cloud: remove debugging echo · d273b8ab
      Serge Hallyn authored
      Signed-off-by: 's avatarSerge Hallyn <serge.hallyn@ubuntu.com>
  6. 26 Jul, 2013 1 commit
  7. 23 Jul, 2013 3 commits
  8. 22 Jul, 2013 5 commits
  9. 18 Jul, 2013 1 commit
  10. 17 Jul, 2013 1 commit
    • ubuntu templates: add some kernel filesystems to container fstab · 6f259716
      Serge Hallyn authored
      The debugfs, fusectl, and securityfs may not be mounted inside a
      non-init userns.  But mountall hangs waiting for them to be
      mounted.  So just pre-mount them using $lxcpath/$name/fstab as
      bind mounts, which will prevent mountall from trying to mount
      them.
      
      If the kernel doesn't provide them, then the bind mount failure
      will be ignored, and mountall in the container will proceed
      without the mount since it is 'optional'.  But without these
      bind mounts, starting a container inside a user namespace
      hangs.
      Signed-off-by: 's avatarSerge Hallyn <serge.hallyn@ubuntu.com>
      Acked-by: 's avatarStéphane Graber <stgraber@ubuntu.com>
  11. 16 Jul, 2013 4 commits
  12. 15 Jul, 2013 1 commit
    • lxc_create: prepend pretty header to config file (v2) · 3ce74686
      Serge Hallyn authored
      Define a sha1sum_file() function in utils.c.  Use that in lxcapi_create
      to write out the sha1sum of the template being used.  If libgnutls is
      not found, then the template sha1sum simply won't be printed into the
      container config.
      
      This patch also trivially fixes some cases where SYSERROR is used after
      a fclose (masking errno) and missing consts in mkdir_p.
      Signed-off-by: 's avatarSerge Hallyn <serge.hallyn@ubuntu.com>
  13. 12 Jul, 2013 4 commits
  14. 11 Jul, 2013 1 commit
    • Accomodate stricter devices cgroup rules · 283678ed
      Serge Hallyn authored
      3.10 kernel comes with proper hierarchical enforcement of devices
      cgroup.  To keep that code somewhat sane, certain things are not
      allowed.  Switching from default-allow to default-deny and vice versa
      are not allowed when there are children cgroups.  (This *could* be
      simplified in the kernel by checking that all child cgroups are
      unpopulated, but that has not yet been done and may be rejected)
      
      The mountcgroup hook causes lxc-start to break with 3.10 kernels, because
      you cannot write 'a' to devices.deny once you have a child cgroup.  With
      this patch, (a) lxcpath is passed to hooks, (b) the cgroup mount hook sets
      the container's devices cgroup, and (c) setup_cgroup() during lxc startup
      ignores failures to write to devices subsystem if we are already in a
      child of the container's new cgroup.
      
      ((a) is not really related to this bug, but is definately needed.
      The followup work of making the other hooks use the passed-in lxcpath
      is still to be done)
      Signed-off-by: 's avatarSerge Hallyn <serge.hallyn@ubuntu.com>