1. 19 Feb, 2021 12 commits
    • bpf: update device cgroup semantics · 0ede3725
      Christian Brauner authored
      LXC has supported the bpf device controlller for a while now. A bpf device
      program can be attached to the container's cgroup if this is a pure cgroup2
      host.
      
      The format for specifying device rules for the cgroup2 bpf device controller is
      the same as for the legacy cgroup device controller; only the configuration key
      prefix has to change. Specifically, device rules for the legacy cgroup device
      controller are specified by via lxc.cgroup.devices.{allow,deny} whereas for the
      cgroup2 bpf device controller lxc.cgroup2.devices.{allow,deny} must be used.
      
      The following semantics apply:
      1. The device rule "lxc.cgroup2.devices.deny = a" will cause LXC to instruct
         the kernel to block access to all devices by default. To grant access to
         devices "allow device rules" must be added via the
         "lxc.cgroup2.devices.allow" key. This is referred to as a "allowlist" device
         program.
      2. The device rule "lxc.cgroup2.devices.allow = a" will cause LXC to instruct
         the kernel to allow access to all devices by default. To deny access to
         devices "deny device rules" must be added via "lxc.cgroup2.devices.deny"
         key. This is referred to as a "denylist" device program.
      3. Specifying a rule as explained in 1. or 2. will cause all previous rules to
         be cleared, i.e. the device list will be reset.
      
      For example the set of rules:
      
      lxc.cgroup2.devices.deny = a
      lxc.cgroup2.devices.allow = c *:* m
      lxc.cgroup2.devices.allow = b *:* m
      lxc.cgroup2.devices.allow = c 1:3 rwm
      
      implements a "allowlist" device program, i.e. the kernel will block access to
      all devices not specifically allowed in this list. This particular program
      states that all character and block devices might be created but only /dev/null
      might be read or written.
      
      If we to switch to the set of rules to:
      
      lxc.cgroup2.devices.allow = a
      lxc.cgroup2.devices.deny = c *:* m
      lxc.cgroup2.devices.deny = b *:* m
      lxc.cgroup2.devices.deny = c 1:3 rwm
      
      then LXC would instruct the kernel to implement a "denylist", i.e. the kernel
      will allow access to all devices not specifically denied in this list. This
      particular program states that no character devices or block devices might be
      created and that /dev/null is not allow allowed to be read, written, or
      created.
      
      Consider the same program but followed by a rule as explained in 1. or 2.:
      
      lxc.cgroup2.devices.allow = a
      lxc.cgroup2.devices.deny = c *:* m
      lxc.cgroup2.devices.deny = b *:* m
      lxc.cgroup2.devices.deny = c 1:3 rwm
      lxc.cgroup2.devices.allow = a
      
      The last line will cause LXC to reset the device list without changing the type
      of device program.
      
      lxc.cgroup2.devices.allow = a
      lxc.cgroup2.devices.deny = c *:* m
      lxc.cgroup2.devices.deny = b *:* m
      lxc.cgroup2.devices.deny = c 1:3 rwm
      lxc.cgroup2.devices.deny = a
      
      The last line will cause LXC to reset the device list and switch from a
      "allowlist" program to a "denylist" program.
      Signed-off-by: 's avatarChristian Brauner <christian.brauner@ubuntu.com>
    • bpf: fix typos · 15970277
      Christian Brauner authored
      Signed-off-by: 's avatarChristian Brauner <christian.brauner@ubuntu.com>
    • Merge pull request #3686 from cyphar/apparmor-attr-subdir · f43ed6a0
      Christian Brauner authored
      apparmor: prefer /proc/.../attr/apparmor/current over legacy interface
    • apparmor: prefer /proc/.../attr/apparmor/current over legacy interface · 47f4914d
      Aleksa Sarai authored
      It turns out that since Linux 5.1 there are now per-LSM subdirectories
      for major LSMs, which users are recommended to use over the "legacy"
      top-level /proc/$pid/attr/... files[1]:
      
      > Process attributes associated with “major” security modules should be
      > accessed and maintained using the special files in /proc/.../attr. A
      > security module may maintain a module specific subdirectory there,
      > named after the module. /proc/.../attr/smack is provided by the Smack
      > security module and contains all its special files. The files directly
      > in /proc/.../attr remain as legacy interfaces for modules that provide
      > subdirectories.
      
      AppArmor has had such a directory since Linux 5.8[2], and it turns out
      that with certain CONFIG_LSM configurations you can end up with AppArmor
      files not being accessible from the legacy interface. Arch Linux
      recently added BPF as one of the enabled LSM in their configuration, and
      this broke runc[3] and LXC.
      
      The solution is to first try to use /proc/$pid/attr/apparmor/current and
      fall back to /proc/$pid/attr/current if the former is not available.
      
      [1]: https://www.kernel.org/doc/html/latest/admin-guide/LSM/index.html
      [2]: Linux 5.8 ; commit 6413f852ce08 ("apparmor: add proc subdir to attrs")
      [3]: https://github.com/opencontainers/runc/issues/2801Signed-off-by: 's avatarAleksa Sarai <cyphar@cyphar.com>
    • apparmor: clean up apparmor_process_label_get · 301a5f8e
      Aleksa Sarai authored
      Rather than open-coding file reading and retry semantics and
      implementing the path generation logic separately to
      apparmor_process_label_fd_get, refactor the logic so that it looks
      closer to the pidfd version.
      
      This will make it easier to implement the two-step handling for
      /proc/self/attr/apparmor/current and makes this code slightly less
      confusing.
      Signed-off-by: 's avatarAleksa Sarai <cyphar@cyphar.com>
  2. 18 Feb, 2021 28 commits