Container Breakout (Leaky Vessels) Affecting github.com/opencontainers/runc/libcontainer package, versions >=1.0.0-rc93 <1.1.12
Do your applications use this vulnerable package?
In a few clicks we can analyze your entire application and see what components are vulnerable in your application, and suggest you quick fixes.Test your applications
- Snyk ID SNYK-GOLANG-GITHUBCOMOPENCONTAINERSRUNCLIBCONTAINER-6209334
- published 31 Jan 2024
- disclosed 12 Dec 2023
- credit Rory McNamara (Snyk Security Research), @lifubang (acmcoder), Aleksa Sarai (SUSE)
How to fix?
github.com/opencontainers/runc/libcontainer to version 1.1.12 or higher.
github.com/opencontainers/runc/libcontainer is a package for a modern container runtime.
Affected versions of this package are vulnerable to Container Breakout (Leaky Vessels). Due to certain leaked file descriptors, an attacker could cause a newly-spawned container process (from
runc exec) to have a working directory in the host filesystem namespace, allowing for a container escape by giving access to the host filesystem ("attack 2"). The same attack could be used by a malicious image to allow a container process to gain access to the host filesystem through
runc run ("attack 1"). Variants of attacks 1 and 2 could also be used to overwrite semi-arbitrary host binaries, allowing for complete container escapes ("attack 3a" and "attack 3b").
Attack 1: process.cwd "mis-configuration"
Several file descriptors are inadvertently leaked into
runc init, including a handle to the host's
/sys/fs/cgroup (this leak was added in v1.0.0-rc93). If the container was configured to have
process.cwd set to
/proc/self/fd/7/ (the actual fd can change depending on file opening order in
runc), the resulting pid1 process will have a working directory in the host mount namespace and thus the spawned process can access the entire host filesystem. This alone is not an exploit against runc, however a malicious image could make any innocuous-looking non-
/ path a symlink to
/proc/self/fd/7/ and thus trick a user into starting a container whose binary has access to the host filesystem.
Furthermore, runc does not verify that the final working directory is inside the container's mount namespace after calling
chdir(2) (as we have already joined the container namespace, it is incorrectly assumed there is no way to
chdir outside the container after
Note: This attack requires a privileged user to be tricked into running a malicious container image. It should be noted that when using higher-level runtimes (such as Docker or Kubernetes), this exploit can be considered critical as it can be done remotely by anyone with the rights to start a container image (and can be exploited from within Dockerfiles using
ONBUILD in the case of Docker).
Attack 2: runc exec container breakout (This is a modification of attack 1, constructed to allow for a process inside a container to break out.)
The same fd leak and lack of verification of the working directory in attack 1 also apply to
runc exec. If a malicious process inside the container knows that some administrative process will call
runc exec with the
--cwd argument and a given path, in most cases they can replace that path with a symlink to
/proc/self/fd/7/. Once the container process has executed the container binary,
PR_SET_DUMPABLE protections no longer apply and the attacker can open
/proc/$exec_pid/cwd to get access to the host filesystem.
runc exec defaults to a cwd of
/ (which cannot be replaced with a symlink), so this attack depends on the attacker getting a user (or some administrative process) to use
--cwd and figuring out what path the target working directory is. Note that if the target working directory is a parent of the program binary being executed, the attacker might be unable to replace the path with a symlink (the
execve will fail in most cases, unless the host filesystem layout specifically matches the container layout in specific ways and the attacker knows which binary the
runc exec is executing).
Attacks 3a and 3b: process.args host binary overwrite attack (These are modifications of attacks 1 and 2, constructed to overwrite a host binary by using
execve to bring a magic-link reference into the container.)
Attacks 1 and 2 can be adapted to overwrite a host binary by using a path like
/proc/self/fd/7/../../../bin/bash as the
process.args binary argument, causing a host binary to be executed by a container process. The
/proc/$pid/exe handle can then be used to overwrite the host binary, as seen in CVE-2019-5736 (note that the same
#! trick can be used to avoid detection as an attacker). As the overwritten binary could be something like
/bin/bash, as soon as a privileged user executes the target binary on the host, the attacker can pivot to gain full access to the host.
Attack 3a is attack 1 but adapted to overwrite a host binary, where a malicious image is set up to execute
/proc/self/fd/7/../../../bin/bashand run a shell script that overwrites
/proc/self/exe, overwriting the host copy of
Attack 3b is attack 2 but adapted to overwrite a host binary, where the malicious container process overwrites all of the possible
runc exectarget binaries inside the container (such as
/bin/bash) such that a host target binary is executed and then the container process opens
/proc/$pid/exeto get access to the host binary and overwrite it.
As mentioned above, 3b is more dangerous than 3a in practice as it doesn't require a user to run a malicious image.
For attacks 1 and 2, only permit containers (and
runc exec) to use a
/. It is not possible for
/to be replaced with a symlink (the path is resolved from within the container's mount namespace, and you cannot change the root of a mount namespace or an fs root to a symlink).
For attacks 1 and 3a, only permit users to run trusted images.
For attack 3b, there is no practical workaround other than never using
runc execbecause any binary executed with
runc execcould end up being a malicious binary target.
This vulnerability was discovered and responsibly disclosed as part of the Leaky Vessels project.
ChangeLog: 2024-02-05 -
1.0.0-rc93 was added as a lower bound to the