By default container runtimes go to great lengths to shield a container from the host system. Running in
--privileged mode disables/bypasses most of these checks.
This basically means that if you are root in a container you have the privileges of root on the host system. Is is only meant for special cases such as running Docker in Docker and should be avoided.
In this lesson, we will show you why running containers in privileged mode is really a bad idea. First, we will show you that a privileged container system with root access gives you access to the host filesystem, kernel settings and processes. We will explain alternative configuration options if you need more access then usual. Finally, we will teach you how to configure your Kubernetes code to disable running containers with such settings.
We know it's tempting but privileged mode in almost all cases is a matter of lazyness. Let's see why using it is a bad idea!
The mindset of giving only the privileges needed to complete a task is called the Principle of Least Privilege
After some careful crafting of a URL request, we manage to get access to a root shell on a remote system (how that's done, I'll leave that for another lesson). We get really excited and start to explore what we can do.
After we get access, we first check if we are inside of a container by running:
We see the word docker in there so we can confirm we are in a container.
Enabling Privileged mode (--privileged) as per the official Docker documentation has the following effects : The
--privileged flag gives all capabilities to the container, and it also lifts all the limitations enforced by the device cgroup controller. In other words, the container can then do almost everything that the host can do. This flag exists to allow special use-cases, like running Docker within Docker.
You can find more detail in this blogpost about docker in docker by Docker Captain JPetazzo who submitted this feature.
The result of bypassing all these checks is that a user root inside of a container will have the same access as root on the host system. So we definitely want to avoid running as root in a container that is running in privileged mode.
When a container is given privileged mode it receives all permissions the host has. When running as a regular user this might still be limited, but as root this means having control over the complete host system. Therefore the first mitigation is to avoid running as root in the first place.
root (id = 0) is the default user within a container. The image developer can create additional users. Those users are accessible by name. When passing a numeric ID, the user does not have to exist in the container.
We can set a default user to run the first process with the Dockerfile USER instruction.
But sometimes we don't have control over the image and we download a container image that by default runs as root (for example Ubuntu).
When starting a container, we can override the USER instruction by passing the -u (
$ docker run -u 1001 -it ubuntu:latest /bin/bashI have no name!@94f1ded5e4ac:/$ id -auid=1001 gid=0(root) groups=0(root)
We see that this confuses the container as it doesn't see itself as root anymore.
Even when a container is started as non root, given the right permissions a user might sudo and become root.
We can block this behavior by using the
--security-opt no-new-privileges flag when starting the container.
docker container run --rm -it \--user 1001:1001 \--security-opt no-new-privileges \mycontainer
The user specified on the CLI or in the Dockerfile refer to the user running inside of the container. As a container admin you can change under which users containers run on the host using the
userns-remap feature. This will make it transparent for a container and tricks it into thinking it is still root but in reality it runs as a different user on the host.
The explanation of this is out of scope, see the Docker documentation on using userns-remap
Rootless mode executes the Docker daemon and containers inside a user namespace. This is very similar to
userns-remap mode, except that with
userns-remap mode, the daemon itself is running with root privileges, whereas in rootless mode, both the daemon and the container are running without root privileges.
The explanation of this is out of scope, see the docker documentation for more information on running Docker in rootless mode
It's really tempting to give a container privileged mode, this takes away the pain of having to find out the real privileges you need. It's the
chmod 777 of containers. As we've shown in the In Action section, this is really a bad idea. We encourage you to look at the documentation of the container to what privileges it really needs.
In the next parts we describe a few of the most common cases where we see people resort to privileged mode but could have specified specific permission so privileged mode is not required. You can check all the options for the docker runtime on their documentation page.
When dealing with device access from host inside of a container it's tempting to add privileged mode, yet there is a better solution:
To share devices from the host to the container such as disks or serial ports there exist a specific container run flag
$ docker run --device=/dev/sda:/dev/xvdc --rm -it ubuntu fdisk /dev/xvdc
Another common case is when you want to start a webserver and is says it can't bind to a lower tcp port (say port 80 port http). If a server needs to listen to a port below 1024 , we can use Linux capabilities to add this capability.
Your error might look like:
(13)Permission denied: AH00072: make_sock: could not bind to address [::]:80
Then you can add the capability
NET_BIND_SERVICE to the container instead of adding the privileged mode
$ docker run -it --rm --cap-drop=ALL --cap-add=NET_BIND_SERVICE php:apache
For more information on using Linux Capabilities see our other lesson about configuring Container and Linux Capabilities
Please note that the bind behavior has changed in Docker since version 20.03 to allow lower port binding. So if you see this error best to check/upgrade the container runtime version.
Trust us , we've been there seccomp is not always easy. Sometimes you think you've set up all the permission you think it needs but there is still some permission issue. You can relax the seccomp contraints while mounting filesystems. More specific volume mount option besides the ro/rw flag also support a
-Z flag to relax seccomp file permissions. Matt Jarvis has written a more detailed blog on privileged mode and seccomp.
There is a known issue with older versions of seccomp and newer versions of the docker runtime on Alpine systems/Raspbian. This manifest itself, for example, by not being able to run
apt update as root in a container. The solution is to update libseccomp to a new version; sometimes this is not available on your OS updates so you have to update manually. See https://docs.linuxserver.io/faq for more details.
Even developers of the popular Home Assistant automation package added
--privileged in the documentation as a recommended way to run their container. As described in this issue https://github.com/home-assistant/core/issues/52647 it shows it was not really needed.
As a Kubernetes admin, you can configure the cluster to enforce securityContext settings. Kubernetes has the PodSecurityPolicy controller built in which allows you to enforce securityContext settings.
More specifically you can force that users (a) not use privileged, (b) must run as non root and (c) disables any escalation of privilege happening in a container.
Please note that PodSecurityPolicy has been deprecated in the v1.21 release and is scheduled for removal in v1.25.
It will eventually be replaced by the new Pod Security Admission feature; however, as of v1.22, it is still in alpha status. Good alternatives are the externally maintained projects Open Policy Agent or Kyverno.
To learn more about kubernetes security, check out our blog posts:
Finally, you can download our Kubernetes settings cheatsheet here.
Woohoo! You've learned what the risks are of using the
--privileged mode flag. We hope you will stop using it as lazy way to bypass all permissions problems and spend the time setting the correct permissions.
Feel free to rate how valuable this lesson was for you and provide feedback to make it even better! Also, make sure to check out our lessons on other common vulnerabilities.