Container is running in privileged mode

A user that has root access in a container with Privileged mode on basically is root on the host system

Container: privileged mode

What is container privileged mode?

By default container runtimes go to great lengths to shield a container from the host system. Running in --privileged mode disables/bypasses most of these checks. This basically means that if you are root in a container you have the privileges of root on the host system. Is is only meant for special cases such as running Docker in Docker and should be avoided.

About this lesson

In this lesson, we will show you why running containers in privileged mode is really a bad idea. First, we will show you that a privileged container system with root access gives you access to the host filesystem, kernel settings and processes. We will explain alternative configuration options if you need more access then usual. Finally, we will teach you how to configure your Kubernetes code to disable running containers with such settings.

We know it's tempting but privileged mode in almost all cases is a matter of lazyness. Let's see why using it is a bad idea!

Did you know?

Principle of Least Privilege

The mindset of giving only the privileges needed to complete a task is called the Principle of Least Privilege

Container privileged mode , all your host privileges belong to me

Try now

What can you do if you have access to a privileged container

After some careful crafting of a URL request, we manage to get access to a root shell on a remote system (how that's done, I'll leave that for another lesson). We get really excited and start to explore what we can do.

Demo terminal
Are we in a container ?

After we get access, we first check if we are inside of a container by running: cat /proc/1/cgroup.

We see the word docker in there so we can confirm we are in a container.

Why does privileged mode exist?

Privileged mode disables all these checks

It was first introduced as an easier way to debug and to allow for running Docker inside Docker.

Enabling Privileged mode (--privileged) as per the official Docker documentation has the following effects : The --privileged flag gives all capabilities to the container, and it also lifts all the limitations enforced by the device cgroup controller. In other words, the container can then do almost everything that the host can do. This flag exists to allow special use-cases, like running Docker within Docker.

You can find more detail in this blogpost about docker in docker by Docker Captain JPetazzo who submitted this feature.

With privileged root in a container means root on host

The result of bypassing all these checks is that a user root inside of a container will have the same access as root on the host system. So we definitely want to avoid running as root in a container that is running in privileged mode.

What are the best practices for privileged mode?

Avoid running as root user in a container

When a container is given privileged mode it receives all permissions the host has. When running as a regular user this might still be limited, but as root this means having control over the complete host system. Therefore the first mitigation is to avoid running as root in the first place.

root (id = 0) is the default user within a container. The image developer can create additional users. Those users are accessible by name. When passing a numeric ID, the user does not have to exist in the container.

We can set a default user to run the first process with the Dockerfile USER instruction.

But sometimes we don't have control over the image and we download a container image that by default runs as root (for example Ubuntu). When starting a container, we can override the USER instruction by passing the -u (--user) option.

$ docker run -u 1001 -it ubuntu:latest /bin/bash
I have no name!@94f1ded5e4ac:/$ id -a
uid=1001 gid=0(root) groups=0(root)

We see that this confuses the container as it doesn't see itself as root anymore.

Prevent a user from becoming root user

Even when a container is started as non root, given the right permissions a user might sudo and become root. We can block this behavior by using the --security-opt no-new-privileges flag when starting the container.

docker container run --rm -it \
--user 1001:1001 \
--security-opt no-new-privileges \
mycontainer

Advanced - Remap container process to a non privileged host user

The user specified on the CLI or in the Dockerfile refer to the user running inside of the container. As a container admin you can change under which users containers run on the host using the userns-remap feature. This will make it transparent for a container and tricks it into thinking it is still root but in reality it runs as a different user on the host.

The explanation of this is out of scope, see the Docker documentation on using userns-remap

Advanced - Running Docker Daemon in Rootless mode

Rootless mode executes the Docker daemon and containers inside a user namespace. This is very similar to userns-remap mode, except that with userns-remap mode, the daemon itself is running with root privileges, whereas in rootless mode, both the daemon and the container are running without root privileges.

The explanation of this is out of scope, see the docker documentation for more information on running Docker in rootless mode

Take some time and find the privileges you actually need

It's really tempting to give a container privileged mode, this takes away the pain of having to find out the real privileges you need. It's the chmod 777 of containers. As we've shown in the In Action section, this is really a bad idea. We encourage you to look at the documentation of the container to what privileges it really needs.

In the next parts we describe a few of the most common cases where we see people resort to privileged mode but could have specified specific permission so privileged mode is not required. You can check all the options for the docker runtime on their documentation page.

Use --device to shared device access from host to container

When dealing with device access from host inside of a container it's tempting to add privileged mode, yet there is a better solution: To share devices from the host to the container such as disks or serial ports there exist a specific container run flag --device.

$ docker run --device=/dev/sda:/dev/xvdc --rm -it ubuntu fdisk /dev/xvdc

Use capabilities to allow listening to port below 1024

Another common case is when you want to start a webserver and is says it can't bind to a lower tcp port (say port 80 port http). If a server needs to listen to a port below 1024 , we can use Linux capabilities to add this capability.

Your error might look like:

(13)Permission denied: AH00072: make_sock: could not bind to address [::]:80

Then you can add the capability NET_BIND_SERVICE to the container instead of adding the privileged mode

$ docker run -it --rm --cap-drop=ALL --cap-add=NET_BIND_SERVICE php:apache

For more information on using Linux Capabilities see our other lesson about configuring Container and Linux Capabilities

Please note that the bind behavior has changed in Docker since version 20.03 to allow lower port binding. So if you see this error best to check/upgrade the container runtime version.

Seccomp - filesystem permission frustion

Trust us , we've been there seccomp is not always easy. Sometimes you think you've set up all the permission you think it needs but there is still some permission issue. You can relax the seccomp contraints while mounting filesystems. More specific volume mount option besides the ro/rw flag also support a -z and -Z flag to relax seccomp file permissions. Matt Jarvis has written a more detailed blog on privileged mode and seccomp.

Seccomp outdated - raspberry pi/alpine

There is a known issue with older versions of seccomp and newer versions of the docker runtime on Alpine systems/Raspbian. This manifest itself, for example, by not being able to run apt update as root in a container. The solution is to update libseccomp to a new version; sometimes this is not available on your OS updates so you have to update manually. See https://docs.linuxserver.io/faq for more details.

Even developers of the popular Home Assistant automation package added --privileged in the documentation as a recommended way to run their container. As described in this issue https://github.com/home-assistant/core/issues/52647 it shows it was not really needed.

Enforcing Kubernetes securityContext capability settings

As a Kubernetes admin, you can configure the cluster to enforce securityContext settings. Kubernetes has the PodSecurityPolicy controller built in which allows you to enforce securityContext settings.

More specifically you can force that users (a) not use privileged, (b) must run as non root and (c) disables any escalation of privilege happening in a container.

  • privileged : determines if any container in a pod can enable privileged mode. By default a container is not allowed to access any devices on the host, but a "privileged" container is given access to all devices on the host. This allows the container nearly all the same access as processes running on the host. This is useful for containers that want to use linux capabilities like manipulating the network stack and accessing devices.
  • MustRunAsNonRoot : Requires that the pod be submitted with a non-zero runAsUser or have the USER directive defined (using a numeric UID) in the image. Pods which have specified neither runAsNonRoot nor runAsUser settings will be mutated to set runAsNonRoot=true, thus requiring a defined non-zero numeric USER directive in the container. No default provided. Setting allowPrivilegeEscalation=false is strongly recommended with this strategy.
  • AllowPrivilegeEscalation : Gates whether or not a user is allowed to set the security context of a container to allowPrivilegeEscalation=true. This defaults to allowed so as to not break setuid binaries. Setting it to false ensures that no child process of a container can gain more privileges than its parent.

See Kubernetes documentation for more info

Please note that PodSecurityPolicy has been deprecated in the v1.21 release and is scheduled for removal in v1.25.

It will eventually be replaced by the new Pod Security Admission feature; however, as of v1.22, it is still in alpha status. Good alternatives are the externally maintained projects Open Policy Agent or Kyverno.

Keep learning

To learn more about kubernetes security, check out our blog posts:

Finally, you can download our Kubernetes settings cheatsheet here.

Congratulations

Woohoo! You've learned what the risks are of using the --privileged mode flag. We hope you will stop using it as lazy way to bypass all permissions problems and spend the time setting the correct permissions. Feel free to rate how valuable this lesson was for you and provide feedback to make it even better! Also, make sure to check out our lessons on other common vulnerabilities.