Container does not drop all default capabilities

Default capabilities are not as strict as you think

Kubernetes

Container capabilities: the basics

What are those container capabilities or privileges?

A container is, in essence, just a process that runs on the host system. The permissions this container/process receives from the host system depends on how the container was launched.

To make containers more secure, we should provide the containers with the least amount of privileges it needs to run. The container runtime assigns a set of default privileges (capabilities) to the container. In contrast to what you might expect, this default set of privileges is, or can be, harmful. Therefore the best practice is to drop all privileges and only add the ones you need.

About this lesson

In this lesson, we will show how you can secure your containers by giving them only the privileges they need. First, we will show you that a default configured container system gives you far more privileges than most processes need, unnecessarily opening them up to exploits. We will explain how Linux capabilities work and how they can be used to make your containers more secure. Finally, we will teach you how to configure your Kubernetes code to only allow the least amount of capabilities.

Always thought what happens in a container, stays in a container? Let's see how we can make this a reality by reading on!

FUN FACT

Principle of Least Privilege

This mindset of giving only the privileges needed to complete a task is called the Principle of Least Privilege

Hacking into a container with default capabilities

What can you do if you have access to a container

After some careful crafting of a URL request, we manage to get access to a root shell on a remote system (how that's done, I'll leave that for another lesson). We get really excited and start to explore what we can do.

Are we inside a container?

First, we check what processes are running on the machine. We run ps -aux to see the list of processes. Hmmm, that is a bit disappointing! We only see a couple of processes, where's the rest? Maybe we are in a container? That's easy to check : we run cat /proc/1/cgroup to find out. We see it has the word docker in there, so yes it's a container. Ok, not the full access we hope for, but hey ... as if that is going to stop us. Next let's find out more about the network.

To see this, copy and paste the following commands one at a time into the terminal and hit enter:

ps -aux

cat /proc/1/cgroup

Demo terminal

Finding the Docker host IP address

So what does the network look like? With ip route we can find out our router which is very likely to be the docker host system. From the output we figure out that we are on the network 172.17.0.0 with a /16 subnet mask. And that the router is the ip-address 172.17.0.1.

Note: you may need to install iproute 2 by running 'apt-get install iproute2'

Copy and paste the following into the terminal and hit enter:

'ip route'

Demo terminal

Finding web servers on the Docker network

What else is running on the network? We find out by running the network scanning tool masscan. We ask it to find all port 80 or ports between 8000-9000 open.

Copy and paste the following into the terminal and hit enter:

masscan -p80,8000-9000 172.17.0.0/24 --rate=10000

In the scan results we see a few other containers seem indeed to have web server ports running the Docker host itself is running services on port 8443 & 8080

We quickly download curl and try to connect.

Copy and paste the following into the terminal and hit enter:

curl http://172.17.0.3:80

You will see the output <html><body><h1>It works!</h1></body></html>

Nice! We can see it has a plain apache2 web server running--wouldn't it be cool if we can listen in on network traffic? Let's get ready for some arp spoofing!

Demo terminal

ARP spoofing

Armed with the ip-address of the host and the ip-address of a web server we can set up arp-spoofing. We verify that IP forwarding is enabled in our container so that our container can route traffic. cat /proc/sys/net/ipv4/ip_forward yields 1

We install dniff a package that contains the arpspoof tool: apt-get install dsniff

Finally, we put our container in the middle of the traffic by both redirecting traffic from Docker host (-t) to the web server and from the web server (-t) to the Docker host.

arpspoof -r -t 172.17.0.3 172.17.0.1 & > /dev/null 2>&1
arpspoof -r -t 172.17.0.1 172.17.0.3 & > /dev/null 2>&1

We are ready to intercept the traffic.

Now we listen in on the network traffic using tcpdump and wait for an admin to login in the web server ... patience my friend. All of a sudden we see traffic getting in, and look, it contains a basic Authentication header.

tcpdump -s 0 -A host 172.17.0.3 and dst port 80
...
Authorization: Basic YWRtaW46c255a2xlYXJuaXNhd2Vzb21lCg==
User-Agent: curl/7.64.0

Copy and paste the following into the terminal and hit enter to see it in action:

tcpdump -s 0 -A host 172.17.0.3 and dst port 80

Aha! Now we just need to decode this basic authentication to get the user & password of the web server. Luckily, that's easy because it's just base64 encoded

Now copy and paste the following into the terminal and hit enter to find the password:

echo YWRtaW46c255a2xlYXJuaXNhd2Vzb21lCg== | base64 -d

Et voila, we successfully got access to another service.

Demo terminal

$fun & $profit

In the example attack we've shown how we got access to network traffic between other containers all from getting root access inside another container. Over the years this technique has been used to cause all sorts of havoc from sniffing the traffic (as we did), faking DNS requests in Kubernetes and even rerouting complete networks over ipv6.

As we'll explain in the next section, the out-of-thebox security of running Docker containers still allows for serious security issues and requires you to harden configuration to improve security.

Here's some references of this in the wild:

How do Container and Docker Capabilities work?

A container is a process with special permissions

A container is in essence just a process that runs on the host system. The permissions the container/process receives from the host system depend on how the container was launched. In Linux there are many ways to give processes permissions. In the explanation we'll first go through the basics of file permissions and how they were extended with the concept of Linux Capabilities. From there we'll learn how we can read/set capabilities to improve security. Finally we explain how containers get launched with a default set of capabilities to give us permission to do things inside of a container.

The basics of giving permissions - setUID/setGUID

In security a big part of the job is allowing the right people access certain things while denying others.

Chances are that you've come across using the command chmod to set the permission of files and directories. For example chmod 775 somedir gives the owner and the group permission to read/write/enter the directory somedir and others only read and enter permissions. A bit more human friendly equivalent of the command would be chmod u+rwx,g+rwx,o+rx somedir basically translating these numbers into user friendly mnemonics.

mkdir somedir
chmod 775 somedir
ls -l
drwxrwxr-x 2 patrick staff 64 Sep 7 13:11 somedir/

As part of file permission options there is also the setUID (+s) or setGUID flag (+S) which allows users to become a certain user/group when they run a file. This comes in handy for a command like ping : on Linux , to send special ping packets (ICMP) it needs the root privilege (to open a NET_RAW socket) Older Linux systems therefore used the setUID flag to make the ping command run as root.

ls -l /sbin/ping
-r-sr-xr-x 1 root wheel 55344 Sep 22 2020 /sbin/ping

But ... that's a whole lot of privileges for just sending a ping! There must be a better way. That's why Linux capabilities were invented.

From setUID to capabilities

Starting with kernel 2.2, Linux divides the privileges traditionally associated with superuser into distinct units, known as capabilities, which can be independently enabled and disabled. To control the capabilities of processes and files we have a couple of useful commands. You get them as part of the libcap2-bin package (or similar to your OS).

  • capsh: cli to manage capabilities in a human friendly way
  • getcap: get the capabilities of a file
  • setcap: set the capabilities of a file

Exploring linux capabilities

To see the capabilities of your current shell process we can access the status of it.

We can find the capabilities of the current process.

Copy and paste the following into the terminal and hit enter:

grep Cap /proc/$BASHPID/status

This will give us a list of capabilities, but they aren't in an easy readable form.

Let's see how we can convert the capability number 000001ffffffffff. capsh to the rescue! We can ask it to decode these capabilities.

Copy and paste the following into the terminal and hit enter:

capsh --decode=000001ffffffffff

It will show a long list of capabilities. Each of these capabilities represent a specific set of permissions we have. For example cap_kill allows us to kill processes or cap_mknod allows us to create special devices.

Instead of decoding these capabilities one by one, we can use the command capsh --print to decode all capabilities at once.

Demo terminal

Ping capabilities - the right way

Now let's revisit our ping example, on modern versions of Linux the ping command does not have the setUID flag anymore but is given the cap_net_raw capability. This is much safer than giving the command complete root access. We can verify these capabilities using the getcap command: getcap /usr/bin/ping reports that ping now has the right privileges (= capability) and is much safer than the setUID root approach.

Copy and paste the following into the terminal and hit enter:

getcap /usr/bin/ping

Demo terminal

Default container capabilities

To relate this to containers, we must first understand how process permission delegation works.

On Linux, process ID 1 (INIT) has all the privileges after booting. After that, privileges are delegated and possibly restricted. The container system is no different, a container is nothing but a process with a set of given privileges delegated by the container system. Obviously, we want containers to only have a set of restricted permissions and in the following snippet of the Docker daemon source code you can see that it starts a container with a default set of capabilities.

This is already a very limited set of capabilities, but still it allows for example for cap_mknod, cap_net_raw. In our exploit we've shown that just by having the cap_net_raw we were able to hijack network traffic. It turns out that in most cases you can actually drop all privileges/capabilities, which results in a better security posture.

In case you do need to give your container a certain capability, it's good practice to first drop all capabilities and then allow the needed one after that. You might hesitate about the cap_net_bind_service though, isn't my app going to have to bind to the network? Well the permission only applies for binding to ports lower than 1024. So when you bind your server to for example 8080 you don't need that privilege.

A diagram that shows the difference between running with default capabilities and with restricted capabilities

Now that we know we don't need the default set of capabilities we will now move on showing you the syntax on how to drop them via configuration.

See the source code from the Docker Capabilities.

Scan your code & stay secure with Snyk - for FREE!

Did you know you can use Snyk for free to verify that your code
doesn't include this or other vulnerabilities?

Scan your code

How to improving Kubernetes capabilities configuration?

Dropping capabilities using securityContext

If we follow the principles of least privilege, the best practice from a security perspective would be to only provide the capabilities which our container actually needs. In Kubernetes, you can manage capabilities assigned to your container through the securityContext settings.

You can drop all capabilities by adding the following configuration to your .yaml file:

Dropping capabilities first, adding later

In the case where you do have to allow capabilities it is good practice to first drop all default capabilities and only then add only the ones you need. In the example below you can see how we first drop all capabilities and then add the NET_BIND_SERVICE capability.

Stop privilege escalation

As we've explained, capabilities are delegated from one process to another, but processes can also gain new privileges if they are allowed to. We can disable this behavior by:

  • not running a container as root: using the --user option, if we are regular user, the kernel will not pass us any special capabilities
  • disable privilege escalation: using the allowPrivilegeEscalation flag in the Kubernetes config, (equivalent to the Docker command line option --security-opt=no-new-privileges), we avoid that the privileges given to a process are extended

These additional measures help in creating a defense in depth by providing different layers of protection. In addition, Kubernetes and containers have other mechanisms to limit permissions: cgroups, seccomp and AppArmor. They can be used in combination with the capabilities limiting we've presented. The explanation of these concepts is beyond the scope of this lesson.

Enforcing Kubernetes securityContext capability settings

As a Kubernetes admin, the built-in Pod Security Admission Controller is one method for enforcing securityContext settings; this replaced the PodSecurityPolicy controller in Kubernetes 1.25. Alternatives include the CNCF projects Open Policy Agent or Kyverno.

Congratulations

Youโ€™ve learned what Linux capabilities are and how to apply them to improve container security. We hope you will apply your new knowledge wisely and make your code much safer. Also, make sure to check out our lessons on other common vulnerabilities.