Directory traversal
Unintended disclosure of sensitive files
Select your ecosystem
What is directory traversal?
A directory traversal attack aims to access files and directories that are stored outside the intended folder. By manipulating files with "dot-dot-slash (../)" sequences and its variations, or by using absolute file paths, it may be possible to access arbitrary files and directories stored on the filesystem; including application source code, configuration, and other critical system files.
About this lesson
In this lesson, you will learn how directory traversal works and how to mitigate it in your application. You will first use a directory traversal attack to hack a vulnerable web server. We will then explain directory traversal by showing you the backend code of that vulnerable server. Finally, we will teach you how to prevent directory traversal from affecting your code.
Ready to learn? Buckle your seat belts, put on your hacker's hat, and let's get started!
Hacking a to-do app
To increase revenue and survive until the next funding round, a company called startup.io decided to create a side product. Since the market for image hosting platforms has recently become a bit saturated, the firm made a call to build an app for managing to-do lists instead.
Sadly, their to-do app is vulnerable to directory traversal attack. Let's use a terminal window and curl to exploit the vulnerability. Our goal is to view the /etc/passwd stored on the backend server.
The application is hosted on https://todoapp.startup.io
. First, let's try to curl a page we should have access to.
Copy the following into the terminal: curl https://todoapp.startup.io/public/about.html
Listing the public page
We see the about.html page returned, which is to be expected. Notice that this HTML page is being served from the public directory.
Let's remove the about.html filename from this request and see if we can get a directory listing back.
Copy the following into the terminal: curl https://todoapp.startup.io/public/
Bingo! We've managed to list the files in the public directory. This is not a severe hack yet, since we're still in the public directory. However, showing a directory listing like this is a form of unnecessary information disclosure.
List one page up
Since we can list files in the public directory, maybe we could also traverse to other directories and see their contents? Let's add a ../
onto the URL to break up to the parent directory. Run the following command:
Copy the following into the terminal: curl https://todoapp.startup.io/public/../
Uh oh, we've lost our directory listing and are back into HTML! Our attempt at performing directory traversal has been caught! Sanitization exists, and it caught our malicious effort. It looks like we've been taken back to the to-do app homepage.
Circumventing sanitization
Our hope is not lost yet! There is a different way to represent a .
in the web world: URL encoding. Let's try to circumvent the sanitization by URL encoding the .
s. Replace the .
s with %2e
as follows:
Copy the following into the terminal: curl https://todoapp.startup.io/public/%2e%2e/
Congrats! You've broken out of the public directory. We can now step up our game and access some sensitive information.
Accessing sensitive information
I've got a suspicion we're running as the root user, so let's aim big and try to access some sensitive system information. For example, let’s access the /etc/passwd
file. To do this, run the following command:
Copy the following into the terminal: curl https://todoapp.startup.io/public/%2e%2e/%2e%2e/etc/passwd
Boom! We’ve managed to view the /etc/passwd
file. Imagine what else we could disclose if we poked around the filesystem for a bit longer. Maybe SSL certificates? Or database passwords with read/write access to production databases?
Let's see what went wrong on the startup.io
backend server, which allowed us to perform a directory traversal attack.
How does directory traversal work?
Essentially, the attack is accomplished by adding characters such as ../
into a URL that serves content from a directory structure. The content is usually served from a base directory, such as /public
. An attacker can supply filenames that contain ../
or a URL encoded equivalent %2e%2e%2f
. These URLs allow the attacker to break out of the base directory and view files stored in other folders on the filesystem.
The vulnerable code
To illustrate this, let's jump into the code. Below you will find the a function, which constructs a filesystem path from the URL. All files and directories returned by the function are served statically by the web server.
Validate canonical path
The most robust way to prevent directory traversal attacks is to avoid relying on user-supplied input when dealing with the filesystem APIs. Unfortunately, this is easier said than done and might require rewriting a considerable chunk of your application.
A more realistic mitigation mechanism is to prevent the user-supplied directory from being higher up on the filesystem than the directory used to serve static content.
For example, if an application serves files from /wwwroot/public/
, any canonical representation of the user requested path must start with /wwwroot/public/
. Otherwise, the request could break out of the target directory.
The code example below shows how to normalize the user-supplied path and check whether it starts in the expected directory. To achieve this in Java, use Path.normalize or similar.
Verify the input
The path normalization will deal with malicious inputs such as https://todoapp.startup.io/public/../
. However, we can still trick it by encoding the .
character as %2e
. To be fully protected, you need to sanitize your user-supplied data and get rid of unexpected inputs, for instance:
- Maintain a set of allowed filesystem paths and compare the user input against that set.
- Allow only alphanumeric characters and reject inputs that contain other characters.
If the above measures are impractical, consider disallowing dangerous characters explicitly. For example, the below code removes the URL-encoded characters from the user-supplied input:
Don't reinvent the wheel: use open-source libraries
Correct sanitization of user input is hard work and requires constant verification against newly discovered ways to bypass known protection methods. In almost all cases, it is a better choice to use a well-maintained open-source library.
For instance, consider building your web application with Spring Boot, which has built-in support for serving static content.
To decide which libraries to trust, use Snyk Advisor! Snyk Advisor provides information on a given package's popularity, community support, and security. Also, check your open source libraries with vulnerability scanners such as Snyk, which will notify you about all new vulnerabilities discovered in any libraries you are using, and will help you mitigate them easily.
How do you mitigate directory traversal?
To recap, to mitigate directory traversal in your codebase, avoid calling filesystem APIs with user-supplied data as input. If that is not practical, validate that the user-supplied path is a child of the directory which the application serves from. Remember to sanitize the input to prevent malicious payloads from tricking you through techniques such as URL encoding. Finally, instead of writing all the logic yourself, consider using popular open-source libraries which handle things for you.
Test your knowledge!
Keep learning
To learn more about directory traversal, check out some other great content produced by Snyk:
- If you want to discover more about the vulnerability present in
st
, watch our YouTube video and read our blog post - Read our white paper on Zip Slip, a directory traversal vulnerability that results in remote code execution
- If the white paper got you worried, learn how to mitigate Zip Slip in your code-base with our cheat sheet