Protecting your application from malicious regular expressions
What is ReDoS?
Regular expressions (regex) are incredibly powerful, but they aren't very easy to read or write, and most developers know just enough to be dangerous.
But first - if you are scratching your head about "what even is regex?", then I recommend you take a quick look at https://regexone.com/ where you can learn about regex with some interactive exercises before continuing on.
Developers use regex for many purposes, including in web applications to validate user input. For example, when validating:
As you can see, things get complex quickly.
Only a few developers realize that a poor construction of regex patterns can ultimately lead to an attacker compromising your application. One possible way is through what is known as Regular Expression Denial of Service (ReDoS).
A ReDoS attack attempts to slow down or even render an application unavailable. It is attacked the
A as in Availability in the famous CIA triad of cybersecurity. Attackers do this by providing an application with a malicious string to be processed by its regex engine against a vulnerable (or "evil") regex pattern. The processing of the malicious string exhausts the computing power or memory available, thus impacting the application's performance and, in certain circumstances, causing a denial of service (or DoS).
Preventing ReDoS attacks usually comes down to good practices when defining your regex patterns.
About this lesson
In this lesson, you will learn how ReDoS works and how to protect your applications against attacks. We will begin by exploiting a ReDoS vulnerability in a simple proof of concept. We will then dive under the hood. We will touch on how regex works and then the causes of ReDoS. Finally, we will provide some options for remediation and prevention.
Ready to learn? Grab your hacker's hat, and let's get started!
In 2016, Stack Overflow went down for 34 minutes. All because of this:
^[\s\u200c]+|[\s\u200c]+$ This regular expression was run on a malformed post that caused the regex to consume high CPU on their web servers. Thankfully, they weren't down too long!
In our example, cryptocurrency trading company Dark Chain Inc is struggling to compete with its competitor Fast Chain Inc and wants to find a way to damage its reputation.
When fuzzing the input fields of Fast Chain Inc email search endpoint, they observe the response time increases for particular failed input. Hmm... this might be a problem with the regex on the backend. Let's see what we can break!
Testing for ReDoS
- STEP 1
- STEP 2
- STEP 3
- STEP 4
- STEP 5
Setting the stage
We want to take down our competitor and we think we might have found a way. Increased response times for longer searches are a big hint! Let's begin!
As we can see from the responses, the application's time to process the request grows exponentially as we keep adding characters to our search term.
As Block Chain Inc keeps increasing the size of its malicious string, at some point, the application just hangs indefinitely. This eventuates into a ReDos attack.
The outage to Fast Chain Inc's system causes significant reputation damage to the business as clients begin to question the reliability of their services.
Let's take a look at some example code. It isn't a complete program but it'll highlight the vulnerable parts.
How does a ReDOS attack happen?
Let's walk through how a ReDoS attack happens. With regex, things can get complex quickly, which is why this vulnerability is so prevalent. So we'll keep the example simple.
Let's take the following regex pattern:
Let's break it down to understand what it is evaluating:
A Character( Capturing group 1. Groups multiple tokens together and creates a capture group for extracting substring or uses a back referenceB Character| Alternation. Acts like a boolean ORC Character+ Quantifier. Match 1 or more of the proceeding token)+ Quantifier. Match 1 or more of the proceeding tokenD Character
Visualizing regex patterns can also be very useful in gaining an understanding of its flow:
In this example, our regex would match:
Our command will be:
time node -e 'console.log(/A(B|C+)+D/.test("ABBD"))'
To break down this command:
time: Times the execution
console.log(): Writes the output to the console
/A(B|C+)+D: our regex
.test(): Executes a search for a match between a regular expression and a specified string. Returns true or false.
"ABBD": The string we're matching on
Try the following and observe the execution times (copy and paste each line separately into terminal below and press
echo "ABBD ACCCCCCCCCCCCCCCCCCCCCCD ACCCCCCCCCCCCCCCCCCCCCCCCD ACCCCCCCCCCCCCCCCCCCCCCCCCCD" | xargs -I PATTERN -n1 time node -e "console.log(/A(B|C+)+D/.test('PATTERN'))"
echo "ABBX ACCCCCCCCCCCCCCCCCCCCCCCCCCE ACCCCCCCCCCCCCCCCCCCCCCCCE ACCCCCCCCCCCCCCCCCCCCCCCCCCE" | xargs -I PATTERN -n1 time node -e "console.log(/A(B|C+)+D/.test('PATTERN'))"
What did you observe with the execution times? Let's graph the output and see what we can see. Execution time for successful matches:
Execution time for unsuccessful matches:
From this, we can see the exponential growth occurring for the failed matches as our string grows.
This is essentially the cause of ReDoS attacks. As the failed string length grows linearly, the execution time of the "evil" regex grows exponentially.
But why does this happen?
Tell me why!
Logically, to understand what ReDoS is, we need to take a quick detour and explain why this is an inherent issue in many regex engines.
To perform a match, a regex engine constructs what is known as a Nondeterministic Finite Automaton (NFA). Without going deep into computer science theory, essentially, what this means is that there may be several possible next states for a given state. To traverse an NFA regex engine uses a deterministic algorithm. This algorithm tries one by one all the possible paths (if needed) until a match is found (or all the paths are tried and fail).
To explain this more practically, the engine will match the first possible way to accept the current character and proceed to the next one. If it then fails to match the next one, it will “backtrack” and see if there was another way to digest the previous character.
When we start adding repetition (using "+" or "*") to our regex pattern, the number of possible paths that exist can begin to grow exponentially.
When a regex engine goes down a path, and the string doesn't match in the end, and if many characters have multiple valid regex paths, the number of backtracking steps can become very large, resulting in what is known as catastrophic backtracking.
Regex patterns resulting in catastrophic backtracking are commonly known as "evil regex" and should be avoided.
Examples of evil regex include (but certainly not limited to):
From our previous example of the evil regex
Using the string: "ACCCX", there are 4 different paths that could match those three C's:
The regex engine has to try each of those combinations to see if any of them potentially match against the expression. When you combine that with the other steps the engine must take, we can see the engine has to take a total of 38 steps before it can determine the string doesn't match.
From there, the number of steps the engine must use to validate a string just continues to grow.
- Number of C's: 3
- Number of steps: 38
- Number of C's: 4
- Number of steps: 71
- Number of C's: 5
- Number of steps: 136
- Number of C's: 14
- Number of steps: 65,553
By the time the string includes 14 C's, the engine has to take over 65,000 steps just to see if the string is valid. Now imagine this process as it would apply to our original string: "ACCCCCCCCCCCCCCCCCCCCCCCCCCCCX". You can see how this backtracking can get out of control very quickly.
Given how the regex engine evaluates a regular expression, the massive jump in the time required to process an invalid expression becomes clearer. There is an exponential relationship between the length of the string and the number of paths the engine has to evaluate for evil regexes.
Impacts of ReDoS
The aim of an attacker executing a ReDoS attack is to provide input to the regex so that it takes a large amount of computation time. The attacker's expected outcome is to either slow down the application or cause it to crash.
As its name suggests, the primary impact of a ReDOS attack is to cause a denial of service against a target system. Many different systems can be the target of a ReDoS attack. For instance, an attacker can target:
- a client-side application causing the user's browser to hang,
- a web application firewall (or WAF) causing a denial of service for all services sitting behind it, or
- a web application, causing it to degrade or crash.
In 2019, Cloudflare's ReDoS outage took down a large chunk of the internet for 27 minutes. The root cause was an evil regex - you can read their post-mortem here.
On July 2nd, 2019, Cloudflare deployed a new regex rule in their WAF. this update caused CPUs to become exhausted on every CPU core that handles HTTP/HTTPS traffic on the Cloudflare network worldwide.
The Node.js runtime architecture implements a single-threaded Event Loop. The single-threaded Event Loop architecture is highly scalable, as threads never sit and "wait", but it introduces a problem when a function takes a long time to complete.
Since the Node.js runtime runs only one computational thread, such a long-running function would make the entire process hang. This is why Node.js is significantly affected by ReDoS vulnerabilities.
The most obvious way to prevent ReDoS attacks would be to avoid using regex for user input validation. Unfortunately, for most applications, this is not a practical solution.
A more realistic mitigation mechanism is to build non-vulnerable expressions. When writing regex, closely review and analyze all patterns before implementation to ensure they do not contain any evil regex patterns.
Unfortunately, it's actually very tricky to spot evil regex patterns. However 3 key patterns to look out for are:
- (a+)+ : Nesting quantifiers
- (a|a)+ : Quantified overlapping disjunctions
- \d+\d+ : Quantified Overlapping Adjacencies
If you are not a regex guru (and who truly is), consider using regex patterns from online sources (such as https://owasp.org/www-community/OWASP_Validation_Regex_Repository) that have already been vetted for evil patterns.
Additionally, do not allow users to provide input into the regex pattern. If an attacker can modify the regex, they can construct a dangerous pattern to make the system vulnerable to ReDoS.
Bonus: ReDoS in the wild
In his Proof-of-Concept, Li demonstrated that by appending a long string of space characters (in his case, 5,000) followed by an exclamation point to the user agent, an attacker could force a condition called catastrophic backtracking.
The maintainer responded in less than 24 hours, indicating that he would publish a fix in the coming days and that it would be acceptable to publish a CVE once the fix was available. On September 12th, less than 72 hours after the initial disclosure, the package maintainer committed a fix to the repository on GitHub and notified Snyk that the new version was available.
For a detailed write-up of this disclosed vulnerability, see https://snyk.io/blog/regular-expression-denial-of-service-redos-in-uaparser-js/
To learn more about ReDoS, check out some other great content produced by Snyk:
- Read a blog post about ReDos and catastrphic backtracking
- ReDoS vulnerabilities continue to grow
- ReDoS in the wild! Regular Expression Denial of Service (REDoS) in UAParser.js
You've learned what ReDoS is and how to protect your systems from it. We hope you will apply your new knowledge wisely and make your code much safer. Feel free to rate how valuable this lesson was for you and provide feedback to make it even better! Also, make sure to check out our lessons on other common vulnerabilities.