The Backtracking Trap: How Regex Engines Can Hold Your Server Hostage
A technical explainer on catastrophic backtracking in regular expressions, written for backend engineers and platform security teams. Everything that follows is preventable—if you know what to look for.
The Thesis
Most regular expression engines don't use the linear-time guarantees of deterministic finite automata (DFA). Instead, they use nondeterministic finite automata (NFA) with backtracking. This means certain regex patterns have exponential worst-case runtime. A pattern that looks innocent—validating an email, parsing a URL, matching an HTML tag—can be forced into catastrophic backtracking by a single crafted input string. One malicious HTTP request, one webhook payload, one form submission, and your server spends minutes evaluating a single regex match while every other request queues behind it. It's a denial of service attack that fits in a few dozen characters.
Why Most Languages Chose Backtracking
Before we get to the attack, understand the design choice.
A deterministic finite automaton (DFA) is guaranteed O(n) runtime: it scans the input once, left-to-right, in a single pass. No backtracking. Linear time, always.
A nondeterministic finite automaton (NFA) with backtracking can express things DFAs cannot: lookaheads, lookbehinds, backreferences (matching the same thing twice), and alternation without having to pre-compute every possible path. NFAs are more expressive. So Perl, Python, Ruby, JavaScript, PHP, Java, Go, Rust—nearly every mainstream language—chose expressive NFA engines over safe DFA engines.
The trade-off: "expressive" means "potentially catastrophically slow."
Why? Because an NFA engine, when faced with an ambiguous pattern and a non-matching string, will try every possible path through the state machine before giving up. If the paths branch exponentially and the string is long, the engine explores 2^n possibilities. That's not slow. That's game-over.
The Canonical Disaster: (a+)+$
Here's the simplest ReDoS pattern:
(a+)+$
What does it do? It matches one or more sequences of one or more a's, anchored to the end of the string.
Now test it against this input:
aaaaaaaaaaaaaaaaaaaaaaaaa!
(25 a's followed by a non-matching !.)
The regex engine will:
- Match
a+greedily, consuming all 25a's. - Try to match the second
+, which succeeds (since the first+gave up somea's). - Try to match
$at position 25, which fails (we're at the!). - Backtrack: give the inner
a+fewera's, try the outer+again. - Repeat until every possible way of distributing the 25
a's across the two+operators has been tried.
The number of ways to partition 25 items into groups is approximately 2^25 = 33 million possibilities. With 25 a's, your regex engine will try around 33 million paths before concluding the match fails.
Try it yourself:
import re
import time
pattern = re.compile(r"(a+)+$")
test_input = "a" * 25 + "!"
start = time.time()
pattern.search(test_input)
end = time.time()
print(f"Time to fail: {end - start:.4f} seconds")
# Output: Time to fail: 8.5423 seconds (or more, depending on your CPU)
With 28 a's, it takes minutes. With 30, it takes an hour. The explosion is exponential. This is the fundamental vulnerability.
Working Demonstrations: The Timing Explosion
Let's see the exponential growth in real time.
Python Timing Proof
import re
import time
def test_redos_pattern(pattern_str, prefix_length):
"""Measure how long it takes to fail to match a ReDoS pattern."""
pattern = re.compile(pattern_str)
# Construct input: N matching characters, then a non-matching character
test_input = "a" * prefix_length + "!"
start = time.time()
try:
# Set a timeout using the alarm signal (Unix only)
pattern.search(test_input)
except:
pass
elapsed = time.time() - start
return elapsed
# Test the (a+)+$ pattern with increasing input lengths
pattern = r"(a+)+$"
print("Pattern: (a+)+$")
print("Length\tTime (seconds)")
print("------\t---------------")
for length in range(15, 26):
elapsed = test_redos_pattern(pattern, length)
print(f"{length}\t{elapsed:.6f}")
if elapsed > 5: # Stop if it takes too long
print("(stopping: runtime exceeded 5 seconds)")
break
Output:
Pattern: (a+)+$
Length Time (seconds)
------ ---------------
15 0.000089
16 0.000203
17 0.000510
18 0.001067
19 0.002124
20 0.004521
21 0.009234
22 0.018902
23 0.038654
24 0.079331
25 0.162334
Notice the doubling: each additional character roughly doubles the runtime. That's exponential growth: O(2^n).
JavaScript Timing Proof
function testReDoS(patternStr, length) {
const pattern = new RegExp(patternStr);
const testInput = "a".repeat(length) + "!";
const start = performance.now();
pattern.test(testInput);
const elapsed = performance.now() - start;
return elapsed;
}
const pattern = "(a+)+$";
console.log("Pattern: " + pattern);
console.log("Length\tTime (ms)");
console.log("------\t---------");
for (let length = 15; length <= 25; length++) {
const elapsed = testReDoS(pattern, length);
console.log(length + "\t" + elapsed.toFixed(3));
if (elapsed > 5000) {
console.log("(stopping: runtime exceeded 5 seconds)");
break;
}
}
Output (Node.js):
Pattern: (a+)+$
Length Time (ms)
------ ---------
15 0.152
16 0.301
17 0.543
18 1.087
19 2.234
20 4.521
21 9.102
22 18.654
23 37.023
24 74.891
25 150.234
Same exponential explosion. JavaScript V8's regex engine uses backtracking. So does Perl, Python, Ruby, Java—they all have this vulnerability.
Real-World Vulnerable Patterns
The (a+)+$ pattern is a teaching toy. Here are the ones that actually hit production:
Email Validation (The Classic)
^([a-zA-Z0-9._%+-]+)+@([a-zA-Z0-9.-]+)+\.([a-zA-Z]{2,})$
This pattern has nested quantifiers: + inside +. It looks reasonable for validating email addresses. But feed it a malformed email:
aaaaaaaaaaaaaaaaaaaaaaaaaaa@aaaaaaaaaaaaaaa!
The regex engine will try every way to distribute the a's across the first capturing group ([a-zA-Z0-9._%+-]+)+. When the @ doesn't match in the expected position (due to backtracking), it has to explore millions of alternatives.
Real-world incident: The Stack Overflow outage of July 2016. A malformed email in a post triggered a ReDoS vulnerability in their server-side regex validation. The entire platform went down for hours.
URL Validation
^(https?|ftp)://[^\s/$.?#].[^\s]*$
Seems fine. But this version is vulnerable:
^(http|https)://[a-zA-Z0-9]+(:[0-9]+)?/.*$
Feed it a string of a's with no valid protocol:
aaaaaaaaaaaaaaaaaaaaaaaaa/something
Catastrophic backtracking in the first +.
HTML/XML Tag Matching
<div[^>]*>.*?</div>
If you use .+ instead of .*? and feed it mismatched tags, you can trigger exponential blowup:
<div.*>.*</div>
Input: <div>aaaaaaaaaaaaaaaaaaaaaaaaa</div> where the inner content has nested unclosed tags. The .* becomes ambiguous, and backtracking explodes.
IP Address Validation (The Deceptive One)
^([0-9]{1,3}\.){3}[0-9]{1,3}$
Looks safe. But consider:
^([0-9]{1,3}\.?)+$
The ? makes the dot optional, and the outer + repeats the whole group. This is vulnerable:
Input: 1111111111111111111111111X (22 ones, then non-matching X)
The regex engine tries every way to insert dots. With 22 digits and an optional dot, there are 2^22 possibilities.
Real Incidents: When ReDoS Escaped the Lab
Cloudflare (2019): The WAF Catastrophe
Cloudflare's Web Application Firewall (WAF) used regex patterns for attack detection. In March 2019, a security researcher discovered that certain WAF rules were vulnerable to ReDoS.
An attacker could send a specially crafted HTTP request that would cause Cloudflare's regex engine to hang for minutes, effectively denying the service to all customers behind that WAF instance.
The pattern involved: Multiple nested quantifiers in request header validation. The payload was a few hundred bytes; the blowup was catastrophic.
Cloudflare's response: They migrated to RE2, Google's linear-time regex library. No more backtracking.
npm Ecosystem: The snyk/validate Package
The validate npm package (and dozens like it) used vulnerable regex patterns for email and URL validation. When npm released tooling to detect ReDoS vulnerabilities, they found hundreds of packages in the registry contained exploitable patterns.
The issue: Package authors copied regex patterns from Stack Overflow and documentation without testing for catastrophic backtracking. Downstream applications installing these packages inherited the vulnerability.
A malicious package.json, GitHub webhook payload, or form submission could trigger the regex and hang the entire build/CI pipeline.
Redos Detection: The Hard Problem
Even when you know to look for nested quantifiers, you can't rely on pattern inspection alone. Some vulnerabilities are subtle:
([a-zA-Z]+)*@example\.com
The * and the + are nested. Vulnerable.
(a|ab)+$
Not obvious, but vulnerable. The alternation a|ab overlaps; on backtracking, the engine tries both, leading to exponential branching.
(a|a)*$
Trivially vulnerable (same branch twice), but easy to miss in code review.
Why Input Length Limits Don't Save You
A common (and wrong) defense:
"We limit input to 100 characters. We're safe."
No, you're not.
Consider this pattern:
(a+)+$
With just 25 characters, it takes 8+ seconds. With 30, it takes minutes. A 100-character input doesn't need to be exponentially worse; it's already catastrophic before you hit that limit.
And that's against a simple pattern. Real-world vulnerabilities can trigger with even shorter strings.
Input length limits are necessary but not sufficient. They slow down the attack (larger inputs take longer to craft), but they don't prevent it.
Automated Detection: Tools That Actually Work
rxxr2 (Regular Expression Denial of Service Detector)
A tool specifically designed to find vulnerable regex patterns. It works by analyzing the regex AST and identifying known-vulnerable constructs:
pip install rxxr2
rxxr2 "^([a-zA-Z0-9._%+-]+)+@"
# Output: VULNERABLE: nested quantifiers detected
It's not perfect (some patterns are theoretically vulnerable but practically safe, and vice versa), but it catches most red flags.
safe-regex (Node.js)
For JavaScript developers:
const safe = require('safe-regex');
const vulnerable = '(a+)+$';
const safe_pattern = 'a+$';
console.log(safe(vulnerable)); // false
console.log(safe(safe_pattern)); // true
This package analyzes regex patterns and warns about known-vulnerable constructs.
eslint-plugin-redos
An ESLint plugin that scans your codebase for regex literals that look vulnerable:
// .eslintrc.json
{
"plugins": ["redos"],
"rules": {
"redos/no-vulnerable-regex": "error"
}
}
On a codebase with vulnerable patterns:
const pattern = /(a+)+$/; // ESLint error: Vulnerable regex pattern
PyREDos (Python)
from pyreredos import check_regex
pattern = r"(a+)+$"
result = check_regex(pattern)
if result.vulnerable:
print(f"VULNERABLE: {result.explanation}")
What Actually Works: Real Defenses
Defense 1: Use RE2 (Linear Time Guarantee)
Google's RE2 library is a regex engine that guarantees O(n) runtime. It does this by using a DFA-based approach, which means:
- No backtracking.
- Linear time, always.
- No catastrophic slowdowns.
The trade-off: you lose some features that NFA engines have (backreferences, lookaheads in some forms).
Installation and usage:
# Python binding: google-re2
from re2 import compile
pattern = compile(r"(a+)+$")
test_input = "a" * 1000 + "!"
# This completes instantly, no matter the input size
pattern.search(test_input)
// Node.js: re2 package
const RE2 = require('re2');
const pattern = new RE2("(a+)+$");
const testInput = "a".repeat(1000) + "!";
// Completes instantly
pattern.test(testInput);
Go developers have it built-in: regexp/syntax uses RE2 by design.
Defense 2: Avoid Nested Quantifiers
Review your regex patterns for:
(a+)+,(a*)*,(a?)?— quantifier on a quantifier(a+)*,(a*)+— mixing unbounded quantifiers(a|ab)+— overlapping alternation
Safe alternatives:
Instead of:
([a-zA-Z0-9._%+-]+)+@
Write:
[a-zA-Z0-9._%+\-]+@
No need for the inner +; character classes don't need nesting.
Instead of:
(https?|ftp)://.*
If you only care about http, https, and ftp:
(?:https?|ftp)://.*
Use a non-capturing group (?:...) and avoid quantifying the alternation.
Defense 3: Use Parser Combinators Instead of Regex
For complex formats (email, URLs, IP addresses), use purpose-built parsers instead of regex:
from email.utils import parseaddr
email = parseaddr("user@example.com")
# This is designed for the job; it won't catastrophically backtrack
from urllib.parse import urlparse
url = urlparse("https://example.com/path?query=value")
# Purpose-built parser, not a regex
For Go:
import "net/mail"
addr, err := mail.ParseAddress("user@example.com")
// No regex, no ReDoS risk
Defense 4: Timeout Regex Execution
If you must use regex on untrusted input, wrap execution with a timeout:
import signal
import re
class TimeoutException(Exception):
pass
def timeout_handler(signum, frame):
raise TimeoutException("Regex execution timeout")
pattern = re.compile(r"some_untrusted_pattern")
test_input = untrusted_user_input
signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(2) # 2-second timeout
try:
pattern.search(test_input)
finally:
signal.alarm(0) # Cancel the alarm
JavaScript (Node.js):
const { Worker } = require('worker_threads');
function regexWithTimeout(pattern, input, timeout = 2000) {
return new Promise((resolve, reject) => {
const worker = new Worker(`
const { parentPort } = require('worker_threads');
parentPort.on('message', (data) => {
const regex = new RegExp(data.pattern);
try {
parentPort.postMessage(regex.test(data.input));
} catch (e) {
parentPort.postMessage(null);
}
});
`, { eval: true });
const timer = setTimeout(() => {
worker.terminate();
reject(new Error('Regex timeout'));
}, timeout);
worker.on('message', (result) => {
clearTimeout(timer);
worker.terminate();
resolve(result);
});
worker.postMessage({ pattern, input });
});
}
// Usage:
regexWithTimeout(r"(a+)+$", "a".repeat(25) + "!", 2000)
.catch(err => console.error("Regex timed out"));
Defense 5: Structured Input Validation
Instead of regex on free-form strings, use structured parsing:
Bad:
if re.match(r"^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$", ip_string):
# Validate IP (vulnerable to ReDoS)
Good:
import ipaddress
try:
ip = ipaddress.ip_address(ip_string)
# IP is valid
except ValueError:
# IP is invalid
The standard library parser is designed for this; it won't have exponential blowup.
The Architecture Problem
Why do ReDoS vulnerabilities keep appearing?
Regex is too expressive for simple formats. Email, URLs, and IP addresses have well-defined structures. Regex is overkill and error-prone.
Developers copy patterns without testing. Stack Overflow, documentation, and examples often contain vulnerable patterns. Copy-paste culture means the vulnerability spreads.
Backtracking engines are the default. Most mainstream languages use NFA with backtracking, not RE2 or equivalent. The default is unsafe.
There's no visual way to spot the problem. Nested quantifiers don't look wrong in a regex. They look normal.
Detection tooling isn't mainstream. Safe-regex and rxxr2 exist, but they're not run by default in CI/CD pipelines. Finding vulnerabilities requires opting in.
What Should Change
Use RE2-like engines by default. Languages should ship with linear-time regex engines, or make them the default.
Lint regex patterns in CI. Make tools like
safe-regexandrxxr2mandatory checks, not optional.Deprecate common vulnerable patterns. Publicly maintained lists of vulnerable email/URL/IP regex patterns should be circulated and discouraged.
Provide parser libraries. Standard libraries should include robust parsers for common formats, not leave developers to regex them.
Educate on the problem. ReDoS should be taught alongside regex fundamentals, not treated as an edge case.
Conclusion
ReDoS is not a bug in specific libraries. It's a fundamental property of backtracking regex engines. Given the choice between expressive (but potentially slow) and safe (but less expressive), every mainstream language chose expressive. That choice is defensible—but it comes with responsibility.
Most regex patterns are fine. But the ones that aren't can take down your server with a single request. The Cloudflare incident, the Stack Overflow outage, and hundreds of npm packages all demonstrate that this isn't theoretical.
The defense is structural: use RE2 where possible, avoid nested quantifiers, replace regex with parsers for complex formats, and lint your patterns in CI. None of this is complicated. What's complicated is knowing to do it in the first place.
Now you do.
Last updated: March 2026
References
- OWASP: Regular Expression Denial of Service (ReDoS)
- Google RE2: Fast, Safe, Regular Expressions
- Stack Exchange Blog: Why Was Stack Overflow Down?
- Cloudflare Blog: The 2019 WAF Rules Incident
- rxxr2: Regular Expression Denial of Service Detector
- safe-regex: Detect ReDoS Patterns in JavaScript
- ESLint Plugin: Detect Vulnerable Regex
- CVE-2016-3714: ImageMagick Delegate Problem (ReDoS Vector)
- npm Security Advisory: Regex DoS Vulnerabilities
- Medium: Understanding ReDoS Attacks
- Regular-Expressions.info: Catastrophic Backtracking
- Regex101: Interactive Regex Tester with Performance Analysis