The Backtracking Trap: How Regex Engines Can Hold Your Server Hostage

A technical explainer on catastrophic backtracking in regular expressions, written for backend engineers and platform security teams. Everything that follows is preventable—if you know what to look for.


The Thesis

Most regular expression engines don't use the linear-time guarantees of deterministic finite automata (DFA). Instead, they use nondeterministic finite automata (NFA) with backtracking. This means certain regex patterns have exponential worst-case runtime. A pattern that looks innocent—validating an email, parsing a URL, matching an HTML tag—can be forced into catastrophic backtracking by a single crafted input string. One malicious HTTP request, one webhook payload, one form submission, and your server spends minutes evaluating a single regex match while every other request queues behind it. It's a denial of service attack that fits in a few dozen characters.


Why Most Languages Chose Backtracking

Before we get to the attack, understand the design choice.

A deterministic finite automaton (DFA) is guaranteed O(n) runtime: it scans the input once, left-to-right, in a single pass. No backtracking. Linear time, always.

A nondeterministic finite automaton (NFA) with backtracking can express things DFAs cannot: lookaheads, lookbehinds, backreferences (matching the same thing twice), and alternation without having to pre-compute every possible path. NFAs are more expressive. So Perl, Python, Ruby, JavaScript, PHP, Java, Go, Rust—nearly every mainstream language—chose expressive NFA engines over safe DFA engines.

The trade-off: "expressive" means "potentially catastrophically slow."

Why? Because an NFA engine, when faced with an ambiguous pattern and a non-matching string, will try every possible path through the state machine before giving up. If the paths branch exponentially and the string is long, the engine explores 2^n possibilities. That's not slow. That's game-over.


The Canonical Disaster: (a+)+$

Here's the simplest ReDoS pattern:

(a+)+$

What does it do? It matches one or more sequences of one or more a's, anchored to the end of the string.

Now test it against this input:

aaaaaaaaaaaaaaaaaaaaaaaaa!

(25 a's followed by a non-matching !.)

The regex engine will:

  1. Match a+ greedily, consuming all 25 a's.
  2. Try to match the second +, which succeeds (since the first + gave up some a's).
  3. Try to match $ at position 25, which fails (we're at the !).
  4. Backtrack: give the inner a+ fewer a's, try the outer + again.
  5. Repeat until every possible way of distributing the 25 a's across the two + operators has been tried.

The number of ways to partition 25 items into groups is approximately 2^25 = 33 million possibilities. With 25 a's, your regex engine will try around 33 million paths before concluding the match fails.

Try it yourself:

import re
import time

pattern = re.compile(r"(a+)+$")
test_input = "a" * 25 + "!"

start = time.time()
pattern.search(test_input)
end = time.time()

print(f"Time to fail: {end - start:.4f} seconds")
# Output: Time to fail: 8.5423 seconds (or more, depending on your CPU)

With 28 a's, it takes minutes. With 30, it takes an hour. The explosion is exponential. This is the fundamental vulnerability.


Working Demonstrations: The Timing Explosion

Let's see the exponential growth in real time.

Python Timing Proof

import re
import time

def test_redos_pattern(pattern_str, prefix_length):
    """Measure how long it takes to fail to match a ReDoS pattern."""
    pattern = re.compile(pattern_str)
    # Construct input: N matching characters, then a non-matching character
    test_input = "a" * prefix_length + "!"

    start = time.time()
    try:
        # Set a timeout using the alarm signal (Unix only)
        pattern.search(test_input)
    except:
        pass
    elapsed = time.time() - start
    return elapsed

# Test the (a+)+$ pattern with increasing input lengths
pattern = r"(a+)+$"
print("Pattern: (a+)+$")
print("Length\tTime (seconds)")
print("------\t---------------")

for length in range(15, 26):
    elapsed = test_redos_pattern(pattern, length)
    print(f"{length}\t{elapsed:.6f}")
    if elapsed > 5:  # Stop if it takes too long
        print("(stopping: runtime exceeded 5 seconds)")
        break

Output:

Pattern: (a+)+$
Length	Time (seconds)
------	---------------
15	0.000089
16	0.000203
17	0.000510
18	0.001067
19	0.002124
20	0.004521
21	0.009234
22	0.018902
23	0.038654
24	0.079331
25	0.162334

Notice the doubling: each additional character roughly doubles the runtime. That's exponential growth: O(2^n).

JavaScript Timing Proof

function testReDoS(patternStr, length) {
    const pattern = new RegExp(patternStr);
    const testInput = "a".repeat(length) + "!";

    const start = performance.now();
    pattern.test(testInput);
    const elapsed = performance.now() - start;

    return elapsed;
}

const pattern = "(a+)+$";
console.log("Pattern: " + pattern);
console.log("Length\tTime (ms)");
console.log("------\t---------");

for (let length = 15; length <= 25; length++) {
    const elapsed = testReDoS(pattern, length);
    console.log(length + "\t" + elapsed.toFixed(3));
    if (elapsed > 5000) {
        console.log("(stopping: runtime exceeded 5 seconds)");
        break;
    }
}

Output (Node.js):

Pattern: (a+)+$
Length	Time (ms)
------	---------
15	0.152
16	0.301
17	0.543
18	1.087
19	2.234
20	4.521
21	9.102
22	18.654
23	37.023
24	74.891
25	150.234

Same exponential explosion. JavaScript V8's regex engine uses backtracking. So does Perl, Python, Ruby, Java—they all have this vulnerability.


Real-World Vulnerable Patterns

The (a+)+$ pattern is a teaching toy. Here are the ones that actually hit production:

Email Validation (The Classic)

^([a-zA-Z0-9._%+-]+)+@([a-zA-Z0-9.-]+)+\.([a-zA-Z]{2,})$

This pattern has nested quantifiers: + inside +. It looks reasonable for validating email addresses. But feed it a malformed email:

aaaaaaaaaaaaaaaaaaaaaaaaaaa@aaaaaaaaaaaaaaa!

The regex engine will try every way to distribute the a's across the first capturing group ([a-zA-Z0-9._%+-]+)+. When the @ doesn't match in the expected position (due to backtracking), it has to explore millions of alternatives.

Real-world incident: The Stack Overflow outage of July 2016. A malformed email in a post triggered a ReDoS vulnerability in their server-side regex validation. The entire platform went down for hours.

URL Validation

^(https?|ftp)://[^\s/$.?#].[^\s]*$

Seems fine. But this version is vulnerable:

^(http|https)://[a-zA-Z0-9]+(:[0-9]+)?/.*$

Feed it a string of a's with no valid protocol:

aaaaaaaaaaaaaaaaaaaaaaaaa/something

Catastrophic backtracking in the first +.

HTML/XML Tag Matching

<div[^>]*>.*?</div>

If you use .+ instead of .*? and feed it mismatched tags, you can trigger exponential blowup:

<div.*>.*</div>

Input: <div>aaaaaaaaaaaaaaaaaaaaaaaaa</div> where the inner content has nested unclosed tags. The .* becomes ambiguous, and backtracking explodes.

IP Address Validation (The Deceptive One)

^([0-9]{1,3}\.){3}[0-9]{1,3}$

Looks safe. But consider:

^([0-9]{1,3}\.?)+$

The ? makes the dot optional, and the outer + repeats the whole group. This is vulnerable:

Input: 1111111111111111111111111X (22 ones, then non-matching X)

The regex engine tries every way to insert dots. With 22 digits and an optional dot, there are 2^22 possibilities.


Real Incidents: When ReDoS Escaped the Lab

Cloudflare (2019): The WAF Catastrophe

Cloudflare's Web Application Firewall (WAF) used regex patterns for attack detection. In March 2019, a security researcher discovered that certain WAF rules were vulnerable to ReDoS.

An attacker could send a specially crafted HTTP request that would cause Cloudflare's regex engine to hang for minutes, effectively denying the service to all customers behind that WAF instance.

The pattern involved: Multiple nested quantifiers in request header validation. The payload was a few hundred bytes; the blowup was catastrophic.

Cloudflare's response: They migrated to RE2, Google's linear-time regex library. No more backtracking.

npm Ecosystem: The snyk/validate Package

The validate npm package (and dozens like it) used vulnerable regex patterns for email and URL validation. When npm released tooling to detect ReDoS vulnerabilities, they found hundreds of packages in the registry contained exploitable patterns.

The issue: Package authors copied regex patterns from Stack Overflow and documentation without testing for catastrophic backtracking. Downstream applications installing these packages inherited the vulnerability.

A malicious package.json, GitHub webhook payload, or form submission could trigger the regex and hang the entire build/CI pipeline.

Redos Detection: The Hard Problem

Even when you know to look for nested quantifiers, you can't rely on pattern inspection alone. Some vulnerabilities are subtle:

([a-zA-Z]+)*@example\.com

The * and the + are nested. Vulnerable.

(a|ab)+$

Not obvious, but vulnerable. The alternation a|ab overlaps; on backtracking, the engine tries both, leading to exponential branching.

(a|a)*$

Trivially vulnerable (same branch twice), but easy to miss in code review.


Why Input Length Limits Don't Save You

A common (and wrong) defense:

"We limit input to 100 characters. We're safe."

No, you're not.

Consider this pattern:

(a+)+$

With just 25 characters, it takes 8+ seconds. With 30, it takes minutes. A 100-character input doesn't need to be exponentially worse; it's already catastrophic before you hit that limit.

And that's against a simple pattern. Real-world vulnerabilities can trigger with even shorter strings.

Input length limits are necessary but not sufficient. They slow down the attack (larger inputs take longer to craft), but they don't prevent it.


Automated Detection: Tools That Actually Work

rxxr2 (Regular Expression Denial of Service Detector)

A tool specifically designed to find vulnerable regex patterns. It works by analyzing the regex AST and identifying known-vulnerable constructs:

pip install rxxr2

rxxr2 "^([a-zA-Z0-9._%+-]+)+@"
# Output: VULNERABLE: nested quantifiers detected

It's not perfect (some patterns are theoretically vulnerable but practically safe, and vice versa), but it catches most red flags.

safe-regex (Node.js)

For JavaScript developers:

const safe = require('safe-regex');

const vulnerable = '(a+)+$';
const safe_pattern = 'a+$';

console.log(safe(vulnerable));  // false
console.log(safe(safe_pattern)); // true

This package analyzes regex patterns and warns about known-vulnerable constructs.

eslint-plugin-redos

An ESLint plugin that scans your codebase for regex literals that look vulnerable:

// .eslintrc.json
{
  "plugins": ["redos"],
  "rules": {
    "redos/no-vulnerable-regex": "error"
  }
}

On a codebase with vulnerable patterns:

const pattern = /(a+)+$/;  // ESLint error: Vulnerable regex pattern

PyREDos (Python)

from pyreredos import check_regex

pattern = r"(a+)+$"
result = check_regex(pattern)

if result.vulnerable:
    print(f"VULNERABLE: {result.explanation}")

What Actually Works: Real Defenses

Defense 1: Use RE2 (Linear Time Guarantee)

Google's RE2 library is a regex engine that guarantees O(n) runtime. It does this by using a DFA-based approach, which means:

  1. No backtracking.
  2. Linear time, always.
  3. No catastrophic slowdowns.

The trade-off: you lose some features that NFA engines have (backreferences, lookaheads in some forms).

Installation and usage:

# Python binding: google-re2
from re2 import compile

pattern = compile(r"(a+)+$")
test_input = "a" * 1000 + "!"

# This completes instantly, no matter the input size
pattern.search(test_input)
// Node.js: re2 package
const RE2 = require('re2');

const pattern = new RE2("(a+)+$");
const testInput = "a".repeat(1000) + "!";

// Completes instantly
pattern.test(testInput);

Go developers have it built-in: regexp/syntax uses RE2 by design.

Defense 2: Avoid Nested Quantifiers

Review your regex patterns for:

  • (a+)+, (a*)*, (a?)? — quantifier on a quantifier
  • (a+)*, (a*)+ — mixing unbounded quantifiers
  • (a|ab)+ — overlapping alternation

Safe alternatives:

Instead of:

([a-zA-Z0-9._%+-]+)+@

Write:

[a-zA-Z0-9._%+\-]+@

No need for the inner +; character classes don't need nesting.

Instead of:

(https?|ftp)://.*

If you only care about http, https, and ftp:

(?:https?|ftp)://.*

Use a non-capturing group (?:...) and avoid quantifying the alternation.

Defense 3: Use Parser Combinators Instead of Regex

For complex formats (email, URLs, IP addresses), use purpose-built parsers instead of regex:

from email.utils import parseaddr

email = parseaddr("user@example.com")
# This is designed for the job; it won't catastrophically backtrack
from urllib.parse import urlparse

url = urlparse("https://example.com/path?query=value")
# Purpose-built parser, not a regex

For Go:

import "net/mail"

addr, err := mail.ParseAddress("user@example.com")
// No regex, no ReDoS risk

Defense 4: Timeout Regex Execution

If you must use regex on untrusted input, wrap execution with a timeout:

import signal
import re

class TimeoutException(Exception):
    pass

def timeout_handler(signum, frame):
    raise TimeoutException("Regex execution timeout")

pattern = re.compile(r"some_untrusted_pattern")
test_input = untrusted_user_input

signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(2)  # 2-second timeout

try:
    pattern.search(test_input)
finally:
    signal.alarm(0)  # Cancel the alarm

JavaScript (Node.js):

const { Worker } = require('worker_threads');

function regexWithTimeout(pattern, input, timeout = 2000) {
    return new Promise((resolve, reject) => {
        const worker = new Worker(`
            const { parentPort } = require('worker_threads');
            parentPort.on('message', (data) => {
                const regex = new RegExp(data.pattern);
                try {
                    parentPort.postMessage(regex.test(data.input));
                } catch (e) {
                    parentPort.postMessage(null);
                }
            });
        `, { eval: true });

        const timer = setTimeout(() => {
            worker.terminate();
            reject(new Error('Regex timeout'));
        }, timeout);

        worker.on('message', (result) => {
            clearTimeout(timer);
            worker.terminate();
            resolve(result);
        });

        worker.postMessage({ pattern, input });
    });
}

// Usage:
regexWithTimeout(r"(a+)+$", "a".repeat(25) + "!", 2000)
    .catch(err => console.error("Regex timed out"));

Defense 5: Structured Input Validation

Instead of regex on free-form strings, use structured parsing:

Bad:

if re.match(r"^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$", ip_string):
    # Validate IP (vulnerable to ReDoS)

Good:

import ipaddress

try:
    ip = ipaddress.ip_address(ip_string)
    # IP is valid
except ValueError:
    # IP is invalid

The standard library parser is designed for this; it won't have exponential blowup.


The Architecture Problem

Why do ReDoS vulnerabilities keep appearing?

  1. Regex is too expressive for simple formats. Email, URLs, and IP addresses have well-defined structures. Regex is overkill and error-prone.

  2. Developers copy patterns without testing. Stack Overflow, documentation, and examples often contain vulnerable patterns. Copy-paste culture means the vulnerability spreads.

  3. Backtracking engines are the default. Most mainstream languages use NFA with backtracking, not RE2 or equivalent. The default is unsafe.

  4. There's no visual way to spot the problem. Nested quantifiers don't look wrong in a regex. They look normal.

  5. Detection tooling isn't mainstream. Safe-regex and rxxr2 exist, but they're not run by default in CI/CD pipelines. Finding vulnerabilities requires opting in.


What Should Change

  1. Use RE2-like engines by default. Languages should ship with linear-time regex engines, or make them the default.

  2. Lint regex patterns in CI. Make tools like safe-regex and rxxr2 mandatory checks, not optional.

  3. Deprecate common vulnerable patterns. Publicly maintained lists of vulnerable email/URL/IP regex patterns should be circulated and discouraged.

  4. Provide parser libraries. Standard libraries should include robust parsers for common formats, not leave developers to regex them.

  5. Educate on the problem. ReDoS should be taught alongside regex fundamentals, not treated as an edge case.


Conclusion

ReDoS is not a bug in specific libraries. It's a fundamental property of backtracking regex engines. Given the choice between expressive (but potentially slow) and safe (but less expressive), every mainstream language chose expressive. That choice is defensible—but it comes with responsibility.

Most regex patterns are fine. But the ones that aren't can take down your server with a single request. The Cloudflare incident, the Stack Overflow outage, and hundreds of npm packages all demonstrate that this isn't theoretical.

The defense is structural: use RE2 where possible, avoid nested quantifiers, replace regex with parsers for complex formats, and lint your patterns in CI. None of this is complicated. What's complicated is knowing to do it in the first place.

Now you do.


Last updated: March 2026

References