Skip to content

[Bugs] Protential ReDoS bug in caption.py #2716

@ShangzhiXu

Description

@ShangzhiXu

Description

Describe the bug

Describe the bug
Hi team, thanks for your great work! I think I found a bug that might lead to DDoS in the system
At line 34 in caption.py
the regex RE_FIG_NUM = re.compile(r'^(\^)?([1-9][0-9]*(?:.[1-9][0-9]*)*)(?= |$)') is vulnerable to ReDoS when it is used in
m = RE_FIG_NUM.match(argument)

PoC

I have a test file to stimulate the user input handled by this regex

import re
import time

regex_pattern = re.compile(r'^(\^)?([1-9][0-9]*(?:.[1-9][0-9]*)*)(?= |$)')

for i in range(50, 500, 50):
    long_string = '1' * i + 'a'
    start_time = time.time()
    match = re.match(regex_pattern, long_string)
    end_time = time.time()
    print(f"long_string execution time: {end_time - start_time:.6f} s")

The result system will hang for more than 1 hour here

If using pymdown in a server setup, this bug may lead to high CPU usage or DoS risks if users submit malicious or resource-intensive repositories.

Expected behavior
I think we can add a limit like replace .*? with .{0,200}? ? Maybe it can help to solve the recursion problem.

The core of the problem lies within (.*)*. It leads to massive recursion and backtracking when faced with malicious input.

Minimal Reproduction

like above~

Version(s) & System Info

  • Operating System: ...
  • Python Version: ...
  • Package Version: ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions