Skip to content

Add validations for ERE quantifiers {m}, {m,} and {m,n}#328

Open
LoukasPap wants to merge 1 commit intouutils:mainfrom
LoukasPap:issue289
Open

Add validations for ERE quantifiers {m}, {m,} and {m,n}#328
LoukasPap wants to merge 1 commit intouutils:mainfrom
LoukasPap:issue289

Conversation

@LoukasPap
Copy link
Contributor

@LoukasPap LoukasPap commented Mar 4, 2026

Fixes issue 289

There are two types of errors in quantifiers:

  1. Unmatched \\{ for unterminated braces
  2. Invalid content of \\{\\} for all the others

Unmatched \\{

sed always captures the unmatched braces error first. To deal with that, I scan ahead the regex for unclosed { } and if it is violated it throws the error, otherwise we retreat back to the beginning to start the processing.

Invalid content of \\{\\}

Processing happens only if we have extended mode and checks for:

  • m and/or n are positive
  • m <= n
  • integers and comma are only allowed (the comma between m and n)
  • {,} to be converted to *
  • {,n} to be parsed as literal

There is also an error when m and/or n are very large numbers but I let that for the regex crate to handle (the error message is different from sed's but that is trivial I believe).

@codecov
Copy link

codecov bot commented Mar 4, 2026

Codecov Report

❌ Patch coverage is 95.15152% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.40%. Comparing base (b5b4cef) to head (47d16fe).
⚠️ Report is 19 commits behind head on main.

Files with missing lines Patch % Lines
src/sed/delimited_parser.rs 94.48% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #328      +/-   ##
==========================================
+ Coverage   82.07%   82.40%   +0.33%     
==========================================
  Files          13       13              
  Lines        5445     5587     +142     
  Branches      293      307      +14     
==========================================
+ Hits         4469     4604     +135     
- Misses        974      981       +7     
  Partials        2        2              
Flag Coverage Δ
macos_latest 82.84% <95.15%> (+0.32%) ⬆️
ubuntu_latest 82.95% <95.15%> (+0.31%) ⬆️
windows_latest 0.00% <0.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@LoukasPap
Copy link
Contributor Author

Forgot to add unit tests. I will add soon.

@codspeed-hq
Copy link

codspeed-hq bot commented Mar 6, 2026

Merging this PR will improve performance by 2.42%

⚡ 1 improved benchmark
✅ 10 untouched benchmarks

Performance Changes

Benchmark BASE HEAD Efficiency
number_fix 521 ms 508.7 ms +2.42%

Comparing LoukasPap:issue289 (47d16fe) with main (61152af)

Open in CodSpeed

@LoukasPap
Copy link
Contributor Author

LoukasPap commented Mar 6, 2026

@e-kwsm Ready for review! 🙂

Copy link
Contributor

@e-kwsm e-kwsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{,n} to be parsed as literal

nitpick: "{,n}" pattern is not specified in ERE, but

<<< xxx sed -E -n -e '/x{,-1}/s//@/p'
  • GNU and BSD: Invalid content of \{\}, exit with one
  • uutils: prints nothing and exits with zero
<<< xxx sed -E -n -e '/x{,0}/s//@/p'
  • GNU and BSD: print @xxx
  • uutils: prints nothing
<<< xxx sed -E -n -e '/x{,1}/s//@/p'
  • GNU and BSD: print @xx
  • uutils: prints nothing

@e-kwsm
Copy link
Contributor

e-kwsm commented Mar 8, 2026

<<< xxx sed -n -E -e '/.{,x}/p'
  • GNU and BSD: Invalid content of \{\}
  • uutils: prints nothing

@e-kwsm
Copy link
Contributor

e-kwsm commented Mar 9, 2026

There is also an error when m and/or n are very large numbers but I let that for the regex crate to handle

GNU and BSD support 255-repetition.

https://www.gnu.org/software/sed/manual/html_node/BRE-syntax.html#index-GNU-extensions_002c-to-basic-regular-expressions-2

\{i\}

As *, but matches exactly i sequences (i is a decimal integer; for portability, keep it between 0 and 255 inclusive).

https://man.freebsd.org/cgi/man.cgi?re_format(7)

A bound is `{' followed by an unsigned decimal integer, possibly followed by `,' possibly followed by another unsigned decimal integer, always followed by `}'. The integers must lie between 0 and RE_DUP_MAX (255<**>) inclusive, and if there are two of them, the first may not exceed the second.

@LoukasPap
Copy link
Contributor Author

Nice catches, I'll check them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ERE quantifiers are not validated: {m}, {m,} and {m,n}

2 participants