feat: add regex literal syntax (/.../) for type-based overload resolution#481
Conversation
…ution (#458) Introduce /pattern/ regex literal syntax that produces a Regex type, enabling type-based overload resolution to distinguish regex from string arguments. This allows UFCS-compatible text-first function calls like `"hello".match(/[a-z]+/)` and `"a1b2".split(/[0-9]/)`. - Lexer: context-aware `/` disambiguation using prev_kind_ (division vs regex) - AST: RegexExpr node for regex literal expressions - Codegen: RK_Regex resource tracking with separate global cache to avoid collision with string literals of the same content - New unprefixed functions: match, search, replace, split, find_all (text-first arg order, Regex type parameter) - Existing regex_* functions remain for backward compatibility - Extract buildArcGlobal() to eliminate ARC global construction duplication
Update regex.ry declarations and spec tests to use 'function' keyword per the fn → function rename on v0.0.6.
There was a problem hiding this comment.
Pull request overview
This PR introduces /.../ regex literals that produce a Regex-typed value, enabling type-based overload resolution (including UFCS-style calls) while keeping existing regex_* functions for backward compatibility.
Changes:
- Lexer + parser: add
TokenKind::RegexLiteraland aRegexExprAST node, with context-sensitive/disambiguation. - Codegen: add
RK_Regextracking and a separate regex global cache; implement regex overload dispatch forsplit/replaceand newmatch/search/find_all. - Stdlib + docs + tests: add native declarations, reference docs/changelog updates, and new C++ + spec test coverage for regex literals and UFCS.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_lexer.cpp | Updates lexer tests for division vs regex-literal tokenization. |
| tests/test_codegen_regex.cpp | Adds codegen tests covering regex literals, UFCS, and coexistence with division. |
| tests/spec/regex_literal.test.ry | Adds spec tests validating regex literal behavior across the new API surface. |
| src/parser_expr.cpp | Parses RegexLiteral tokens into RegexExpr and updates expression-start lookahead. |
| src/lexer.cpp | Implements / disambiguation and regex literal scanning with escape/newline handling. |
| src/codegen.cpp | Refactors ARC global literal construction via buildArcGlobal() for reuse by regex globals. |
| src/codegen_type.cpp | Resolves Regex as an opaque pointer type for codegen/type resolution. |
| src/codegen_expr.cpp | Emits regex literals via a separate global cache and tags them as RK_Regex. |
| src/codegen_call_string.cpp | Adds split/replace overloads that dispatch to regex runtime when given Regex. |
| src/codegen_call_io.cpp | Adds dispatcher support for unprefixed match/search/find_all with Regex patterns. |
| src/codegen_builtin.cpp | Adds isRegex() helper backed by resource tracking. |
| lib/std/regex.ry | Declares new native text-first regex functions taking Regex patterns. |
| include/ry/lexer.hpp | Adds TokenKind::RegexLiteral. |
| include/ry/codegen.hpp | Adds RK_Regex, regex global cache, and buildArcGlobal() declaration. |
| include/ry/ast.hpp | Adds RegexExpr and wires it into the expression variant. |
| docs/reference/regex.md | Documents regex literals and the new text-first API. |
| CHANGELOG.md | Records the feature addition and new overloadable regex functions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| prev_kind_ != TokenKind::String && prev_kind_ != TokenKind::Ident && | ||
| prev_kind_ != TokenKind::RParen && prev_kind_ != TokenKind::RBracket && | ||
| prev_kind_ != TokenKind::RBrace && prev_kind_ != TokenKind::True && | ||
| prev_kind_ != TokenKind::False && prev_kind_ != TokenKind::NoneKw && | ||
| prev_kind_ != TokenKind::PlusPlus && prev_kind_ != TokenKind::MinusMinus && | ||
| prev_kind_ != TokenKind::RegexLiteral) { |
There was a problem hiding this comment.
The regex-vs-division disambiguation treats / as starting a regex literal unless the previous token is value-producing, but TokenKind::FStringEnd is missing from this value-producing list. This is inconsistent with the similar disambiguation used for leading-dot floats later in this file (which does include FStringEnd), and can mis-tokenize / after an f-string as a regex literal (potentially yielding an “unterminated regex literal” error). Consider adding TokenKind::FStringEnd (and any other string-literal end kinds) to the division list.
| prev_kind_ != TokenKind::String && prev_kind_ != TokenKind::Ident && | |
| prev_kind_ != TokenKind::RParen && prev_kind_ != TokenKind::RBracket && | |
| prev_kind_ != TokenKind::RBrace && prev_kind_ != TokenKind::True && | |
| prev_kind_ != TokenKind::False && prev_kind_ != TokenKind::NoneKw && | |
| prev_kind_ != TokenKind::PlusPlus && prev_kind_ != TokenKind::MinusMinus && | |
| prev_kind_ != TokenKind::RegexLiteral) { | |
| prev_kind_ != TokenKind::String && prev_kind_ != TokenKind::FStringEnd && | |
| prev_kind_ != TokenKind::Ident && prev_kind_ != TokenKind::RParen && | |
| prev_kind_ != TokenKind::RBracket && prev_kind_ != TokenKind::RBrace && | |
| prev_kind_ != TokenKind::True && prev_kind_ != TokenKind::False && | |
| prev_kind_ != TokenKind::NoneKw && prev_kind_ != TokenKind::PlusPlus && | |
| prev_kind_ != TokenKind::MinusMinus && prev_kind_ != TokenKind::RegexLiteral) { |
| - After values (identifiers, numbers, `)`, `]`): `/` is division | ||
| - After operators, keywords, `(`, `[`, `,`, `=`: `/` starts a regex literal |
There was a problem hiding this comment.
The “Division vs Regex” bullet list has malformed markdown/typos (a stray backtick and mismatched parentheses in “identifiers, numbers, ), ])”). It also omits some of the value-producing tokens the lexer actually checks (e.g., strings). Please tighten this description so it accurately reflects the lexer’s disambiguation rules.
| - After values (identifiers, numbers, `)`, `]`): `/` is division | |
| - After operators, keywords, `(`, `[`, `,`, `=`: `/` starts a regex literal | |
| - After value-producing tokens (identifiers, numbers, string literals, `)` or `]`), `/` is parsed as division | |
| - After operators, keywords, or delimiters that expect an expression (`(`, `[`, `,`, `=`), `/` starts a regex literal |
| it("replaces first match", function(): | ||
| expect(replace("abc123def", /[0-9]+/, "X")).to_eq("abcXdef") | ||
| ) | ||
| it("UFCS", function(): | ||
| expect("abc123".replace(/[0-9]+/, "X")).to_eq("abcX") |
There was a problem hiding this comment.
This test case description says “replaces first match”, but the underlying runtime implementation of __ry_regex_replace replaces all matches (see existing regex_replace("[0-9]+", "a1b2c3", "X") tests expecting aXbXcX). Please update the wording and consider adding a multi-match assertion here to ensure the new replace(text, Regex, repl) overload preserves the same semantics.
| it("replaces first match", function(): | |
| expect(replace("abc123def", /[0-9]+/, "X")).to_eq("abcXdef") | |
| ) | |
| it("UFCS", function(): | |
| expect("abc123".replace(/[0-9]+/, "X")).to_eq("abcX") | |
| it("replaces all matches", function(): | |
| expect(replace("a1b2c3", /[0-9]+/, "X")).to_eq("aXbXcX") | |
| ) | |
| it("UFCS replaces all matches", function(): | |
| expect("a1b2c3".replace(/[0-9]+/, "X")).to_eq("aXbXcX") |
| TEST_F(CodeGenTest, RegexLiteralReplace) { | ||
| EXPECT_EQ(runSource(R"( | ||
| print(replace("abc123def", /[0-9]+/, "X")) | ||
| )"), "abcXdef\n"); |
There was a problem hiding this comment.
RegexLiteralReplace only exercises a single-match case. Since this PR adds a new overload path (replace(text, Regex, repl) in the string call handler), it would be good to add a multi-match input (e.g., something like the existing regex_replace coverage that expects repeated substitutions) to ensure the overload wiring preserves the same semantics as regex_replace.
| )"), "abcXdef\n"); | |
| print(replace("a1b2c3", /[0-9]/, "X")) | |
| )"), "abcXdef\naXbXcX\n"); |
…adlock Re-check the completion predicate after reacquiring the lock in waitWithWorkerHelp. Previously, when tryRunOne() returned false, the code fell through to cv.wait() without rechecking pred(). If the task completed and called cv.notify_all() while the lock was released (between unlock and lock), the notification was lost and the caller blocked forever. This race was observable under ASan + trace overhead where the timing window was wide enough for the task to complete during the unlock gap.
- Add FStringEnd to value-producing token list in lexer disambiguation
to prevent f-string followed by `/` being mis-tokenized as regex
- Fix docs to accurately describe the lexer's disambiguation rules
- Fix replace test descriptions ("first match" → "all matches") and
add multi-match assertions to verify semantics match regex_replace
Summary
/pattern/) that produces aRegextype, enabling type-based overload resolution and UFCS-compatible function callsmatch,search,replace,split,find_all— overloaded to acceptRegextype patterns alongside existing string functionsregex_*functions remain for backward compatibilityChanges
Lexer
/disambiguation usingprev_kind_(JavaScript-style): after value-producing tokens → division, otherwise → regex literal\/) with newline boundary checkAST & Parser
RegexExprnode for regex literal expressionsRegexLiteraltoken kind inparsePrimary()and newline lookaheadCodegen
RK_Regexresource tracking viaresource_sets_(same pattern as TcpListener, HttpRequest, etc.)regex_global_cache_to prevent collision with string literals of same contentbuildArcGlobal()extracted to eliminate ARC global construction duplication betweencachedGlobalString()and regex globalsisRegex()helper for type-based dispatchsplit/replacestring handlers and newmatch/search/find_allin regex dispatcherRegextype resolution incodegen_type.cppUsage
Closes #458
Test plan
regex_literal.test.ry+ all 62 spec files passregex_*functions andsplit/replacestring overloads verified