Skip to content

feat: add regex literal syntax (/.../) for type-based overload resolution#481

Merged
t0k0sh1 merged 4 commits into
v0.0.6from
feat/regex-literal-458
Mar 31, 2026
Merged

feat: add regex literal syntax (/.../) for type-based overload resolution#481
t0k0sh1 merged 4 commits into
v0.0.6from
feat/regex-literal-458

Conversation

@t0k0sh1

@t0k0sh1 t0k0sh1 commented Mar 31, 2026

Copy link
Copy Markdown
Owner

Summary

  • Add regex literal syntax (/pattern/) that produces a Regex type, enabling type-based overload resolution and UFCS-compatible function calls
  • New text-first regex functions: match, search, replace, split, find_all — overloaded to accept Regex type patterns alongside existing string functions
  • Existing regex_* functions remain for backward compatibility

Changes

Lexer

  • Context-aware / disambiguation using prev_kind_ (JavaScript-style): after value-producing tokens → division, otherwise → regex literal
  • Proper escape handling (\/) with newline boundary check

AST & Parser

  • RegexExpr node for regex literal expressions
  • RegexLiteral token kind in parsePrimary() and newline lookahead

Codegen

  • RK_Regex resource tracking via resource_sets_ (same pattern as TcpListener, HttpRequest, etc.)
  • Separate regex_global_cache_ to prevent collision with string literals of same content
  • buildArcGlobal() extracted to eliminate ARC global construction duplication between cachedGlobalString() and regex globals
  • isRegex() helper for type-based dispatch
  • Regex overloads in split/replace string handlers and new match/search/find_all in regex dispatcher
  • Regex type resolution in codegen_type.cpp

Usage

from regex import match, split, replace

# Regex literals enable type-based overloading
"hello".match(/[a-z]+/)          # true
"a1b2c".split(/[0-9]/)           # ["a", "b", "c"]
"abc123".replace(/[0-9]+/, "X")  # "abcX"

# Variables work too
pat = /[a-z]+/
"hello".match(pat)               # true

# Division coexists
x = 10 / 2                       # 5

Closes #458

Test plan

  • C++ tests: 14 new regex literal tests + all 1259 existing tests pass
  • Ry spec tests: 21 new tests in regex_literal.test.ry + all 62 spec files pass
  • ASan: clean (no memory safety issues)
  • Backward compatibility: existing regex_* functions and split/replace string overloads verified
  • Lexer disambiguation: division vs regex tested

t0k0sh1 added 2 commits March 31, 2026 23:30
…ution (#458)

Introduce /pattern/ regex literal syntax that produces a Regex type,
enabling type-based overload resolution to distinguish regex from string
arguments. This allows UFCS-compatible text-first function calls like
`"hello".match(/[a-z]+/)` and `"a1b2".split(/[0-9]/)`.

- Lexer: context-aware `/` disambiguation using prev_kind_ (division vs regex)
- AST: RegexExpr node for regex literal expressions
- Codegen: RK_Regex resource tracking with separate global cache to avoid
  collision with string literals of the same content
- New unprefixed functions: match, search, replace, split, find_all
  (text-first arg order, Regex type parameter)
- Existing regex_* functions remain for backward compatibility
- Extract buildArcGlobal() to eliminate ARC global construction duplication
Update regex.ry declarations and spec tests to use 'function' keyword
per the fn → function rename on v0.0.6.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces /.../ regex literals that produce a Regex-typed value, enabling type-based overload resolution (including UFCS-style calls) while keeping existing regex_* functions for backward compatibility.

Changes:

  • Lexer + parser: add TokenKind::RegexLiteral and a RegexExpr AST node, with context-sensitive / disambiguation.
  • Codegen: add RK_Regex tracking and a separate regex global cache; implement regex overload dispatch for split/replace and new match/search/find_all.
  • Stdlib + docs + tests: add native declarations, reference docs/changelog updates, and new C++ + spec test coverage for regex literals and UFCS.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/test_lexer.cpp Updates lexer tests for division vs regex-literal tokenization.
tests/test_codegen_regex.cpp Adds codegen tests covering regex literals, UFCS, and coexistence with division.
tests/spec/regex_literal.test.ry Adds spec tests validating regex literal behavior across the new API surface.
src/parser_expr.cpp Parses RegexLiteral tokens into RegexExpr and updates expression-start lookahead.
src/lexer.cpp Implements / disambiguation and regex literal scanning with escape/newline handling.
src/codegen.cpp Refactors ARC global literal construction via buildArcGlobal() for reuse by regex globals.
src/codegen_type.cpp Resolves Regex as an opaque pointer type for codegen/type resolution.
src/codegen_expr.cpp Emits regex literals via a separate global cache and tags them as RK_Regex.
src/codegen_call_string.cpp Adds split/replace overloads that dispatch to regex runtime when given Regex.
src/codegen_call_io.cpp Adds dispatcher support for unprefixed match/search/find_all with Regex patterns.
src/codegen_builtin.cpp Adds isRegex() helper backed by resource tracking.
lib/std/regex.ry Declares new native text-first regex functions taking Regex patterns.
include/ry/lexer.hpp Adds TokenKind::RegexLiteral.
include/ry/codegen.hpp Adds RK_Regex, regex global cache, and buildArcGlobal() declaration.
include/ry/ast.hpp Adds RegexExpr and wires it into the expression variant.
docs/reference/regex.md Documents regex literals and the new text-first API.
CHANGELOG.md Records the feature addition and new overloadable regex functions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/lexer.cpp Outdated
Comment on lines +295 to +300
prev_kind_ != TokenKind::String && prev_kind_ != TokenKind::Ident &&
prev_kind_ != TokenKind::RParen && prev_kind_ != TokenKind::RBracket &&
prev_kind_ != TokenKind::RBrace && prev_kind_ != TokenKind::True &&
prev_kind_ != TokenKind::False && prev_kind_ != TokenKind::NoneKw &&
prev_kind_ != TokenKind::PlusPlus && prev_kind_ != TokenKind::MinusMinus &&
prev_kind_ != TokenKind::RegexLiteral) {

Copilot AI Mar 31, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex-vs-division disambiguation treats / as starting a regex literal unless the previous token is value-producing, but TokenKind::FStringEnd is missing from this value-producing list. This is inconsistent with the similar disambiguation used for leading-dot floats later in this file (which does include FStringEnd), and can mis-tokenize / after an f-string as a regex literal (potentially yielding an “unterminated regex literal” error). Consider adding TokenKind::FStringEnd (and any other string-literal end kinds) to the division list.

Suggested change
prev_kind_ != TokenKind::String && prev_kind_ != TokenKind::Ident &&
prev_kind_ != TokenKind::RParen && prev_kind_ != TokenKind::RBracket &&
prev_kind_ != TokenKind::RBrace && prev_kind_ != TokenKind::True &&
prev_kind_ != TokenKind::False && prev_kind_ != TokenKind::NoneKw &&
prev_kind_ != TokenKind::PlusPlus && prev_kind_ != TokenKind::MinusMinus &&
prev_kind_ != TokenKind::RegexLiteral) {
prev_kind_ != TokenKind::String && prev_kind_ != TokenKind::FStringEnd &&
prev_kind_ != TokenKind::Ident && prev_kind_ != TokenKind::RParen &&
prev_kind_ != TokenKind::RBracket && prev_kind_ != TokenKind::RBrace &&
prev_kind_ != TokenKind::True && prev_kind_ != TokenKind::False &&
prev_kind_ != TokenKind::NoneKw && prev_kind_ != TokenKind::PlusPlus &&
prev_kind_ != TokenKind::MinusMinus && prev_kind_ != TokenKind::RegexLiteral) {

Copilot uses AI. Check for mistakes.
Comment thread docs/reference/regex.md Outdated
Comment on lines +35 to +36
- After values (identifiers, numbers, `)`, `]`): `/` is division
- After operators, keywords, `(`, `[`, `,`, `=`: `/` starts a regex literal

Copilot AI Mar 31, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Division vs Regex” bullet list has malformed markdown/typos (a stray backtick and mismatched parentheses in “identifiers, numbers, ), ])”). It also omits some of the value-producing tokens the lexer actually checks (e.g., strings). Please tighten this description so it accurately reflects the lexer’s disambiguation rules.

Suggested change
- After values (identifiers, numbers, `)`, `]`): `/` is division
- After operators, keywords, `(`, `[`, `,`, `=`: `/` starts a regex literal
- After value-producing tokens (identifiers, numbers, string literals, `)` or `]`), `/` is parsed as division
- After operators, keywords, or delimiters that expect an expression (`(`, `[`, `,`, `=`), `/` starts a regex literal

Copilot uses AI. Check for mistakes.
Comment thread tests/spec/regex_literal.test.ry Outdated
Comment on lines +26 to +30
it("replaces first match", function():
expect(replace("abc123def", /[0-9]+/, "X")).to_eq("abcXdef")
)
it("UFCS", function():
expect("abc123".replace(/[0-9]+/, "X")).to_eq("abcX")

Copilot AI Mar 31, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case description says “replaces first match”, but the underlying runtime implementation of __ry_regex_replace replaces all matches (see existing regex_replace("[0-9]+", "a1b2c3", "X") tests expecting aXbXcX). Please update the wording and consider adding a multi-match assertion here to ensure the new replace(text, Regex, repl) overload preserves the same semantics.

Suggested change
it("replaces first match", function():
expect(replace("abc123def", /[0-9]+/, "X")).to_eq("abcXdef")
)
it("UFCS", function():
expect("abc123".replace(/[0-9]+/, "X")).to_eq("abcX")
it("replaces all matches", function():
expect(replace("a1b2c3", /[0-9]+/, "X")).to_eq("aXbXcX")
)
it("UFCS replaces all matches", function():
expect("a1b2c3".replace(/[0-9]+/, "X")).to_eq("aXbXcX")

Copilot uses AI. Check for mistakes.
Comment thread tests/test_codegen_regex.cpp Outdated
TEST_F(CodeGenTest, RegexLiteralReplace) {
EXPECT_EQ(runSource(R"(
print(replace("abc123def", /[0-9]+/, "X"))
)"), "abcXdef\n");

Copilot AI Mar 31, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RegexLiteralReplace only exercises a single-match case. Since this PR adds a new overload path (replace(text, Regex, repl) in the string call handler), it would be good to add a multi-match input (e.g., something like the existing regex_replace coverage that expects repeated substitutions) to ensure the overload wiring preserves the same semantics as regex_replace.

Suggested change
)"), "abcXdef\n");
print(replace("a1b2c3", /[0-9]/, "X"))
)"), "abcXdef\naXbXcX\n");

Copilot uses AI. Check for mistakes.
t0k0sh1 added 2 commits March 31, 2026 23:51
…adlock

Re-check the completion predicate after reacquiring the lock in
waitWithWorkerHelp. Previously, when tryRunOne() returned false, the
code fell through to cv.wait() without rechecking pred(). If the task
completed and called cv.notify_all() while the lock was released
(between unlock and lock), the notification was lost and the caller
blocked forever.

This race was observable under ASan + trace overhead where the timing
window was wide enough for the task to complete during the unlock gap.
- Add FStringEnd to value-producing token list in lexer disambiguation
  to prevent f-string followed by `/` being mis-tokenized as regex
- Fix docs to accurately describe the lexer's disambiguation rules
- Fix replace test descriptions ("first match" → "all matches") and
  add multi-match assertions to verify semantics match regex_replace
@t0k0sh1 t0k0sh1 merged commit 84bc07b into v0.0.6 Mar 31, 2026
6 checks passed
@t0k0sh1 t0k0sh1 deleted the feat/regex-literal-458 branch March 31, 2026 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants