feat: add regex literal syntax (/.../) for type-based overload resolution by t0k0sh1 · Pull Request #481 · t0k0sh1/ry

t0k0sh1 · 2026-03-31T14:31:16Z

Summary

Add regex literal syntax (/pattern/) that produces a Regex type, enabling type-based overload resolution and UFCS-compatible function calls
New text-first regex functions: match, search, replace, split, find_all — overloaded to accept Regex type patterns alongside existing string functions
Existing regex_* functions remain for backward compatibility

Changes

Lexer

Context-aware / disambiguation using prev_kind_ (JavaScript-style): after value-producing tokens → division, otherwise → regex literal
Proper escape handling (\/) with newline boundary check

AST & Parser

RegexExpr node for regex literal expressions
RegexLiteral token kind in parsePrimary() and newline lookahead

Codegen

RK_Regex resource tracking via resource_sets_ (same pattern as TcpListener, HttpRequest, etc.)
Separate regex_global_cache_ to prevent collision with string literals of same content
buildArcGlobal() extracted to eliminate ARC global construction duplication between cachedGlobalString() and regex globals
isRegex() helper for type-based dispatch
Regex overloads in split/replace string handlers and new match/search/find_all in regex dispatcher
Regex type resolution in codegen_type.cpp

Usage

from regex import match, split, replace

# Regex literals enable type-based overloading
"hello".match(/[a-z]+/)          # true
"a1b2c".split(/[0-9]/)           # ["a", "b", "c"]
"abc123".replace(/[0-9]+/, "X")  # "abcX"

# Variables work too
pat = /[a-z]+/
"hello".match(pat)               # true

# Division coexists
x = 10 / 2                       # 5

Closes #458

Test plan

C++ tests: 14 new regex literal tests + all 1259 existing tests pass
Ry spec tests: 21 new tests in regex_literal.test.ry + all 62 spec files pass
ASan: clean (no memory safety issues)
Backward compatibility: existing regex_* functions and split/replace string overloads verified
Lexer disambiguation: division vs regex tested

…ution (#458) Introduce /pattern/ regex literal syntax that produces a Regex type, enabling type-based overload resolution to distinguish regex from string arguments. This allows UFCS-compatible text-first function calls like `"hello".match(/[a-z]+/)` and `"a1b2".split(/[0-9]/)`. - Lexer: context-aware `/` disambiguation using prev_kind_ (division vs regex) - AST: RegexExpr node for regex literal expressions - Codegen: RK_Regex resource tracking with separate global cache to avoid collision with string literals of the same content - New unprefixed functions: match, search, replace, split, find_all (text-first arg order, Regex type parameter) - Existing regex_* functions remain for backward compatibility - Extract buildArcGlobal() to eliminate ARC global construction duplication

Update regex.ry declarations and spec tests to use 'function' keyword per the fn → function rename on v0.0.6.

Copilot

Pull request overview

This PR introduces /.../ regex literals that produce a Regex-typed value, enabling type-based overload resolution (including UFCS-style calls) while keeping existing regex_* functions for backward compatibility.

Changes:

Lexer + parser: add TokenKind::RegexLiteral and a RegexExpr AST node, with context-sensitive / disambiguation.
Codegen: add RK_Regex tracking and a separate regex global cache; implement regex overload dispatch for split/replace and new match/search/find_all.
Stdlib + docs + tests: add native declarations, reference docs/changelog updates, and new C++ + spec test coverage for regex literals and UFCS.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/test_lexer.cpp	Updates lexer tests for division vs regex-literal tokenization.
tests/test_codegen_regex.cpp	Adds codegen tests covering regex literals, UFCS, and coexistence with division.
tests/spec/regex_literal.test.ry	Adds spec tests validating regex literal behavior across the new API surface.
src/parser_expr.cpp	Parses `RegexLiteral` tokens into `RegexExpr` and updates expression-start lookahead.
src/lexer.cpp	Implements `/` disambiguation and regex literal scanning with escape/newline handling.
src/codegen.cpp	Refactors ARC global literal construction via `buildArcGlobal()` for reuse by regex globals.
src/codegen_type.cpp	Resolves `Regex` as an opaque pointer type for codegen/type resolution.
src/codegen_expr.cpp	Emits regex literals via a separate global cache and tags them as `RK_Regex`.
src/codegen_call_string.cpp	Adds `split`/`replace` overloads that dispatch to regex runtime when given `Regex`.
src/codegen_call_io.cpp	Adds dispatcher support for unprefixed `match`/`search`/`find_all` with `Regex` patterns.
src/codegen_builtin.cpp	Adds `isRegex()` helper backed by resource tracking.
lib/std/regex.ry	Declares new native text-first regex functions taking `Regex` patterns.
include/ry/lexer.hpp	Adds `TokenKind::RegexLiteral`.
include/ry/codegen.hpp	Adds `RK_Regex`, regex global cache, and `buildArcGlobal()` declaration.
include/ry/ast.hpp	Adds `RegexExpr` and wires it into the expression variant.
docs/reference/regex.md	Documents regex literals and the new text-first API.
CHANGELOG.md	Records the feature addition and new overloadable regex functions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-31T14:44:24Z

+            prev_kind_ != TokenKind::String && prev_kind_ != TokenKind::Ident &&
+            prev_kind_ != TokenKind::RParen && prev_kind_ != TokenKind::RBracket &&
+            prev_kind_ != TokenKind::RBrace && prev_kind_ != TokenKind::True &&
+            prev_kind_ != TokenKind::False && prev_kind_ != TokenKind::NoneKw &&
+            prev_kind_ != TokenKind::PlusPlus && prev_kind_ != TokenKind::MinusMinus &&
+            prev_kind_ != TokenKind::RegexLiteral) {


The regex-vs-division disambiguation treats / as starting a regex literal unless the previous token is value-producing, but TokenKind::FStringEnd is missing from this value-producing list. This is inconsistent with the similar disambiguation used for leading-dot floats later in this file (which does include FStringEnd), and can mis-tokenize / after an f-string as a regex literal (potentially yielding an “unterminated regex literal” error). Consider adding TokenKind::FStringEnd (and any other string-literal end kinds) to the division list.

Suggested change

prev_kind_ != TokenKind::String && prev_kind_ != TokenKind::Ident &&

prev_kind_ != TokenKind::RParen && prev_kind_ != TokenKind::RBracket &&

prev_kind_ != TokenKind::RBrace && prev_kind_ != TokenKind::True &&

prev_kind_ != TokenKind::False && prev_kind_ != TokenKind::NoneKw &&

prev_kind_ != TokenKind::PlusPlus && prev_kind_ != TokenKind::MinusMinus &&

prev_kind_ != TokenKind::RegexLiteral) {

prev_kind_ != TokenKind::String && prev_kind_ != TokenKind::FStringEnd &&

prev_kind_ != TokenKind::Ident && prev_kind_ != TokenKind::RParen &&

prev_kind_ != TokenKind::RBracket && prev_kind_ != TokenKind::RBrace &&

prev_kind_ != TokenKind::True && prev_kind_ != TokenKind::False &&

prev_kind_ != TokenKind::NoneKw && prev_kind_ != TokenKind::PlusPlus &&

prev_kind_ != TokenKind::MinusMinus && prev_kind_ != TokenKind::RegexLiteral) {

Copilot · 2026-03-31T14:44:25Z

+- After values (identifiers, numbers, `)`, `]`): `/` is division
+- After operators, keywords, `(`, `[`, `,`, `=`: `/` starts a regex literal


The “Division vs Regex” bullet list has malformed markdown/typos (a stray backtick and mismatched parentheses in “identifiers, numbers, ), ])”). It also omits some of the value-producing tokens the lexer actually checks (e.g., strings). Please tighten this description so it accurately reflects the lexer’s disambiguation rules.

Suggested change

- After values (identifiers, numbers, `)`, `]`): `/` is division

- After operators, keywords, `(`, `[`, `,`, `=`: `/` starts a regex literal

- After value-producing tokens (identifiers, numbers, string literals, `)` or `]`), `/` is parsed as division

- After operators, keywords, or delimiters that expect an expression (`(`, `[`, `,`, `=`), `/` starts a regex literal

Copilot · 2026-03-31T14:44:25Z

+  it("replaces first match", function():
+    expect(replace("abc123def", /[0-9]+/, "X")).to_eq("abcXdef")
+  )
+  it("UFCS", function():
+    expect("abc123".replace(/[0-9]+/, "X")).to_eq("abcX")


This test case description says “replaces first match”, but the underlying runtime implementation of __ry_regex_replace replaces all matches (see existing regex_replace("[0-9]+", "a1b2c3", "X") tests expecting aXbXcX). Please update the wording and consider adding a multi-match assertion here to ensure the new replace(text, Regex, repl) overload preserves the same semantics.

Suggested change

it("replaces first match", function():

expect(replace("abc123def", /[0-9]+/, "X")).to_eq("abcXdef")

)

it("UFCS", function():

expect("abc123".replace(/[0-9]+/, "X")).to_eq("abcX")

it("replaces all matches", function():

expect(replace("a1b2c3", /[0-9]+/, "X")).to_eq("aXbXcX")

)

it("UFCS replaces all matches", function():

expect("a1b2c3".replace(/[0-9]+/, "X")).to_eq("aXbXcX")

Copilot · 2026-03-31T14:44:25Z

+TEST_F(CodeGenTest, RegexLiteralReplace) {
+    EXPECT_EQ(runSource(R"(
+print(replace("abc123def", /[0-9]+/, "X"))
+)"), "abcXdef\n");


RegexLiteralReplace only exercises a single-match case. Since this PR adds a new overload path (replace(text, Regex, repl) in the string call handler), it would be good to add a multi-match input (e.g., something like the existing regex_replace coverage that expects repeated substitutions) to ensure the overload wiring preserves the same semantics as regex_replace.

Suggested change

)"), "abcXdef\n");

print(replace("a1b2c3", /[0-9]/, "X"))

)"), "abcXdef\naXbXcX\n");

…adlock Re-check the completion predicate after reacquiring the lock in waitWithWorkerHelp. Previously, when tryRunOne() returned false, the code fell through to cv.wait() without rechecking pred(). If the task completed and called cv.notify_all() while the lock was released (between unlock and lock), the notification was lost and the caller blocked forever. This race was observable under ASan + trace overhead where the timing window was wide enough for the task to complete during the unlock gap.

- Add FStringEnd to value-producing token list in lexer disambiguation to prevent f-string followed by `/` being mis-tokenized as regex - Fix docs to accurately describe the lexer's disambiguation rules - Fix replace test descriptions ("first match" → "all matches") and add multi-match assertions to verify semantics match regex_replace

t0k0sh1 added 2 commits March 31, 2026 23:30

merge: resolve conflict with v0.0.6 (fn → function rename)

1541fd4

Update regex.ry declarations and spec tests to use 'function' keyword per the fn → function rename on v0.0.6.

t0k0sh1 requested a review from Copilot March 31, 2026 14:35

Copilot started reviewing on behalf of t0k0sh1 March 31, 2026 14:36 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

t0k0sh1 added 2 commits March 31, 2026 23:51

t0k0sh1 merged commit 84bc07b into v0.0.6 Mar 31, 2026
6 checks passed

t0k0sh1 deleted the feat/regex-literal-458 branch March 31, 2026 15:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add regex literal syntax (/.../) for type-based overload resolution#481

feat: add regex literal syntax (/.../) for type-based overload resolution#481
t0k0sh1 merged 4 commits into
v0.0.6from
feat/regex-literal-458

t0k0sh1 commented Mar 31, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		- After values (identifiers, numbers, `)`, `]`): `/` is division
		- After operators, keywords, `(`, `[`, `,`, `=`: `/` starts a regex literal

	)"), "abcXdef\n");
	print(replace("a1b2c3", /[0-9]/, "X"))
	)"), "abcXdef\naXbXcX\n");

Conversation

t0k0sh1 commented Mar 31, 2026

Summary

Changes

Lexer

AST & Parser

Codegen

Usage

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants