Bypass search cache on truncated searches at embed-escape boundary#675
Draft
stefanobaghino wants to merge 32 commits into
Draft
Bypass search cache on truncated searches at embed-escape boundary#675stefanobaghino wants to merge 32 commits into
stefanobaghino wants to merge 32 commits into
Conversation
`handle_fail` already re-emits the branch_point match's `pat.scope`
so a keyword like `LIKE` retains its scope after a branch swap. It
did not do the same for `captures:`-derived scopes, because those
are computed from the match's `regions` at emit time and not stored
on the BranchPoint record. On same-line and cross-line fail paths
the original capture Push/Pop pair was truncated off `ops` together
with alt[0]'s subsequent work and never replayed.
Minimal Haskell repro:
data CtxCls ctx => ModId.QTyCls
-- ^^^^ keyword.declaration.data.haskell
The `datas` branch_point match is
`(data)(?:\s+(family|instance))?` with
`captures: { 1: keyword.declaration.data.haskell, 2: storage.modifier.family.haskell }`.
When `data-signature` (alt[0]) fires `fail: data-declaration` on
the `=>` lookahead, replay into `data-context` (alt[1]) only
restored `scope: meta.declaration.data.haskell` — the `data` token
lost its inner `keyword.declaration.data.haskell` scope.
Fix: snapshot the computed capture ops on the BranchPoint record
at creation time (new `capture_ops` field), and replay them inside
the pat_scope brackets in both fail paths. Extracts the capture-op
build logic (previously inline in `exec_pattern`) into a
free-function helper `build_capture_ops` used at both call sites.
Regression test `branch_point_capture_scopes_survive_fail_retry`
mirrors the Haskell shape with a synthetic two-alt branch whose
trigger match carries both `scope:` and `captures:`.
Syntest net:
- Haskell/syntax_test_haskell.hs: 72 -> 50 (Cluster A closes;
`data CtxCls ctx => …`, `data family … => …`, and
`type … => …` all now retain their declaration-keyword scope —
22 assertions).
- No collateral deltas on any other baseline file (both regex
backends).
Refs: trishume#631
Mechanical: Haskell drops from 72 to 50 failing assertions on both regex backends. No other baseline lines change.
- Inline `build_capture_ops` signature so it fits on one line under the older rustfmt in CI's Minimum supported rust version job. - Escape `alt[0]`/`alt[1]` in the `BranchPoint.capture_ops` doc so rustdoc stops treating them as intra-doc links under `-Dwarnings`. Refs: trishume#631
push_meta_ops emitted the compound Pop before the Restore when the leaving context carried `clear_scopes:` and `set_pop_count > 1`. The intermediate popped frame's meta_content_scope is split across the live scope stack and clear_stack; Pop — counting the full pre-Clear total — then ate atoms from frames below the popped range. Observed on Batch File `cmd-set-quoted-value-inner-end` (`clear_scopes: 1`) firing `pop: 2, set: ignored-tail-outer`, which dropped `meta.command.set.dosbatch` from the trailing content of every quoted `set "var"=...` line. Emit Restore before Pop when `set_pop_count > 1`; plain `set:` keeps the existing Pop-then-Restore order (gated by the Lisp `defun` test `v2_set_to_target_with_clear_scopes_clears_parent_meta_content_scope`). syntax_test_batch_file.bat: 74 → 0 on both backends; no other baseline entry changes. New regression `pop_n_set_with_cur_clear_scopes_restores_before_popping_deeper_frames`. Refs: trishume#631
Sublime Text applies `captures: N:` to the overlap between group N's span and the rule's consumed match range. syntect dropped lookaround- internal captures in `parse_captures` and would have emitted Pop past match_end in `build_capture_ops` had they reached it. Keep every non-negative `captures:` key at load time; clip `(cap_start, cap_end)` to `regions.pos(0)` at apply time. Removes the now-unused `get_consuming_capture_indexes` walker and its tests. Baseline (both backends): clears ASP/syntax_test_asp.asp (53), C#/tests/syntax_test_Generics.cs (3), Rails/tests/syntax_test_rails.html.erb (23). Refs: trishume#631
ST drops the popped context's `meta_scope` and `meta_content_scope`
from the trigger match's text for `pop: N + embed:`, unlike
`pop: N + set:` which preserves both. Rules in the wild re-add the
meta_scope atom explicitly in their match `scope:` so it still
appears exactly once on the trigger — HTML (JSP).sublime-syntax's
`tag-jsp-{declaration,expression,scriptlet}-attributes` all do.
syntect's Embed → synthetic Set routing in `push_meta_ops`
inherited plain-Set semantics, so cur.meta_scope stayed on the
stack and the match's explicit scope duplicated it on top.
Fix: when `pop_count > 0`, emit initial-phase Pops for cur's mcs
and ms, then pass a scope-stripped clone of cur through to the
recursive Set call so its non-initial `num_to_pop` doesn't
double-account atoms that are already off the stack. Probe and
ordering invariant in `v2_pop_embed_suppresses_cur_meta_scope_on_match`.
Net syntest: Java/jsp 44 → 39 on both baselines (5
`- meta.tag.jsp meta.tag.jsp` assertions cleared). The other 39
are three unrelated root causes, not addressed here.
Refs: trishume#631
A cross-line `fail` replay commits a Push(meta_scope) to
`flushed_ops` for a speculative context; a later same-line `fail`
for a branch_point created *during* the replay then truncates the
owning context out of `self.stack` without emitting a balancing
Pop (the Push is in `flushed_ops`, beyond `ops.truncate`'s reach).
`exec_escape` pops based on the truncated stack, leaving an
orphan atom at the top.
Track a `shadow: ScopeStack` mirror of the consumer view, synced
at `parse_line` boundaries the same way `syntest` applies
`replayed` + `ops`. `exec_escape` now emits a corrective Pop for
any atoms exceeding the sum of `self.stack`'s `meta_scope` /
gated `meta_content_scope` contributions.
Drops `syntax_test_latex.tex: 76` from both
`testdata/known_syntest_failures{,_fancy}.txt`.
`IncludeWithPrototype` in `MatchIter` pushed the included target on top of the prototype, and `MatchIter::next` reads the stack top (`ctx_stack[len-1]`) — so the target's patterns were iterated first and the external prototype's second. The parser's tie-break on match_start is strict `<`, so whichever rule is enumerated first wins a same-position match. ST's `apply_prototype` semantics and `ParseState::find_best_match`'s own `context.prototype` chaining (`chain(cur_prototype).chain(cur_context)`) both put prototype patterns ahead of the target. Swap the two pushes so the prototype lands on top of the stack and is iterated first. Concretely: in HAML `tag-attributes-content`, `ruby-code` does `include: scope:source.ruby.rails.embedded.haml apply_prototype: true`. Ruby-for-HAML's prototype injects HAML's `pipe-continuations` (match `|\s*$`). Before this fix, Ruby's bitwise-or rule (`[~|^]`) at the same position was iterated first and won the tie, so `|` at EOL got `keyword.operator.bitwise.ruby` instead of `punctuation.separator.continuation.haml`; the attribute braces popped at the newline, and every continuation assertion below cascaded. After the swap, the prototype's pipe-continuation wins the tie. Refs: trishume#631
Strengthens the existing `apply_prototype_includes_external_prototype` from build-only to parse-and-assert. Adds precedence, opt-out, and HAML-Rails end-to-end guards alongside it in `src/parsing/syntax_set.rs`. Refs: trishume#631
All 65 assertions in `syntax_test_rails.haml` pass after the apply_prototype ordering fix. Delta applies to both baselines.
ST's `text_point(row, col)` overflows past-EOL columns into the next row, so its syntax-test framework evaluates past-EOL assertions against the corresponding column on the next line. syntect's harness was instead testing against the consumed `\n`'s scope — silent divergence whenever the `\n` carried parent meta_scopes that the EOL pop chain dropped. Reorder the loop to parse-before-assert; thread the first post-target line's scopes into `process_assertions` (`examples/syntest.rs`); fall back to the previous behaviour when next-line scopes aren't available (EOF, replay path). Closes 17 `syntax_test_git_config` and 1 `syntax_test_clojure` stale baselines. Refs: trishume#631
Two cross-line branches failing on the same parse_line grew
`flushed_ops` by append, so `ParseLineOutput::replayed` doubled and
consumers that pair `replayed[i]` with the i-th pending line slid
ops from one buffered line onto another's text. Observed as the
byte-77 panic at `syntax_test_java.java` line 624.
Track `flushed_ops_start` alongside `flushed_ops` and merge
subsequent fails against the prior snapshot's range. See
`ParseState::merge_flushed` docs for the composition rule.
`known_syntest_failures{,_fancy}` absorb the unmasking: Python /
TypeScript / Bash / Zsh files previously panicking now report their
real path-1 counts. Java stays at `1` — next panic site is a
pre-existing stale-`line_number` on branches created during replay,
tracked as follow-up. Refs: trishume#631
Branches created while `handle_fail` re-parses a buffered past line snapshotted `self.line_number` / `self.pending_lines.len()`, which still reflect the *outer* `parse_line`'s current line. A later fail on the outer line would then see `bp.line_number == cur_line`, classify the branch as same-line, and apply the branch's replay-line-relative `match_start` to a shorter outer line — shipped as `byte index 20 out of bounds of " foo = BAR,\n"` on `syntax_test_java.java:10263` inside `@MultiLineAnnotation(...)`, and the matching byte-2 panic in `syntax_test_markdown.md`'s multi-line math blocks. Introduce a `replay_ctx: Option<ReplayCtx>` set around each inner `parse_line_inner*` call in both replay loops. Branch creation and `handle_fail`'s `cur_line` read through it, so branches born in the re-parse of line `L+i` record `line_number = L+i` and `pending_lines_snapshot_len = <slot for L+i>`. Baselines absorb the unmasking: TypeScript drops 230 to 12 (cascading replay-branch misclassifications fixed), Markdown moves 1 to 897 (the `1` was the byte-2 panic artefact; real count surfaces). `syntax_test_java.java` stays at `1`: a distinct pre-existing `NoClearedScopesToRestore` surfaces further into the same file, tracked as a follow-up. Refs: trishume#631
A `branch_point` born inside `handle_fail`'s cross-line replay recorded only the inner re-parse's local `res` Vec as its `prefix_ops`. When that nested branch later failed cross-line, its own replay reconstructed the line from an empty prefix and the captures emitted before the *outer* branch trigger vanished. Shipped as `[foo]: /url` losing its `meta.link.reference.def.markdown` and capture scopes whenever the outer `link-def-title-continuation` branch's `immediately-pop2` alt-1 spawned a nested `link-def-attr-continuation` whose own fail then replayed line 3 without the original LRD opener captures. Compose the first-line prefix (outer `prefix_ops` + new-alt meta/pat/capture/meta_content) up front in both cross-line replay paths, surface it via `ParseState::replay_prefix_ops`, and prepend it to inner branch creations' `prefix_ops`. Baselines: Markdown 897 → 565, TypeScript 12 → 0 (file disappears — `syntax_test_typescript.ts` exercised the same nested-replay shape). Refs: trishume#631
`parse_line` captured the buffered shadow snapshot BEFORE the line ran, and the syntest consumer captured `stack_before` similarly. A replay applied during that line's parse may have corrected ops for prior buffered lines, leaving the captured snapshot reflecting the uncorrected baseline. A LATER replay covering the same line then resets to that stale snapshot, re-applies the corrected ops on top, and resurrects any meta_scope the prior replay had unwound. Manifested as `meta.link.reference.def.markdown` leaking past back-to-back Markdown link reference definitions and polluting all subsequent paragraphs, code blocks, blockquotes, autolinks, footnotes, etc. for the rest of the file (~408 chars / 88 assertions in `syntax_test_markdown.md`). After applying replays in `parse_line`, overwrite each buffered `pending_line_start_shadows[start_idx + i + 1]` with the post-i shadow, and use the post-replay shadow as the snapshot for the current line being pushed. Mirror the same correction in `syntest`'s consumer loop on `parsed_line_buffer[..].stack_before`. Baselines: - Markdown 565 → 158 (the LRD-leak family) - Java 1 (panic) → 18953 (real failures unmasked — the `NoClearedScopesToRestore` panic that the same drift was triggering is gone) Refs: trishume#631
A `pop: N + branch_point` snapshots `stack_depth` pre-pop; the synthetic Set's post-Set retain (`bp.stack_depth <= final_len`) and `handle_fail`'s validity check (`stack.len() < bp.stack_depth`) both ignored that `pop_count`, dropping the freshly-created bp at creation. Same-line re-emit also missed the popped contexts' meta_scope clearance Pop — route it through `push_meta_ops` like the original push. Symptom: `meta.annotation.identifier.java meta.path.java` leaking past nested-annotation extends paths in `syntax_test_java.java`. Drops Java baseline 18953 -> 9956.
Mirrors the trishume#660 same-line fix into the cross-line branch — the bespoke re-emit of the new alternative's meta_scope/meta_content_scope was missing the popped contexts' Pop, leaking the popped meta_scope (annotation-qualified-identifier's meta_scope in Java) plus the surrounding declaration's meta_scope when an annotation crosses a line into a class/enum/interface declaration.
…ctions
When an outer cross-line `fail`'s replay re-parses buffered lines, an
inner cross-line `fail` firing during the loop writes its correction
into `self.flushed_ops`. Previously, the outer's locally-computed
`replayed_ops[i]` overwrote that correction via `merge_flushed`, freezing
a stale interpretation for indices the inner had already corrected.
Fixes the leak in `src/parsing/parser.rs::handle_fail` for both the
alt-N and exhaustion cross-line paths. Repro: Java
`@A.B\n(par=1)\nenum E {}\n` — the outer `declarations` fail's line-1
reparse froze the dotted annotation as `path` alt before the inner
`annotation-qualified-identifier` fail's `name`-alt resolution landed.
Drops Java syntest baseline 9935 → 9774; no regressions in other
languages or in `Markdown` (still 158).
When a same-line branch_point exhausts at a zero-width lookahead, rewind the cursor to the BP's original position and skip the same-name Branch pattern on retry — letting the parent context's next rule fire instead of advancing past the lookahead, which let stale keyword rules match inside identifiers (`package` in `$package`, `class` in `Foo.class;`). Drops Java syntest 9774 → 1987 (-7787, -80%); jsp 39 → 0; Zsh 604 → 410. Markdown unchanged at 158. No regressions elsewhere. See parser.rs::handle_fail same-line exhaustion handler and the new `skipped_branches` field; new test `exhausted_branch_point_falls_through_to_parent_next_rule`.
`push_meta_ops`'s non-initial phase emitted the deep-context meta_scope/mcs Pops before restoring `cur_context.clear_scopes`. When the cleared atom belonged to one of the deeper contexts being popped, the Pops landed on the wrong (still-visible) scope — observed on Java's `case DayType when -> "incomplete"`, where `case-label-expression`'s `clear_scopes: 1` hid `case-label`'s `meta.case.java` and `case-label-end`'s `pop: 2` then popped the surrounding switch block off the consumer's stack. Move the cur_context Restore to before the depth loop so the previously-cleared atom is visible again when the deeper-context Pop lands on it. Drops Java syntest 1987 → 949 (-1038, additional -50%); fixes C#'s `syntax_test_GeneralStructure.cs` (was 2 → 0) and Haskell -1. Markdown unchanged at 158, no other regressions. See `parser.rs::push_meta_ops` Pop arm and the new test `pop_n_restores_clear_before_unwinding_deeper_meta_scopes`.
The YAML loader checked `set:`, `branch:`, and `embed:` after a `pop:` key but never `push:`. Combined `pop: N + push: X` rules degraded to a plain `Pop(N)` and silently dropped the push, leaving the parser on the outer context instead of the intended target. Affected rules in vendored syntaxes: Java's `pop: 2 + push: annotation-parameters-body` (lambda3 line 10069 and many others) and `pop: 1 + push: case-label-expression`; Python's `pop: 2 + push: function-parameter-list-body` and `type-parameter-list-body`. Java syntest 641 → 245 (-396); Python 66 → 45 (-21). Other language baselines unchanged.
The Set initial-phase Pop at parser.rs:1992 unconditionally popped `cur_context.meta_content_scope.len()` even when cur_context's mcs was never pushed because the context immediately below has `embed_scope_replaces=true`. This dropped the topmost wrapper-pushed embed_scope token. Mirrors the skip already in the Pop branch at parser.rs:1912. Markdown 158 -> 31; Python 45 -> 32 (free benefit).
Plain `set:` (no `pop_count`) into a target with `clear_scopes` emitted that Clear in `push_meta_ops`'s initial phase even when the leaving context carried its own `meta_scope` / `meta_content_scope`. Cur's ms sits on top of the visible stack at that point; Clear hid it instead of the parent atom the optimization was meant to strip. The non-initial Pop then ate atoms below cur's hidden ms, and the trailing Restore resurrected cur's ms — leaving cur's meta_scope where the parent's atom used to be. Bash repro `: ~/`: `~` set: `tilde-modifier` (clear+ms); `''` zero-width set: `tilde-modifier-username` (clear+mcs); `/` lookahead pops. ST scopes `/` as `meta.string.glob.shell string.unquoted.shell`; syntect emitted `meta.interpolation.tilde.shell string.unquoted.shell`. Fix: when cur has `meta_scope` or `meta_content_scope`, defer the single-context-set target Clear to the non-initial phase, after Pop+Restore (so Pop finds cur's ms visible and Restore brings the parent atoms back) and before pushing target's ms/mcs. The cur-empty case (Lisp `(defun fn (...)`, pinned by `v2_set_to_target_with_clear_scopes_clears_parent_meta_content_scope`) is unchanged. Net syntest: bash 249 → 30, zsh 410 → 25, java 245 → 221 on both regex backends; no other baseline lines change. New regression `cur_meta_scope_set_to_target_with_clear_scopes` mirrors the bash shape. Refs: trishume#631
Multi-context `set:` whose target body has both `clear_scopes: N` and
a non-empty `meta_scope`, fired from a cur with no ms/mcs/clear,
needs an extra atom dropped on the trigger token beyond Clear's
reach. ST drops `N + 1` atoms on the trigger and `N` on the body
content, anchoring the extra drop on the target's `meta_scope`.
`push_meta_ops` previously kept both atoms on the trigger, leaking
nested `meta.function.php` / `meta.function.return-type.php` into
the `:` of PHP `function bye(): never {`. The fix emits a combined
`Clear(N + 1)` in the initial phase and a paired `Restore` in the
non-initial phase, leaving the body content's existing per-context
Clear+Push to land it at the same place as before.
Gated on the clear-bearing target carrying a non-empty `meta_scope`
so syntaxes whose target has only `meta_content_scope` are
unaffected — Zsh's `zsh-redirection-glob-range-end` (clear+mcs, no
ms) on the `<` redirection trigger otherwise loses
`source.shell.zsh` and `meta.function-call.arguments.shell`.
PHP 1 -> 0.
push_meta_ops's `MatchOperation::Set` arm with `set_pop_count > 1` lumped target.ms + cur.ms + every popped deeper frame's mcs+ms into a single Pop. Per-frame clear_scopes were never restored — their cleared atoms stayed in clear_stack out of reach, and the new target's clear_scopes then bit one atom too deep. Observed on Python `r'''(?ix:some text(?-i:hello))(?iLmsux)(?a)foo'''`: the `(?ix:` rule's `pop: 3 + set:[group-body-extended, maybe-unexpected-quantifiers]` left `group-body-extended_outer`'s cleared `meta.mode.extended.regexp` in clear_stack; `group-body-extended_target`'s `clear_scopes: 1` then cleared `source.regexp.python` (the embed wrapper's mcs) instead of `mode_outer`. ST keeps `source.regexp.python` visible from col 22 through col 47+; syntect previously dropped it from col 27 onward. Split the lumped Pop into a head Pop (target.ms + cur.ms) and a per-depth Pop+Restore loop mirroring `MatchOperation::Pop` arm at parser.rs:1954-1971. New regression tests `pop_n_set_restores_deeper_frame_clear_scopes` (positive) and `pop_n_set_without_deeper_clear_scopes_unaffected` (negative gate against regressing Java's `pop:2 + push:annotation-parameters-body` shape). Refs: trishume#631
Resolved by per-depth clear_scopes Restore on pop:N + set:. Refs: trishume#631
`yaml_load`'s `parse_embed_op` was setting `embed_scope_replaces=true` on the wrapper unconditionally. That flag tells the per-target loop in `parser.rs` to suppress the next context's `meta_content_scope` push, to avoid duplicating the embedded syntax's top-level scope (auto- inserted into `main`'s mcs at `yaml_load.rs:706-713`) with the wrapper's last `embed_scope` atom. That dedup is only needed when the embed enters via `main`. Fragment embeds (e.g. `embed: scope:source.toml.embedded.python#toml`) bypass `main`, so the fragment context's mcs is independent of the syntax's top-level scope. Suppressing it strips a real grammar atom (TOML's `meta.mapping.toml`) and the next `clear_scopes:` then bites the wrapper instead of the intended grammar atom — leaking the wrapper out of every nested scope inside the embed. Mark the wrapper as `embed_scope_replaces=true` only when the embed target has no `#fragment`. Two regression tests: - `fragment_embed_preserves_target_meta_content_scope` (positive) - `non_fragment_embed_still_suppresses_main_mcs` (negative gate) The b31b727 test `embed_scope_replaces_preserves_wrapper_mcs_across_inner_set` is unaffected — Markdown's bash code-fence embed has no fragment. Python 32 -> 0 on both regex backends; no other baseline moves.
When a child syntax has multiple parents in `extends:` and the parents disagree on a shared context or variable, a parent's directly-defined entry now outranks another parent's inherited entry. Same-provenance ties still resolve last-wins. Fixes the indented zsh shebang in Markdown fenced blocks: `Zsh (for Markdown)` extends `[Bash (for Markdown), Zsh]`. Bash (for Markdown) owns a lenient `main` (`^(?=\s*#!)`); Zsh inherits Bash's strict column-0 main. The previous last-wins merge let Zsh's inherited main override, so the indented ` #!/usr/bin/env zsh` fell into the regular comments rule.
`get_line_assertion_details` recognised any line where the testtoken
appeared mid-text and where valid assertion markers followed. ST's
syntax-test format only allows assertions on dedicated comment-only
lines, so source code preceding the testtoken means the markers are
coincidental. The harness was processing such lines as assertions
anyway, producing spurious failures and pinning
`test_against_line_number` away from the source line so the *next*
genuine assertion tested against stale scopes.
Fix: early-return `None` when source code precedes the testtoken or
non-whitespace follows the closing testtoken_end. The two bash repros
are `: ${#^pattern}` (the `#` is the parameter-length operator) and
`[ <<doc ] # <- ]` (a trailing comment whose body starts with `<-`).
Doing this also exposed a latent bug in
`only_whitespace_after_token_end`: `after_token_end` was the substring
*from* the end-token, so the end-token glyphs themselves always
counted as non-whitespace, and `/* ^ scope */` lines were silently
classified as non-pure. Under the old gate this was harmless (the
flag only fed the `parse_test_lines` path), but the early-return
turned every C-style block-comment assertion into a non-assertion
source line. Skip the end-token before checking the trailing content.
Three new harness unit tests cover the corrected predicate and both
shell repros. Two existing tests already exercise the pure-assertion
path; their `is_pure_assertion_line` field assertions are now
invariant-true at the constructor, but kept as documentation.
Net syntest deltas (both regex backends):
- Bash 30 -> 4 (residual: backtick `for...done` interaction)
- Zsh 25 -> 10 (residual: zsh glob-range scoping)
- Haskell 49 -> 43
Stacks on trishume#673.
The per-line search cache stored full-line `regex.search` results keyed
by MatchPattern pointer, then reused them on every later search regardless
of `search_end`. Inside an embed where `search_end` is clipped to the
escape position, that reuse can flip rule outcomes whose lookaheads sit
exactly at the boundary — the cached "no match" was computed against the
escape glyph, but a fresh truncated search would see end-of-input there.
Concretely: in `` `for i in $(seq 100); do echo $i; done` `` the
`done{{cmd_break}}` rule (`done(?!cmd_char)`) was searched at the outer
level with full-line text, where the lookahead saw the closing backtick
(itself a cmd_char) and failed. That `None` was cached. Inside the
backtick embed, with search_end clipped to the close, the cache
short-circuited the lookup before the regex could re-run with
end-of-input semantics, so `done` fell through to `cmd-name-body` and
got `variable.function.shell` instead of `keyword.control.loop.end.shell`.
Skip the cache lookup whenever `search_end < line.len()`. Insertion was
already gated on `search_end == line.len()`, so the cache stays
populated by full-line answers; truncated searches just re-run.
This was referenced Apr 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacks on #674.
Inside an embed where
search_endis clipped to the escape position,the per-line search cache returned cached full-line results regardless
of the current search bound. That flipped rule outcomes whose lookaheads
sat exactly at the boundary.
In
`for i in $(seq 100); do echo $i; done`(Bash syntest), thedone{{cmd_break}}rule (done(?!cmd_char)) was searched at the outerlevel with full-line text, where the lookahead saw the closing backtick
(itself a cmd_char) and failed. That
Nonewas cached. Inside thebacktick embed, with
search_endclipped to the close, the cacheshort-circuited the lookup before the regex could re-run with
end-of-input semantics, so
donefell through tocmd-name-bodyandemitted
variable.function.shellinstead ofkeyword.control.loop.end.shell.Skip the cache lookup whenever
search_end < line.len(). Insertion wasalready gated on
search_end == line.len(), so the cache stayspopulated by full-line answers; truncated searches just re-run.
Drops bash baseline 4 → 0 on both backends.
Compare against #674: stefanobaghino/syntect@631-syntest-non-pure-assertion-gating...631-clip-lookahead-at-embed-escape