Skip to content

Substitute outer-side cross-line replay with inner correction when inner pushes a meta atom outer drops#689

Draft
stefanobaghino wants to merge 49 commits into
trishume:masterfrom
stefanobaghino:631-java-field-type-meta-path-drop
Draft

Substitute outer-side cross-line replay with inner correction when inner pushes a meta atom outer drops#689
stefanobaghino wants to merge 49 commits into
trishume:masterfrom
stefanobaghino:631-java-field-type-meta-path-drop

Conversation

@stefanobaghino
Copy link
Copy Markdown
Contributor

Review just this PR's changes: stefanobaghino/syntect@631-java-annotation-parameters-drift...631-java-field-type-meta-path-drop

Adds a substitution arm to prefer_inner_replay_corrections's SkippedDeepNonExtension branch. When is_replace_shape matches (any inner-side meta.* atom not present on the outer side), substitute outer's local ops with inner's corrected ops. A per-atom comp-pop balances the over-push: insert a Pop(1) after each over-pushed Push, gated by G2 (skip the comp-pop when any over-push atom is already on the running shadow stack — that signals legitimate preservation, not fresh introduction).

Fixes the cluster of multi-line qualified Java field declarations interrupted by /**/ and EOL comments where meta.path.java was lost and the parser flipped into meta.function.identifier.java for several columns. Java syntest baseline drops 89 → 75 cols. Cluster-B reproducers (cross_line_path_field_type_keeps_meta_path_on_continuation_line, inline_path_field_type_keeps_meta_path_when_uninterrupted, cross_line_alternative_replacement_substitution_does_not_double_meta_scope) defend the rule, comp-pop, and G2 gate.

Stacked on #686#687#688.

Refs: #631

push_meta_ops emitted the compound Pop before the Restore when the
leaving context carried `clear_scopes:` and `set_pop_count > 1`. The
intermediate popped frame's meta_content_scope is split across the
live scope stack and clear_stack; Pop — counting the full pre-Clear
total — then ate atoms from frames below the popped range.

Observed on Batch File `cmd-set-quoted-value-inner-end`
(`clear_scopes: 1`) firing `pop: 2, set: ignored-tail-outer`, which
dropped `meta.command.set.dosbatch` from the trailing content of every
quoted `set "var"=...` line.

Emit Restore before Pop when `set_pop_count > 1`; plain `set:` keeps
the existing Pop-then-Restore order (gated by the Lisp `defun` test
`v2_set_to_target_with_clear_scopes_clears_parent_meta_content_scope`).

syntax_test_batch_file.bat: 74 → 0 on both backends; no other baseline
entry changes. New regression
`pop_n_set_with_cur_clear_scopes_restores_before_popping_deeper_frames`.

Refs: trishume#631
Sublime Text applies `captures: N:` to the overlap between group N's
span and the rule's consumed match range. syntect dropped lookaround-
internal captures in `parse_captures` and would have emitted Pop past
match_end in `build_capture_ops` had they reached it.

Keep every non-negative `captures:` key at load time; clip
`(cap_start, cap_end)` to `regions.pos(0)` at apply time. Removes the
now-unused `get_consuming_capture_indexes` walker and its tests.

Baseline (both backends): clears
ASP/syntax_test_asp.asp (53), C#/tests/syntax_test_Generics.cs (3),
Rails/tests/syntax_test_rails.html.erb (23).

Refs: trishume#631
ST drops the popped context's `meta_scope` and `meta_content_scope`
from the trigger match's text for `pop: N + embed:`, unlike
`pop: N + set:` which preserves both. Rules in the wild re-add the
meta_scope atom explicitly in their match `scope:` so it still
appears exactly once on the trigger — HTML (JSP).sublime-syntax's
`tag-jsp-{declaration,expression,scriptlet}-attributes` all do.

syntect's Embed → synthetic Set routing in `push_meta_ops`
inherited plain-Set semantics, so cur.meta_scope stayed on the
stack and the match's explicit scope duplicated it on top.

Fix: when `pop_count > 0`, emit initial-phase Pops for cur's mcs
and ms, then pass a scope-stripped clone of cur through to the
recursive Set call so its non-initial `num_to_pop` doesn't
double-account atoms that are already off the stack. Probe and
ordering invariant in `v2_pop_embed_suppresses_cur_meta_scope_on_match`.

Net syntest: Java/jsp 44 → 39 on both baselines (5
`- meta.tag.jsp meta.tag.jsp` assertions cleared). The other 39
are three unrelated root causes, not addressed here.

Refs: trishume#631
A cross-line `fail` replay commits a Push(meta_scope) to
`flushed_ops` for a speculative context; a later same-line `fail`
for a branch_point created *during* the replay then truncates the
owning context out of `self.stack` without emitting a balancing
Pop (the Push is in `flushed_ops`, beyond `ops.truncate`'s reach).
`exec_escape` pops based on the truncated stack, leaving an
orphan atom at the top.

Track a `shadow: ScopeStack` mirror of the consumer view, synced
at `parse_line` boundaries the same way `syntest` applies
`replayed` + `ops`. `exec_escape` now emits a corrective Pop for
any atoms exceeding the sum of `self.stack`'s `meta_scope` /
gated `meta_content_scope` contributions.

Drops `syntax_test_latex.tex: 76` from both
`testdata/known_syntest_failures{,_fancy}.txt`.
`IncludeWithPrototype` in `MatchIter` pushed the included target
on top of the prototype, and `MatchIter::next` reads the stack
top (`ctx_stack[len-1]`) — so the target's patterns were iterated
first and the external prototype's second. The parser's
tie-break on match_start is strict `<`, so whichever rule is
enumerated first wins a same-position match.

ST's `apply_prototype` semantics and `ParseState::find_best_match`'s
own `context.prototype` chaining (`chain(cur_prototype).chain(cur_context)`)
both put prototype patterns ahead of the target. Swap the two
pushes so the prototype lands on top of the stack and is iterated
first.

Concretely: in HAML `tag-attributes-content`, `ruby-code` does
`include: scope:source.ruby.rails.embedded.haml apply_prototype: true`.
Ruby-for-HAML's prototype injects HAML's `pipe-continuations`
(match `|\s*$`). Before this fix, Ruby's bitwise-or rule
(`[~|^]`) at the same position was iterated first and won the
tie, so `|` at EOL got `keyword.operator.bitwise.ruby` instead of
`punctuation.separator.continuation.haml`; the attribute braces
popped at the newline, and every continuation assertion below
cascaded. After the swap, the prototype's pipe-continuation wins
the tie.

Refs: trishume#631
Strengthens the existing `apply_prototype_includes_external_prototype`
from build-only to parse-and-assert. Adds precedence, opt-out, and
HAML-Rails end-to-end guards alongside it in `src/parsing/syntax_set.rs`.

Refs: trishume#631
All 65 assertions in `syntax_test_rails.haml` pass after the
apply_prototype ordering fix. Delta applies to both baselines.
ST's `text_point(row, col)` overflows past-EOL columns into the next
row, so its syntax-test framework evaluates past-EOL assertions
against the corresponding column on the next line. syntect's harness
was instead testing against the consumed `\n`'s scope — silent
divergence whenever the `\n` carried parent meta_scopes that the EOL
pop chain dropped.

Reorder the loop to parse-before-assert; thread the first post-target
line's scopes into `process_assertions` (`examples/syntest.rs`); fall
back to the previous behaviour when next-line scopes aren't available
(EOF, replay path). Closes 17 `syntax_test_git_config` and 1
`syntax_test_clojure` stale baselines. Refs: trishume#631
Two cross-line branches failing on the same parse_line grew
`flushed_ops` by append, so `ParseLineOutput::replayed` doubled and
consumers that pair `replayed[i]` with the i-th pending line slid
ops from one buffered line onto another's text. Observed as the
byte-77 panic at `syntax_test_java.java` line 624.

Track `flushed_ops_start` alongside `flushed_ops` and merge
subsequent fails against the prior snapshot's range. See
`ParseState::merge_flushed` docs for the composition rule.

`known_syntest_failures{,_fancy}` absorb the unmasking: Python /
TypeScript / Bash / Zsh files previously panicking now report their
real path-1 counts. Java stays at `1` — next panic site is a
pre-existing stale-`line_number` on branches created during replay,
tracked as follow-up. Refs: trishume#631
Branches created while `handle_fail` re-parses a buffered past line
snapshotted `self.line_number` / `self.pending_lines.len()`, which
still reflect the *outer* `parse_line`'s current line. A later fail
on the outer line would then see `bp.line_number == cur_line`,
classify the branch as same-line, and apply the branch's
replay-line-relative `match_start` to a shorter outer line —
shipped as `byte index 20 out of bounds of "  foo = BAR,\n"` on
`syntax_test_java.java:10263` inside `@MultiLineAnnotation(...)`,
and the matching byte-2 panic in `syntax_test_markdown.md`'s
multi-line math blocks.

Introduce a `replay_ctx: Option<ReplayCtx>` set around each inner
`parse_line_inner*` call in both replay loops. Branch creation and
`handle_fail`'s `cur_line` read through it, so branches born in the
re-parse of line `L+i` record `line_number = L+i` and
`pending_lines_snapshot_len = <slot for L+i>`.

Baselines absorb the unmasking: TypeScript drops 230 to 12
(cascading replay-branch misclassifications fixed), Markdown moves
1 to 897 (the `1` was the byte-2 panic artefact; real count
surfaces). `syntax_test_java.java` stays at `1`: a distinct
pre-existing `NoClearedScopesToRestore` surfaces further into the
same file, tracked as a follow-up.

Refs: trishume#631
A `branch_point` born inside `handle_fail`'s cross-line replay
recorded only the inner re-parse's local `res` Vec as its
`prefix_ops`. When that nested branch later failed cross-line, its
own replay reconstructed the line from an empty prefix and the
captures emitted before the *outer* branch trigger vanished.

Shipped as `[foo]: /url` losing its
`meta.link.reference.def.markdown` and capture scopes whenever the
outer `link-def-title-continuation` branch's `immediately-pop2`
alt-1 spawned a nested `link-def-attr-continuation` whose own fail
then replayed line 3 without the original LRD opener captures.

Compose the first-line prefix (outer `prefix_ops` + new-alt
meta/pat/capture/meta_content) up front in both cross-line replay
paths, surface it via `ParseState::replay_prefix_ops`, and
prepend it to inner branch creations' `prefix_ops`.

Baselines: Markdown 897 → 565, TypeScript 12 → 0 (file disappears
— `syntax_test_typescript.ts` exercised the same nested-replay
shape).

Refs: trishume#631
`parse_line` captured the buffered shadow snapshot BEFORE the line
ran, and the syntest consumer captured `stack_before` similarly. A
replay applied during that line's parse may have corrected ops for
prior buffered lines, leaving the captured snapshot reflecting the
uncorrected baseline. A LATER replay covering the same line then
resets to that stale snapshot, re-applies the corrected ops on top,
and resurrects any meta_scope the prior replay had unwound.

Manifested as `meta.link.reference.def.markdown` leaking past
back-to-back Markdown link reference definitions and polluting all
subsequent paragraphs, code blocks, blockquotes, autolinks, footnotes,
etc. for the rest of the file (~408 chars / 88 assertions in
`syntax_test_markdown.md`).

After applying replays in `parse_line`, overwrite each buffered
`pending_line_start_shadows[start_idx + i + 1]` with the post-i
shadow, and use the post-replay shadow as the snapshot for the
current line being pushed. Mirror the same correction in `syntest`'s
consumer loop on `parsed_line_buffer[..].stack_before`.

Baselines:
- Markdown 565 → 158 (the LRD-leak family)
- Java 1 (panic) → 18953 (real failures unmasked — the
  `NoClearedScopesToRestore` panic that the same drift was
  triggering is gone)

Refs: trishume#631
A `pop: N + branch_point` snapshots `stack_depth` pre-pop; the synthetic
Set's post-Set retain (`bp.stack_depth <= final_len`) and `handle_fail`'s
validity check (`stack.len() < bp.stack_depth`) both ignored that
`pop_count`, dropping the freshly-created bp at creation. Same-line
re-emit also missed the popped contexts' meta_scope clearance Pop —
route it through `push_meta_ops` like the original push.

Symptom: `meta.annotation.identifier.java meta.path.java` leaking past
nested-annotation extends paths in `syntax_test_java.java`. Drops
Java baseline 18953 -> 9956.
Mirrors the trishume#660 same-line fix into the cross-line branch — the
bespoke re-emit of the new alternative's meta_scope/meta_content_scope
was missing the popped contexts' Pop, leaking the popped meta_scope
(annotation-qualified-identifier's meta_scope in Java) plus the
surrounding declaration's meta_scope when an annotation crosses a
line into a class/enum/interface declaration.
…ctions

When an outer cross-line `fail`'s replay re-parses buffered lines, an
inner cross-line `fail` firing during the loop writes its correction
into `self.flushed_ops`. Previously, the outer's locally-computed
`replayed_ops[i]` overwrote that correction via `merge_flushed`, freezing
a stale interpretation for indices the inner had already corrected.

Fixes the leak in `src/parsing/parser.rs::handle_fail` for both the
alt-N and exhaustion cross-line paths. Repro: Java
`@A.B\n(par=1)\nenum E {}\n` — the outer `declarations` fail's line-1
reparse froze the dotted annotation as `path` alt before the inner
`annotation-qualified-identifier` fail's `name`-alt resolution landed.

Drops Java syntest baseline 9935 → 9774; no regressions in other
languages or in `Markdown` (still 158).
When a same-line branch_point exhausts at a zero-width
lookahead, rewind the cursor to the BP's original position
and skip the same-name Branch pattern on retry — letting the
parent context's next rule fire instead of advancing past the
lookahead, which let stale keyword rules match inside
identifiers (`package` in `$package`, `class` in `Foo.class;`).

Drops Java syntest 9774 → 1987 (-7787, -80%); jsp 39 → 0; Zsh
604 → 410. Markdown unchanged at 158. No regressions
elsewhere.

See parser.rs::handle_fail same-line exhaustion handler and
the new `skipped_branches` field; new test
`exhausted_branch_point_falls_through_to_parent_next_rule`.
`push_meta_ops`'s non-initial phase emitted the deep-context
meta_scope/mcs Pops before restoring `cur_context.clear_scopes`.
When the cleared atom belonged to one of the deeper contexts
being popped, the Pops landed on the wrong (still-visible)
scope — observed on Java's `case DayType when -> "incomplete"`,
where `case-label-expression`'s `clear_scopes: 1` hid
`case-label`'s `meta.case.java` and `case-label-end`'s
`pop: 2` then popped the surrounding switch block off the
consumer's stack.

Move the cur_context Restore to before the depth loop so the
previously-cleared atom is visible again when the deeper-context
Pop lands on it.

Drops Java syntest 1987 → 949 (-1038, additional -50%); fixes
C#'s `syntax_test_GeneralStructure.cs` (was 2 → 0) and
Haskell -1. Markdown unchanged at 158, no other regressions.

See `parser.rs::push_meta_ops` Pop arm and the new test
`pop_n_restores_clear_before_unwinding_deeper_meta_scopes`.
The YAML loader checked `set:`, `branch:`, and `embed:` after a `pop:`
key but never `push:`. Combined `pop: N + push: X` rules degraded to a
plain `Pop(N)` and silently dropped the push, leaving the parser on the
outer context instead of the intended target.

Affected rules in vendored syntaxes: Java's
`pop: 2 + push: annotation-parameters-body` (lambda3 line 10069 and
many others) and `pop: 1 + push: case-label-expression`; Python's
`pop: 2 + push: function-parameter-list-body` and `type-parameter-list-body`.

Java syntest 641 → 245 (-396); Python 66 → 45 (-21). Other language
baselines unchanged.
The Set initial-phase Pop at parser.rs:1992 unconditionally popped
`cur_context.meta_content_scope.len()` even when cur_context's mcs
was never pushed because the context immediately below has
`embed_scope_replaces=true`. This dropped the topmost wrapper-pushed
embed_scope token. Mirrors the skip already in the Pop branch at
parser.rs:1912.

Markdown 158 -> 31; Python 45 -> 32 (free benefit).
Plain `set:` (no `pop_count`) into a target with `clear_scopes`
emitted that Clear in `push_meta_ops`'s initial phase even when the
leaving context carried its own `meta_scope` / `meta_content_scope`.
Cur's ms sits on top of the visible stack at that point; Clear hid
it instead of the parent atom the optimization was meant to strip.
The non-initial Pop then ate atoms below cur's hidden ms, and the
trailing Restore resurrected cur's ms — leaving cur's meta_scope
where the parent's atom used to be.

Bash repro `: ~/`: `~` set: `tilde-modifier` (clear+ms); `''`
zero-width set: `tilde-modifier-username` (clear+mcs); `/` lookahead
pops. ST scopes `/` as `meta.string.glob.shell string.unquoted.shell`;
syntect emitted `meta.interpolation.tilde.shell string.unquoted.shell`.

Fix: when cur has `meta_scope` or `meta_content_scope`, defer the
single-context-set target Clear to the non-initial phase, after
Pop+Restore (so Pop finds cur's ms visible and Restore brings the
parent atoms back) and before pushing target's ms/mcs. The cur-empty
case (Lisp `(defun fn (...)`, pinned by
`v2_set_to_target_with_clear_scopes_clears_parent_meta_content_scope`)
is unchanged.

Net syntest: bash 249 → 30, zsh 410 → 25, java 245 → 221 on both
regex backends; no other baseline lines change. New regression
`cur_meta_scope_set_to_target_with_clear_scopes` mirrors the bash
shape.

Refs: trishume#631
Multi-context `set:` whose target body has both `clear_scopes: N` and
a non-empty `meta_scope`, fired from a cur with no ms/mcs/clear,
needs an extra atom dropped on the trigger token beyond Clear's
reach. ST drops `N + 1` atoms on the trigger and `N` on the body
content, anchoring the extra drop on the target's `meta_scope`.

`push_meta_ops` previously kept both atoms on the trigger, leaking
nested `meta.function.php` / `meta.function.return-type.php` into
the `:` of PHP `function bye(): never {`. The fix emits a combined
`Clear(N + 1)` in the initial phase and a paired `Restore` in the
non-initial phase, leaving the body content's existing per-context
Clear+Push to land it at the same place as before.

Gated on the clear-bearing target carrying a non-empty `meta_scope`
so syntaxes whose target has only `meta_content_scope` are
unaffected — Zsh's `zsh-redirection-glob-range-end` (clear+mcs, no
ms) on the `<` redirection trigger otherwise loses
`source.shell.zsh` and `meta.function-call.arguments.shell`.

PHP 1 -> 0.
push_meta_ops's `MatchOperation::Set` arm with `set_pop_count > 1`
lumped target.ms + cur.ms + every popped deeper frame's mcs+ms
into a single Pop. Per-frame clear_scopes were never restored —
their cleared atoms stayed in clear_stack out of reach, and the
new target's clear_scopes then bit one atom too deep.

Observed on Python `r'''(?ix:some text(?-i:hello))(?iLmsux)(?a)foo'''`:
the `(?ix:` rule's `pop: 3 + set:[group-body-extended,
maybe-unexpected-quantifiers]` left `group-body-extended_outer`'s
cleared `meta.mode.extended.regexp` in clear_stack;
`group-body-extended_target`'s `clear_scopes: 1` then cleared
`source.regexp.python` (the embed wrapper's mcs) instead of
`mode_outer`. ST keeps `source.regexp.python` visible from col 22
through col 47+; syntect previously dropped it from col 27 onward.

Split the lumped Pop into a head Pop (target.ms + cur.ms) and a
per-depth Pop+Restore loop mirroring `MatchOperation::Pop` arm at
parser.rs:1954-1971. New regression tests
`pop_n_set_restores_deeper_frame_clear_scopes` (positive) and
`pop_n_set_without_deeper_clear_scopes_unaffected` (negative gate
against regressing Java's `pop:2 + push:annotation-parameters-body`
shape).

Refs: trishume#631
Resolved by per-depth clear_scopes Restore on pop:N + set:.

Refs: trishume#631
`yaml_load`'s `parse_embed_op` was setting `embed_scope_replaces=true`
on the wrapper unconditionally. That flag tells the per-target loop in
`parser.rs` to suppress the next context's `meta_content_scope` push,
to avoid duplicating the embedded syntax's top-level scope (auto-
inserted into `main`'s mcs at `yaml_load.rs:706-713`) with the
wrapper's last `embed_scope` atom.

That dedup is only needed when the embed enters via `main`. Fragment
embeds (e.g. `embed: scope:source.toml.embedded.python#toml`) bypass
`main`, so the fragment context's mcs is independent of the syntax's
top-level scope. Suppressing it strips a real grammar atom (TOML's
`meta.mapping.toml`) and the next `clear_scopes:` then bites the
wrapper instead of the intended grammar atom — leaking the wrapper
out of every nested scope inside the embed.

Mark the wrapper as `embed_scope_replaces=true` only when the embed
target has no `#fragment`. Two regression tests:
- `fragment_embed_preserves_target_meta_content_scope` (positive)
- `non_fragment_embed_still_suppresses_main_mcs` (negative gate)

The b31b727 test
`embed_scope_replaces_preserves_wrapper_mcs_across_inner_set` is
unaffected — Markdown's bash code-fence embed has no fragment.

Python 32 -> 0 on both regex backends; no other baseline moves.
When a child syntax has multiple parents in `extends:` and the parents
disagree on a shared context or variable, a parent's directly-defined
entry now outranks another parent's inherited entry. Same-provenance
ties still resolve last-wins.

Fixes the indented zsh shebang in Markdown fenced blocks: `Zsh (for
Markdown)` extends `[Bash (for Markdown), Zsh]`. Bash (for Markdown)
owns a lenient `main` (`^(?=\s*#!)`); Zsh inherits Bash's strict
column-0 main. The previous last-wins merge let Zsh's inherited main
override, so the indented `   #!/usr/bin/env zsh` fell into the
regular comments rule.
`get_line_assertion_details` recognised any line where the testtoken
appeared mid-text and where valid assertion markers followed. ST's
syntax-test format only allows assertions on dedicated comment-only
lines, so source code preceding the testtoken means the markers are
coincidental. The harness was processing such lines as assertions
anyway, producing spurious failures and pinning
`test_against_line_number` away from the source line so the *next*
genuine assertion tested against stale scopes.

Fix: early-return `None` when source code precedes the testtoken or
non-whitespace follows the closing testtoken_end. The two bash repros
are `: ${#^pattern}` (the `#` is the parameter-length operator) and
`[ <<doc ]  # <- ]` (a trailing comment whose body starts with `<-`).

Doing this also exposed a latent bug in
`only_whitespace_after_token_end`: `after_token_end` was the substring
*from* the end-token, so the end-token glyphs themselves always
counted as non-whitespace, and `/* ^ scope */` lines were silently
classified as non-pure. Under the old gate this was harmless (the
flag only fed the `parse_test_lines` path), but the early-return
turned every C-style block-comment assertion into a non-assertion
source line. Skip the end-token before checking the trailing content.

Three new harness unit tests cover the corrected predicate and both
shell repros. Two existing tests already exercise the pure-assertion
path; their `is_pure_assertion_line` field assertions are now
invariant-true at the constructor, but kept as documentation.

Net syntest deltas (both regex backends):
- Bash 30 -> 4 (residual: backtick `for...done` interaction)
- Zsh 25 -> 10 (residual: zsh glob-range scoping)
- Haskell 49 -> 43

Stacks on trishume#673.
The per-line search cache stored full-line `regex.search` results keyed
by MatchPattern pointer, then reused them on every later search regardless
of `search_end`. Inside an embed where `search_end` is clipped to the
escape position, that reuse can flip rule outcomes whose lookaheads sit
exactly at the boundary — the cached "no match" was computed against the
escape glyph, but a fresh truncated search would see end-of-input there.

Concretely: in `` `for i in $(seq 100); do echo $i; done` `` the
`done{{cmd_break}}` rule (`done(?!cmd_char)`) was searched at the outer
level with full-line text, where the lookahead saw the closing backtick
(itself a cmd_char) and failed. That `None` was cached. Inside the
backtick embed, with search_end clipped to the close, the cache
short-circuited the lookup before the regex could re-run with
end-of-input semantics, so `done` fell through to `cmd-name-body` and
got `variable.function.shell` instead of `keyword.control.loop.end.shell`.

Skip the cache lookup whenever `search_end < line.len()`. Insertion was
already gated on `search_end == line.len()`, so the cache stays
populated by full-line answers; truncated searches just re-run.
In a multi-context `set:` whose non-topmost target declares
`clear_scopes: N` plus a `meta_content_scope`-only body (empty
`meta_scope`), Sublime applies the Clear to atoms that earlier targets
pushed via their `meta_scope` and the strip is visible to the trigger
match's own scope/captures. Syntect was deferring the Clear to the
non-initial phase, so the trigger token leaked the cleared atom even
though body content saw it removed.

Surfaces on Zsh glob-range openings inside `[ <1-2> ]` etc.: the
`zsh-redirection-glob-range-begin` `set:` lists `string-path-pattern-body`
(meta_scope `meta.string.glob.shell string.unquoted.shell`) before
`zsh-redirection-glob-range-end` (`clear_scopes: 1` +
`meta_content_scope: meta.range.shell.zsh`), and the `<` carries a
capture scope asserted with `- string`. Drops the residual 10-char Zsh
syntest failure on both backends.
Two-part guard against `branch_point` exhaustion collapsing a parent
`meta_scope` one line boundary too early on empty lines:

1. In `parse_next_token`, a non-consuming `Branch` match that lands at
   or past the replay line's end is skipped when inside `replay_ctx`.
   Without this, the outer fail-replay would chain another `branch_point`
   at end-of-replay-line whose own cross-line exhaustion later attaches
   pops to the wrong line.
2. In `handle_fail`'s same-line path, when the rewind position is 0 of
   a purely empty line (length ≤ 1, just `\n`), advance the cursor to
   `line.len()`. The next-iteration `match: ''` of an `immediately-pop`-
   style alt then emits its scope pops past-EOL, which
   `ScopeRegionIterator` wraps onto the next line's baseline.

Together they make Markdown's non-terminated link reference definition
keep `meta.link.reference.def.markdown` on the empty line between
`blah` and the closing `text` paragraph, matching ST.

Baseline: Markdown 1 → 0 (the `syntax_test_markdown.md` line drops
from `known_syntest_failures{,_fancy}.txt`). No other rows change.
The harness's `SYNTAX_TEST_HEADER_PATTERN` restricted `testtoken_end`
to punctuation glyphs (`*/`, `-->`, …), assuming alphabetic tails like
`dmd`, `clojure`, or `dotnet run` were shebang-style instructions to
ignore. ST disagrees: those tails *are* the closing testtoken, and ST
clips each assertion line's selector at the first substring match.
The D shebang test's ` #! <- keyword.operator.logical.d dmd` and the
Clojure shebang's `<- comment.line.shebang.clojure …` both relied on
that clipping; under the old regex `dmd` / `clojure` leaked into the
selector and the assertions failed against scopes the parser had
correct.

Two-part fix in `examples/syntest.rs`:

- Broaden the `testtoken_end` capture to the entire whitespace-stripped
  trailing tail (`\S(?:.*\S)?`), so multi-word tails like `dotnet run`
  also round-trip cleanly.
- Drop the `only_whitespace_after_token_end` gate. The Clojure case
  has `clojure` inside `comment.line.shebang.clojure`, so clipping
  succeeds but content follows the closing token; ST still treats the
  line as a pure assertion (with the clipped selector) rather than as
  source code, and so should we. The before-`testtoken_start`
  whitespace check alone is enough to reject the bash `: ${#^pat}` and
  `[ <<doc ]  # <- ]` repros that motivated the gate.

Baseline drops both `syntax_test_shebang.d` and
`syntax_test_shebang.clj` rows from
`testdata/known_syntest_failures{,_fancy}.txt`. No other rows change.

Stacked on trishume#677.
Pre-fix `recursively_mark_no_prototype` followed every `Push` / `Set` /
`Branch` / `Embed` AND every nested `include` from the prototype's
include chain unconditionally, marking every reachable context as
"don't include the prototype". For Haskell that meant marking
`function-name`, `variable-name`, and `variable-name-end` because of
the chain
  prototype → preprocessor-pragmas
            → push: preprocessor-pragma-body
            → embed: preprocessor-pragma-signature-value
            → include: functions
            → branch: variable-name, function-name
            → push: variable-name-end
With the prototype's `line-comments` rule no longer applied inside
`variable-name-end`, the `(?=\S)` pop:2 rule fired on every `--` of
the assertion-comment lines that sit between an infix operator
declaration and its `:: a -> Bool` continuation. That popped the
branch alternative off the stack mid-air, orphaned the `functions`
branch_point, and prevented the `(?=::)` `fail: functions` rule from
ever installing `meta.function.identifier.haskell` via cross-line
replay. ST verified via `scope_at_test`: every position the harness
flagged as wrong is `source.haskell meta.function.identifier.haskell …`
in ST.

The fix tracks a `via_push` flag through the recursion: includes are
followed only while still in the prototype's include chain
(`via_push: false`); once we've crossed a Push/Set/Branch/Embed we
keep following further match-op targets but stop following the body's
own `include:`s. That preserves the YAML and Lua cases (where
prototype-pushed bodies chain via `set:` to other prototype-pushed
bodies that DO need the no_prototype mark to break the loop —
`property → property-body`, `line-doc-comment-body → maybe-line-doc-
comment → line-doc-comment-body`) while keeping prototype attached
to general code-parsing contexts that are merely included from a
body for its local rule access.

Baseline: `syntax_test_haskell.hs` 43 → 1 (just the orthogonal
`variable.other..haskell` double-dot selector failure remains, fixed
in the next commit). `syntax_test_java.java` 221 → 212 incidentally —
same underlying mechanism unmasked nine column-failures that the
over-marking had been hiding.

Stacked on trishume#678.
`Scope::new("variable.other..haskell")` (double dot from a typo or a
test author writing `variable.other..haskell` to bypass ST's symbol-
test heuristics) used to pack `""` as a real atom, producing a 4-atom
scope `[variable, other, "", haskell]` that no longer prefix-matched
the 3-atom `variable.other.haskell` it was meant to equal.

ST's selector engine collapses runs of dots — `score_selector(
'variable.other..haskell', 'source.haskell variable.other.haskell')`
returns 48, the same as the single-dot form. Mirror that by filtering
empty segments in `ScopeRepository::build`. Symmetric: applies to
both selector parsing in syntest assertions and to scope construction
where a syntax accidentally has `scope: foo..bar`.

Surfaces as the last `syntax_test_haskell.hs` failure
(`syntax_test_haskell.hs:2348` line `:: a -> Bool`,
`--         ^ variable.other..haskell` against scope
`source.haskell variable.other.haskell`).

Baseline: `syntax_test_haskell.hs` drops out of both
`testdata/known_syntest_failures{,_fancy}.txt`. Java incidentally went
from 221 to 212 with the prior commit's prototype-attachment fix; the
new line is recorded here.
Submodule moves from `1ba99a47` (`v4201-119-g1ba99a47`) to the
shipped `v4202` tag (`91ad8085`, "[D, Makefile, Rust] Standardize
build output scopes"). v4202 is the most recent stable release tag
before the C# v2 migration `8621831d` and the regex embed grammar
`c735169b`; pinning here keeps `regex_string` on the legacy
`embed: scope:source.regexp; embed_scope: meta.string.cs meta.regexp.cs`
form, sidestepping the wrapper-mcs divergence between syntect and
ST DEV's renderer that produced the `syntax_test_C#11.cs: 35`
baseline entry.

Compared with v4200 and v4204/v4205:
- v4200 requires regenerating `testdata/test4.html` against the
  older `Cargo.sublime-syntax` (pre-`91ad8085` scope rename); v4202
  matches the existing fixture as-is.
- v4204/v4205 reintroduce the C#11 row (35) plus a
  `parser.rs::can_parse_preprocessor_rules` divergence from the C
  directive-scope refactor `44871676`.

Baseline movement: `make syntest` and `make syntest-fancy` both end
clean ("No new failures!"). C#11 row drops (-35); Java row at 212
unchanged. Net -35 failures.

Companion fixes for v4202's older fixtures:

- `parsing::syntax_set::tests::can_load`: Rails `main`'s
  `context_iter` count drops from 185 to 184 (one context added
  upstream post-v4202).

- `parser.rs::push_meta_ops`: keep the auto-injected top-level scope
  across v2 set's cur.mcs Pop. The initial-phase Pop was popping
  `cur_context.meta_content_scope.len()` atoms at `match_start` so
  the matched text wouldn't see cur's `meta_content_scope`. That
  overcounts when cur is `main`: `add_initial_contexts` injects the
  syntax's top-level scope at `main.meta_content_scope[0]`, which
  ST keeps on the visible stack across the trigger (verified against
  ST 4200 stable on TOML's `[section]` rule, where the `[` trigger
  emits `source.toml` alongside `meta.section.toml`). Without this,
  the v4202-era `Rust/tests/syntax_test_frontmatter.{rs,md}` would
  fail at the `[section]` trigger position — the upstream fix
  `20212766` for the same divergence is post-v4202 and not in
  scope. Regression coverage in
  `v2_set_does_not_apply_parent_meta_content_scope_to_matched_text`
  still pins user-declared cur.mcs as popped.
Cross-line all-exhaustion in `handle_fail` advanced one char past the
branch_point's lookahead, leaving the rest of the matched identifier
to be reparsed without the branch_point in scope. The same-line arm
already does the rewind+skipped_branches dance from f3e497a; extend it
to the cross-line arm so the parent context's NEXT rule fires at the
BP's match position.

Drops Java syntest 212 → 119 (-93 char-assertions). The three
unique-line wins are `package apple dot` line 572, and the
`    variable` after `import no.terminator` / `import static
no.terminator` on lines 656 and 671 — top-level-`java` cases where
`declarations` exhausts and ST falls through to
`else-expressions → expressions → constant-expressions → variables`.

Drops `outer_cross_line_replay_prefers_inner_correction`. The test
was added in trishume#663 to guard the inner-correction-preference machinery
under the path "outer `declarations` 0 → 1, inner
`annotation-qualified-identifier` 0 → 1". Intervening parser fixes
between trishume#663's baseline (9774) and current HEAD (212) shifted control
flow so that the test's 3-line input now hits the cross-line
all-exhaust path instead, with the outer cycling all 5 alts. The
test's coverage of `prefer_inner_replay_corrections` was already lost
before this change; deleting it reflects that. The current Java
baseline failures still exercise the alt-N path through other inputs.
…lings only

The previous gate skipped substitution iff
`inner.stack_depth > outer.stack_depth`, which collapsed two
structurally distinct cases — sibling refinement (substitute
needed, e.g. `outer=declarations(3)`,
`inner=annotation-identifier(4)` on the cluster-1 input
`@Anno\n.\nAnno\n(par=1)\nenum E {}`) and child-of-resolved-alt
nesting (substitute must skip, the multigen16 doubling guarded by
`deeper_inner_bp_correction_does_not_double_outer_meta_scope`).

Tighten to `depth_diff in {0, 1}`. Java syntest 119 → 117. Adds
`cross_line_all_exhaust_with_pop_count_emits_popped_meta_scope_pops`
as a passing regression test for the cluster-1 input. Doubling
guard stays green.
For `pop: N + set:` rules where N > 1 and the target declares its own
meta_scope, the trigger token's effective scope leaked the (N-1)
deeper popped contexts' meta_scope atoms — observed on Java's
`@RunWith(JUnit4.class)`, where `annotation-unqualified-parameters`'s
`pop: 2 push: annotation-parameters-body` left
`meta.annotation.identifier.java` on the `(` token. ST drops those
atoms before the trigger sees its scope when the target has a
meta_scope.

Move the deeper `meta_scope` / `meta_content_scope` pops from
`push_meta_ops`'s non-initial phase to its initial phase, gated on
target `meta_scope` being non-empty and no `clear_scopes`
interactions. The TS quirk where ST keeps the deeper `meta_scope`
visible to the trigger when the target has no `meta_scope` (e.g.
`(?:get|set|async){{identifier_break}} pop: 2 + set:` in
`object-property-name`) keeps the original non-initial path.

Java syntest baseline drops 117 → 113.
For a `pop: N + set/branch/push:` rule whose `scope:` leads with the
same atoms as the popped frames' `meta_scope` (the rule re-states a
meta_scope it's in the process of unwinding), the matched text saw
both copies — the still-on-stack popped meta_scope AND the rule's
own re-statement. Observed on Java's
`annotation-qualified-identifier-name` whose `scope:
meta.annotation.identifier.java meta.path.java
variable.annotation.java` + `pop: 2 + branch:` doubled
`meta.annotation.identifier.java meta.path.java` on
`@ClassName.FixMethodOrder(...)`. ST collapses these adjacent
duplicates rather than re-pushing them.

`pat_scope_skip_count` computes how many leading atoms of the
rule's `scope:` match a trailing slice of the popped frames'
`meta_scope` and skips that many pushes (and the matching closing
Pop) at the call site. Gated on `MatchOperation::Set` with non-zero
`pop_count` and a non-empty popped meta_scope so pure `Push:` rules
keep their genuinely-nested same-named atoms intact — the gate that
preserves CSS `selector(.bar)` inside `@import supports(...)`'s
inner `meta.function-call.arguments.css meta.group.css` level.

Java syntest baseline drops 113 → 99.
`make update-known-failures{,-fancy}` after the prior commit. Java
drops 113 → 99 cols on `syntax_test_java.java:10108`; no other
language shifts.
`prefer_inner_replay_corrections` skipped substitution when the inner
BP was more than one frame deeper than the outer BP, to avoid the
multigen16-style doubling where the inner reparse re-pushes atoms
outer's resolved alt already provides
(`deeper_inner_bp_correction_does_not_double_outer_meta_scope`).

That same skip dropped legitimate end-of-line pops from an inner BP's
`immediately-pop` failover when the outer's per-line replay
terminated before the nested fail fired. Multi-line declarations like
`@Number\n final\n int\n` leaked `meta.annotation.identifier.java`
past the `\n`: the inner `annotation-unqualified-parameters` BP's
commit pop sat at the BP trigger position (col 11 of line 3), but
the outer `declarations` cross-line replay's locally-computed line 3
ops never received it — `prefer_inner_replay_corrections`'s depth gate
discarded the inner correction entirely.

Allow substitution when inner ops are an `immediately-pop`-style
tail-extension of outer's: identical prefix + appended `Pop` ops at
positions outer already covers. New helper `inner_extends_outer`
checks this. Multigen16's reparse adds `Push` ops, so the new gate
declines that case unchanged.

Java syntest baseline: 99 → 97 (covers `syntax_test_java.java:5018-5020`
and `:5036-5038`).
Java line drops from 99 to 97 in both default and fancy regex
baselines.

Refs: trishume#631
Sublime Text treats `pop:N + push:X` as lookahead (the trigger token
excludes popped frames' meta_scope) and `pop:N + set:X` as stacking
(the trigger token receives popped frames' meta_scope AND the pushed
meta_scope). The YAML loader was merging both forms into
`MatchOperation::Set { ctx_refs, pop_count }`, conflating two
semantically distinct operations.

Commit 610412f patched the symptom by stripping deeper popped
meta_scope before pop:N+set: trigger tokens, which fixed the Java
`@RunWith(JUnit4.class)` case (actually `pop:2 push:`) but broke
genuine `pop:N + set:` rules — probe-witnessed against ST 4200.

Fix the conflation structurally:

- Extend `MatchOperation::Push` to a struct variant carrying
  `pop_count`. `pop_count == 0` is plain push; `pop_count > 0` is
  `pop:N + push:` lookahead.
- Route `pop:N + push:` to Push instead of Set in the YAML loader.
- In `push_meta_ops`, gate the initial-phase deeper-pop on
  Push-with-pop (lookahead, unconditional), and emit Set-with-pop's
  deeper-pop in the non-initial phase (stacking, unconditional).
- In `perform_op`, when handling Push with `pop_count > 0`, pop the
  runtime stack and prune branch_points/escape_stack — without this
  the target context would be pushed on top of frames that should
  have been popped.

Branch+pop and embed+pop have the same conflation problem (they
synthesize a `Set { pop_count }` and currently apply stacking where
ST wants lookahead). Marked with TODOs at the synthesis sites; full
fix tracked separately.

The Push tuple → struct variant change breaks bincode wire format,
so the committed packdumps are regenerated.

Refs: trishume#631
Sublime Text treats `pop:N + branch:X` as lookahead — the trigger token
excludes popped frames' meta_scope. Branch dispatch was synthesising
`Set { pop_count }` (stacking) and a per-rule scope-dedup
(`pat_scope_skip_count`) collapsed the resulting doubled atoms when a
rule's `scope:` led with the popped frame's meta_scope. That dedup also
fired for genuine `pop:N + set:` rules where ST keeps the duplicate
(probe-witnessed via SetDedupProbe).

After the prior commit's Push-with-pop lookahead, the fix is mechanical:
flip Branch's four synthetic-op sites in parser.rs from
`Set { pop_count }` to `Push { pop_count }` (primary `_parse_match`,
fail-replay first-line prefix, same-line fail re-emission, and the
push_meta_ops safety fallback). Then delete `pat_scope_skip_count` and
restore the simpler unconditional push/pop pair around `pat.scope`.

Java's `@ClassName.FixMethodOrder(...)` (`syntax_test_java.java:10108`)
is now defended by Branch+pop's lookahead path rather than the dedup
mask. Java baseline stays at 97; full syntest matches the existing
known-failures file.

`pop:N + embed:` has the same conflation but is interlocked with
cur_context meta_scope suppression in the Embed arm; left for a
follow-up with a refreshed TODO at the synthesis site.

Refs: trishume#631
Closes the last conflation site from the pop+action lookahead/stacking series. Per ST docs ("for `push`, `embed` and `branch` actions, the pop treats the match as if it were a lookahead"), `pop:N + embed:` should exclude popped frames' meta_scope from the trigger token. Embed dispatch in `push_meta_ops` was synthesising `Set { pop_count }` (stacking) — the original Branch/Embed/Push conflation predates the bug-#1 IR split.

Flip the synthetic to `Push { pop_count }`. The pre-recursive Pops at parser.rs:2683-2700 still cover ST's embed-specific cur_context meta_scope suppression (depth 0); the Push-with-pop deeper-pop loop covers depths 1..N. The existing JSP `<jsp:declaration>` regression test (`v2_pop_embed_suppresses_cur_meta_scope_on_match`) still passes.

New regression `pop_n_embed_drops_deeper_meta_scope_at_trigger` mirrors the bug-#1 PopFirstProbe shape for embed: stack `main → mid → inner` with non-empty meta_scopes, rule `pop: 2 + embed:` on `c`, asserts trigger excludes both `meta.mid` and `meta.inner`.

Java baseline holds at 97; full syntest matches the existing baselines.

Refs: trishume#631
…ay arbitration

`merge_flushed` and `prefer_inner_replay_corrections` previously discriminated overlapping ops slots only by stack depth. Two cross-line patterns slipped through that suffix: a wrapping earlier-line retry overwriting later-line corrections in the SnapLeStart arm (Java `@Anno\n.\nAnno\n(par=1)\nenum E {}` losing the AQI alt's `variable.annotation.java` at line 3); and a SnapGtStart roll-up shrinking the rolled-up depth attribution and overwriting a deeper-producer slot (the same Java input losing `meta.annotation.parameters.java` at the `(` when the body spans comment-broken lines).

Track `flushed_ops` BP identity per slot rather than per buffer, and extend `BpInfo` with an `inner_producer: Option<Box<BpInfo>>` chain that survives `prefer_inner_replay_corrections` substitutions. With per-slot attribution available, two production rules fall out:

- Per-slot line-number rule (commit point: SnapLeStart in `merge_flushed`, depth-window bypass in `prefer_inner_replay_corrections`): preserve a slot whose attributing BP was created on a later line than the wrapping retry. Same-line and earlier-line slots fall through to the existing peer-BP and fresher-finding behaviours.
- Per-slot effective-producer-depth rule (SnapGtStart in `merge_flushed`): preserve a slot whose effective producer (max of its own `stack_depth` and any recursive `inner_producer.stack_depth`) strictly exceeds the new BP's depth. The roll-up that hides a deep AQI behind shallower outer chains keeps the deep producer reachable through the chain.

Cluster-A reproducers (`cross_line_chained_fail_swaps_leaf_scope_on_buffered_line`, `cross_line_chained_fail_pushes_target_meta_scope_on_continuation_line`, `cross_line_chained_fail_pushes_target_meta_scope_on_inline_continuation`) lock the rule against regression. Java syntest baseline drops 97 → 89 cols.

Refs: trishume#631
Java drops from 97 to 89 cols in both default and fancy regex baselines after the line-number and effective-depth discriminators land.

Refs: trishume#631
…ner pushes a meta atom outer drops

`prefer_inner_replay_corrections` skipped substitution under `SkippedDeepNonExtension` (inner more than one frame deeper than outer, no `immediately-pop` tail-extension). That left a class of cross-line failures untouched: when the inner reparse genuinely IS the better arbitration and pushes a `meta.*` atom outer's locally-computed parse drops. Java's multi-line qualified field declarations interrupted by `/**/` and EOL comments lose their `meta.path.java` push and flip into `meta.function.identifier.java` for several columns before recovering. The same path-shape works inline.

Add a substitution arm to the `SkippedDeepNonExtension` branch gated on `is_replace_shape`: any inner-side `meta.*` atom not present on the outer side. When it fires, swap outer's local ops for inner's corrected ops. The substitution can over-push a meta atom outer doesn't push; balance with a per-atom comp-pop that inserts a `Pop(1)` after each over-pushed `Push` in the substituted ops. Skip the comp-pop when any over-push atom is already on the running shadow stack (G2 gate) — that signals the inner correction is preserving an existing meta scope rather than introducing a fresh one, so popping it would break the consumer.

Cluster-B reproducers (`cross_line_path_field_type_keeps_meta_path_on_continuation_line`, `inline_path_field_type_keeps_meta_path_when_uninterrupted`, `cross_line_alternative_replacement_substitution_does_not_double_meta_scope`) defend the rule + comp-pop + G2 gate.

Refs: trishume#631
Java drops from 89 to 75 cols in both default and fancy regex baselines.

Refs: trishume#631
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant