Skip to content

feat: add a stop position field to the parser#10043

Merged
david-christiansen merged 1 commit intoleanprover:masterfrom
david-christiansen:attempt2
Aug 23, 2025
Merged

feat: add a stop position field to the parser#10043
david-christiansen merged 1 commit intoleanprover:masterfrom
david-christiansen:attempt2

Conversation

@david-christiansen
Copy link
Copy Markdown
Contributor

@david-christiansen david-christiansen commented Aug 22, 2025

This PR allows Lean's parser to run with a final position prior to the end of the string, so it can be invoked on a sub-region of the input.

This has applications in Verso proper, which parses Lean syntax in contexts such as code blocks and docstrings, and it is a prerequisite to parsing the contents of Lean docstrings.

@david-christiansen
Copy link
Copy Markdown
Contributor Author

This is a draft PR for benchmarking purposes. Once everything's good to go, it can be split into two in order to use the normal infrastructure for stage0 updates.

@github-actions github-actions bot added the toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN label Aug 22, 2025
@ghost
Copy link
Copy Markdown

ghost commented Aug 22, 2025

Mathlib CI status (docs):

  • ❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase 890722f571c0f16a5d42c9afcab144aec6e15885 --onto 21f5263f2f28e5fd7b6a99c40f0c11351f04aedb. You can force Mathlib CI using the force-mathlib-ci label. (2025-08-22 05:11:44)

@leanprover-bot
Copy link
Copy Markdown
Collaborator

Reference manual CI status:

  • ❗ Reference manual CI will not be attempted unless your PR branches off the nightly-with-manual branch. Try git rebase 890722f571c0f16a5d42c9afcab144aec6e15885 --onto d3dda9f6d4428a906c096067ecb75e432afc4615. You can force reference manual CI using the force-manual-ci label. (2025-08-22 05:11:45)

@david-christiansen
Copy link
Copy Markdown
Contributor Author

!bench

@david-christiansen david-christiansen added the release-ci Enable all CI checks for a PR, like is done for releases label Aug 22, 2025
@leanprover-bot
Copy link
Copy Markdown
Collaborator

Here are the benchmark results for commit 4ed7b9f.
There were significant changes against commit 890722f:

  Benchmark   Metric         Change
  ============================================
- parser      instructions     2.0% (1989.3 σ)

@david-christiansen
Copy link
Copy Markdown
Contributor Author

!bench

@leanprover-bot
Copy link
Copy Markdown
Collaborator

Here are the benchmark results for commit c6a3eb8.
There were significant changes against commit 890722f:

  Benchmark       Metric          Change
  =================================================
+ bv_decide_mod   task-clock       -1.2%  (-71.9 σ)
+ bv_decide_mod   wall-clock       -1.2%  (-42.5 σ)
+ iterators       branch-misses    -5.0%  (-32.5 σ)
- parser          instructions      2.1% (1073.1 σ)

@david-christiansen
Copy link
Copy Markdown
Contributor Author

david-christiansen commented Aug 22, 2025

That must have been a fluke...

@david-christiansen
Copy link
Copy Markdown
Contributor Author

!bench

@leanprover-bot
Copy link
Copy Markdown
Collaborator

Here are the benchmark results for commit 487eed1.
There were significant changes against commit 890722f:

  Benchmark        Metric             Change
  =====================================================
+ binarytrees.st   maxrss              -7.4%  (-58.9 σ)
- parser           instructions         2.1% (1758.8 σ)
+ riscv-ast.lean   wall-clock          -1.2%  (-54.3 σ)
- stdlib           tactic execution     2.4%   (78.9 σ)
+ unionfind        maxrss              -4.7%  (-39.8 σ)

All parsers now respect the end position field by virtue of using the
appropriate context operations instead of operations on the underlying
string.
@david-christiansen david-christiansen removed the release-ci Enable all CI checks for a PR, like is done for releases label Aug 23, 2025
@david-christiansen david-christiansen added this pull request to the merge queue Aug 23, 2025
Merged via the queue into leanprover:master with commit c9727c2 Aug 23, 2025
26 of 28 checks passed
ejgallego added a commit to ejgallego/verso that referenced this pull request Feb 9, 2026
This is possible after leanprover/lean4#10043
improved upstream parsing API.

This replaces the current use of `parserInputString` and seems clearer overall.

There are two main cases for string literals for us to process:

- string literals coming from Verso: in this case, `.getPos` and
  `.getTailPos` do provide the correct start / end of the string,
  without the quotes.

- string literals coming from Lean: in this case, we must account for
  the quotes and fixup the positions.

We have handled this by convention, but `VersoUtils.parseString` takes
an optional parameter as it is used in both modes. Eventually we'd
like to move Verso-style strings to its own type instead of `StrLit`.

Notes:

- I couldn't port `VersoBlog.leanInit` as `Parser.parseHeader` always
  parses from `pos := 0`. This is the last blocker to completely
  remove `parserInputString`.

- In general, it seems like most functions using `mkParserState`
  upstream could benefit from an update to take positions.

- Tests in `UsersGuide.Markup` had to be adapted due to use of
  `contents.getString.trimAsciiEnd.copy`. I did this to pass CI, must
  implement a better fix before merging.

- Code that depends on MD4Lean hasn't been ported as MD4 doesn't seem
  to provide the right API for us.

- TODO: we should test `canonical := true` and quoting properly. Note
  special cases such as `` `code ``
ejgallego added a commit to ejgallego/verso that referenced this pull request Feb 9, 2026
This is possible after leanprover/lean4#10043
improved upstream parsing API.

This replaces the current use of `parserInputString`, which pads the
input, and seems clearer overall.

There are two main cases for string literals for us to process:

- string literals coming from Verso: in this case, `.getPos` and
  `.getTailPos` do provide the correct start / end of the string,
  without the quotes.

- string literals coming from Lean: in this case, we must account for
  the quotes and fixup the positions.

We have handled this by convention, but `VersoUtils.parseString` takes
an optional parameter as it is used in both modes. Eventually we'd
like to move Verso-style strings to its own type instead of `StrLit`.

Notes:

- I couldn't port `VersoBlog.leanInit` as `Parser.parseHeader` always
  parses from `pos := 0`. This is the last blocker to completely
  remove `parserInputString`.

- In general, it seems like most functions using `mkParserState`
  upstream could benefit from an update to take positions.

- Tests in `UsersGuide.Markup` had to be adapted due to use of
  `contents.getString.trimAsciiEnd.copy`. I did this to pass CI, must
  implement a better fix before merging.

- Code that depends on MD4Lean hasn't been ported as MD4 doesn't seem
  to provide the right API for us.

- TODO: we should test `canonical := true` and quoting properly. Note
  special cases such as `` `code ``
ejgallego added a commit to ejgallego/verso that referenced this pull request Feb 20, 2026
This is possible after leanprover/lean4#10043
improved upstream parsing API.

This replaces the current use of `parserInputString`, which pads the
input, and seems clearer overall.

There are two main cases for string literals for us to process:

- string literals coming from Verso: in this case, `.getPos` and
  `.getTailPos` do provide the correct start / end of the string,
  without the quotes.

- string literals coming from Lean: in this case, we must account for
  the quotes and fixup the positions.

We have handled this by convention, but `VersoUtils.parseString` takes
an optional parameter as it is used in both modes. Eventually we'd
like to move Verso-style strings to its own type instead of `StrLit`.

Notes:

- I couldn't port `VersoBlog.leanInit` as `Parser.parseHeader` always
  parses from `pos := 0`. This is the last blocker to completely
  remove `parserInputString`.

- In general, it seems like most functions using `mkParserState`
  upstream could benefit from an update to take positions.

- Tests in `UsersGuide.Markup` had to be adapted due to use of
  `contents.getString.trimAsciiEnd.copy`. I did this to pass CI, must
  implement a better fix before merging.

- Code that depends on MD4Lean hasn't been ported as MD4 doesn't seem
  to provide the right API for us.

- TODO: we should test `canonical := true` and quoting properly. Note
  special cases such as `` `code ``
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog-other toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants