feat: add a stop position field to the parser#10043
Merged
david-christiansen merged 1 commit intoleanprover:masterfrom Aug 23, 2025
Merged
feat: add a stop position field to the parser#10043david-christiansen merged 1 commit intoleanprover:masterfrom
david-christiansen merged 1 commit intoleanprover:masterfrom
Conversation
Contributor
Author
|
This is a draft PR for benchmarking purposes. Once everything's good to go, it can be split into two in order to use the normal infrastructure for stage0 updates. |
|
Mathlib CI status (docs):
|
Collaborator
|
Reference manual CI status:
|
Contributor
Author
|
!bench |
Collaborator
|
Here are the benchmark results for commit 4ed7b9f. Benchmark Metric Change
============================================
- parser instructions 2.0% (1989.3 σ) |
Contributor
Author
|
!bench |
Collaborator
|
Here are the benchmark results for commit c6a3eb8. Benchmark Metric Change
=================================================
+ bv_decide_mod task-clock -1.2% (-71.9 σ)
+ bv_decide_mod wall-clock -1.2% (-42.5 σ)
+ iterators branch-misses -5.0% (-32.5 σ)
- parser instructions 2.1% (1073.1 σ) |
Contributor
Author
|
That must have been a fluke... |
Contributor
Author
|
!bench |
Collaborator
|
Here are the benchmark results for commit 487eed1. Benchmark Metric Change
=====================================================
+ binarytrees.st maxrss -7.4% (-58.9 σ)
- parser instructions 2.1% (1758.8 σ)
+ riscv-ast.lean wall-clock -1.2% (-54.3 σ)
- stdlib tactic execution 2.4% (78.9 σ)
+ unionfind maxrss -4.7% (-39.8 σ) |
487eed1 to
7abffad
Compare
Kha
reviewed
Aug 22, 2025
7abffad to
6923fb6
Compare
All parsers now respect the end position field by virtue of using the appropriate context operations instead of operations on the underlying string.
6923fb6 to
8e2760b
Compare
ejgallego
added a commit
to ejgallego/verso
that referenced
this pull request
Feb 9, 2026
This is possible after leanprover/lean4#10043 improved upstream parsing API. This replaces the current use of `parserInputString` and seems clearer overall. There are two main cases for string literals for us to process: - string literals coming from Verso: in this case, `.getPos` and `.getTailPos` do provide the correct start / end of the string, without the quotes. - string literals coming from Lean: in this case, we must account for the quotes and fixup the positions. We have handled this by convention, but `VersoUtils.parseString` takes an optional parameter as it is used in both modes. Eventually we'd like to move Verso-style strings to its own type instead of `StrLit`. Notes: - I couldn't port `VersoBlog.leanInit` as `Parser.parseHeader` always parses from `pos := 0`. This is the last blocker to completely remove `parserInputString`. - In general, it seems like most functions using `mkParserState` upstream could benefit from an update to take positions. - Tests in `UsersGuide.Markup` had to be adapted due to use of `contents.getString.trimAsciiEnd.copy`. I did this to pass CI, must implement a better fix before merging. - Code that depends on MD4Lean hasn't been ported as MD4 doesn't seem to provide the right API for us. - TODO: we should test `canonical := true` and quoting properly. Note special cases such as `` `code ``
ejgallego
added a commit
to ejgallego/verso
that referenced
this pull request
Feb 9, 2026
This is possible after leanprover/lean4#10043 improved upstream parsing API. This replaces the current use of `parserInputString`, which pads the input, and seems clearer overall. There are two main cases for string literals for us to process: - string literals coming from Verso: in this case, `.getPos` and `.getTailPos` do provide the correct start / end of the string, without the quotes. - string literals coming from Lean: in this case, we must account for the quotes and fixup the positions. We have handled this by convention, but `VersoUtils.parseString` takes an optional parameter as it is used in both modes. Eventually we'd like to move Verso-style strings to its own type instead of `StrLit`. Notes: - I couldn't port `VersoBlog.leanInit` as `Parser.parseHeader` always parses from `pos := 0`. This is the last blocker to completely remove `parserInputString`. - In general, it seems like most functions using `mkParserState` upstream could benefit from an update to take positions. - Tests in `UsersGuide.Markup` had to be adapted due to use of `contents.getString.trimAsciiEnd.copy`. I did this to pass CI, must implement a better fix before merging. - Code that depends on MD4Lean hasn't been ported as MD4 doesn't seem to provide the right API for us. - TODO: we should test `canonical := true` and quoting properly. Note special cases such as `` `code ``
ejgallego
added a commit
to ejgallego/verso
that referenced
this pull request
Feb 20, 2026
This is possible after leanprover/lean4#10043 improved upstream parsing API. This replaces the current use of `parserInputString`, which pads the input, and seems clearer overall. There are two main cases for string literals for us to process: - string literals coming from Verso: in this case, `.getPos` and `.getTailPos` do provide the correct start / end of the string, without the quotes. - string literals coming from Lean: in this case, we must account for the quotes and fixup the positions. We have handled this by convention, but `VersoUtils.parseString` takes an optional parameter as it is used in both modes. Eventually we'd like to move Verso-style strings to its own type instead of `StrLit`. Notes: - I couldn't port `VersoBlog.leanInit` as `Parser.parseHeader` always parses from `pos := 0`. This is the last blocker to completely remove `parserInputString`. - In general, it seems like most functions using `mkParserState` upstream could benefit from an update to take positions. - Tests in `UsersGuide.Markup` had to be adapted due to use of `contents.getString.trimAsciiEnd.copy`. I did this to pass CI, must implement a better fix before merging. - Code that depends on MD4Lean hasn't been ported as MD4 doesn't seem to provide the right API for us. - TODO: we should test `canonical := true` and quoting properly. Note special cases such as `` `code ``
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR allows Lean's parser to run with a final position prior to the end of the string, so it can be invoked on a sub-region of the input.
This has applications in Verso proper, which parses Lean syntax in contexts such as code blocks and docstrings, and it is a prerequisite to parsing the contents of Lean docstrings.