Skip to content

Conversation

@rmuir
Copy link
Member

@rmuir rmuir commented Jan 29, 2026

Integrates prek as a repository scanner and pre-commit tool. It has some built-in rust checks that run across the repository, and can also integrate external tools.

External tools are managed as pypi dependencies with uv, just explicitly instead of using prek's built-in uv support. You need uv either way.

shellcheck, actionlint, zizmor, ruff, rumdl, and ast-grep tools are initially integrated.
We can migrate other checks from the gradle build here with follow-up commits.

Closes #15626

The CI is reorganized to run this "quick check" before any of the heavy java jobs. Current job in the worst case (cache miss) runs in less than a minute: so it does not slow down feedback to the user, but saves a lot of CI minutes. good task for ubuntu slim runner.

Screen_Shot_2026-01-28_at_19 50 58

To try the functionality out locally:

You need uv package manager
You can then change some code and try out a precommit-run with uvx prek
To do this automatically as git-hook, use uvx prek install.

Locally, the checks are fast, since generally only the files you touched are being passed to the checker. So the pre-commit git hook is actually not annoying. On my 2-core with modifications to IndexWriter.java staged:

$ time prek
Check for large files.....................................................................Passed
Check for files that would conflict in case-insensitive filesystems.......................Passed
Check JSON files......................................................(no files to check)Skipped
Check TOML files......................................................(no files to check)Skipped
Check YAML files......................................................(no files to check)Skipped
Check XML files.......................................................(no files to check)Skipped
Check for merge conflicts.................................................................Passed
Check for private keys....................................................................Passed
Check that (non-binary) executables have a shebang....................(no files to check)Skipped
Check Shell Scripts...................................................(no files to check)Skipped
Check Github Actions..................................................(no files to check)Skipped
Check Github Actions Security.........................................(no files to check)Skipped
Check ast-grep rules..................................................(no files to check)Skipped
Check Sources with ast-grep...............................................................Passed
Check prek hooks match at least one file..............................(no files to check)Skipped
Check prek excludes match include pattern.............................(no files to check)Skipped
Fix trailing whitespace...................................................................Passed
Fix newline at EOF........................................................................Passed
Fix UTF-8 byte order marker...............................................................Passed
Fix line endings..........................................................................Passed
Fix Markdown..........................................................(no files to check)Skipped
Fix Python............................................................(no files to check)Skipped
Fix Python formatting.................................................(no files to check)Skipped

real    0m0.579s
user    0m0.216s
sys     0m0.244

The checker is configured with some autofix support. All of the current Check steps run in parallel, then the Fix steps run sequentially in the order they are defined in the .pre-commit-config.yml file.

Fix solution is always preferred to make it easier on the user, the individual steps will use multiple cores anyway.

Fixes will still "fail" the git commit, but they'll make the changes for you locally. You just need to review and git add the fixes made, then git commit again.

rmuir added 4 commits January 28, 2026 19:38
Integrates prek as a repository scanner. It has some built-in rust
checks that run across the repository, and can also integrate external
tools.

External tools are managed as python dependencies with `uv`, just
explicitly instead of using prek's built-in uv support. You need uv
either way.

shellcheck, actionlint, and zizmor tools are initially integrated.
We can migrate checks from the gradle build here (e.g. ast-grep,
java-based checks) with follow-up commits.

Closes apache#15626
@rmuir
Copy link
Member Author

rmuir commented Jan 29, 2026

I think this one is ok, although I didn't move a huge portion of stuff out of the gradle build yet, since some of it is a bit hairy :)

Basically just added the easy-wins, keeping everything fast (none of the tooling here is actually in python). I think we should try to keep the precommit hook subsecond so that people will want to actually use it.

Copy link
Contributor

@dweiss dweiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh... I like it very much. Especially how it's all consolidated in one config file. I guess Windows users may suffer a bit... but these days you'd go through a PR which runs this for you anyway, so I don't see any problem here.

I'm +1 to add this and replace the hand-coded stuff.

@rmuir
Copy link
Member Author

rmuir commented Jan 29, 2026

It is configured with the default which is a universal lockfile. Check out the uv.lock in the PR diff to see details. Windows should be supported (how well it works, I do not know)

rmuir added 2 commits January 29, 2026 11:44
Add self-checks for the configuration. They fail if hook doesn't apply
to any files in the repository, or if the excludes contain extraneous
files that wouldn't be included anyway. It is helpful for detecting
simple mistakes. For this reason, remove the symlink target check since
we have no symlinks in the repository.
@rmuir rmuir marked this pull request as ready for review January 29, 2026 17:06
@rmuir
Copy link
Member Author

rmuir commented Jan 29, 2026

I plan to followup to address the TODOs in the (long) exclusion lists.

Purpose of adding all these exclusions was to avoid reformatting/fixing tons of the codebase in this PR.

@github-actions github-actions bot added this to the 11.0.0 milestone Jan 30, 2026
@rmuir rmuir requested a review from uschindler January 30, 2026 04:04
@uschindler
Copy link
Contributor

uschindler commented Jan 31, 2026

Ah figured it out but does not work on Windows, so it can't run on Windows.

C:\Users\Uwe Schindler\Projects\lucene\lucene>uv run prek --all-files
error: Failed to spawn: `prek`
  Caused by: program not found

So I assume also the ast-grep "uv run" does not work. I have no idea what all this means, as I have no idea where it looks for the binary.

@uschindler
Copy link
Contributor

Aaah figured out. So it works!

image

You should read documentation.

@rmuir: So to basically run the checks on jenkins just add a job and call uvx prek all-files in checkout folder?

@uschindler
Copy link
Contributor

P.S.: It also survived the "whitespace in username" hardcore check!

@rmuir
Copy link
Member Author

rmuir commented Jan 31, 2026

@rmuir: So to basically run the checks on jenkins just add a job and call uvx prek all-files in checkout folder?

@uschindler yes. If you want to be very pedantic, you can improve it for CI:

Either of these does same thing:

$ uv run --project dev-tools --frozen prek all-files
$ UV_PROJECT=dev-tools UV_FROZEN=1 uv run prek all-files

This will definitely ensure it uses exact version of prek in our lockfile and is good for CI. If you look at the github workflow in here and the precommit configuration, you can see everything is done that way internally.

edited: add "frozen" argument for more safety.

@rmuir
Copy link
Member Author

rmuir commented Jan 31, 2026

P.S.: It also survived the "whitespace in username" hardcore check!

nice!, question is, does it work fast on windows?

@uschindler
Copy link
Contributor

P.S.: It also survived the "whitespace in username" hardcore check!

nice!, question is, does it work fast on windows?

The full checks take a few seconds (feels like 8 seconds on my device but there's terminal animations). The preconmit is milliseconds.

Can't measure, not sure how to measure time on windows.

@rmuir
Copy link
Member Author

rmuir commented Jan 31, 2026

Yeah the full checks across all files are several seconds on my machine too. It is really for CI purposes only.

Glad to hear precommit works fast if you only change a few files, that is the whole purpose :)

rmuir and others added 2 commits January 31, 2026 14:30
There's no desire to update the uv.lock in source code control when
executing these commands: it can only trigger noise and confusion.

This can happen if you have some custom uv config on your machine which
would resolve packages differently (e.g. UV_DEFAULT_REGISTRY=)
@rmuir
Copy link
Member Author

rmuir commented Jan 31, 2026

I tested this on MacOS just now as well.

@rmuir
Copy link
Member Author

rmuir commented Jan 31, 2026

@uschindler I updated those suggested CI commands (it matches what the github workflow does) with the extra safety.

rmuir added 3 commits January 31, 2026 14:57
This delivers a clear error message if the user happens to have
a too-old prek, rather than something confusing.

We require the 0.2.29 which supports includes/excludes as yaml glob
lists rather than the previous regexp approach, this is just a massive
improvement to the maintainers of these files:

https://github.com/j178/prek/releases/tag/v0.2.29
Improves the output from the github workflow if something had to be
reformatted or fixed.
@rmuir rmuir merged commit 68e90db into apache:main Jan 31, 2026
13 checks passed
rmuir added a commit to rmuir/prek that referenced this pull request Jan 31, 2026
@uschindler
Copy link
Contributor

uschindler commented Jan 31, 2026

I will add the check-all-files command to the main branch Jenkins job on Policeman's instance.
Of course I need to install uv on the nodes. About ASF Jenkins: Not sure if the ASF Jenkins has it available.

priority: 0
language: system
entry: uv
args: [ 'run', 'ast-grep', '-c', './gradle/validation/ast-grep/sgconfig.yml', 'scan', '--format', 'github' ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we no longer run ast-grep from withing Gradle we should move the config files to the dev tools folder.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uschindler I agree. I am working followups for a lot of this. I was trying to avoid reformatting / messing with entire codebase in one big PR.

priority: 0
language: system
entry: uv
args: [ 'run', 'ast-grep', '-c', './gradle/validation/ast-grep/sgconfig.yml', 'test', '--skip-snapshot-tests' ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@rmuir
Copy link
Member Author

rmuir commented Jan 31, 2026

Of course I need to install uv on the nodes. About ASF Jenkins: Not sure if the ASF Jenkins has it available.

See the install link. There are many ways, including pip install to do it.

@uschindler
Copy link
Contributor

Of course I need to install uv on the nodes. About ASF Jenkins: Not sure if the ASF Jenkins has it available.

See the install link. There are many ways, including pip install to do it.

That's known. On ASF Jenkins we should check if it's available by default like Java.

@uschindler
Copy link
Contributor

I added the prek checks to the Policeman Jenkins Windows, MacOS and Linux builds on main branch (only the default test, not the mmap ones).

Sorry for the failed builds, I had to fight against "uv" not on path, because Jenkins opens no login shell on POSIX.

image

Windows:

image

(UV was of course installed before for the user who executes Jenkins; at moment it does not get updated automatically. We may add a simple cronjob Jenkins that calls uv self update. I will keep an eye on it.

@uschindler
Copy link
Contributor

The PREK execution is now executed at end of build, temporarily for testing it was at beginning. I think this also makes sure the repo has no uncommitted changes.

finnroblin pushed a commit to finnroblin/lucene that referenced this pull request Feb 2, 2026
Integrates prek as a repository scanner and pre-commit tool. It has some built-in 
rust checks that run across the repository, and can also integrate external tools.

External tools are managed as pypi dependencies with uv, just explicitly instead 
of using prek's built-in uv support. You need uv either way.

shellcheck, actionlint, zizmor, ruff, rumdl, and ast-grep tools are initially integrated.
We can migrate other checks from the gradle build here with follow-up commits.

The CI is reorganized to run this "quick check" before any of the heavy java jobs. 
Current job in the worst case (cache miss) runs in less than a minute: so it does 
not slow down feedback to the user, but saves a lot of CI minutes. good task for 
ubuntu slim runner.

To try the functionality out locally:

You need uv package manager: https://docs.astral.sh/uv/getting-started/installation/
You can then change some code and try out a precommit-run with `uvx prek`
To do this automatically as git-hook, use `uvx prek install`.

Locally, the checks are fast, since generally only the files you touched are being
passed to the checker. So the pre-commit git hook is actually not annoying (< 1s).

The checker is configured with some autofix support. All of the current "Check" 
steps run in parallel, then the "Fix" steps run sequentially in the order they are 
defined in the `.pre-commit-config.yml` file. 

`Fix` solution is always preferred to make it easier on the user, the individual
steps will use multiple cores anyway.

Fixes will still "fail" the `git commit`, but they'll make the changes for you locally. 
You just need to review and `git add` the fixes made, then `git commit` again.

Closes apache#15626
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Switch some of the precommit/ validation tasks to pre-commit

3 participants