-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Automatically generate examples documentation adv (#19294) #19750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Automatically generate examples documentation adv (#19294) #19750
Conversation
High-Level OverviewIn previous PRs, a basic script was introduced that only parsed example group names #19371 and #19491 This PR implements a more comprehensive solution for keeping the examples README up-to-date:
This ensures that the README for examples stays in sync with the actual examples and encourages maintainers to keep examples.toml accurate. |
|
@Jefffrey since you helped with previous PRs related to example documentation, it would be great if you could take a look at this one as well. Your feedback on the CI check approach or any improvements would be much appreciated. |
Jefffrey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late review. I haven't looked too in-depth but I have some concerns with this approach:
- A lot of bash scripting; I know we have other scripts like this, but I wonder if this amount of logic is better served as Python or even Rust scripts
- Some of the Bash scripting behaviour is not ideal too, like how it doesn't capitalze
SQLandIOproperly
- Some of the Bash scripting behaviour is not ideal too, like how it doesn't capitalze
- Needing to have a separate metadata toml file doesn't seem ideal, but I can't think of another way to handle this other than manually parsing the Rust code of the examples (which is worse)
- We managed to use a Rust based approach for function docs because we used a Rust binary to access the functions which are exposed as a lib; that won't work for examples 🤔
I don't have too many ideas myself unfortunately, would love if anyone else can chime in 😅
|
@Jefffrey Thanks for the review and for raising these points - they’re all fair concerns. I agree this is pushing Bash beyond simple glue logic. I started there to avoid introducing new dependencies and to stay consistent with existing CI scripts, but if the overall approach makes sense, I’d be happy to follow up with a Rust or Python implementation to improve maintainability. Regarding The capitalization issues you noticed (e.g. SQL / IO) are artifacts of the current heuristics and can be fixed - either via normalization or explicit metadata. I’m definitely open to alternative ideas here and would love more input if others have suggestions. |
|
Based on this discussion, I’m working on a new Rust-based implementation to replace Once this new implementation is ready, I’ll share it here so we can review the updated approach and decide whether it’s a better fit. |
|
I replaced the shell-based examples documentation generator with a Rust implementation.
Design change:
Now:
Personally, I find this approach cleaner and more “Rust-native” than maintaining a complex shell script and a separate TOML file. A follow-up PR could switch this to a dedicated parsing crate (e.g. |
Jefffrey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer this Rust solution 👍
I think we can merge this PR and do any improvements in further PRs 🚀
| continue | ||
| fi | ||
| echo "▶ Formatting generated README with Prettier…" | ||
| npx [email protected] \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something to look at in a followup issue is unifying our prettier versions somehow 🤔
datafusion/dev/update_config_docs.sh
Lines 241 to 242 in 8023947
| echo "Running prettier" | |
| npx [email protected] --write "$TARGET_FILE" |
datafusion/dev/update_function_docs.sh
Lines 116 to 117 in 4d63f8c
| echo "Running prettier" | |
| npx [email protected] --write "$TARGET_FILE" |
datafusion/ci/scripts/doc_prettier_check.sh
Lines 23 to 27 in 4d63f8c
| SCRIPT_NAME="$(basename "${BASH_SOURCE[0]}")" | |
| PRETTIER_VERSION="2.7.1" | |
| PRETTIER_TARGETS=( | |
| '{datafusion,datafusion-cli,datafusion-examples,dev,docs}/**/*.md' | |
| '!datafusion/CHANGELOG.md' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you, it's a good point
datafusion-examples/README.md
Outdated
| | file_stream_provider | [`custom_data_source/file_stream_provider.rs`](examples/custom_data_source/file_stream_provider.rs) | Read/write via FileStreamProvider for streams | | ||
|
|
||
| ## Data IO Examples | ||
| ## Data Io Examples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice if we could fix these capitalization cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to work on this
|
@Jefffrey Thanks for the review! I’ve fixed the capitalization issues for group titles (e.g. Data IO, SQL Ops, UDF, etc.) by introducing explicit handling for common abbreviations instead of naive Title Case. For now, I kept the existing lightweight parser for extracting metadata from Happy to follow whatever direction makes the most sense here. |
I'm not overly concerned about dependency footprint for this; we can always feature gate it so people running datafusion examples won't have it by default. My only concern is complexity in the code in the DataFusion repo. If we pull in |
|
@Jefffrey Thanks, that makes sense. I’ll keep this PR focused on improving the current implementation and avoid adding extra complexity here. For now, I think it’s best to keep the existing lightweight parser as-is and only make targeted improvements where needed. As follow-ups, I’m planning:
Happy to adjust direction if you’d prefer a different approach. |
|
Should be good to merge once conflicts are resolved @cj-zhukov |
Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?