Skip to content

Conversation

@cj-zhukov
Copy link
Contributor

@cj-zhukov cj-zhukov commented Dec 17, 2025

#19294)

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the development-process Related to development process of DataFusion label Dec 17, 2025
@cj-zhukov
Copy link
Contributor Author

High-Level Overview

This PR introduces a bash script to ensure all example groups in datafusion-examples have corresponding documentation in the README.

  • Parses group names from ### Group: <group>`` headers.
  • Compares documented groups to the actual folder names.

Adds a CI step examples-docs-check (modeled after config-docs-check) that fails if any example group is missing documentation.

This ensures README stays in sync with example groups.

Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor note: the issue + PR title mention auto generating the README but this PR seems to only check that example (groups) exist in the README; is this intended?

EXAMPLES_DIR="datafusion-examples/examples"
README="datafusion-examples/README.md"

SKIP_LIST=("ffi")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for skipping ffi?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ffi is skipped because it wasn’t part of the recent example consolidation work. It doesn’t follow the new example grouping and execution pattern, and therefore isn’t represented in the README using the new structure. Removing it from the check avoids false failures for a group that isn’t aligned with the documented example format.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we leave a small comment next to this skip list explaining this, for future reference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, let's do it

Comment on lines +39 to +43
# collect folder names
folders=$(find "$EXAMPLES_DIR" -mindepth 1 -maxdepth 1 -type d -exec basename {} \;)

# collect group names from README headers
groups=$(grep "^### Group:" "$README" | sed -E 's/^### Group: `([^`]+)`.*/\1/')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only check at the group granularity? So if we add a new example to an existing group this check can miss that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Yes, this initial check operates at the group granularity, not at the individual example level. The idea was to introduce a lightweight, easy-to-maintain first layer of validation that ensures every group is represented in the README.

Individual examples within a group are expected to follow the existing documentation pattern. This script doesn’t enforce that yet, but it establishes a foundation we can extend in future CI improvements if needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch -- the current PR title isn’t fully aligned with what the PR delivers. This change focuses only on validating that example groups are documented, not on auto-generating the README. If auto-generation is considered valuable, that can certainly be explored as a follow-up PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given these reasons, the PR title should be altered and prefereably we don't close the issue, as it seems theres still some future work to be done

@cj-zhukov
Copy link
Contributor Author

Added comment about skipping ffi examples in check_examples_docs.sh

@cj-zhukov cj-zhukov changed the title Automatically generate examples documentation and add CI sync check (… Add CI check to ensure examples are documented in README Dec 24, 2025
@cj-zhukov
Copy link
Contributor Author

Updated the title of the PR to one that matches the current scope

@Jefffrey Jefffrey added this pull request to the merge queue Dec 27, 2025
Merged via the queue into apache:main with commit ae35177 Dec 27, 2025
30 checks passed
@Jefffrey
Copy link
Contributor

Thanks @cj-zhukov

@cj-zhukov cj-zhukov deleted the cj-zhukov/readme-docs-stays-sync-with-code branch December 28, 2025 06:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

development-process Related to development process of DataFusion

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants