Skip to content

docs: Added starter dev notes on push to hugging face hub#355

Open
nabinchha wants to merge 6 commits intomainfrom
nmulepati/docs/dev-notes-push-to-huggingface-hub
Open

docs: Added starter dev notes on push to hugging face hub#355
nabinchha wants to merge 6 commits intomainfrom
nmulepati/docs/dev-notes-push-to-huggingface-hub

Conversation

@nabinchha
Copy link
Contributor

@nabinchha nabinchha commented Feb 26, 2026

Adds a dev note post to cover push_to_hub feature of Data Designer

@nabinchha nabinchha requested a review from a team as a code owner February 26, 2026 18:20
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 26, 2026

Greptile Summary

This PR adds a new developer notes blog post documenting the push_to_hub feature of Data Designer, along with four supporting images and the corresponding nav entry in mkdocs.yml. The post covers the full push_to_hub API surface — two entry points (results.push_to_hub() and HuggingFaceHubClient.push_to_hub_from_folder()), the upload pipeline order, processor-to-HF-config mapping, auto-generated dataset card generation, auth token resolution, and the reproducibility round-trip via builder_config.json.

Key observations:

  • Previously noted issues have been addressed: the <!-- more --> excerpt marker is now correctly placed after the two-line intro (line 14) with no redundant occurrences, and the dataset card template path at line 229 is the full, correct path (packages/data-designer/src/data_designer/integrations/huggingface/dataset_card_template.md), confirmed against the repository file system.
  • The code example on line 23 uses data_designer as a variable name for a DataDesigner instance, which shares the name of the imported package — a minor readability concern for developers following along.
  • The mkdocs.yml nav entry follows the same explicit-listing pattern used by all other dev notes posts, so there is no structural inconsistency.

Confidence Score: 4/5

  • Documentation-only PR with accurate content; safe to merge with a minor style fix in the code example.
  • All changes are documentation (markdown, images, YAML nav entry). Previously flagged issues with excerpt markers and template path are resolved. The only remaining note is a minor variable naming style concern in the code snippet that doesn't affect correctness.
  • docs/devnotes/posts/push-datasets-to-hugging-face-hub.md — minor variable naming style in the opening code example.

Important Files Changed

Filename Overview
docs/devnotes/posts/push-datasets-to-hugging-face-hub.md New dev note covering push_to_hub feature; placement and dataset card template path are now correct; minor code example style inconsistency noted.
docs/devnotes/.authors.yml Added nmulepati author entry; format is consistent with existing entries.
mkdocs.yml New post added to nav; consistent with the pattern used for all other dev notes posts.
docs/devnotes/posts/images/push-to-hub-hero.png New hero image for the blog post; referenced correctly relative to the post's location.

Sequence Diagram

sequenceDiagram
    participant User
    participant DataDesigner
    participant HuggingFaceHubClient
    participant HuggingFaceHub

    User->>DataDesigner: create(config_builder, num_records)
    DataDesigner-->>User: results

    User->>HuggingFaceHubClient: results.push_to_hub(repo_id, description, tags)
    HuggingFaceHubClient->>HuggingFaceHub: Upload README.md (dataset card)
    HuggingFaceHubClient->>HuggingFaceHub: Upload data/*.parquet (remapped from parquet-files/)
    HuggingFaceHubClient->>HuggingFaceHub: Upload images/* (if image columns exist)
    HuggingFaceHubClient->>HuggingFaceHub: Upload {processor}/* (remapped from processors-files/)
    HuggingFaceHubClient->>HuggingFaceHub: Upload builder_config.json
    HuggingFaceHubClient->>HuggingFaceHub: Upload metadata.json (paths rewritten)
    HuggingFaceHubClient-->>User: dataset URL

    User->>DataDesigner: DataDesignerConfigBuilder.from_config(HF_blob_URL)
    DataDesigner->>HuggingFaceHub: Fetch builder_config.json (blob → raw URL rewrite)
    HuggingFaceHub-->>DataDesigner: builder_config.json
    DataDesigner-->>User: config_builder (fully hydrated)
Loading

Last reviewed commit: a923a2a

dhruvnathawani
dhruvnathawani previously approved these changes Feb 26, 2026
Copy link
Contributor

@dhruvnathawani dhruvnathawani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you use AI for the images?
LGTM

Move the single <\!-- more --> to after the intro paragraph for a shorter
blog teaser and remove the 6 redundant markers throughout the post.
@nabinchha
Copy link
Contributor Author

Did you use AI for the images? LGTM

@dhruvnathawani, yes!

nabinchha and others added 2 commits March 9, 2026 09:45
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
import data_designer.config as dd
from data_designer.interface import DataDesigner

data_designer = DataDesigner()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable name shadows package name

The local variable data_designer (assigned a DataDesigner instance) shares its name with the data_designer package that was just imported on lines 20–21. While Python won't error here (the imports use dd and DataDesigner as their local bindings), readers skimming the snippet may confuse the instance with the module. A less ambiguous name like designer or dd_client would make the example clearer, especially since the second Round-Trip code block (line 272) already uses the inline DataDesigner().create(...) style without assigning to a named variable at all.

Suggested change
data_designer = DataDesigner()
designer = DataDesigner()

Then update line 65 accordingly:

results = designer.create(config_builder, num_records=10_000)
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/devnotes/posts/push-datasets-to-hugging-face-hub.md
Line: 23

Comment:
**Variable name shadows package name**

The local variable `data_designer` (assigned a `DataDesigner` instance) shares its name with the `data_designer` package that was just imported on lines 20–21. While Python won't error here (the imports use `dd` and `DataDesigner` as their local bindings), readers skimming the snippet may confuse the instance with the module. A less ambiguous name like `designer` or `dd_client` would make the example clearer, especially since the second Round-Trip code block (line 272) already uses the inline `DataDesigner().create(...)` style without assigning to a named variable at all.

```suggestion
designer = DataDesigner()
```
Then update line 65 accordingly:
```python
results = designer.create(config_builder, num_records=10_000)
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants