docs: Added starter dev notes on push to hugging face hub#355
docs: Added starter dev notes on push to hugging face hub#355
Conversation
Greptile SummaryThis PR adds a new developer notes blog post documenting the Key observations:
|
| Filename | Overview |
|---|---|
| docs/devnotes/posts/push-datasets-to-hugging-face-hub.md | New dev note covering push_to_hub feature; placement and dataset card template path are now correct; minor code example style inconsistency noted. |
| docs/devnotes/.authors.yml | Added nmulepati author entry; format is consistent with existing entries. |
| mkdocs.yml | New post added to nav; consistent with the pattern used for all other dev notes posts. |
| docs/devnotes/posts/images/push-to-hub-hero.png | New hero image for the blog post; referenced correctly relative to the post's location. |
Sequence Diagram
sequenceDiagram
participant User
participant DataDesigner
participant HuggingFaceHubClient
participant HuggingFaceHub
User->>DataDesigner: create(config_builder, num_records)
DataDesigner-->>User: results
User->>HuggingFaceHubClient: results.push_to_hub(repo_id, description, tags)
HuggingFaceHubClient->>HuggingFaceHub: Upload README.md (dataset card)
HuggingFaceHubClient->>HuggingFaceHub: Upload data/*.parquet (remapped from parquet-files/)
HuggingFaceHubClient->>HuggingFaceHub: Upload images/* (if image columns exist)
HuggingFaceHubClient->>HuggingFaceHub: Upload {processor}/* (remapped from processors-files/)
HuggingFaceHubClient->>HuggingFaceHub: Upload builder_config.json
HuggingFaceHubClient->>HuggingFaceHub: Upload metadata.json (paths rewritten)
HuggingFaceHubClient-->>User: dataset URL
User->>DataDesigner: DataDesignerConfigBuilder.from_config(HF_blob_URL)
DataDesigner->>HuggingFaceHub: Fetch builder_config.json (blob → raw URL rewrite)
HuggingFaceHub-->>DataDesigner: builder_config.json
DataDesigner-->>User: config_builder (fully hydrated)
Last reviewed commit: a923a2a
dhruvnathawani
left a comment
There was a problem hiding this comment.
Did you use AI for the images?
LGTM
Move the single <\!-- more --> to after the intro paragraph for a shorter blog teaser and remove the 6 redundant markers throughout the post.
@dhruvnathawani, yes! |
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
| import data_designer.config as dd | ||
| from data_designer.interface import DataDesigner | ||
|
|
||
| data_designer = DataDesigner() |
There was a problem hiding this comment.
Variable name shadows package name
The local variable data_designer (assigned a DataDesigner instance) shares its name with the data_designer package that was just imported on lines 20–21. While Python won't error here (the imports use dd and DataDesigner as their local bindings), readers skimming the snippet may confuse the instance with the module. A less ambiguous name like designer or dd_client would make the example clearer, especially since the second Round-Trip code block (line 272) already uses the inline DataDesigner().create(...) style without assigning to a named variable at all.
| data_designer = DataDesigner() | |
| designer = DataDesigner() |
Then update line 65 accordingly:
results = designer.create(config_builder, num_records=10_000)Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/devnotes/posts/push-datasets-to-hugging-face-hub.md
Line: 23
Comment:
**Variable name shadows package name**
The local variable `data_designer` (assigned a `DataDesigner` instance) shares its name with the `data_designer` package that was just imported on lines 20–21. While Python won't error here (the imports use `dd` and `DataDesigner` as their local bindings), readers skimming the snippet may confuse the instance with the module. A less ambiguous name like `designer` or `dd_client` would make the example clearer, especially since the second Round-Trip code block (line 272) already uses the inline `DataDesigner().create(...)` style without assigning to a named variable at all.
```suggestion
designer = DataDesigner()
```
Then update line 65 accordingly:
```python
results = designer.create(config_builder, num_records=10_000)
```
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Adds a dev note post to cover
push_to_hubfeature of Data Designer