Skip to content

docs(pretrain): add TinyStories pretraining section#2236

Open
nuthalapativarun wants to merge 2 commits into
Lightning-AI:mainfrom
nuthalapativarun:docs/tinystories-pretrain-tutorial
Open

docs(pretrain): add TinyStories pretraining section#2236
nuthalapativarun wants to merge 2 commits into
Lightning-AI:mainfrom
nuthalapativarun:docs/tinystories-pretrain-tutorial

Conversation

@nuthalapativarun
Copy link
Copy Markdown

What does this PR do?

Adds a dedicated Pretrain on TinyStories section to tutorials/pretrain.md, as proposed in #1082.

Changes

  • New ## Pretrain on TinyStories section in tutorials/pretrain.md that explains:
    • What TinyStories is and why it's useful for quick experiments
    • Step-by-step instructions using the existing debug.yaml config (the fastest path)
    • An alternative manual CLI invocation for users who want to configure parameters directly
    • A tip pointing to litgpt pretrain --data.help TinyStories for all available dataset options
  • Fix typo: scenarioesscenarios in the continued-pretraining section

The TinyStories data module and debug.yaml config already exist — this PR is purely documentation.

Closes #1082

Add a dedicated 'Pretrain on TinyStories' section to tutorials/pretrain.md
explaining how to use the TinyStories data module for quick end-to-end
pretraining experiments. Covers both the debug.yaml config path and
manual CLI configuration.

Also fix a typo: 'scenarioes' -> 'scenarios'.

Closes Lightning-AI#1082
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
4 pipeline(s) require an authorized user to comment /azp run to run.

@nuthalapativarun
Copy link
Copy Markdown
Author

Hi! Just checking in — CI appears to be waiting on an authorized /azp run trigger. Happy to make any changes needed to move this forward. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add TinyStories to the pretraining docs

1 participant