CPU package support with auto detection #17

lukestanley · 2025-10-14T01:03:53Z

Updates speedrun.sh to select Torch index based on NVIDIA GPU check
The Torch CPU package index is used if no NVIDIA GPU is indicated by nvidia-smi check.
Device conditional type changes
Added a link on README to the very useful DeepWiki page for the NanoChat repo.

$ python -m scripts.base_train --depth=2 --device_batch_size=1 --total_batch_size=2048 --num_iterations=1 --eval_every=1 --eval_tokens=2048 --core_metric_every=100000 --sample_every=100000

                                                   █████                 █████
                                                  ░░███                 ░░███
 ████████    ██████   ████████    ██████   ██████  ░███████    ██████   ███████
░░███░░███  ░░░░░███ ░░███░░███  ███░░███ ███░░███ ░███░░███  ░░░░░███ ░░░███░
 ░███ ░███   ███████  ░███ ░███ ░███ ░███░███ ░░░  ░███ ░███   ███████   ░███
 ░███ ░███  ███░░███  ░███ ░███ ░███ ░███░███  ███ ░███ ░███  ███░░███   ░███ ███
 ████ █████░░████████ ████ █████░░██████ ░░██████  ████ █████░░████████  ░░█████
░░░░ ░░░░░  ░░░░░░░░ ░░░░ ░░░░░  ░░░░░░   ░░░░░░  ░░░░ ░░░░░  ░░░░░░░░    ░░░░░

Overriding: depth = 2
Overriding: device_batch_size = 1
Overriding: total_batch_size = 2048
Overriding: num_iterations = 1
Overriding: eval_every = 1
Overriding: eval_tokens = 2048
Overriding: core_metric_every = 100000
Overriding: sample_every = 100000
2025-10-14 00:20:24,167 - nanochat.common - INFO - Distributed world size: 1
/workspaces/nanochat/.venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py:283: UserWarning: In CPU autocast, but the target dtype is not supported. Disabling autocast.
CPU Autocast only supports dtype of torch.bfloat16, torch.float16 currently.
  warnings.warn(error_message)
Vocab size: 34,301
num_layers: 2
model_dim: 128
num_heads: 1
num_kv_heads: 1
Tokens / micro-batch / rank: 1 x 2048 = 2,048
Tokens / micro-batch: 2,048
Total batch size 2,048 => gradient accumulation steps: 1
Number of parameters: 9,174,272
Estimated FLOPs per token: 3.499392e+07
Using user-provided number of iterations: 1
Total number of training tokens: 2,048
Tokens : Params ratio: 0.00
Total training FLOPs estimate: 7.166755e+10
Scaling the LR for the AdamW parameters ∝1/√(128/768) = 2.449490
Step 00000 | Validation bpb: 3.2639
step 00000/00001 (0.00%) | loss: 10.442895 | lrm: 1.00 | dt: 2396.07ms | tok/sec: 854 | mfu: N/A | total time: 0.00m
Step 00001 | Validation bpb: 3.2430
Evaluating: hellaswag_zeroshot (0-shot, type: multiple_choice)... accuracy: 0.2480 | centered: -0.0027 | time: 56.06s
Evaluating: jeopardy (10-shot, type: language_modeling)... accuracy: 0.0000 | centered: 0.0000 | time: 53.14s
Evaluating: bigbench_qa_wikidata (10-shot, type: language_modeling)... accuracy: 0.0000 | centered: 0.0000 | time: 19.92s
Evaluating: arc_easy (10-shot, type: multiple_choice)... accuracy: 0.2660 | centered: 0.0213 | time: 200.33s
Evaluating: arc_challenge (10-shot, type: multiple_choice)... accuracy: 0.2220 | centered: -0.0373 | time: 218.14s
Evaluating: copa (0-shot, type: multiple_choice)... accuracy: 0.4400 | centered: -0.1200 | time: 1.25s
Evaluating: commonsense_qa (10-shot, type: multiple_choice)... accuracy: 0.2120 | centered: 0.0150 | time: 253.94s
Evaluating: piqa (10-shot, type: multiple_choice)... accuracy: 0.4940 | centered: -0.0120 | time: 110.09s
Evaluating: openbook_qa (0-shot, type: multiple_choice)... accuracy: 0.2260 | centered: -0.0320 | time: 10.72s
Evaluating: lambada_openai (0-shot, type: language_modeling)... accuracy: 0.0000 | centered: 0.0000 | time: 11.26s
Evaluating: hellaswag (10-shot, type: multiple_choice)... accuracy: 0.2480 | centered: -0.0027 | time: 459.59s
Evaluating: winograd (0-shot, type: schema)... accuracy: 0.4945 | centered: -0.0110 | time: 4.01s
Evaluating: winogrande (0-shot, type: schema)... accuracy: 0.5260 | centered: 0.0520 | time: 7.76s
Evaluating: bigbench_dyck_languages (10-shot, type: language_modeling)... accuracy: 0.0000 | centered: 0.0000 | time: 83.10s
Evaluating: agi_eval_lsat_ar (3-shot, type: multiple_choice)... accuracy: 0.2261 | centered: 0.0326 | time: 201.11s
Evaluating: bigbench_cs_algorithms (10-shot, type: language_modeling)... accuracy: 0.0000 | centered: 0.0000 | time: 69.72s
Evaluating: bigbench_operators (10-shot, type: language_modeling)... accuracy: 0.0000 | centered: 0.0000 | time: 22.28s
Evaluating: bigbench_repeat_copy_logic (10-shot, type: language_modeling)... accuracy: 0.0000 | centered: 0.0000 | time: 4.13s
Evaluating: squad (10-shot, type: language_modeling)... accuracy: 0.0000 | centered: 0.0000 | time: 276.78s
Evaluating: coqa (0-shot, type: language_modeling)... accuracy: 0.0000 | centered: 0.0000 | time: 71.01s
Evaluating: boolq (10-shot, type: multiple_choice)... accuracy: 0.3720 | centered: -0.6526 | time: 423.89s
Evaluating: bigbench_language_identification (10-shot, type: multiple_choice)... accuracy: 0.2380 | centered: 0.1617 | time: 844.53s
Step 00001 | CORE metric: -0.0267
W1014 01:17:32.378000 59893 .venv/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py:1016] [0/8] torch._dynamo hit config.recompile_limit (8)
W1014 01:17:32.378000 59893 .venv/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py:1016] [0/8]    function: 'forward' (/workspaces/nanochat/nanochat/gpt.py:261)
W1014 01:17:32.378000 59893 .venv/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py:1016] [0/8]    last reason: 0/7: kv_cache.pos == 10                                     
W1014 01:17:32.378000 59893 .venv/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py:1016] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
W1014 01:17:32.378000 59893 .venv/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py:1016] [0/8] To diagnose recompilation issues, see https://pytorch.org/docs/main/torch.compiler_troubleshooting.html.
<|bos|>The capital of France is seen-mile than a seamless movement total traded shaped local s—’ moment,” recalls former
<|bos|>The chemical symbol of gold is seen-mile than a seamless movement total traded shaped local s—’ moment,” recalls former
<|bos|>If yesterday was Friday, then tomorrow will be subject new journeys tend to grow an important consideration when required their rivals taking only needs
<|bos|>The opposite of hot is seen-mile than a seamless movement total traded shaped local s—’ moment,” recalls former
<|bos|>The planets of the solar system are: at each end move via this article shall prevent km) access year them a seamless
<|bos|>My favorite color is seen-mile than a seamless movement total traded shaped local s—’ moment,” recalls former
<|bos|>If 5*x + 3 = 13, then x is seen-mile than a seamless movement total traded shaped local s—’ moment,” recalls former
2025-10-14 01:17:32,671 - nanochat.checkpoint_manager - INFO - Saved model file to: /home/codespace/.cache/nanochat/base_checkpoints/d2/model_000001.pt
2025-10-14 01:17:32,742 - nanochat.checkpoint_manager - INFO - Saved optimizer file to: /home/codespace/.cache/nanochat/base_checkpoints/d2/optim_000001.pt
2025-10-14 01:17:32,742 - nanochat.checkpoint_manager - INFO - Saved metadata file to: /home/codespace/.cache/nanochat/base_checkpoints/d2/meta_000001.json
Peak memory usage: N/A (CPU run)
Total training time: 0.00m
Minimum validation bpb: 3.2430
$

Well, I better sleep now it has finished!

Thanks for this great end-to-end project @karpathy

Makes torch index an extra in pyproject.toml, speedrun.sh selects GPU index if supported, with CPU fallback

coded with help of Devin DeepWiki planner, GPT-5 Codex execution with VS Code agent mode. https://deepwiki.com/search/suggest-how-to-modify-the-code_80cebbfc-0ad0-4b92-addd-2b4210fa9f04

karpathy · 2025-10-14T01:19:44Z

I think... you're going to wait a long time. :D

lukestanley · 2025-10-14T01:28:53Z

Haha, the little test finished!
My favorite color is seen-mile than a seamless movement total traded shaped local s—’ moment,” recalls former
Thanks again for the great little end-to-end magic!
@karpathy

This change incorporates the changes from pull requests #17 and #21 to add support for CPU-only and macOS environments. It introduces dynamic detection of hardware and data types, and updates the dependency installation process to select the appropriate PyTorch build.

kbastani · 2025-10-15T16:49:56Z

Nice work!

svlandeg

Maintenance update: the edits in this PR seem to be mostly covered by #88, so I suggest to close this one.

lukestanley added 3 commits October 13, 2025 22:34

Docs: Link directly to DeepWiki URL for repo

8aca987

Add Torch CPU package index support

e27d2da

Makes torch index an extra in pyproject.toml, speedrun.sh selects GPU index if supported, with CPU fallback

Support training with only CPU,

de6eac2

coded with help of Devin DeepWiki planner, GPT-5 Codex execution with VS Code agent mode. https://deepwiki.com/search/suggest-how-to-modify-the-code_80cebbfc-0ad0-4b92-addd-2b4210fa9f04

svlandeg added the scripts label Oct 29, 2025

svlandeg reviewed Nov 14, 2025

View reviewed changes

svlandeg closed this Nov 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CPU package support with auto detection #17

CPU package support with auto detection #17

Uh oh!

lukestanley commented Oct 14, 2025 •

edited

Loading

Uh oh!

karpathy commented Oct 14, 2025

Uh oh!

lukestanley commented Oct 14, 2025

Uh oh!

kbastani commented Oct 15, 2025

Uh oh!

svlandeg left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CPU package support with auto detection #17

CPU package support with auto detection #17

Uh oh!

Conversation

lukestanley commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

karpathy commented Oct 14, 2025

Uh oh!

lukestanley commented Oct 14, 2025

Uh oh!

kbastani commented Oct 15, 2025

Uh oh!

svlandeg left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lukestanley commented Oct 14, 2025 •

edited

Loading

svlandeg left a comment •

edited

Loading