nanoAction

The best VLA that $500 can buy for a $500 robot

This repo will directly extend off of the nanochat repo, and will be a full-stack implementation of a VLA trained on a corpus of robotics data. This script will generate a light-weight VLA that can be used to control, single camera, low DOF robots.

We will make the enhancements to the nanochat repo:

Add a vision encoder to the model
- v1 will use an existing encoder
- v2 will train a custom encoder
Add an action head to the model in the spirit of Pi0.5
- Train on a corpus of robotics data
Finetune the model for a specific model in IsaacSim
- v1 Koch Robot arm with wrist camera
Record a corpus of robotics data from a real robot
- v1 finetune on the corpus on live robot

A major goal of this is to create a basic foundational robotics policy to run on 3d printed open source hardware.

$250 Jetson Orin Nano Super
~$250 3d printed arm - Like the Koch arm or SO-ARM100

We will reference best practices of the LeRobot team and community to make this VLA compatible with the LeRobot ecosystem.

Additional references:

Background Knowledge:

LeRobot Tutorial https://huggingface.co/papers/2510.12403 code pdf
NanoVLA: Routing Decoupled Vision-Language Understanding for Nano-sized Generalist Robotic Policies pdf and code

Step 0: Starting from scratch

Referencing README-NANOCHAT.md and speedrun.sh, we will could from scratch. However we will leverage some of the existing images from the community to make the process faster.

We should be able to pull down a pre-trained model from the nanochat repo and use it as a starting point for our VLA

karpathy/nanochat-d32 and description

Task list

Chat locally with the pre-trained model and confirm behavior is as expected
- PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True && python -m scripts.chat_cli -p "Why is the sky blue?"
Compare architecture of nanochat to existing VLMs like (https://huggingface.co/blog/nanovlm) and VLAs like Pi0.5 and OpenVLA
Review initial VLM extensions of nanochat like make nanochat multimodal for < $10!

Step 1: Working nanochat (with VL)

The chat interface should accept one or more images and a text prompt. The model should generate a response.

Step 2: Train on robotics data

TODO

Step 3: Finetune on robotics data

TODO

Step 4: Record robotics data

TODO

Citation

If you find nanoaction helpful in your research cite simply as:

@misc{nanoaction,
  author = {Sean Kruzel},
  title = {nanoaction: The best VLA that $500 can buy for a $500 robot},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/closedloop-tech/nanoaction}
}

A huge thanks to Andrej Karpathy for the nanochat repo and the nanochat-d32 model.

@misc{nanochat,
  author = {Andrej Karpathy},
  title = {nanochat: The best ChatGPT that $100 can buy},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/karpathy/nanochat}
}

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nanoAction

Step 0: Starting from scratch

Task list

Step 1: Working nanochat (with VL)

Step 2: Train on robotics data

Step 3: Finetune on robotics data

Step 4: Record robotics data

Citation

License

FilesExpand file tree

README-NANOACTION.md

Latest commit

History

README-NANOACTION.md

File metadata and controls

nanoAction

Step 0: Starting from scratch

Task list

Step 1: Working nanochat (with VL)

Step 2: Train on robotics data

Step 3: Finetune on robotics data

Step 4: Record robotics data

Citation

License