GitHub - Huster-YZY/GenieDrive: [CVPR 2026] "GenieDrive: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video Generation"

GenieDrive: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video Generation

Zhenya Yang¹, Zhe Liu^1,†, Yuxiang Lu¹, Liping Hou², Chenxuan Miao¹, Siyi Peng², Bailan Feng², Xiang Bai³, Hengshuang Zhao^1,✉

¹ The University of Hong Kong, ² Huawei Noah's Ark Lab, ³ Huazhong University of Science and Technology
† Project leader, ✉ Corresponding author.

📑 [arXiv], ⚙️ [project page], 🤗 [model weights]

Overview of our GenieDrive

📢 News

[2026/4/29] We have released the code for GenieDrive.
2026/2/21: DrivePI and GenieDrive have been accepted by CVPR 2026!
2025/12/15: We release GenieDrive paper on arXiv. 🔥

2025.12.15: DrivePI paper released! A novel spatial-aware 4D MLLM that serves as a unified Vision-Language-Action (VLA) framework that is also compatible with vision-action (VA) models. 🔥
2025.11.04: Our previous work UniLION has been released. Check out the codebase for unified autonomous driving model with Linear Group RNNs. 🚀
2024.09.26: Our work LION has been accepted by NeurIPS 2024. Visit the codebase for Linear Group RNN for 3D Object Detection. 🚀

📋 TODO List

✅ Release 4D occupancy forecasting code and model weights.
✅ Release multi-view video generator code and weights.

Getting Started

This repository contains a three-stage pipeline for driving scene generation:

occ_gen: generate occupancy (occ)
occ_rasterizer: rasterize the occupancy into semantic maps
occ_render: generate the final videos based on the rendered semantic maps

Environment Setup

Please refer to occ_gen/README.md for occ generation/forecasting. (geniedrive-occ)

Please refer to occ_render/README.md and occ_rasterizer/README.md for occupancy conditioned video generation. (geniedrive-video)

Data Preparation

We have provided preprocessed items that can be used directly for video generation, without the need to run occ_gen or download the full NuScenes dataset. For more details, please refer to occ_render/README.md.

Before running the following steps, please make sure that you have downloaded NuScenes and Occ3D-NuScenes.

To simplify data loading and processing, we recommend creating symbolic links from your NuScenes dataset to the data/ directories under occ_gen, occ_rasterizer, and occ_render.

cd occ_gen
ln -s [Your Nuscenes Path] data
cd occ_rasterizer
ln -s [Your Nuscenes Path] data
cd occ_render
ln -s [Your Nuscenes Path] data

Download pickle files from huggingface:

cd occ_gen
huggingface-cli download --resume-download ANIYA673/GenieDrive --include="*.pkl" --local-dir data

Then your dictory should look like:

.
├── occ_gen/
│   └── data/
├── occ_rasterizer/
│   └── data/
└── occ_render/
    └── data/
          ├── v1.0-trainval/
          ├── gts/                         # Occ3D-nus occupancy labels
          ├── samples/                     # nuScenes keyframes
          ├── sweeps/                      # nuScenes non-keyframes / intermediate frames
          ├── world-nuscenes_infos_train.pkl
          ├── world-nuscenes_infos_val.pkl
          ├── nuscenes_interp_12Hz_infos_train.pkl
          ├── nuscenes_interp_12Hz_infos_val.pkl
          ├── nuscenes_infos_temporal_train_3keyframes.pkl
          └── nuscenes_infos_temporal_val_3keyframes.pkl

Workflow Overview

We provide 2 workflows to generate multi-view driving videos. The only difference is the source of the occupancy. For workflow 1, we generate video based on our model predicted occupancy:

occ_gen -> occ_rasterizer -> occ_render

While workflow 2 utilize the existing occ from Nuscenes occupancy/ Edited occupancy/ Carla occupancy to generate videos:

gt_occ -> occ_rasterizer -> occ_render

Model Inference & Training

For occupancy generation, please refer to occ_gen/README.md.

For occupancy-conditioned video generation, please refer to occ_render/README.md.

📈 Results

Our method achieves a remarkable increase in 4D Occupancy forecasting performance, with a 7.2% increase in mIoU and a 4% increase in IoU. Moreover, our tri-plane VAE compresses occupancy into a latent tri-plane that is only 58% the size used in previous methods, while still maintaining superior reconstruction performance. This compact latent representation also contributes to fast inference (41 FPS) and a minimal parameter count of only 3.47M (including the VAE and prediction module).

Performance of 4D Occupancy Forecasting

We train three driving video generation models that differ only in video length: S (8 frames, ~0.7 s), M (37 frames, ~3 s), and L (81 frames, ~7 s). Through rollout, the L model can further generate long multi-view driving videos of up to 241 frames (~20 s). GenieDrive consistently outperforms previous occupancy-based methods across all metrics, while also enabling much longer video generation.

Performance of Multi-View Video Generation

📝 Citation

@article{yang2025geniedrive,
  author    = {Yang, Zhenya and Liu, Zhe and Lu, Yuxiang and Hou, Liping and Miao, Chenxuan and Peng, Siyi and Feng, Bailan and Bai, Xiang and Zhao, Hengshuang},
  title     = {GenieDrive: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video Generation},
  journal   = {arXiv:2512.12751},
  year      = {2025},
}

Acknowledgements

We thank these great works and open-source repositories: I2-World, UniScene, DynamicCity, MMDectection3D and VideoX-Fun.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
occ_gen		occ_gen
occ_rasterizer		occ_rasterizer
occ_render		occ_render
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GenieDrive: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video Generation

📢 News

📋 TODO List

Getting Started

Environment Setup

Data Preparation

Workflow Overview

Model Inference & Training

📈 Results

📝 Citation

Acknowledgements

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GenieDrive: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video Generation

📢 News

📋 TODO List

Getting Started

Environment Setup

Data Preparation

Workflow Overview

Model Inference & Training

📈 Results

📝 Citation

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages