Using PyTorch 2.9.1 and CUDA 12.8
MEGAPAK uses the same base mechanism as the slim images. The key differences are:
-
Includes 40+ custom nodes. See the full list.
-
Includes CUDA development kit for compiling PyTorch C++ extensions,
.cufiles, etc. -
Includes performance optimization libraries such as Nunchaku and SageAttention (powerful but may have compatibility issues).
-
Includes additional tools and dependencies.
What’s special about this cu128-megapak-pt29 image:
-
Pinned to PyTorch 2.9.1 and CUDA 12.8.
-
Pre-installed:
-
SageAttention 2.2.0
-
SpargeAttention
-
FlashAttention 2.8.3
-
Nunchaku
-
-
With:
-
Python 3.12
-
GCC 11
-
glibc 2.38 (from openSUSE Leap 15.6)
-
Please successfully run the slim image before attempting the megapak image. The prerequisites/setup sections are omitted from this document.
mkdir -p \
storage \
storage-models/models \
storage-models/hf-hub \
storage-models/torch-hub \
storage-user/input \
storage-user/output \
storage-user/workflows
docker run -it --rm \
--name comfyui-megapak \
--runtime nvidia \
--gpus all \
-p 8188:8188 \
-v "$(pwd)"/storage:/root \
-v "$(pwd)"/storage-models/models:/root/ComfyUI/models \
-v "$(pwd)"/storage-models/hf-hub:/root/.cache/huggingface/hub \
-v "$(pwd)"/storage-models/torch-hub:/root/.cache/torch/hub \
-v "$(pwd)"/storage-user/input:/root/ComfyUI/input \
-v "$(pwd)"/storage-user/output:/root/ComfyUI/output \
-v "$(pwd)"/storage-user/workflows:/root/ComfyUI/user/default/workflows \
-e CLI_ARGS="" \
yanwk/comfyui-boot:cu128-megapak-pt29mkdir -p \
storage \
storage-models/models \
storage-models/hf-hub \
storage-models/torch-hub \
storage-user/input \
storage-user/output \
storage-user/workflows
podman run -it --rm \
--name comfyui-megapak \
--device nvidia.com/gpu=all \
--security-opt label=disable \
-p 8188:8188 \
-v "$(pwd)"/storage:/root \
-v "$(pwd)"/storage-models/models:/root/ComfyUI/models \
-v "$(pwd)"/storage-models/hf-hub:/root/.cache/huggingface/hub \
-v "$(pwd)"/storage-models/torch-hub:/root/.cache/torch/hub \
-v "$(pwd)"/storage-user/input:/root/ComfyUI/input \
-v "$(pwd)"/storage-user/output:/root/ComfyUI/output \
-v "$(pwd)"/storage-user/workflows:/root/ComfyUI/user/default/workflows \
-e CLI_ARGS="" \
docker.io/yanwk/comfyui-boot:cu128-megapak-pt29| args | description |
|---|---|
--use-sage-attention |
Use SageAttention. Keep current config for xFormers. |
--use-flash-attention |
Use FlashAttention. Keep current config for xFormers. |
--use-pytorch-cross-attention |
Use PyTorch’s built-in cross-attention. Disable xFormers. |
-
Only one attention implementation can be selected at a time. If none is specified, xFormers is enabled by default.
-
For example, you may want to use one of these:
-
--use-flash-attention -
--use-flash-attention --disable-xformers -
--use-sage-attention -
--use-sage-attention --disable-xformers -
--use-pytorch-cross-attention -
Leave empty (will use xFormers by default).
-
| GPU Architecture | Blackwell | Hopper | Ada Lovelace | Ampere | Turing | Volta |
|---|---|---|---|---|---|---|
Example GPU |
RTX 5090 |
H100 |
RTX 4090 |
RTX 3090 |
RTX 2080 |
TITAN V |
SageAttention |
✔️ |
❌ |
✔️ |
✔️ |
❌ |
❌ |
FlashAttention |
✔️ |
✔️ |
✔️ |
✔️ |
❌ |
❌ |
xFormers |
✔️ |
✔️ |
✔️ |
✔️ |
✔️ |
❌ |
PyTorch Native |
✔️ |
✔️ |
✔️ |
✔️ |
✔️ |
✔️ |
-
Note the xFormers compatibility issues on Blackwell GPUs were fixed in this version. Now xFormers is enabled by default.
| args | description |
|---|---|
--disable-xformers |
Disable xFormers. |
--fast |
Enable experimental optimizations.
(e.g.
float8_e4m3fn
matrix multiplication on Ada Lovelace and later GPUs).
Might lower image quality. |
--disable-smart-memory |
Force ComfyUI to offload models from VRAM to RAM more frequently. Slows performance but reduce memory leaks. |
--lowvram |
Force ComfyUI to split the model (UNET) into parts to use less VRAM, at the cost of speed. Use only if your GPU has less than 6 GB of VRAM. |
--novram |
Use system RAM only, no VRAM at all. Very slow. |
--cpu |
Run on CPU. Very slow. Used for testing purposes. |
More CLI_ARGS available at ComyfyUI’s
cli_args.py.
| Variable | Example Value | Memo |
|---|---|---|
HTTP_PROXY |
Set HTTP proxy. Works the same as |
|
PIP_INDEX_URL |
Set mirror site for Python Package Index. |
|
HF_ENDPOINT |
Set mirror site for HuggingFace Hub. |
|
HF_TOKEN |
'hf_your_token' |
Set HuggingFace Access Token. More info |
HF_XET_HIGH_PERFORMANCE |
1 |
Enable HuggingFace Hub’s high performance mode. Only make sense if you have >5Gbps and VERY STABLE connection (e.g. cloud server). More info |
TORCH_CUDA_ARCH_LIST |
8.6 |
Build target for PyTorch and its extensions. For most users, no setup is needed as it will be automatically selected on Linux. When needed, you only need to set one build target just for your GPU. More info |
CMAKE_ARGS |
'-DBUILD_opencv_world=ON -DWITH_CUDA=ON -DCUDA_FAST_MATH=ON -DWITH_CUBLAS=ON -DWITH_NVCUVID=ON' |
Build options for CMAKE projects using CUDA. |