Skip to content

Latest commit

 

History

History
238 lines (179 loc) · 6.49 KB

File metadata and controls

238 lines (179 loc) · 6.49 KB

MEGAPAK image for ComfyUI

Using PyTorch 2.9.1 and CUDA 12.8

Concept Design

GitHub Workflow Status

MEGAPAK uses the same base mechanism as the slim images. The key differences are:

  • Includes 40+ custom nodes. See the full list.

  • Includes CUDA development kit for compiling PyTorch C++ extensions, .cu files, etc.

  • Includes performance optimization libraries such as Nunchaku and SageAttention (powerful but may have compatibility issues).

  • Includes additional tools and dependencies.

What’s special about this cu128-megapak-pt29 image:

  • Pinned to PyTorch 2.9.1 and CUDA 12.8.

  • Pre-installed:

    • SageAttention 2.2.0

    • SpargeAttention

    • FlashAttention 2.8.3

    • Nunchaku

  • With:

    • Python 3.12

    • GCC 11

    • glibc 2.38 (from openSUSE Leap 15.6)

Usage

Please successfully run the slim image before attempting the megapak image. The prerequisites/setup sections are omitted from this document.

Run with Docker

mkdir -p \
  storage \
  storage-models/models \
  storage-models/hf-hub \
  storage-models/torch-hub \
  storage-user/input \
  storage-user/output \
  storage-user/workflows

docker run -it --rm \
  --name comfyui-megapak \
  --runtime nvidia \
  --gpus all \
  -p 8188:8188 \
  -v "$(pwd)"/storage:/root \
  -v "$(pwd)"/storage-models/models:/root/ComfyUI/models \
  -v "$(pwd)"/storage-models/hf-hub:/root/.cache/huggingface/hub \
  -v "$(pwd)"/storage-models/torch-hub:/root/.cache/torch/hub \
  -v "$(pwd)"/storage-user/input:/root/ComfyUI/input \
  -v "$(pwd)"/storage-user/output:/root/ComfyUI/output \
  -v "$(pwd)"/storage-user/workflows:/root/ComfyUI/user/default/workflows \
  -e CLI_ARGS="" \
  yanwk/comfyui-boot:cu128-megapak-pt29

Run with Podman

mkdir -p \
  storage \
  storage-models/models \
  storage-models/hf-hub \
  storage-models/torch-hub \
  storage-user/input \
  storage-user/output \
  storage-user/workflows

podman run -it --rm \
  --name comfyui-megapak \
  --device nvidia.com/gpu=all \
  --security-opt label=disable \
  -p 8188:8188 \
  -v "$(pwd)"/storage:/root \
  -v "$(pwd)"/storage-models/models:/root/ComfyUI/models \
  -v "$(pwd)"/storage-models/hf-hub:/root/.cache/huggingface/hub \
  -v "$(pwd)"/storage-models/torch-hub:/root/.cache/torch/hub \
  -v "$(pwd)"/storage-user/input:/root/ComfyUI/input \
  -v "$(pwd)"/storage-user/output:/root/ComfyUI/output \
  -v "$(pwd)"/storage-user/workflows:/root/ComfyUI/user/default/workflows \
  -e CLI_ARGS="" \
  docker.io/yanwk/comfyui-boot:cu128-megapak-pt29

CLI_ARGS - Attention Selection

args description

--use-sage-attention

Use SageAttention. Keep current config for xFormers.

--use-flash-attention

Use FlashAttention. Keep current config for xFormers.

--use-pytorch-cross-attention

Use PyTorch’s built-in cross-attention. Disable xFormers.

  • Only one attention implementation can be selected at a time. If none is specified, xFormers is enabled by default.

  • For example, you may want to use one of these:

    • --use-flash-attention

    • --use-flash-attention --disable-xformers

    • --use-sage-attention

    • --use-sage-attention --disable-xformers

    • --use-pytorch-cross-attention

    • Leave empty (will use xFormers by default).

Compatibility (only applies to this image)

GPU Architecture Blackwell Hopper Ada Lovelace Ampere Turing Volta

Example GPU

RTX 5090

H100

RTX 4090

RTX 3090

RTX 2080
GTX 1660

TITAN V

SageAttention

✔️

✔️

✔️

FlashAttention

✔️

✔️

✔️

✔️

xFormers

✔️

✔️

✔️

✔️

✔️

PyTorch Native

✔️

✔️

✔️

✔️

✔️

✔️

  • Note the xFormers compatibility issues on Blackwell GPUs were fixed in this version. Now xFormers is enabled by default.

CLI_ARGS - Frequently Used

args description

--disable-xformers

Disable xFormers.

--fast

Enable experimental optimizations. (e.g. float8_e4m3fn matrix multiplication on Ada Lovelace and later GPUs). Might lower image quality.
Turn it off if you want stability over speed.

--disable-smart-memory

Force ComfyUI to offload models from VRAM to RAM more frequently. Slows performance but reduce memory leaks.

--lowvram

Force ComfyUI to split the model (UNET) into parts to use less VRAM, at the cost of speed. Use only if your GPU has less than 6 GB of VRAM.

--novram

Use system RAM only, no VRAM at all. Very slow.

--cpu

Run on CPU. Very slow. Used for testing purposes.

More CLI_ARGS available at ComyfyUI’s cli_args.py.

Environment Variables Reference

Variable Example Value Memo

HTTP_PROXY
HTTPS_PROXY

http://localhost:1081
http://localhost:1081

Set HTTP proxy. Works the same as set-proxy.sh.

PIP_INDEX_URL

'https://pypi.org/simple'

Set mirror site for Python Package Index.

HF_ENDPOINT

'https://huggingface.co'

Set mirror site for HuggingFace Hub.

HF_TOKEN

'hf_your_token'

Set HuggingFace Access Token. More info

HF_XET_HIGH_PERFORMANCE

1

Enable HuggingFace Hub’s high performance mode. Only make sense if you have >5Gbps and VERY STABLE connection (e.g. cloud server). More info

TORCH_CUDA_ARCH_LIST

8.6
or
'7.0;7.5;8.0;8.6;9.0;10.0;12.0+PTX'

Build target for PyTorch and its extensions. For most users, no setup is needed as it will be automatically selected on Linux. When needed, you only need to set one build target just for your GPU. More info

CMAKE_ARGS

'-DBUILD_opencv_world=ON -DWITH_CUDA=ON -DCUDA_FAST_MATH=ON -DWITH_CUBLAS=ON -DWITH_NVCUVID=ON'

Build options for CMAKE projects using CUDA.