I am using the 910B NPU with the Qwen2.5-VL-3B model to evaluate BLINK, and it reported an error:
[2026-01-25 15:38:11] ERROR - RUN - run.py: main - 2090: Model Qwen2.5-VL-3B-Instruct x Dataset VStarBench combination failed: NPU out of memory. Tried to allocate 5.51 GiB (NPU 0; 60.96 GiB total capacity; 53.99 GiB already allocated; 53.99 GiB current active; 4.53 GiB free; 55.50 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.
How can I resolve this issue? Even when I use 4*910B, it doesn't seem to distribute the memory usage evenly.