convert_hf : faster lazy safetensors#8482
Conversation
|
This seems to cause a memory usage regression when lazily converting MoE models. Looks a lot like a memory leak. I'll try to fix this before merging. EDIT: So in my tests it's only happening with MoE models, so this means it's likely not a EDIT2: Seems like it might be related to a reference cycle in EDIT3: implemented the recursive solution in b971122. |
The '_lazy' queue was sometimes self-referential, which caused reference cycles of objects old enough to avoid garbage collection until potential memory exhaustion.
* convert_hf : faster lazy safetensors This makes '--dry-run' much, much faster. * convert_hf : fix memory leak in lazy MoE conversion The '_lazy' queue was sometimes self-referential, which caused reference cycles of objects old enough to avoid garbage collection until potential memory exhaustion.
Currently, with Lazy conversion, a relatively big portion of the model files is read before even beginning to write the output file, and then if the disk cache it smaller than the model, it will be read from disk again when actually converting.
Most of the time in the initial read is spent on
model_part.get_tensor(name)(at least when usingsafetensors).Turns out
safetensorshas the much faster.get_slice(name)which doesn't read the tensor data before it's needed, while still giving access to the shape and dtype of each tensor.As a nice result, this makes
convert_hf_to_gguf.py --dry-runmuch, much faster than before for slow disks and/or big models (seconds instead of minutes). Normal lazy conversion is also faster, since the initial metadata reading step doesn't unnecessarily read all the data anymore.Note that I've also removed some unused code in
gguf-py/gguf/tensor_mapping.pyrelated to the number of experts.xiddoes not exist in the mappings since stacked experts were implemented, so.format(xid = xid)does not do anything.Testing
After fixing the problem found in #8482 (comment), I've ran some more tests.
-no-slices-meansmasterat commit 97bdd26, while-slices-recurse-means after the memory leak was fixed in b971122.