Skip to content
This repository was archived by the owner on May 20, 2025. It is now read-only.
This repository was archived by the owner on May 20, 2025. It is now read-only.

Memory leak when using nogil Cython #51

@ogrisel

Description

@ogrisel

Code to reproduce:

pip install numpy scipy cython psutil
git clone https://github.com/scikit-learn/scikit-learn
cd scikit-learn
pip install -e .
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.datasets import make_classification
import psutil
import gc


X, y = make_classification(n_samples=int(1e4), random_state=42)
print(f"data size: {X.nbytes / 1e6:.1f} MB")

for i in range(5):
    clf = HistGradientBoostingClassifier(max_iter=100).fit(X, y)
    gc.collect()
    print(f"memory usage: {psutil.Process().memory_info().rss / 1e6:.1f} MB")

Using nogil-Cython build of scikit-learn: CPython

$ OPENMP_NUM_THREADS=1 python ~/code/sanbox/debug_memleak.py 
data size: 1.6 MB
memory usage: 701.8 MB
memory usage: 1290.5 MB
memory usage: 1878.0 MB
memory usage: 2466.0 MB
memory usage: 3053.4 MB

Using scikit-learn installed from conda-forge

$ OPENMP_NUM_THREADS=1 python ~/code/sanbox/debug_memleak.py 
data size: 1.6 MB
memory usage: 124.6 MB
memory usage: 124.6 MB
memory usage: 125.1 MB
memory usage: 125.1 MB
memory usage: 125.1 MB

Note: this code is using OpenMP-based threading but the leak still happens when disabling the OpenMP threading layer by setting OPENMP_NUM_THREADS=1 so this problem is probably not related to OpenMP.

Note sure how to debug this. Maybe I could try to use valgrind.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions