-
-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MemoryLeak in LogisticRession
#28993
Comments
The culprit is not |
I confirm that I can reproduce the problem on macOS with the current state of However I also confirm that inserting a manual call to Since running
I have not tried to diagnose the root cause of the problem yet, but it's possible that the following references might help for anyone with a desire to look into the problem: |
I am almost sure that the issue is happening when calling the |
We can also try to use tracemalloc to get more information, for instance: |
I used this code to test loss_gradient import tracemalloc
import warnings
import numpy as np
from sklearn.linear_model import LogisticRegression
def do_train() -> None:
np.random.seed(42)
X = np.random.rand(2000, 3000)
y = np.random.randint(2, size=2000)
clf = LogisticRegression(max_iter=20)
clf.fit(X, y)
def run(reps):
tracemalloc.start()
with warnings.catch_warnings():
warnings.simplefilter("ignore")
for _ in range(reps):
do_train()
base, peak = tracemalloc.get_traced_memory()
tracemalloc.stop()
print(f"| {reps:^6} | {base/ 1024**2:^9.2f} | {peak/ 1024**2:^9.2f} | {(peak-base)/ 1024**2:^9.2f} |")
if __name__ == "__main__":
print("| # runs | base (MB) | peak (MB) | diff (MB) |")
print("| ------ | --------- | --------- | --------- |")
run(1)
run(2)
run(4)
run(8)
run(16) I did this change here def f(*args, **kwargs):
return 0.0
opt_res = optimize.minimize(
# func,
f,
w0,
method="L-BFGS-B",
jac=False,
args=(X, target, sample_weight, l2_reg_strength, n_threads),
options={
"maxiter": max_iter,
"maxls": 50, # default is 20
"iprint": iprint,
"gtol": tol,
"ftol": 64 * np.finfo(float).eps,
},
) and leak still exists
I also tried this script in new venv import tracemalloc
from scipy.optimize import minimize
import numpy as np
def do_something(*args, **kwargs):
return np.random.rand()
def do_train():
minimize(
do_something,
np.random.rand(300),
args=(np.random.rand(2000, 3000)),
method="L-BFGS-B"
)
def run(reps):
tracemalloc.start()
for _ in range(reps):
do_train()
base, peak = tracemalloc.get_traced_memory()
tracemalloc.stop()
print(f"| {reps:^6} | {base/ 1024**2:^9.2f} | {peak/ 1024**2:^9.2f} | {(peak-base)/ 1024**2:^9.2f} |")
if __name__ == "__main__":
print("| # runs | base (MB) | peak (MB) | diff (MB) |")
print("| ------ | --------- | --------- | --------- |")
run(1)
run(2)
run(3)
run(4)
run(5)
run(6)
run(7)
run(8)
run(9)
run(14)
run(16)
run(8) Result is
Looks like problem occurs because of scipy.optimizie.minimize |
@Tialo Thanks for the further investigations. I propose to open an issue upstream in scipy and close here. |
I opened scipy/scipy#20768. |
FYI, we discovered that there were circular references in a helper class (ScalarFunction) that prevented instances of that class from being collected when the calling scope exited. There is a PR underway to remove those circular references, which should fix this issue. |
Thanks @andyfaff for the heads up |
Special "Thank You" from my side @Tialo for crafting the minimal example that I did not get time writing on my first pass ;) |
Hello, SciPy 1.14.0rc1 is out, with a fix for this issue. It can be installed with
As the name implies, it is a release candidate, so there is a higher risk of bugs than usual. Please report any bugs you find to the SciPy issue tracker. |
Confirming that |
Describe the bug
repro
LogisticRegression().fit(X, y)
on same size feature matrix. Or run the attached repro-script.expected
actual
notes
scikit-learn==1.2.2
, andPython 3.10.11
SGDClassifier
instead ofLogisticRegression
. The memory usage behaves as expected when using this solver. I therefore suspect this has to do libsvm not releasing memory correctly after completed training.Steps/Code to Reproduce
Expected Results
Expected memory usage: 7.63 MB
Actual Results
Expected memory usage: 7.63 MB
Versions
The text was updated successfully, but these errors were encountered: