You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 28, 2024. It is now read-only.
I am running into problems with BayesSearchCV and cross validations (CVs) in runs when at least one fold would be of size 1 (e.g. LOO-CV) and when I use the r2 score. A reproducible example:
import numpy as np
import pandas as pd
from catboost import CatBoostRegressor
from skopt import BayesSearchCV
from sklearn.model_selection import LeaveOneOut
# Generate random data
n_samples = 10
n_features = 2
X = pd.DataFrame(np.random.randn(n_samples, n_features))
y = pd.Series(np.random.randn(n_samples))
# Define your model
model = CatBoostRegressor()
# Define the hyperparameter search space
search_spaces = {
'learning_rate': (0.01, 0.95, 'log-uniform'),
'max_depth': (2, 4),
'n_estimators': (10, 200),
'l2_leaf_reg': (1, 10),
'random_strength': (1, 10, 'log-uniform')
}
# Define the search object
search = BayesSearchCV(
model,
search_spaces,
n_iter=3,
cv=LeaveOneOut(),
scoring='r2',
n_jobs=-1)
# Fit the search object to your data
search.fit(X, y)
for which I get warnings like:
"UserWarning: One or more of the test scores are non-finite: [nan]"
and the best_score_ is nan.
If I change the cv parameter (in BayesSearchCV) to 3, there is no warning. Alternatively, if I change scoring from r2 to neg_mean_squared_error, there is also no warning. r2 can not be calculated for one observation (row in X) only. So I guess that the CV scores in skopt are derived for each fold separately and then averaged. Should not the score be derived for all the observations together? So should not for each fold predictions be made for the validation/test observations corresponding to that CV iteration, and then the score would be calculated once, based on all the predicted values (from all the CV folds) and all the true values?
I am using skopt version 0.9.0.
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I am running into problems with
BayesSearchCV
and cross validations (CVs) in runs when at least one fold would be of size 1 (e.g. LOO-CV) and when I use ther2
score. A reproducible example:for which I get warnings like:
"UserWarning: One or more of the test scores are non-finite: [nan]"
and the
best_score_
is nan.If I change the cv parameter (in
BayesSearchCV
) to 3, there is no warning. Alternatively, if I changescoring
fromr2
toneg_mean_squared_error
, there is also no warning.r2
can not be calculated for one observation (row in X) only. So I guess that the CV scores inskopt
are derived for each fold separately and then averaged. Should not the score be derived for all the observations together? So should not for each fold predictions be made for the validation/test observations corresponding to that CV iteration, and then the score would be calculated once, based on all the predicted values (from all the CV folds) and all the true values?I am using
skopt
version 0.9.0.The text was updated successfully, but these errors were encountered: