Skip to content
This repository has been archived by the owner on Feb 28, 2024. It is now read-only.

LOO in BayesSearchCV #1186

Open
wiktorolszowy opened this issue Oct 27, 2023 · 0 comments
Open

LOO in BayesSearchCV #1186

wiktorolszowy opened this issue Oct 27, 2023 · 0 comments

Comments

@wiktorolszowy
Copy link

wiktorolszowy commented Oct 27, 2023

I am running into problems with BayesSearchCV and cross validations (CVs) in runs when at least one fold would be of size 1 (e.g. LOO-CV) and when I use the r2 score. A reproducible example:

import numpy as np
import pandas as pd
from catboost import CatBoostRegressor
from skopt import BayesSearchCV
from sklearn.model_selection import LeaveOneOut

# Generate random data
n_samples = 10
n_features = 2
X = pd.DataFrame(np.random.randn(n_samples, n_features))
y = pd.Series(np.random.randn(n_samples))

# Define your model
model = CatBoostRegressor()

# Define the hyperparameter search space
search_spaces = {
    'learning_rate': (0.01, 0.95, 'log-uniform'),
    'max_depth': (2, 4),
    'n_estimators': (10, 200),
    'l2_leaf_reg': (1, 10),
    'random_strength': (1, 10, 'log-uniform')
}

# Define the search object
search = BayesSearchCV(
    model,
    search_spaces,
    n_iter=3,
    cv=LeaveOneOut(),
    scoring='r2',
    n_jobs=-1)

# Fit the search object to your data
search.fit(X, y)

for which I get warnings like:

"UserWarning: One or more of the test scores are non-finite: [nan]"

and the best_score_ is nan.

If I change the cv parameter (in BayesSearchCV) to 3, there is no warning. Alternatively, if I change scoring from r2 to neg_mean_squared_error, there is also no warning. r2 can not be calculated for one observation (row in X) only. So I guess that the CV scores in skopt are derived for each fold separately and then averaged. Should not the score be derived for all the observations together? So should not for each fold predictions be made for the validation/test observations corresponding to that CV iteration, and then the score would be calculated once, based on all the predicted values (from all the CV folds) and all the true values?

I am using skopt version 0.9.0.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant