Hyperparameters tuning#

Previous notebooks showed how model parameters impact statistical performance. We want to optimize these parameters to achieve the best possible model performance. This optimization process is called hyperparameter tuning.

This notebook demonstrates several methods to tune model hyperparameters.

Introductory example#

We revisit an example from the linear models notebook about the impact of the \(\alpha\) parameter in a Ridge model. The \(\alpha\) parameter controls model regularization strength. No general rule exists for selecting a good \(\alpha\) value - it depends on the specific dataset.

Let’s load a dataset for regression:

# When using JupyterLite, uncomment and install the `skrub` and `pyodide-http` packages.
%pip install skrub
%pip install pyodide-http
import matplotlib.pyplot as plt
import skrub

# import pyodide_http
# pyodide_http.patch_all()

skrub.patch_display()  # makes nice display for pandas tables
/home/runner/work/traces-sklearn/traces-sklearn/.pixi/envs/docs/bin/python: No module named pip
Note: you may need to restart the kernel to use updated packages.
/home/runner/work/traces-sklearn/traces-sklearn/.pixi/envs/docs/bin/python: No module named pip
Note: you may need to restart the kernel to use updated packages.
from sklearn.datasets import fetch_california_housing

X, y = fetch_california_housing(return_X_y=True, as_frame=True)
X
Processing column   1 / 8
Processing column   2 / 8
Processing column   3 / 8
Processing column   4 / 8
Processing column   5 / 8
Processing column   6 / 8
Processing column   7 / 8
Processing column   8 / 8

Please enable javascript

The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").

y
0        4.526
1        3.585
2        3.521
3        3.413
4        3.422
         ...  
20635    0.781
20636    0.771
20637    0.923
20638    0.847
20639    0.894
Name: MedHouseVal, Length: 20640, dtype: float64

Now we define a Ridge model that processes data by adding feature interactions using a PolynomialFeatures transformer.

from sklearn.linear_model import Ridge
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures, StandardScaler

model = Pipeline(
    [
        ("poly", PolynomialFeatures()),
        ("scaler", StandardScaler()),
        ("ridge", Ridge()),
    ]
)
model
Pipeline(steps=[('poly', PolynomialFeatures()), ('scaler', StandardScaler()),
                ('ridge', Ridge())])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

We start with scikit-learn’s default parameters. Let’s evaluate this basic model:

import pandas as pd
from sklearn.model_selection import KFold, cross_validate

cv = KFold(n_splits=10, shuffle=True, random_state=42)
cv_results = cross_validate(model, X, y, cv=cv)
cv_results = pd.DataFrame(cv_results)
cv_results
Processing column   1 / 3
Processing column   2 / 3
Processing column   3 / 3

Please enable javascript

The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").

cv_results.aggregate(["mean", "std"])
Processing column   1 / 3
Processing column   2 / 3
Processing column   3 / 3

Please enable javascript

The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").

Nothing indicates our pipeline achieves optimal performance. The PolynomialFeatures degree might need adjustment or the Ridge regressor might need different regularization. Let’s examine which parameters we could tune:

for params in model.get_params():
    print(params)
memory
steps
verbose
poly
scaler
ridge
poly__degree
poly__include_bias
poly__interaction_only
poly__order
scaler__copy
scaler__with_mean
scaler__with_std
ridge__alpha
ridge__copy_X
ridge__fit_intercept
ridge__max_iter
ridge__positive
ridge__random_state
ridge__solver
ridge__tol

Two key parameters are scaler__degree and ridge__alpha. We will find their optimal values for this dataset.

Hyperparameters search using a grid#

Our manual search implements a grid-search: trying every possible parameter combination. Scikit-learn provides GridSearchCV to automate this process. During fitting, it performs cross-validation and selects optimal hyperparameters.

from sklearn.model_selection import GridSearchCV

search_cv = GridSearchCV(model, param_grid=parameter_grid)
search_cv.fit(X_train, y_train)
GridSearchCV(estimator=Pipeline(steps=[('poly', PolynomialFeatures()),
                                       ('scaler', StandardScaler()),
                                       ('ridge', Ridge())]),
             param_grid={'poly__degree': [1, 2, 3],
                         'ridge__alpha': [0.01, 0.1, 1, 10]})
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

The best_params_ attribute shows the optimal parameters found:

search_cv.best_params_
{'poly__degree': 1, 'ridge__alpha': 0.01}

The cv_results_ attribute provides details about all hyperparameter combinations tried during fitting:

cv_results = pd.DataFrame(search_cv.cv_results_)
cv_results
Processing column   1 / 15
Processing column   2 / 15
Processing column   3 / 15
Processing column   4 / 15
Processing column   5 / 15
Processing column   6 / 15
Processing column   7 / 15
Processing column   8 / 15
Processing column   9 / 15
Processing column  10 / 15
Processing column  11 / 15
Processing column  12 / 15
Processing column  13 / 15
Processing column  14 / 15
Processing column  15 / 15

Please enable javascript

The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").

When refit=True (default), the search trains a final model using the best parameters. Access this model through best_estimator_:

search_cv.best_estimator_
Pipeline(steps=[('poly', PolynomialFeatures(degree=1)),
                ('scaler', StandardScaler()), ('ridge', Ridge(alpha=0.01))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

The best_estimator_ handles predict and score calls to GridSearchCV:

search_cv.score(X_test, y_test)
0.5910512173880501

EXERCISE:

GridSearchCV behaves like any classifier or regressor. Use cross_validate to evaluate the grid-search model we created.

# Write your code here.

QUESTION:

What limitations does the grid-search approach have?

Model with internal hyperparameter tuning#

Some estimators include efficient hyperparameter selection, more efficient than grid-search. These estimators typically end with CV (e.g. RidgeCV).

EXERCISE:

  1. Create a pipeline with PolynomialFeatures, StandardScaler, and Ridge

  2. Create a grid-search with this pipeline and tune alpha using np.logspace(-2, 2, num=50)

  3. Fit the grid-search on the training set and time it

  4. Repeat using RidgeCV instead of Ridge and remove GridSearchCV

  5. Compare computational performance between approaches

# Write your code here.

Inspection of hyperparameters in cross-validation#

When performing search cross-validation inside evaluation cross-validation, different hyperparameter values may emerge for each split. Let’s examine this with GridSearchCV:

from sklearn.linear_model import RidgeCV

inner_model = Pipeline(
    [
        ("poly", PolynomialFeatures()),
        ("scaler", StandardScaler()),
        ("ridge", Ridge()),
    ]
)
param_grid = {"poly__degree": [1, 2], "ridge__alpha": np.logspace(-2, 2, num=10)}
model = GridSearchCV(inner_model, param_grid=param_grid, n_jobs=-1)
model
GridSearchCV(estimator=Pipeline(steps=[('poly', PolynomialFeatures()),
                                       ('scaler', StandardScaler()),
                                       ('ridge', Ridge())]),
             n_jobs=-1,
             param_grid={'poly__degree': [1, 2],
                         'ridge__alpha': array([1.00000000e-02, 2.78255940e-02, 7.74263683e-02, 2.15443469e-01,
       5.99484250e-01, 1.66810054e+00, 4.64158883e+00, 1.29154967e+01,
       3.59381366e+01, 1.00000000e+02])})
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

We run cross-validation and store models from each split by setting return_estimator=True:

cv_results = cross_validate(model, X, y, cv=cv, return_estimator=True)
cv_results = pd.DataFrame(cv_results)
cv_results
/home/runner/work/traces-sklearn/traces-sklearn/.pixi/envs/docs/lib/python3.12/site-packages/numpy/ma/core.py:2881: RuntimeWarning: invalid value encountered in cast
  _data = np.array(data, dtype=dtype, copy=copy,
Processing column   1 / 4
Processing column   2 / 4
Processing column   3 / 4
Processing column   4 / 4

Please enable javascript

The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").

The estimator column contains the different estimators. We examine best_params_ from each GridSearchCV:

for estimator_cv_fold in cv_results["estimator"]:
    print(estimator_cv_fold.best_params_)
{'poly__degree': 1, 'ridge__alpha': np.float64(12.915496650148826)}
{'poly__degree': 1, 'ridge__alpha': np.float64(0.01)}
{'poly__degree': 2, 'ridge__alpha': np.float64(0.027825594022071243)}
{'poly__degree': 1, 'ridge__alpha': np.float64(35.93813663804626)}
{'poly__degree': 1, 'ridge__alpha': np.float64(12.915496650148826)}
{'poly__degree': 1, 'ridge__alpha': np.float64(12.915496650148826)}
{'poly__degree': 2, 'ridge__alpha': np.float64(0.01)}
{'poly__degree': 1, 'ridge__alpha': np.float64(35.93813663804626)}
{'poly__degree': 1, 'ridge__alpha': np.float64(12.915496650148826)}
{'poly__degree': 1, 'ridge__alpha': np.float64(12.915496650148826)}

This inspection reveals the stability of hyperparameter values across folds.

Note regarding the scoring metric to optimize during tuning#

The GridSearchCV and RandomizedSearchCV classes use the scoring parameter to define the metric to optimize during tuning. If not specified, the scoring metric used for classification is accuracy and the r2_score for regression.

These scoring rules are actually not optimal for hyperparameter tuning. Indeed, we recently recognized that it is better to use proper scoring rules. Such scoring rules allow to get calibrated models.

Therefore, we recommend to use brier_score_loss or log_loss for classification and mean_squared_error for regression.