Note

Go to the end to download the full example code. or to run this example in your browser via JupyterLite or Binder

Release Highlights for scikit-learn 1.7#

We are pleased to announce the release of scikit-learn 1.7! Many bug fixes and improvements were added, as well as some key new features. Below we detail the highlights of this release. For an exhaustive list of all the changes, please refer to the release notes.

To install the latest version (with pip):

pip install --upgrade scikit-learn

or with conda:

conda install -c conda-forge scikit-learn

Improved estimator’s HTML representation#

The HTML representation of estimators now includes a section containing the list of parameters and their values. Non-default parameters are highlighted in orange. A copy button is also available to copy the “fully-qualified” parameter name without the need to call the get_params method. It is particularly useful when defining a parameter grid for a grid-search or a randomized-search with a complex pipeline.

See the example below and click on the different estimator’s blocks to see the improved HTML representation.

from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

model = make_pipeline(StandardScaler(with_std=False), LogisticRegression(C=2.0))
model

Pipeline(steps=[('standardscaler', StandardScaler(with_std=False)),
                ('logisticregression', LogisticRegression(C=2.0))])

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Pipeline

?Documentation for PipelineiNot fitted

Parameters

	steps	[('standardscaler', ...), ('logisticregression', ...)]
	transform_input	None
	memory	None
	verbose	False

StandardScaler

?Documentation for StandardScaler

Parameters

	copy	True
	with_mean	True
	with_std	False

LogisticRegression

?Documentation for LogisticRegression

Parameters

	penalty	'l2'
	dual	False
	tol	0.0001
	C	2.0
	fit_intercept	True
	intercept_scaling	1
	class_weight	None
	random_state	None
	solver	'lbfgs'
	max_iter	100
	multi_class	'deprecated'
	verbose	0
	warm_start	False
	n_jobs	None
	l1_ratio	None

Custom validation set for histogram-based Gradient Boosting estimators#

The ensemble.HistGradientBoostingClassifier and ensemble.HistGradientBoostingRegressor now support directly passing a custom validation set for early stopping to the fit method, using the X_val, y_val, and sample_weight_val parameters. In a pipeline.Pipeline, the validation set X_val can be transformed along with X using the transform_input parameter.

import sklearn
from sklearn.datasets import make_classification
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

sklearn.set_config(enable_metadata_routing=True)

X, y = make_classification(random_state=0)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=0)

clf = HistGradientBoostingClassifier()
clf.set_fit_request(X_val=True, y_val=True)

model = Pipeline([("sc", StandardScaler()), ("clf", clf)], transform_input=["X_val"])
model.fit(X, y, X_val=X_val, y_val=y_val)

Pipeline(steps=[('sc', StandardScaler()),
                ('clf', HistGradientBoostingClassifier())],
         transform_input=['X_val'])

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Pipeline

?Documentation for PipelineiFitted

Parameters

	steps	[('sc', ...), ('clf', ...)]
	transform_input	['X_val']
	memory	None
	verbose	False

StandardScaler

?Documentation for StandardScaler

Parameters

	copy	True
	with_mean	True
	with_std	True

HistGradientBoostingClassifier

?Documentation for HistGradientBoostingClassifier

Parameters

	loss	'log_loss'
	learning_rate	0.1
	max_iter	100
	max_leaf_nodes	31
	max_depth	None
	min_samples_leaf	20
	l2_regularization	0.0
	max_features	1.0
	max_bins	255
	categorical_features	'from_dtype'
	monotonic_cst	None
	interaction_cst	None
	warm_start	False
	early_stopping	'auto'
	scoring	'loss'
	validation_fraction	0.1
	n_iter_no_change	10
	tol	1e-07
	verbose	0
	random_state	None
	class_weight	None

Plotting ROC curves from cross-validation results#

The class metrics.RocCurveDisplay has a new class method from_cv_results that allows to easily plot multiple ROC curves from the results of model_selection.cross_validate.

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import RocCurveDisplay
from sklearn.model_selection import cross_validate

X, y = make_classification(n_samples=150, random_state=0)
clf = LogisticRegression(random_state=0)
cv_results = cross_validate(clf, X, y, cv=5, return_estimator=True, return_indices=True)
_ = RocCurveDisplay.from_cv_results(cv_results, X, y)

Array API support#

Several functions have been updated to support array API compatible inputs since version 1.6, especially metrics from the sklearn.metrics module.

In addition, it is no longer required to install the array-api-compat package to use the experimental array API support in scikit-learn.

Please refer to the array API support page for instructions to use scikit-learn with array API compatible libraries such as PyTorch or CuPy.

Improved API consistency of Multi-layer Perceptron#

The neural_network.MLPRegressor has a new parameter loss and now supports the “poisson” loss in addition to the default “squared_error” loss. Moreover, the neural_network.MLPClassifier and neural_network.MLPRegressor estimators now support sample weights. These improvements have been made to improve the consistency of these estimators with regard to the other estimators in scikit-learn.

Migration toward sparse arrays#

In order to prepare SciPy migration from sparse matrices to sparse arrays, all scikit-learn estimators that accept sparse matrices as input now also accept sparse arrays.

Total running time of the script: (0 minutes 0.206 seconds)