Version 1.7#
Legend for changelogs
Major Feature something big that you couldn’t do before.
Feature something that you couldn’t do before.
Efficiency an existing feature now may not require as much computation or memory.
Enhancement a miscellaneous minor improvement.
Fix something that previously didn’t work as documented – or according to reasonable expectations – should now work.
API Change you will need to change your code to have the same effect in the future; or a feature will be removed in the future.
Version 1.7.dev0#
November 2024
Changes impacting many modules#
Enhancement
__sklearn_tags__
was introduced for setting tags in estimators. More details in Estimator Tags. By Thomas Fan and Adrin Jalali #29677Enhancement Scikit-learn classes and functions can be used while only having a
import sklearn
import line. For example,import sklearn; sklearn.svm.SVC()
now works. By Thomas Fan #29793Fix Classes
metrics.ConfusionMatrixDisplay
,metrics.RocCurveDisplay
,calibration.CalibrationDisplay
,metrics.PrecisionRecallDisplay
,metrics.PredictionErrorDisplay
andinspection.PartialDependenceDisplay
now properly handle Matplotlib aliases for style parameters (e.g.,c
andcolor
,ls
andlinestyle
, etc). By Joseph Barbier #30023API Change
utils.validation.validate_data
is introduced and replaces previously privatebase.BaseEstimator._validate_data
method. This is intended for third party estimator developers, who should use this function in most cases instead ofutils.check_array
andutils.check_X_y
. By Adrin Jalali #29696
Support for Array API#
Additional estimators and functions have been updated to include support for all Array API compliant inputs.
See Array API support (experimental) for more details.
Feature
model_selection.GridSearchCV
,model_selection.RandomizedSearchCV
,model_selection.HalvingGridSearchCV
andmodel_selection.HalvingRandomSearchCV
now support Array API compatible inputs when their base estimators do. By Tim Head and Olivier Grisel #27096Feature
preprocessing.LabelEncoder
now supports Array API compatible inputs. By Omar Salman #27381Feature
sklearn.metrics.mean_absolute_error
now supports Array API compatible inputs. By Edoardo Abati #27736Feature
sklearn.metrics.mean_tweedie_deviance
now supports Array API compatible inputs. By Thomas Li #28106Feature
sklearn.metrics.pairwise.cosine_similarity
now supports Array API compatible inputs. By Edoardo Abati #29014Feature
sklearn.metrics.pairwise.paired_cosine_distances
now supports Array API compatible inputs. By Edoardo Abati #29112Feature
sklearn.metrics.cluster.entropy
now supports Array API compatible inputs. By Yaroslav Korobko #29141Feature
sklearn.metrics.mean_squared_error
now supports Array API compatible inputs. By Yaroslav Korobko #29142Feature
sklearn.metrics.pairwise.additive_chi2_kernel
now supports Array API compatible inputs. By Yaroslav Korobko #29144Feature
sklearn.metrics.d2_tweedie_score
now supports Array API compatible inputs. By Emily Chen #29207Feature
sklearn.metrics.max_error
now supports Array API compatible inputs. By Edoardo Abati #29212Feature
sklearn.metrics.mean_poisson_deviance
now supports Array API compatible inputs. By Emily Chen #29227Feature
sklearn.metrics.mean_gamma_deviance
now supports Array API compatible inputs. By Emily Chen #29239Feature
sklearn.metrics.pairwise.cosine_distances
now supports Array API compatible inputs. By Emily Chen #29265Feature
sklearn.metrics.pairwise.chi2_kernel
now supports Array API compatible inputs. By Yaroslav Korobko #29267Feature
sklearn.metrics.mean_absolute_percentage_error
now supports Array API compatible inputs. By Emily Chen #29300Feature
sklearn.metrics.pairwise.paired_euclidean_distances
now supports Array API compatible inputs. By Emily Chen #29389Feature
sklearn.metrics.pairwise.euclidean_distances
andsklearn.metrics.pairwise.rbf_kernel
now supports Array API compatible inputs. By Omar Salman #29433Feature
sklearn.metrics.pairwise.linear_kernel
,sklearn.metrics.pairwise.sigmoid_kernel
, andsklearn.metrics.pairwise.polynomial_kernel
now supports Array API compatible inputs. By Omar Salman #29475Feature
sklearn.metrics.mean_squared_log_error
andsklearn.metrics.root_mean_squared_log_error
now supports Array API compatible inputs. By Virgil Chan #29709Feature
preprocessing.MinMaxScaler
withclip=True
now supports Array API compatible inputs. By Shreekant Nandiyawar #29751Support for the soon to be deprecated
cupy.array_api
module has been removed in favor of directly supporting the top levelcupy
module, possibly via thearray_api_compat.cupy
compatibility wrapper. By Olivier Grisel #29639
Metadata routing#
Refer to the Metadata Routing User Guide for more details.
Feature
semi_supervised.SelfTrainingClassifier
now supports metadata routing. The fit method now accepts**fit_params
which are passed to the underlying estimators via theirfit
methods. In addition, thepredict
,predict_proba
,predict_log_proba
,score
anddecision_function
methods also accept**params
which are passed to the underlying estimators via their respective methods. By Adam Li #28494Feature
ensemble.StackingClassifier
andensemble.StackingRegressor
now support metadata routing and pass**fit_params
to the underlying estimators via theirfit
methods. By Stefanie Senger #28701Feature
model_selection.learning_curve
now supports metadata routing for thefit
method of its estimator and for its underlying CV splitter and scorer. By Stefanie Senger #28975Feature
compose.TransformedTargetRegressor
now supports metadata routing in itsfit
andpredict
methods and routes the corresponding params to the underlying regressor. By Omar Salman #29136Feature
feature_selection.SequentialFeatureSelector
now supports metadata routing in itsfit
method and passes the corresponding params to themodel_selection.cross_val_score
function. By Omar Salman #29260Feature
model_selection.permutation_test_score
now supports metadata routing for thefit
method of its estimator and for its underlying CV splitter and scorer. By Adam Li #29266Feature
feature_selection.RFE
andfeature_selection.RFECV
now support metadata routing. By Omar Salman #29312Feature
model_selection.validation_curve
now supports metadata routing for thefit
method of its estimator and for its underlying CV splitter and scorer. By Stefanie Senger #29329Fix Metadata is routed correctly to grouped CV splitters via
linear_model.RidgeCV
andlinear_model.RidgeClassifierCV
andUnsetMetadataPassedError
is fixed forlinear_model.RidgeClassifierCV
with default scoring. By Stefanie Senger #29634Fix Many method arguments which shouldn’t be included in the routing mechanism are now excluded and the
set_{method}_request
methods are not generated for them. By Adrin Jalali #29920
Dropping official support for PyPy#
Due to limited maintainer resources and small number of users, official PyPy support has been dropped. Some parts of scikit-learn may still work but PyPy is not tested anymore in the scikit-learn Continuous Integration. By Loïc Estève #29128
Dropping support for building with setuptools#
From scikit-learn 1.6 onwards, support for building with setuptools has been removed. Meson is the only supported way to build scikit-learn, see Building from source for more details. By Loïc Estève #29400
sklearn.base
#
Enhancement Added a function
base.is_clusterer
which determines whether a given estimator is of category clusterer. By Christian Veenhuis #28936API Change Passing a class object to
is_classifier
,is_regressor
,is_transformer
, andis_outlier_detector
is now deprecated. Pass an instance instead. By Adrin Jalali #30122
sklearn.calibration
#
API Change
cv="prefit"
is deprecated forCalibratedClassifierCV
. UseFrozenEstimator
instead, asCalibratedClassifierCV(FrozenEstimator(estimator))
. By Adrin Jalali #30171
sklearn.cluster
#
API Change The
copy
parameter ofcluster.Birch
was deprecated in 1.6 and will be removed in 1.8. It has no effect as the estimator does not perform in-place operations on the input data. By Yao Xiao #29124
sklearn.compose
#
Enhancement
sklearn.compose.ColumnTransformer
verbose_feature_names_out
now accepts string format or callable to generate feature names. By Marc Bresson #28934
sklearn.covariance
#
Efficiency
covariance.MinCovDet
fitting is now slightly faster. By Antony Lee #29835
sklearn.cross_decomposition
#
Fix
cross_decomposition.PLSRegression
properly raises an error whenn_components
is larger thann_samples
. By Thomas Fan #29710
sklearn.datasets
#
Feature
datasets.fetch_file
allows downloading arbitrary data-file from the web. It handles local caching, integrity checks with SHA256 digests and automatic retries in case of HTTP errors. By Olivier Grisel #29354
sklearn.decomposition
#
Enhancement
LatentDirichletAllocation
now has anormalize
parameter intransform
andfit_transform
methods to control whether the document topic distribution is normalized. By Adrin Jalali #30097Fix
IncrementalPCA
will now only raise aValueError
when the number of samples in the input data topartial_fit
is less than the number of components on the first call topartial_fit
. Subsequent calls topartial_fit
no longer face this restriction. By Thomas Gessey-Jones #30224
sklearn.discriminant_analysis
#
Fix
discriminant_analysis.QuadraticDiscriminantAnalysis
will now causeLinAlgWarning
in case of collinear variables. These errors can be silenced using thereg_param
attribute. By Alihan Zihna #19731
sklearn.ensemble
#
Feature
ensemble.ExtraTreesClassifier
andensemble.ExtraTreesRegressor
now support missing-values in the data matrixX
. Missing-values are handled by randomly moving all of the samples to the left, or right child node as the tree is traversed. By Adam Li #28268Efficiency Small runtime improvement of fitting
ensemble.HistGradientBoostingClassifier
andensemble.HistGradientBoostingRegressor
by parallelizing the initial search for bin thresholds. By Christian Lorentzen #28064Efficiency
ensemble.IsolationForest
now runs parallel jobs during predict offering a speedup of up to 2-4x on sample sizes larger than 2000 usingjoblib
. By Adam Li and Sérgio Pereira #28622Enhancement The verbosity of
ensemble.HistGradientBoostingClassifier
andensemble.HistGradientBoostingRegressor
got a more granular control. Now,verbose = 1
prints only summary messages,verbose >= 2
prints the full information as before. By Christian Lorentzen #28179API Change The parameter
algorithm
ofensemble.AdaBoostClassifier
is deprecated and will be removed in 1.8. By Jérémie du Boisberranger #29997
sklearn.feature_extraction
#
Fix
feature_extraction.text.TfidfVectorizer
now correctly preserves thedtype
ofidf_
based on the input data. By Guillaume Lemaitre #30022
sklearn.frozen
#
Major Feature
FrozenEstimator
is now introduced which allows freezing an estimator. This means calling.fit
on it has no effect, and doing aclone(frozenestimator)
returns the same estimator instead of an unfitted clone. #29705 By Adrin Jalali #29705
sklearn.impute
#
Fix
impute.KNNImputer
excludes samples with nan distances when computing the mean value for uniform weights. By Xuefeng Xu #29135Fix When
min_value
andmax_value
are array-like and some features are dropped due tokeep_empty_features=False
,impute.IterativeImputer
no longer raises an error and now indexes correctly. By Guntitat Sawadwuthikul #29451Fix Fixed
impute.IterativeImputer
to make sure that it does not skip the iterative process whenkeep_empty_features
is set toTrue
. By Arif Qodari #29779API Change Add a warning in
impute.SimpleImputer
whenkeep_empty_feature=False
andstrategy="constant"
. In this case empty features are not dropped and this behaviour will change in 1.8. By Arthur Courselle and Simon Riou #29950
sklearn.inspection
#
Enhancement Add
custom_values
parameter ininspection.partial_dependence
. It enables users to pass their own grid of values at which the partial dependence should be calculated. By Freddy A. Boulton and Stephen Pardy #26202
sklearn.linear_model
#
Enhancement The
solver="newton-cholesky"
inlinear_model.LogisticRegression
andlinear_model.LogisticRegressionCV
is extended to support the full multinomial loss in a multiclass setting. By Christian Lorentzen #28840Fix In
linear_model.Ridge
andlinear_model.RidgeCV
, afterfit
, thecoef_
attribute is now of shape(n_samples,)
like other linear models. By Maxwell Liu, Guillaume Lemaitre, and Adrin Jalali #19746Fix
linear_model.LogisticRegressionCV
corrects sample weight handling for the calculation of test scores. By Shruti Nath #29419Fix
linear_model.LassoCV
andlinear_model.ElasticNetCV
now take sample weights into accounts to define the search grid for the internally tunedalpha
hyper-parameter. By John Hopfensperger and Shruti Nath #29442Fix
linear_model.LogisticRegression
,linear_model.PoissonRegressor
,linear_model.GammaRegressor
,linear_model.TweedieRegressor
now take sample weights into account to decide when to fall back tosolver='lbfgs'
wheneversolver='newton-cholesky'
becomes numerically unstable. By Antoine Baker #29818Fix
linear_model.RidgeCV
now properly uses predictions on the same scale as the target seen duringfit
. These predictions are stored incv_results_
whenscoring != None
. Previously, the predictions were rescaled by the square root of the sample weights and offset by the mean of the target, leading to an incorrect estimate of the score. By Guillaume Lemaitre, Jérôme Dockes and Hanmin Qin #29842Fix
linear_model.RidgeCV
now properly supports custom multioutput scorers by letting the scorer manage the multioutput averaging. Previously, the predictions and true targets were both squeezed to a 1D array before computing the error. By Guillaume Lemaitre #29884Fix
linear_model.LinearRegression
now sets thecond
parameter when calling thescipy.linalg.lstsq
solver on dense input data. This ensures more numerically robust results on rank-deficient data. In particular, it empirically fixes the expected equivalence property between fitting with reweighted or with repeated data points. By Antoine Baker #30040Fix
linear_model.LogisticRegression
and and other linear models that acceptsolver="newton-cholesky"
now report the correct number of iterations when they fall back to the"lbfgs"
solver because of a rank deficient Hessian matrix. By Olivier Grisel #30100Fix
SGDOneClassSVM
now correctly inherits fromOutlierMixin
and the tags are correctly set. By Guillaume Lemaitre #30227API Change Deprecates
copy_X
inlinear_model.TheilSenRegressor
as the parameter has no effect.copy_X
will be removed in 1.8. By Adam Li #29105
sklearn.manifold
#
Efficiency
manifold.locally_linear_embedding
andmanifold.LocallyLinearEmbedding
now allocate more efficiently the memory of sparse matrices in the Hessian, Modified and LTSA methods. By Giorgio Angelotti #28096
sklearn.metrics
#
Efficiency
sklearn.metrics.classification_report
is now faster by caching classification labels. By Adrin Jalali #29738Enhancement
metrics.RocCurveDisplay.from_estimator
,metrics.RocCurveDisplay.from_predictions
,metrics.PrecisionRecallDisplay.from_estimator
, andmetrics.PrecisionRecallDisplay.from_predictions
now accept a new keyworddespine
to remove the top and right spines of the plot in order to make it clearer. By Yao Xiao #26367Enhancement
sklearn.metrics.check_scoring
now acceptsraise_exc
to specify whether to raise an exception if a subset of the scorers in multimetric scoring fails or to return an error code. By Stefanie Senger #28992Fix
metrics.roc_auc_score
will now correctly return np.nan and warn user if only one class is present in the labels. By Gleb Levitski and Janez Demšar #27412, #30013Fix The functions
metrics.mean_squared_log_error
andmetrics.root_mean_squared_log_error
now check whether the inputs are within the correct domain for the function \(y=\log(1+x)\), rather than \(y=\log(x)\). The functionsmetrics.mean_absolute_error
,metrics.mean_absolute_percentage_error
,metrics.mean_squared_error
andmetrics.root_mean_squared_error
now explicitly check whether a scalar will be returned whenmultioutput=uniform_average
. By Virgil Chan #29709API Change The
assert_all_finite
parameter of functionsmetrics.pairwise.check_pairwise_arrays
andmetrics.pairwise_distances
is renamed intoensure_all_finite
.force_all_finite
will be removed in 1.8. By Jérémie du Boisberranger #29404API Change
scoring="neg_max_error"
should be used instead ofscoring="max_error"
which is now deprecated. By Farid “Freddie” Taba #29462API Change The default value of the
response_method
parameter ofmetrics.make_scorer
will change fromNone
to"predict"
andNone
will be removed in 1.8. In the mean time,None
is equivalent to"predict"
. By Jérémie du Boisberranger #30001
sklearn.model_selection
#
Enhancement
GroupKFold
now has the ability to shuffle groups into different folds whenshuffle=True
. By Zachary Vealey #28519Enhancement There is no need to call
fit
on aFixedThresholdClassifier
if the underlying estimator is already fitted. By Adrin Jalali #30172Fix Improve error message when
model_selection.RepeatedStratifiedKFold.split
is called without ay
argument By Anurag Varma #29402
sklearn.neighbors
#
Enhancement
neighbors.NearestNeighbors
,neighbors.KNeighborsClassifier
,neighbors.KNeighborsRegressor
,neighbors.RadiusNeighborsClassifier
,neighbors.RadiusNeighborsRegressor
,neighbors.KNeighborsTransformer
,neighbors.RadiusNeighborsTransformer
, andneighbors.LocalOutlierFactor
now work withmetric="nan_euclidean"
, supportingnan
inputs. By Carlo Lemos, Guillaume Lemaitre, and Adrin Jalali #25330Enhancement Add
neighbors.NearestCentroid.decision_function
,neighbors.NearestCentroid.predict_proba
andneighbors.NearestCentroid.predict_log_proba
to theneighbors.NearestCentroid
estimator class. Support the case whenX
is sparse andshrinking_threshold
is notNone
inneighbors.NearestCentroid
. By Matthew Ning #26689Enhancement Make
predict
,predict_proba
, andscore
ofneighbors.KNeighborsClassifier
andneighbors.RadiusNeighborsClassifier
acceptX=None
as input. In this case predictions for all training set points are returned, and points are not included into their own neighbors. By Dmitry Kobak #30047Fix
neighbors.LocalOutlierFactor
raises a warning in thefit
method when duplicate values in the training data lead to inaccurate outlier detection. By Henrique Caroço #28773
sklearn.neural_network
#
Fix
neural_network.MLPRegressor
does no longer crash when the model diverges and thatearly_stopping
is enabled. By Marc Bresson #29773
sklearn.pipeline
#
Major Feature
pipeline.Pipeline
can now transform metadata up to the step requiring the metadata, which can be set using thetransform_input
parameter. By Adrin Jalali #28901Enhancement
pipeline.Pipeline
now warns about not being fitted before calling methods that require the pipeline to be fitted. This warning will become an error in 1.8. By Adrin Jalali #29868Fix Fixed an issue with tags and estimator type of
Pipeline
when pipeline is empty. This allows the HTML representation of an empty pipeline to be rendered correctly. By Gennaro Daniele Acciaro #30203
sklearn.preprocessing
#
Enhancement Added
warn
option tohandle_unknown
parameter inpreprocessing.OneHotEncoder
. By Gleb Levitski #28637Enhancement The HTML representation of
preprocessing.FunctionTransformer
will show the function name in the label. By Yao Xiao #29158Fix
preprocessing.PowerTransformer
now usesscipy.special.inv_boxcox
to outputnan
if the input of BoxCox’s inverse is invalid. By Xuefeng Xu #27875
sklearn.semi_supervised
#
API Change
semi_supervised.SelfTrainingClassifier
deprecated thebase_estimator
parameter in favor ofestimator
. By Adam Li #28494
sklearn.tree
#
Feature
tree.ExtraTreeClassifier
andtree.ExtraTreeRegressor
now support missing-values in the data matrixX
. Missing-values are handled by randomly moving all of the samples to the left, or right child node as the tree is traversed. By Adam Li and Loïc Estève #27966, #30318Fix Escape double quotes for labels and feature names when exporting trees to Graphviz format. By Santiago M. Mola. #17575
sklearn.utils
#
Enhancement
utils.check_array
now acceptsensure_non_negative
to check for negative values in the passed array, until now only available through callingutils.check_non_negative
. By Tamara Atanasoska #29540Enhancement
check_estimator
andparametrize_with_checks
now check and fail if the classifier has thetags.classifier_tags.multi_class = False
tag but does not fail on multi-class data. By Adrin Jalali #29874Enhancement
utils.validation.check_is_fitted
now passes on stateless estimators. An estimator can indicate it’s stateless by setting therequires_fit
tag. See Estimator Tags for more information. By Adrin Jalali #29880Enhancement Changes to
check_estimator
andparametrize_with_checks
.check_estimator
introduces new arguments:on_skip
,on_fail
, andcallback
to control the behavior of the check runner. Refer to the API documentation for more details.generate_only=True
is deprecated incheck_estimator
. Useestimator_checks_generator
instead.The
_xfail_checks
estimator tag is now removed, and now in order to indicate which tests are expected to fail, you can pass a dictionary to thecheck_estimator
as theexpected_failed_checks
parameter. Similarly, theexpected_failed_checks
parameter inparametrize_with_checks
can be used, which is a callable returning a dictionary of the form:{ "check_name": "reason to mark this check as xfail", }
Fix
utils.estimator_checks.parametrize_with_checks
andutils.estimator_checks.check_estimator
now support estimators that haveset_output
called on them. By Adrin Jalali #29869API Change The
assert_all_finite
parameter of functionsutils.check_array
,utils.check_X_y
,utils.as_float_array
is renamed intoensure_all_finite
.force_all_finite
will be removed in 1.8. By Jérémie du Boisberranger #29404API Change
utils.estimator_checks.check_sample_weights_invariance
replaced byutils.estimator_checks.check_sample_weight_equivalence_on_dense_data
which uses integer (including zero) weights andutils.estimator_checks.check_sample_weight_equivalence_on_sparse_data
which does the same on sparse data. By Antoine Baker #29818, #30137API Change Using
_estimator_type
to set the estimator type is deprecated. Inherit fromClassifierMixin
,RegressorMixin
,TransformerMixin
, orOutlierMixin
instead. Alternatively, you can setestimator_type
inTags
in the__sklearn_tags__
method. By Adrin Jalali #30122
Code and documentation contributors
Thanks to everyone who has contributed to the maintenance and improvement of the project since version 1.7, including:
TODO: update at the time of the release.