Version 1.2#
For a short description of the main highlights of the release, please refer to Release Highlights for scikit-learn 1.2.
Legend for changelogs
Major Feature something big that you couldn’t do before.
Feature something that you couldn’t do before.
Efficiency an existing feature now may not require as much computation or memory.
Enhancement a miscellaneous minor improvement.
Fix something that previously didn’t work as documented – or according to reasonable expectations – should now work.
API Change you will need to change your code to have the same effect in the future; or a feature will be removed in the future.
Version 1.2.2#
March 2023
Changelog#
sklearn.base
#
Fix When
set_output(transform="pandas")
,base.TransformerMixin
maintains the index if the transform output is already a DataFrame. #25747 by Thomas Fan.
sklearn.calibration
#
Fix A deprecation warning is raised when using the
base_estimator__
prefix to set parameters of the estimator used incalibration.CalibratedClassifierCV
. #25477 by Tim Head.
sklearn.cluster
#
Fix Fixed a bug in
cluster.BisectingKMeans
, preventingfit
to randomly fail due to a permutation of the labels when running multiple inits. #25563 by Jérémie du Boisberranger.
sklearn.compose
#
Fix Fixes a bug in
compose.ColumnTransformer
which now supports empty selection of columns whenset_output(transform="pandas")
. #25570 by Thomas Fan.
sklearn.ensemble
#
Fix A deprecation warning is raised when using the
base_estimator__
prefix to set parameters of the estimator used inensemble.AdaBoostClassifier
,ensemble.AdaBoostRegressor
,ensemble.BaggingClassifier
, andensemble.BaggingRegressor
. #25477 by Tim Head.
sklearn.feature_selection
#
Fix Fixed a regression where a negative
tol
would not be accepted any more byfeature_selection.SequentialFeatureSelector
. #25664 by Jérémie du Boisberranger.
sklearn.inspection
#
Fix Raise a more informative error message in
inspection.partial_dependence
when dealing with mixed data type categories that cannot be sorted bynumpy.unique
. This problem usually happen when categories arestr
and missing values are present usingnp.nan
. #25774 by Guillaume Lemaitre.
sklearn.isotonic
#
Fix Fixes a bug in
isotonic.IsotonicRegression
whereisotonic.IsotonicRegression.predict
would return a pandas DataFrame when the global configuration setstransform_output="pandas"
. #25500 by Guillaume Lemaitre.
sklearn.preprocessing
#
Fix
preprocessing.OneHotEncoder.drop_idx_
now properly references the dropped category in thecategories_
attribute when there are infrequent categories. #25589 by Thomas Fan.Fix
preprocessing.OrdinalEncoder
now correctly supportsencoded_missing_value
orunknown_value
set to a categories’ cardinality when there is missing values in the training data. #25704 by Thomas Fan.
sklearn.tree
#
Fix Fixed a regression in
tree.DecisionTreeClassifier
,tree.DecisionTreeRegressor
,tree.ExtraTreeClassifier
andtree.ExtraTreeRegressor
where an error was no longer raised in version 1.2 whenmin_sample_split=1
. #25744 by Jérémie du Boisberranger.
sklearn.utils
#
Fix Fixes a bug in
utils.check_array
which now correctly performs non-finite validation with the Array API specification. #25619 by Thomas Fan.Fix
utils.multiclass.type_of_target
can identify pandas nullable data types as classification targets. #25638 by Thomas Fan.
Version 1.2.1#
January 2023
Changed models#
The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.
Fix The fitted components in
decomposition.MiniBatchDictionaryLearning
might differ. The online updates of the sufficient statistics now properly take the sizes of the batches into account. #25354 by Jérémie du Boisberranger.Fix The
categories_
attribute ofpreprocessing.OneHotEncoder
now always contains an array ofobject`s when using predefined categories that are strings. Predefined categories encoded as bytes will no longer work with `X
encoded as strings. #25174 by Tim Head.
Changes impacting all modules#
Fix Support
pandas.Int64
dtypedy
for classifiers and regressors. #25089 by Tim Head.Fix Remove spurious warnings for estimators internally using neighbors search methods. #25129 by Julien Jerphanion.
Fix Fix a bug where the current configuration was ignored in estimators using
n_jobs > 1
. This bug was triggered for tasks dispatched by the auxiliary thread ofjoblib
assklearn.get_config
used to access an empty thread local configuration instead of the configuration visible from the thread wherejoblib.Parallel
was first called. #25363 by Guillaume Lemaitre.
Changelog#
sklearn.base
#
Fix Fix a regression in
BaseEstimator.__getstate__
that would prevent certain estimators to be pickled when using Python 3.11. #25188 by Benjamin Bossan.Fix Inheriting from
base.TransformerMixin
will only wrap thetransform
method if the class definestransform
itself. #25295 by Thomas Fan.
sklearn.datasets
#
Fix Fixes an inconsistency in
datasets.fetch_openml
between liac-arff and pandas parser when a leading space is introduced after the delimiter. The ARFF specs requires to ignore the leading space. #25312 by Guillaume Lemaitre.Fix Fixes a bug in
datasets.fetch_openml
when usingparser="pandas"
where single quote and backslash escape characters were not properly handled. #25511 by Guillaume Lemaitre.
sklearn.decomposition
#
Fix Fixed a bug in
decomposition.MiniBatchDictionaryLearning
where the online updates of the sufficient statistics where not correct when callingpartial_fit
on batches of different sizes. #25354 by Jérémie du Boisberranger.Fix
decomposition.DictionaryLearning
better supports readonly NumPy arrays. In particular, it better supports large datasets which are memory-mapped when it is used with coordinate descent algorithms (i.e. whenfit_algorithm='cd'
). #25172 by Julien Jerphanion.
sklearn.ensemble
#
Fix
ensemble.RandomForestClassifier
,ensemble.RandomForestRegressor
ensemble.ExtraTreesClassifier
andensemble.ExtraTreesRegressor
now support sparse readonly datasets. #25341 by Julien Jerphanion
sklearn.feature_extraction
#
Fix
feature_extraction.FeatureHasher
raises an informative error when the input is a list of strings. #25094 by Thomas Fan.
sklearn.linear_model
#
Fix Fix a regression in
linear_model.SGDClassifier
andlinear_model.SGDRegressor
that makes them unusable with theverbose
parameter set to a value greater than 0. #25250 by Jérémie Du Boisberranger.
sklearn.manifold
#
Fix
manifold.TSNE
now works correctly when output type is set to pandas #25370 by Tim Head.
sklearn.model_selection
#
Fix
model_selection.cross_validate
with multimetric scoring in case of some failing scorers the non-failing scorers now returns proper scores instead oferror_score
values. #23101 by András Simon and Thomas Fan.
sklearn.neural_network
#
Fix
neural_network.MLPClassifier
andneural_network.MLPRegressor
no longer raise warnings when fitting data with feature names. #24873 by Tim Head.Fix Improves error message in
neural_network.MLPClassifier
andneural_network.MLPRegressor
, whenearly_stopping=True
andpartial_fit
is called. #25694 by Thomas Fan.
sklearn.preprocessing
#
Fix
preprocessing.FunctionTransformer.inverse_transform
correctly supports DataFrames that are all numerical whencheck_inverse=True
. #25274 by Thomas Fan.Fix
preprocessing.SplineTransformer.get_feature_names_out
correctly returns feature names whenextrapolations="periodic"
. #25296 by Thomas Fan.
sklearn.tree
#
Fix
tree.DecisionTreeClassifier
,tree.DecisionTreeRegressor
tree.ExtraTreeClassifier
andtree.ExtraTreeRegressor
now support sparse readonly datasets. #25341 by Julien Jerphanion
sklearn.utils
#
Fix Restore
utils.check_array
’s behaviour for pandas Series of type boolean. The type is maintained, instead of converting tofloat64.
#25147 by Tim Head.API Change
utils.fixes.delayed
is deprecated in 1.2.1 and will be removed in 1.5. Instead, importutils.parallel.delayed
and use it in conjunction with the newly introducedutils.parallel.Parallel
to ensure proper propagation of the scikit-learn configuration to the workers. #25363 by Guillaume Lemaitre.
Version 1.2.0#
December 2022
Changed models#
The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.
Enhancement The default
eigen_tol
forcluster.SpectralClustering
,manifold.SpectralEmbedding
,cluster.spectral_clustering
, andmanifold.spectral_embedding
is nowNone
when using the'amg'
or'lobpcg'
solvers. This change improves numerical stability of the solver, but may result in a different model.Enhancement
linear_model.GammaRegressor
,linear_model.PoissonRegressor
andlinear_model.TweedieRegressor
can reach higher precision with the lbfgs solver, in particular whentol
is set to a tiny value. Moreover,verbose
is now properly propagated to L-BFGS-B. #23619 by Christian Lorentzen.Enhancement The default value for
eps
metrics.log_loss
has changed from1e-15
to"auto"
."auto"
setseps
tonp.finfo(y_pred.dtype).eps
. #24354 by Safiuddin Khaja and gsiisg.Fix Make sign of
components_
deterministic indecomposition.SparsePCA
. #23935 by Guillaume Lemaitre.Fix The
components_
signs indecomposition.FastICA
might differ. It is now consistent and deterministic with all SVD solvers. #22527 by Meekail Zain and Thomas Fan.Fix The condition for early stopping has now been changed in
linear_model._sgd_fast._plain_sgd
which is used bylinear_model.SGDRegressor
andlinear_model.SGDClassifier
. The old condition did not disambiguate between training and validation set and had an effect of overscaling the error tolerance. This has been fixed in #23798 by Harsh Agrawal.Fix For
model_selection.GridSearchCV
andmodel_selection.RandomizedSearchCV
ranks corresponding to nan scores will all be set to the maximum possible rank. #24543 by Guillaume Lemaitre.API Change The default value of
tol
was changed from1e-3
to1e-4
forlinear_model.ridge_regression
,linear_model.Ridge
andlinear_model.RidgeClassifier
. #24465 by Christian Lorentzen.
Changes impacting all modules#
Major Feature The
set_output
API has been adopted by all transformers. Meta-estimators that contain transformers such aspipeline.Pipeline
orcompose.ColumnTransformer
also define aset_output
. For details, see SLEP018. #23734 and #24699 by Thomas Fan.Efficiency Low-level routines for reductions on pairwise distances for dense float32 datasets have been refactored. The following functions and estimators now benefit from improved performances in terms of hardware scalability and speed-ups:
For instance
sklearn.neighbors.NearestNeighbors.kneighbors
andsklearn.neighbors.NearestNeighbors.radius_neighbors
can respectively be up to ×20 and ×5 faster than previously on a laptop.Moreover, implementations of those two algorithms are now suitable for machine with many cores, making them usable for datasets consisting of millions of samples.
Enhancement Finiteness checks (detection of NaN and infinite values) in all estimators are now significantly more efficient for float32 data by leveraging NumPy’s SIMD optimized primitives. #23446 by Meekail Zain
Enhancement Finiteness checks (detection of NaN and infinite values) in all estimators are now faster by utilizing a more efficient stop-on-first second-pass algorithm. #23197 by Meekail Zain
Enhancement Support for combinations of dense and sparse datasets pairs for all distance metrics and for float32 and float64 datasets has been added or has seen its performance improved for the following estimators:
#23604 and #23585 by Julien Jerphanion, Olivier Grisel, and Thomas Fan, #24556 by Vincent Maladière.
Fix Systematically check the sha256 digest of dataset tarballs used in code examples in the documentation. #24617 by Olivier Grisel and Thomas Fan. Thanks to Sim4n6 for the report.
Changelog#
sklearn.base
#
Enhancement Introduces
base.ClassNamePrefixFeaturesOutMixin
andbase.ClassNamePrefixFeaturesOutMixin
mixins that defines get_feature_names_out for common transformer uses cases. #24688 by Thomas Fan.
sklearn.calibration
#
API Change Rename
base_estimator
toestimator
incalibration.CalibratedClassifierCV
to improve readability and consistency. The parameterbase_estimator
is deprecated and will be removed in 1.4. #22054 by Kevin Roice.
sklearn.cluster
#
Efficiency
cluster.KMeans
withalgorithm="lloyd"
is now faster and uses less memory. #24264 by Vincent Maladiere.Enhancement The
predict
andfit_predict
methods ofcluster.OPTICS
now accept sparse data type for input data. #14736 by Hunt Zhan, #20802 by Brandon Pokorny, and #22965 by Meekail Zain.Enhancement
cluster.Birch
now preserves dtype fornumpy.float32
inputs. #22968 byMeekail Zain <micky774>
.Enhancement
cluster.KMeans
andcluster.MiniBatchKMeans
now accept a new'auto'
option forn_init
which changes the number of random initializations to one when usinginit='k-means++'
for efficiency. This begins deprecation for the default values ofn_init
in the two classes and both will have their defaults changed ton_init='auto'
in 1.4. #23038 by Meekail Zain.Enhancement
cluster.SpectralClustering
andcluster.spectral_clustering
now propagates theeigen_tol
parameter to all choices ofeigen_solver
. Includes a new optioneigen_tol="auto"
and begins deprecation to change the default fromeigen_tol=0
toeigen_tol="auto"
in version 1.3. #23210 by Meekail Zain.Fix
cluster.KMeans
now supports readonly attributes when predicting. #24258 by Thomas FanAPI Change The
affinity
attribute is now deprecated forcluster.AgglomerativeClustering
and will be renamed tometric
in v1.4. #23470 by Meekail Zain.
sklearn.datasets
#
Enhancement Introduce the new parameter
parser
indatasets.fetch_openml
.parser="pandas"
allows to use the very CPU and memory efficientpandas.read_csv
parser to load dense ARFF formatted dataset files. It is possible to passparser="liac-arff"
to use the old LIAC parser. Whenparser="auto"
, dense datasets are loaded with “pandas” and sparse datasets are loaded with “liac-arff”. Currently,parser="liac-arff"
by default and will change toparser="auto"
in version 1.4 #21938 by Guillaume Lemaitre.Enhancement
datasets.dump_svmlight_file
is now accelerated with a Cython implementation, providing 2-4x speedups. #23127 by Meekail ZainEnhancement Path-like objects, such as those created with pathlib are now allowed as paths in
datasets.load_svmlight_file
anddatasets.load_svmlight_files
. #19075 by Carlos Ramos Carreño.Fix Make sure that
datasets.fetch_lfw_people
anddatasets.fetch_lfw_pairs
internally crops images based on theslice_
parameter. #24951 by Guillaume Lemaitre.
sklearn.decomposition
#
Efficiency
decomposition.FastICA.fit
has been optimised w.r.t its memory footprint and runtime. #22268 by MohamedBsh.Enhancement
decomposition.SparsePCA
anddecomposition.MiniBatchSparsePCA
now implements aninverse_transform
function. #23905 by Guillaume Lemaitre.Enhancement
decomposition.FastICA
now allows the user to select how whitening is performed through the newwhiten_solver
parameter, which supportssvd
andeigh
.whiten_solver
defaults tosvd
althougheigh
may be faster and more memory efficient in cases wherenum_features > num_samples
. #11860 by Pierre Ablin, #22527 by Meekail Zain and Thomas Fan.Enhancement
decomposition.LatentDirichletAllocation
now preserves dtype fornumpy.float32
input. #24528 by Takeshi Oura and Jérémie du Boisberranger.Fix Make sign of
components_
deterministic indecomposition.SparsePCA
. #23935 by Guillaume Lemaitre.API Change The
n_iter
parameter ofdecomposition.MiniBatchSparsePCA
is deprecated and replaced by the parametersmax_iter
,tol
, andmax_no_improvement
to be consistent withdecomposition.MiniBatchDictionaryLearning
.n_iter
will be removed in version 1.3. #23726 by Guillaume Lemaitre.API Change The
n_features_
attribute ofdecomposition.PCA
is deprecated in favor ofn_features_in_
and will be removed in 1.4. #24421 by Kshitij Mathur.
sklearn.discriminant_analysis
#
Major Feature
discriminant_analysis.LinearDiscriminantAnalysis
now supports the Array API forsolver="svd"
. Array API support is considered experimental and might evolve without being subjected to our usual rolling deprecation cycle policy. See Array API support (experimental) for more details. #22554 by Thomas Fan.Fix Validate parameters only in
fit
and not in__init__
fordiscriminant_analysis.QuadraticDiscriminantAnalysis
. #24218 by Stefanie Molin.
sklearn.ensemble
#
Major Feature
ensemble.HistGradientBoostingClassifier
andensemble.HistGradientBoostingRegressor
now support interaction constraints via the argumentinteraction_cst
of their constructors. #21020 by Christian Lorentzen. Using interaction constraints also makes fitting faster. #24856 by Christian Lorentzen.Feature Adds
class_weight
toensemble.HistGradientBoostingClassifier
. #22014 by Thomas Fan.Efficiency Improve runtime performance of
ensemble.IsolationForest
by avoiding data copies. #23252 by Zhehao Liu.Enhancement
ensemble.StackingClassifier
now accepts any kind of base estimator. #24538 by Guillem G Subies.Enhancement Make it possible to pass the
categorical_features
parameter ofensemble.HistGradientBoostingClassifier
andensemble.HistGradientBoostingRegressor
as feature names. #24889 by Olivier Grisel.Enhancement
ensemble.StackingClassifier
now supports multilabel-indicator target #24146 by Nicolas Peretti, Nestor Navarro, Nati Tomattis, and Vincent Maladiere.Enhancement
ensemble.HistGradientBoostingClassifier
andensemble.HistGradientBoostingClassifier
now accept theirmonotonic_cst
parameter to be passed as a dictionary in addition to the previously supported array-like format. Such dictionary have feature names as keys and one of-1
,0
,1
as value to specify monotonicity constraints for each feature. #24855 by Olivier Grisel.Enhancement Interaction constraints for
ensemble.HistGradientBoostingClassifier
andensemble.HistGradientBoostingRegressor
can now be specified as strings for two common cases: “no_interactions” and “pairwise” interactions. #24849 by Tim Head.Fix Fixed the issue where
ensemble.AdaBoostClassifier
outputs NaN in feature importance when fitted with very small sample weight. #20415 by Zhehao Liu.Fix
ensemble.HistGradientBoostingClassifier
andensemble.HistGradientBoostingRegressor
no longer error when predicting on categories encoded as negative values and instead consider them a member of the “missing category”. #24283 by Thomas Fan.Fix
ensemble.HistGradientBoostingClassifier
andensemble.HistGradientBoostingRegressor
, withverbose>=1
, print detailed timing information on computing histograms and finding best splits. The time spent in the root node was previously missing and is now included in the printed information. #24894 by Christian Lorentzen.API Change Rename the constructor parameter
base_estimator
toestimator
in the following classes:ensemble.BaggingClassifier
,ensemble.BaggingRegressor
,ensemble.AdaBoostClassifier
,ensemble.AdaBoostRegressor
.base_estimator
is deprecated in 1.2 and will be removed in 1.4. #23819 by Adrian Trujillo and Edoardo Abati.API Change Rename the fitted attribute
base_estimator_
toestimator_
in the following classes:ensemble.BaggingClassifier
,ensemble.BaggingRegressor
,ensemble.AdaBoostClassifier
,ensemble.AdaBoostRegressor
,ensemble.RandomForestClassifier
,ensemble.RandomForestRegressor
,ensemble.ExtraTreesClassifier
,ensemble.ExtraTreesRegressor
,ensemble.RandomTreesEmbedding
,ensemble.IsolationForest
.base_estimator_
is deprecated in 1.2 and will be removed in 1.4. #23819 by Adrian Trujillo and Edoardo Abati.
sklearn.feature_selection
#
Fix Fix a bug in
feature_selection.mutual_info_regression
andfeature_selection.mutual_info_classif
, where the continuous features inX
should be scaled to a unit variance independently if the targety
is continuous or discrete. #24747 by Guillaume Lemaitre
sklearn.gaussian_process
#
Fix Fix
gaussian_process.kernels.Matern
gradient computation withnu=0.5
for PyPy (and possibly other non CPython interpreters). #24245 by Loïc Estève.Fix The
fit
method ofgaussian_process.GaussianProcessRegressor
will not modify the input X in case a custom kernel is used, with adiag
method that returns part of the input X. #24405 by Omar Salman.
sklearn.impute
#
Enhancement Added
keep_empty_features
parameter toimpute.SimpleImputer
,impute.KNNImputer
andimpute.IterativeImputer
, preventing removal of features containing only missing values when transforming. #16695 by Vitor Santa Rosa.
sklearn.inspection
#
Major Feature Extended
inspection.partial_dependence
andinspection.PartialDependenceDisplay
to handle categorical features. #18298 by Madhura Jayaratne and Guillaume Lemaitre.Fix
inspection.DecisionBoundaryDisplay
now raises error if input data is not 2-dimensional. #25077 by Arturo Amor.
sklearn.kernel_approximation
#
Enhancement
kernel_approximation.RBFSampler
now preserves dtype fornumpy.float32
inputs. #24317 byTim Head <betatim>
.Enhancement
kernel_approximation.SkewedChi2Sampler
now preserves dtype fornumpy.float32
inputs. #24350 by Rahil Parikh.Enhancement
kernel_approximation.RBFSampler
now accepts'scale'
option for parametergamma
. #24755 by Gleb Levitski.
sklearn.linear_model
#
Enhancement
linear_model.LogisticRegression
,linear_model.LogisticRegressionCV
,linear_model.GammaRegressor
,linear_model.PoissonRegressor
andlinear_model.TweedieRegressor
got a new solversolver="newton-cholesky"
. This is a 2nd order (Newton) optimisation routine that uses a Cholesky decomposition of the hessian matrix. Whenn_samples >> n_features
, the"newton-cholesky"
solver has been observed to converge both faster and to a higher precision solution than the"lbfgs"
solver on problems with one-hot encoded categorical variables with some rare categorical levels. #24637 and #24767 by Christian Lorentzen.Enhancement
linear_model.GammaRegressor
,linear_model.PoissonRegressor
andlinear_model.TweedieRegressor
can reach higher precision with the lbfgs solver, in particular whentol
is set to a tiny value. Moreover,verbose
is now properly propagated to L-BFGS-B. #23619 by Christian Lorentzen.Fix
linear_model.SGDClassifier
andlinear_model.SGDRegressor
will raise an error when all the validation samples have zero sample weight. #23275 byZhehao Liu <MaxwellLZH>
.Fix
linear_model.SGDOneClassSVM
no longer performs parameter validation in the constructor. All validation is now handled infit()
andpartial_fit()
. #24433 by Yogendrasingh, Arisa Y. and Tim Head.Fix Fix average loss calculation when early stopping is enabled in
linear_model.SGDRegressor
andlinear_model.SGDClassifier
. Also updated the condition for early stopping accordingly. #23798 by Harsh Agrawal.API Change The default value for the
solver
parameter inlinear_model.QuantileRegressor
will change from"interior-point"
to"highs"
in version 1.4. #23637 by Guillaume Lemaitre.API Change String option
"none"
is deprecated forpenalty
argument inlinear_model.LogisticRegression
, and will be removed in version 1.4. UseNone
instead. #23877 by Zhehao Liu.API Change The default value of
tol
was changed from1e-3
to1e-4
forlinear_model.ridge_regression
,linear_model.Ridge
andlinear_model.RidgeClassifier
. #24465 by Christian Lorentzen.
sklearn.manifold
#
Feature Adds option to use the normalized stress in
manifold.MDS
. This is enabled by setting the newnormalize
parameter toTrue
. #10168 by Łukasz Borchmann, #12285 by Matthias Miltenberger, #13042 by Matthieu Parizy, #18094 by Roth E Conrad and #22562 by Meekail Zain.Enhancement Adds
eigen_tol
parameter tomanifold.SpectralEmbedding
. Bothmanifold.spectral_embedding
andmanifold.SpectralEmbedding
now propagateeigen_tol
to all choices ofeigen_solver
. Includes a new optioneigen_tol="auto"
and begins deprecation to change the default fromeigen_tol=0
toeigen_tol="auto"
in version 1.3. #23210 by Meekail Zain.Enhancement
manifold.Isomap
now preserves dtype fornp.float32
inputs. #24714 by Rahil Parikh.API Change Added an
"auto"
option to thenormalized_stress
argument inmanifold.MDS
andmanifold.smacof
. Note thatnormalized_stress
is only valid for non-metric MDS, therefore the"auto"
option enablesnormalized_stress
whenmetric=False
and disables it whenmetric=True
."auto"
will become the default value fornormalized_stress
in version 1.4. #23834 by Meekail Zain
sklearn.metrics
#
Feature
metrics.ConfusionMatrixDisplay.from_estimator
,metrics.ConfusionMatrixDisplay.from_predictions
, andmetrics.ConfusionMatrixDisplay.plot
accepts atext_kw
parameter which is passed to matplotlib’stext
function. #24051 by Thomas Fan.Feature
metrics.class_likelihood_ratios
is added to compute the positive and negative likelihood ratios derived from the confusion matrix of a binary classification problem. #22518 by Arturo Amor.Feature Add
metrics.PredictionErrorDisplay
to plot residuals vs predicted and actual vs predicted to qualitatively assess the behavior of a regressor. The display can be created with the class methodsmetrics.PredictionErrorDisplay.from_estimator
andmetrics.PredictionErrorDisplay.from_predictions
. #18020 by Guillaume Lemaitre.Feature
metrics.roc_auc_score
now supports micro-averaging (average="micro"
) for the One-vs-Rest multiclass case (multi_class="ovr"
). #24338 by Arturo Amor.Enhancement Adds an
"auto"
option toeps
inmetrics.log_loss
. This option will automatically set theeps
value depending on the data type ofy_pred
. In addition, the default value ofeps
is changed from1e-15
to the new"auto"
option. #24354 by Safiuddin Khaja and gsiisg.Fix Allows
csr_matrix
as input for parameter:y_true
of themetrics.label_ranking_average_precision_score
metric. #23442 by Sean AtukoralaFix
metrics.ndcg_score
will now trigger a warning when they_true
value contains a negative value. Users may still use negative values, but the result may not be between 0 and 1. Starting in v1.4, passing in negative values fory_true
will raise an error. #22710 by Conroy Trinh and #23461 by Meekail Zain.Fix
metrics.log_loss
witheps=0
now returns a correct value of 0 ornp.inf
instead ofnan
for predictions at the boundaries (0 or 1). It also accepts integer input. #24365 by Christian Lorentzen.API Change The parameter
sum_over_features
ofmetrics.pairwise.manhattan_distances
is deprecated and will be removed in 1.4. #24630 by Rushil Desai.
sklearn.model_selection
#
Feature Added the class
model_selection.LearningCurveDisplay
that allows to make easy plotting of learning curves obtained by the functionmodel_selection.learning_curve
. #24084 by Guillaume Lemaitre.Fix For all
SearchCV
classes and scipy >= 1.10, rank corresponding to a nan score is correctly set to the maximum possible rank, rather thannp.iinfo(np.int32).min
. #24141 by Loïc Estève.Fix In both
model_selection.HalvingGridSearchCV
andmodel_selection.HalvingRandomSearchCV
parameter combinations with a NaN score now share the lowest rank. #24539 by Tim Head.Fix For
model_selection.GridSearchCV
andmodel_selection.RandomizedSearchCV
ranks corresponding to nan scores will all be set to the maximum possible rank. #24543 by Guillaume Lemaitre.
sklearn.multioutput
#
Feature Added boolean
verbose
flag to classes:multioutput.ClassifierChain
andmultioutput.RegressorChain
. #23977 by Eric Fiegel, Chiara Marmo, Lucy Liu, and Guillaume Lemaitre.
sklearn.naive_bayes
#
Feature Add methods
predict_joint_log_proba
to all naive Bayes classifiers. #23683 by Andrey Melnik.Enhancement A new parameter
force_alpha
was added tonaive_bayes.BernoulliNB
,naive_bayes.ComplementNB
,naive_bayes.CategoricalNB
, andnaive_bayes.MultinomialNB
, allowing user to set parameter alpha to a very small number, greater or equal 0, which was earlier automatically changed to1e-10
instead. #16747 by @arka204, #18805 by @hongshaoyang, #22269 by Meekail Zain.
sklearn.neighbors
#
Feature Adds new function
neighbors.sort_graph_by_row_values
to sort a CSR sparse graph such that each row is stored with increasing values. This is useful to improve efficiency when using precomputed sparse distance matrices in a variety of estimators and avoid anEfficiencyWarning
. #23139 by Tom Dupre la Tour.Efficiency
neighbors.NearestCentroid
is faster and requires less memory as it better leverages CPUs’ caches to compute predictions. #24645 by Olivier Grisel.Enhancement
neighbors.KernelDensity
bandwidth parameter now accepts definition using Scott’s and Silverman’s estimation methods. #10468 by Ruben and #22993 by Jovan Stojanovic.Enhancement
neighbors.NeighborsBase
now accepts Minkowski semi-metric (i.e. when \(0 < p < 1\) formetric="minkowski"
) foralgorithm="auto"
oralgorithm="brute"
. #24750 by Rudresh VeerkhareFix
neighbors.NearestCentroid
now raises an informative error message at fit-time instead of failing with a low-level error message at predict-time. #23874 by Juan Gomez.Fix Set
n_jobs=None
by default (instead of1
) forneighbors.KNeighborsTransformer
andneighbors.RadiusNeighborsTransformer
. #24075 by Valentin Laurent.Enhancement
neighbors.LocalOutlierFactor
now preserves dtype fornumpy.float32
inputs. #22665 by Julien Jerphanion.
sklearn.neural_network
#
Fix
neural_network.MLPClassifier
andneural_network.MLPRegressor
always expose the parametersbest_loss_
,validation_scores_
, andbest_validation_score_
.best_loss_
is set toNone
whenearly_stopping=True
, whilevalidation_scores_
andbest_validation_score_
are set toNone
whenearly_stopping=False
. #24683 by Guillaume Lemaitre.
sklearn.pipeline
#
Enhancement
pipeline.FeatureUnion.get_feature_names_out
can now be used when one of the transformers in thepipeline.FeatureUnion
is"passthrough"
. #24058 by Diederik PerdokEnhancement The
pipeline.FeatureUnion
class now has anamed_transformers
attribute for accessing transformers by name. #20331 by Christopher Flynn.
sklearn.preprocessing
#
Enhancement
preprocessing.FunctionTransformer
will always try to setn_features_in_
andfeature_names_in_
regardless of thevalidate
parameter. #23993 by Thomas Fan.Fix
preprocessing.LabelEncoder
correctly encodes NaNs intransform
. #22629 by Thomas Fan.API Change The
sparse
parameter ofpreprocessing.OneHotEncoder
is now deprecated and will be removed in version 1.4. Usesparse_output
instead. #24412 by Rushil Desai.
sklearn.svm
#
API Change The
class_weight_
attribute is now deprecated forsvm.NuSVR
,svm.SVR
,svm.OneClassSVM
. #22898 by Meekail Zain.
sklearn.tree
#
Enhancement
tree.plot_tree
,tree.export_graphviz
now uses a lower casex[i]
to represent featurei
. #23480 by Thomas Fan.
sklearn.utils
#
Feature A new module exposes development tools to discover estimators (i.e.
utils.discovery.all_estimators
), displays (i.e.utils.discovery.all_displays
) and functions (i.e.utils.discovery.all_functions
) in scikit-learn. #21469 by Guillaume Lemaitre.Enhancement
utils.extmath.randomized_svd
now accepts an argument,lapack_svd_driver
, to specify the lapack driver used in the internal deterministic SVD used by the randomized SVD algorithm. #20617 by Srinath KailasaEnhancement
utils.validation.column_or_1d
now accepts adtype
parameter to specificy
’s dtype. #22629 by Thomas Fan.Enhancement
utils.extmath.cartesian
now accepts arrays with differentdtype
and will cast the output to the most permissivedtype
. #25067 by Guillaume Lemaitre.Fix
utils.multiclass.type_of_target
now properly handles sparse matrices. #14862 by Léonard Binet.Fix HTML representation no longer errors when an estimator class is a value in
get_params
. #24512 by Thomas Fan.Fix
utils.estimator_checks.check_estimator
now takes into account therequires_positive_X
tag correctly. #24667 by Thomas Fan.Fix
utils.check_array
now supports Pandas Series withpd.NA
by raising a better error message or returning a compatiblendarray
. #25080 by Thomas Fan.API Change The extra keyword parameters of
utils.extmath.density
are deprecated and will be removed in 1.4. #24523 by Mia Bajic.
Code and documentation contributors
Thanks to everyone who has contributed to the maintenance and improvement of the project since version 1.1, including:
2357juan, 3lLobo, Adam J. Stewart, Adam Kania, Adam Li, Aditya Anulekh, Admir Demiraj, adoublet, Adrin Jalali, Ahmedbgh, Aiko, Akshita Prasanth, Ala-Na, Alessandro Miola, Alex, Alexandr, Alexandre Perez-Lebel, Alex Buzenet, Ali H. El-Kassas, aman kumar, Amit Bera, András Simon, Andreas Grivas, Andreas Mueller, Andrew Wang, angela-maennel, Aniket Shirsat, Anthony22-dev, Antony Lee, anupam, Apostolos Tsetoglou, Aravindh R, Artur Hermano, Arturo Amor, as-90, ashah002, Ashwin Mathur, avm19, Azaria Gebremichael, b0rxington, Badr MOUFAD, Bardiya Ak, Bartłomiej Gońda, BdeGraaff, Benjamin Bossan, Benjamin Carter, berkecanrizai, Bernd Fritzke, Bhoomika, Biswaroop Mitra, Brandon TH Chen, Brett Cannon, Bsh, cache-missing, carlo, Carlos Ramos Carreño, ceh, chalulu, Changyao Chen, Charles Zablit, Chiara Marmo, Christian Lorentzen, Christian Ritter, Christian Veenhuis, christianwaldmann, Christine P. Chai, Claudio Salvatore Arcidiacono, Clément Verrier, crispinlogan, Da-Lan, DanGonite57, Daniela Fernandes, DanielGaerber, darioka, Darren Nguyen, davidblnc, david-cortes, David Gilbertson, David Poznik, Dayne, Dea María Léon, Denis, Dev Khant, Dhanshree Arora, Diadochokinetic, diederikwp, Dimitri Papadopoulos Orfanos, Dimitris Litsidis, drewhogg, Duarte OC, Dwight Lindquist, Eden Brekke, Edern, Edoardo Abati, Eleanore Denies, EliaSchiavon, Emir, ErmolaevPA, Fabrizio Damicelli, fcharras, Felipe Siola, Flynn, francesco-tuveri, Franck Charras, ftorres16, Gael Varoquaux, Geevarghese George, genvalen, GeorgiaMayDay, Gianr Lazz, Gleb Levitski, Glòria Macià Muñoz, Guillaume Lemaitre, Guillem García Subies, Guitared, gunesbayir, Haesun Park, Hansin Ahuja, Hao Chun Chang, Harsh Agrawal, harshit5674, hasan-yaman, henrymooresc, Henry Sorsky, Hristo Vrigazov, htsedebenham, humahn, i-aki-y, Ian Thompson, Ido M, Iglesys, Iliya Zhechev, Irene, ivanllt, Ivan Sedykh, Jack McIvor, jakirkham, JanFidor, Jason G, Jérémie du Boisberranger, Jiten Sidhpura, jkarolczak, João David, JohnathanPi, John Koumentis, John P, John Pangas, johnthagen, Jordan Fleming, Joshua Choo Yun Keat, Jovan Stojanovic, Juan Carlos Alfaro Jiménez, juanfe88, Juan Felipe Arias, JuliaSchoepp, Julien Jerphanion, jygerardy, ka00ri, Kanishk Sachdev, Kanissh, Kaushik Amar Das, Kendall, Kenneth Prabakaran, Kento Nozawa, kernc, Kevin Roice, Kian Eliasi, Kilian Kluge, Kilian Lieret, Kirandevraj, Kraig, krishna kumar, krishna vamsi, Kshitij Kapadni, Kshitij Mathur, Lauren Burke, Léonard Binet, lingyi1110, Lisa Casino, Logan Thomas, Loic Esteve, Luciano Mantovani, Lucy Liu, Maascha, Madhura Jayaratne, madinak, Maksym, Malte S. Kurz, Mansi Agrawal, Marco Edward Gorelli, Marco Wurps, Maren Westermann, Maria Telenczuk, Mario Kostelac, martin-kokos, Marvin Krawutschke, Masanori Kanazu, mathurinm, Matt Haberland, mauroantonioserrano, Max Halford, Maxi Marufo, maximeSaur, Maxim Smolskiy, Maxwell, m. bou, Meekail Zain, Mehgarg, mehmetcanakbay, Mia Bajić, Michael Flaks, Michael Hornstein, Michel de Ruiter, Michelle Paradis, Mikhail Iljin, Misa Ogura, Moritz Wilksch, mrastgoo, Naipawat Poolsawat, Naoise Holohan, Nass, Nathan Jacobi, Nawazish Alam, Nguyễn Văn Diễn, Nicola Fanelli, Nihal Thukarama Rao, Nikita Jare, nima10khodaveisi, Nima Sarajpoor, nitinramvelraj, NNLNR, npache, Nwanna-Joseph, Nymark Kho, o-holman, Olivier Grisel, Olle Lukowski, Omar Hassoun, Omar Salman, osman tamer, ouss1508, Oyindamola Olatunji, PAB, Pandata, partev, Paulo Sergio Soares, Petar Mlinarić, Peter Jansson, Peter Steinbach, Philipp Jung, Piet Brömmel, Pooja M, Pooja Subramaniam, priyam kakati, puhuk, Rachel Freeland, Rachit Keerti Das, Rafal Wojdyla, Raghuveer Bhat, Rahil Parikh, Ralf Gommers, ram vikram singh, Ravi Makhija, Rehan Guha, Reshama Shaikh, Richard Klima, Rob Crockett, Robert Hommes, Robert Juergens, Robin Lenz, Rocco Meli, Roman4oo, Ross Barnowski, Rowan Mankoo, Rudresh Veerkhare, Rushil Desai, Sabri Monaf Sabri, Safikh, Safiuddin Khaja, Salahuddin, Sam Adam Day, Sandra Yojana Meneses, Sandro Ephrem, Sangam, SangamSwadik, SANJAI_3, SarahRemus, Sashka Warner, SavkoMax, Scott Gigante, Scott Gustafson, Sean Atukorala, sec65, SELEE, seljaks, Shady el Gewily, Shane, shellyfung, Shinsuke Mori, Shiva chauhan, Shoaib Khan, Shogo Hida, Shrankhla Srivastava, Shuangchi He, Simon, sonnivs, Sortofamudkip, Srinath Kailasa, Stanislav (Stanley) Modrak, Stefanie Molin, stellalin7, Stéphane Collot, Steven Van Vaerenbergh, Steve Schmerler, Sven Stehle, Tabea Kossen, TheDevPanda, the-syd-sre, Thijs van Weezel, Thomas Bonald, Thomas Germer, Thomas J. Fan, Ti-Ion, Tim Head, Timofei Kornev, toastedyeast, Tobias Pitters, Tom Dupré la Tour, tomiock, Tom Mathews, Tom McTiernan, tspeng, Tyler Egashira, Valentin Laurent, Varun Jain, Vera Komeyer, Vicente Reyes-Puerta, Vinayak Mehta, Vincent M, Vishal, Vyom Pathak, wattai, wchathura, WEN Hao, William M, x110, Xiao Yuan, Xunius, yanhong-zhao-ef, Yusuf Raji, Z Adil Khwaja, zeeshan lone