Version 0.22#
For a short description of the main highlights of the release, please refer to Release Highlights for scikit-learn 0.22.
Legend for changelogs
Major Feature something big that you couldn’t do before.
Feature something that you couldn’t do before.
Efficiency an existing feature now may not require as much computation or memory.
Enhancement a miscellaneous minor improvement.
Fix something that previously didn’t work as documented – or according to reasonable expectations – should now work.
API Change you will need to change your code to have the same effect in the future; or a feature will be removed in the future.
Version 0.22.2.post1#
March 3 2020
The 0.22.2.post1 release includes a packaging fix for the source distribution but the content of the packages is otherwise identical to the content of the wheels with the 0.22.2 version (without the .post1 suffix). Both contain the following changes.
Changelog#
sklearn.impute
#
Efficiency Reduce
impute.KNNImputer
asymptotic memory usage by chunking pairwise distance computation. #16397 by Joel Nothman.
sklearn.metrics
#
Fix Fixed a bug in
metrics.plot_roc_curve
where the name of the estimator was passed in themetrics.RocCurveDisplay
instead of the parametername
. It results in a different plot when callingmetrics.RocCurveDisplay.plot
for the subsequent times. #16500 by Guillaume Lemaitre.Fix Fixed a bug in
metrics.plot_precision_recall_curve
where the name of the estimator was passed in themetrics.PrecisionRecallDisplay
instead of the parametername
. It results in a different plot when callingmetrics.PrecisionRecallDisplay.plot
for the subsequent times. #16505 by Guillaume Lemaitre.
sklearn.neighbors
#
Fix Fix a bug which converted a list of arrays into a 2-D object array instead of a 1-D array containing NumPy arrays. This bug was affecting
neighbors.NearestNeighbors.radius_neighbors
. #16076 by Guillaume Lemaitre and Alex Shacked.
Version 0.22.1#
January 2 2020
This is a bug-fix release to primarily resolve some packaging issues in version 0.22.0. It also includes minor documentation improvements and some bug fixes.
Changelog#
sklearn.cluster
#
Fix
cluster.KMeans
withalgorithm="elkan"
now uses the same stopping criterion as with the defaultalgorithm="full"
. #15930 by @inder128.
sklearn.inspection
#
Fix
inspection.permutation_importance
will return the sameimportances
when arandom_state
is given for bothn_jobs=1
orn_jobs>1
both with shared memory backends (thread-safety) and isolated memory, process-based backends. Also avoid casting the data as object dtype and avoid read-only error on large dataframes withn_jobs>1
as reported in #15810. Follow-up of #15898 by Shivam Gargsya. #15933 by Guillaume Lemaitre and Olivier Grisel.Fix
inspection.plot_partial_dependence
andinspection.PartialDependenceDisplay.plot
now consistently checks the number of axes passed in. #15760 by Thomas Fan.
sklearn.metrics
#
Fix
metrics.plot_confusion_matrix
now raises error whennormalize
is invalid. Previously, it runs fine with no normalization. #15888 by Hanmin Qin.Fix
metrics.plot_confusion_matrix
now colors the label color correctly to maximize contrast with its background. #15936 by Thomas Fan and @DizietAsahi.Fix
metrics.classification_report
does no longer ignore the value of thezero_division
keyword argument. #15879 by Bibhash Chandra Mitra.Fix Fixed a bug in
metrics.plot_confusion_matrix
to correctly pass thevalues_format
parameter to themetrics.ConfusionMatrixDisplay
plot() call. #15937 by Stephen Blystone.
sklearn.model_selection
#
Fix
model_selection.GridSearchCV
andmodel_selection.RandomizedSearchCV
accept scalar values provided infit_params
. Change in 0.22 was breaking backward compatibility. #15863 by Adrin Jalali and Guillaume Lemaitre.
sklearn.naive_bayes
#
Fix Removed
abstractmethod
decorator for the method_check_X
innaive_bayes.BaseNB
that could break downstream projects inheriting from this deprecated public base class. #15996 by Brigitta Sipőcz.
sklearn.preprocessing
#
Fix
preprocessing.QuantileTransformer
now guarantees thequantiles_
attribute to be completely sorted in non-decreasing manner. #15751 by Tirth Patel.
sklearn.semi_supervised
#
Fix
semi_supervised.LabelPropagation
andsemi_supervised.LabelSpreading
now allow callable kernel function to return sparse weight matrix. #15868 by Niklas Smedemark-Margulies.
sklearn.utils
#
Fix
utils.check_array
now correctly converts pandas DataFrame with boolean columns to floats. #15797 by Thomas Fan.Fix
utils.validation.check_is_fitted
accepts back an explicitattributes
argument to check for specific attributes as explicit markers of a fitted estimator. When no explicitattributes
are provided, only the attributes that end with a underscore and do not start with double underscore are used as “fitted” markers. Theall_or_any
argument is also no longer deprecated. This change is made to restore some backward compatibility with the behavior of this utility in version 0.21. #15947 by Thomas Fan.
Version 0.22.0#
December 3 2019
Website update#
Our website was revamped and given a fresh new look. #14849 by Thomas Fan.
Clear definition of the public API#
Scikit-learn has a public API, and a private API.
We do our best not to break the public API, and to only introduce backward-compatible changes that do not require any user action. However, in cases where that’s not possible, any change to the public API is subject to a deprecation cycle of two minor versions. The private API isn’t publicly documented and isn’t subject to any deprecation cycle, so users should not rely on its stability.
A function or object is public if it is documented in the API Reference and if it can be
imported with an import path without leading underscores. For example
sklearn.pipeline.make_pipeline
is public, while
sklearn.pipeline._name_estimators
is private.
sklearn.ensemble._gb.BaseEnsemble
is private too because the whole _gb
module is private.
Up to 0.22, some tools were de-facto public (no leading underscore), while
they should have been private in the first place. In version 0.22, these
tools have been made properly private, and the public API space has been
cleaned. In addition, importing from most sub-modules is now deprecated: you
should for example use from sklearn.cluster import Birch
instead of
from sklearn.cluster.birch import Birch
(in practice, birch.py
has
been moved to _birch.py
).
Note
All the tools in the public API should be documented in the API Reference. If you find a public tool (without leading underscore) that isn’t in the API reference, that means it should either be private or documented. Please let us know by opening an issue!
This work was tracked in issue 9250 and issue 12927.
Deprecations: using FutureWarning
from now on#
When deprecating a feature, previous versions of scikit-learn used to raise
a DeprecationWarning
. Since the DeprecationWarnings
aren’t shown by
default by Python, scikit-learn needed to resort to a custom warning filter
to always show the warnings. That filter would sometimes interfere
with users custom warning filters.
Starting from version 0.22, scikit-learn will show FutureWarnings
for
deprecations, as recommended by the Python documentation.
FutureWarnings
are always shown by default by Python, so the custom
filter has been removed and scikit-learn no longer hinders with user
filters. #15080 by Nicolas Hug.
Changed models#
The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.
cluster.KMeans
whenn_jobs=1
. Fixdecomposition.SparseCoder
,decomposition.DictionaryLearning
, anddecomposition.MiniBatchDictionaryLearning
Fixdecomposition.SparseCoder
withalgorithm='lasso_lars'
Fixdecomposition.SparsePCA
wherenormalize_components
has no effect due to deprecation.ensemble.HistGradientBoostingClassifier
andensemble.HistGradientBoostingRegressor
Fix , Feature , Enhancement .impute.IterativeImputer
whenX
has features with no missing values. Featurelinear_model.Ridge
whenX
is sparse. Fixmodel_selection.StratifiedKFold
and any use ofcv=int
with a classifier. Fixcross_decomposition.CCA
when using scipy >= 1.3 Fix
Details are listed in the changelog below.
(While we are trying to better inform users by providing this information, we cannot assure that this list is complete.)
Changelog#
sklearn.base
#
API Change From version 0.24
base.BaseEstimator.get_params
will raise an AttributeError rather than return None for parameters that are in the estimator’s constructor but not stored as attributes on the instance. #14464 by Joel Nothman.
sklearn.calibration
#
Fix Fixed a bug that made
calibration.CalibratedClassifierCV
fail when given asample_weight
parameter of typelist
(in the case wheresample_weights
are not supported by the wrapped estimator). #13575 by William de Vazelhes.
sklearn.cluster
#
Feature
cluster.SpectralClustering
now accepts precomputed sparse neighbors graph as input. #10482 by Tom Dupre la Tour and Kumar Ashutosh.Enhancement
cluster.SpectralClustering
now accepts an_components
parameter. This parameter extendsSpectralClustering
class functionality to matchcluster.spectral_clustering
. #13726 by Shuzhe Xiao.Fix Fixed a bug where
cluster.KMeans
produced inconsistent results betweenn_jobs=1
andn_jobs>1
due to the handling of the random state. #9288 by Bryan Yang.Fix Fixed a bug where
elkan
algorithm incluster.KMeans
was producing Segmentation Fault on large arrays due to integer index overflow. #15057 by Vladimir Korolev.Fix
MeanShift
now accepts a max_iter with a default value of 300 instead of always using the default 300. It also now exposes ann_iter_
indicating the maximum number of iterations performed on each seed. #15120 by Adrin Jalali.Fix
cluster.AgglomerativeClustering
andcluster.FeatureAgglomeration
now raise an error ifaffinity='cosine'
andX
has samples that are all-zeros. #7943 by @mthorrell.
sklearn.compose
#
Feature Adds
compose.make_column_selector
which is used withcompose.ColumnTransformer
to select DataFrame columns on the basis of name and dtype. #12303 by Thomas Fan.Fix Fixed a bug in
compose.ColumnTransformer
which failed to select the proper columns when using a boolean list, with NumPy older than 1.12. #14510 by Guillaume Lemaitre.Fix Fixed a bug in
compose.TransformedTargetRegressor
which did not pass**fit_params
to the underlying regressor. #14890 by Miguel Cabrera.Fix The
compose.ColumnTransformer
now requires the number of features to be consistent betweenfit
andtransform
. AFutureWarning
is raised now, and this will raise an error in 0.24. If the number of features isn’t consistent and negative indexing is used, an error is raised. #14544 by Adrin Jalali.
sklearn.cross_decomposition
#
Feature
cross_decomposition.PLSCanonical
andcross_decomposition.PLSRegression
have a new functioninverse_transform
to transform data to the original space. #15304 by Jaime Ferrando Huertas.Enhancement
decomposition.KernelPCA
now properly checks the eigenvalues found by the solver for numerical or conditioning issues. This ensures consistency of results across solvers (different choices foreigen_solver
), including approximate solvers such as'randomized'
and'lobpcg'
(see #12068). #12145 by Sylvain MariéFix Fixed a bug where
cross_decomposition.PLSCanonical
andcross_decomposition.PLSRegression
were raising an error when fitted with a target matrixY
in which the first column was constant. #13609 by Camila Williamson.Fix
cross_decomposition.CCA
now produces the same results with scipy 1.3 and previous scipy versions. #15661 by Thomas Fan.
sklearn.datasets
#
Feature
datasets.fetch_openml
now supports heterogeneous data using pandas by settingas_frame=True
. #13902 by Thomas Fan.Feature
datasets.fetch_openml
now includes thetarget_names
in the returned Bunch. #15160 by Thomas Fan.Enhancement The parameter
return_X_y
was added todatasets.fetch_20newsgroups
anddatasets.fetch_olivetti_faces
. #14259 by Sourav Singh.Enhancement
datasets.make_classification
now accepts array-likeweights
parameter, i.e. list or numpy.array, instead of list only. #14764 by Cat Chenal.- Enhancement The parameter
normalize
was added to datasets.fetch_20newsgroups_vectorized
. #14740 by Stéphan Tulkens
- Enhancement The parameter
Fix Fixed a bug in
datasets.fetch_openml
, which failed to load an OpenML dataset that contains an ignored feature. #14623 by Sarra Habchi.
sklearn.decomposition
#
Efficiency
decomposition.NMF
withsolver="mu"
fitted on sparse input matrices now uses batching to avoid briefly allocating an array with size (#non-zero elements, n_components). #15257 by Mart Willocx.Enhancement
decomposition.dict_learning
anddecomposition.dict_learning_online
now acceptmethod_max_iter
and pass it todecomposition.sparse_encode
. #12650 by Adrin Jalali.Enhancement
decomposition.SparseCoder
,decomposition.DictionaryLearning
, anddecomposition.MiniBatchDictionaryLearning
now take atransform_max_iter
parameter and pass it to eitherdecomposition.dict_learning
ordecomposition.sparse_encode
. #12650 by Adrin Jalali.Enhancement
decomposition.IncrementalPCA
now accepts sparse matrices as input, converting them to dense in batches thereby avoiding the need to store the entire dense matrix at once. #13960 by Scott Gigante.Fix
decomposition.sparse_encode
now passes themax_iter
to the underlyinglinear_model.LassoLars
whenalgorithm='lasso_lars'
. #12650 by Adrin Jalali.
sklearn.dummy
#
Fix
dummy.DummyClassifier
now handles checking the existence of the provided constant in multiouput cases. #14908 by Martina G. Vilas.API Change The default value of the
strategy
parameter indummy.DummyClassifier
will change from'stratified'
in version 0.22 to'prior'
in 0.24. A FutureWarning is raised when the default value is used. #15382 by Thomas Fan.API Change The
outputs_2d_
attribute is deprecated indummy.DummyClassifier
anddummy.DummyRegressor
. It is equivalent ton_outputs > 1
. #14933 by Nicolas Hug
sklearn.ensemble
#
Major Feature Added
ensemble.StackingClassifier
andensemble.StackingRegressor
to stack predictors using a final classifier or regressor. #11047 by Guillaume Lemaitre and Caio Oliveira and #15138 by Jon Cusick..Major Feature Many improvements were made to
ensemble.HistGradientBoostingClassifier
andensemble.HistGradientBoostingRegressor
:Feature Estimators now natively support dense data with missing values both for training and predicting. They also support infinite values. #13911 and #14406 by Nicolas Hug, Adrin Jalali and Olivier Grisel.
Feature Estimators now have an additional
warm_start
parameter that enables warm starting. #14012 by Johann Faouzi.Feature
inspection.partial_dependence
andinspection.plot_partial_dependence
now support the fast ‘recursion’ method for both estimators. #13769 by Nicolas Hug.Enhancement for
ensemble.HistGradientBoostingClassifier
the training loss or score is now monitored on a class-wise stratified subsample to preserve the class balance of the original training set. #14194 by Johann Faouzi.Enhancement
ensemble.HistGradientBoostingRegressor
now supports the ‘least_absolute_deviation’ loss. #13896 by Nicolas Hug.Fix Estimators now bin the training and validation data separately to avoid any data leak. #13933 by Nicolas Hug.
Fix Fixed a bug where early stopping would break with string targets. #14710 by Guillaume Lemaitre.
Fix
ensemble.HistGradientBoostingClassifier
now raises an error ifcategorical_crossentropy
loss is given for a binary classification problem. #14869 by Adrin Jalali.
Note that pickles from 0.21 will not work in 0.22.
Enhancement Addition of
max_samples
argument allows limiting size of bootstrap samples to be less than size of dataset. Added toensemble.RandomForestClassifier
,ensemble.RandomForestRegressor
,ensemble.ExtraTreesClassifier
,ensemble.ExtraTreesRegressor
. #14682 by Matt Hancock and #5963 by Pablo Duboue.Fix
ensemble.VotingClassifier.predict_proba
will no longer be present whenvoting='hard'
. #14287 by Thomas Fan.Fix The
named_estimators_
attribute inensemble.VotingClassifier
andensemble.VotingRegressor
now correctly maps to dropped estimators. Previously, thenamed_estimators_
mapping was incorrect whenever one of the estimators was dropped. #15375 by Thomas Fan.Fix Run by default
utils.estimator_checks.check_estimator
on bothensemble.VotingClassifier
andensemble.VotingRegressor
. It leads to solve issues regarding shape consistency duringpredict
which was failing when the underlying estimators were not outputting consistent array dimensions. Note that it should be replaced by refactoring the common tests in the future. #14305 by Guillaume Lemaitre.Fix
ensemble.AdaBoostClassifier
computes probabilities based on the decision function as in the literature. Thus,predict
andpredict_proba
give consistent results. #14114 by Guillaume Lemaitre.Fix Stacking and Voting estimators now ensure that their underlying estimators are either all classifiers or all regressors.
ensemble.StackingClassifier
,ensemble.StackingRegressor
, andensemble.VotingClassifier
andensemble.VotingRegressor
now raise consistent error messages. #15084 by Guillaume Lemaitre.Fix
ensemble.AdaBoostRegressor
where the loss should be normalized by the max of the samples with non-null weights only. #14294 by Guillaume Lemaitre.API Change
presort
is now deprecated inensemble.GradientBoostingClassifier
andensemble.GradientBoostingRegressor
, and the parameter has no effect. Users are recommended to useensemble.HistGradientBoostingClassifier
andensemble.HistGradientBoostingRegressor
instead. #14907 by Adrin Jalali.
sklearn.feature_extraction
#
Enhancement A warning will now be raised if a parameter choice means that another parameter will be unused on calling the fit() method for
feature_extraction.text.HashingVectorizer
,feature_extraction.text.CountVectorizer
andfeature_extraction.text.TfidfVectorizer
. #14602 by Gaurav Chawla.Fix Functions created by
build_preprocessor
andbuild_analyzer
offeature_extraction.text.VectorizerMixin
can now be pickled. #14430 by Dillon Niederhut.Fix
feature_extraction.text.strip_accents_unicode
now correctly removes accents from strings that are in NFKD normalized form. #15100 by Daniel Grady.Fix Fixed a bug that caused
feature_extraction.DictVectorizer
to raise anOverflowError
during thetransform
operation when producing ascipy.sparse
matrix on large input data. #15463 by Norvan Sahiner.API Change Deprecated unused
copy
param forfeature_extraction.text.TfidfVectorizer.transform
it will be removed in v0.24. #14520 by Guillem G. Subies.
sklearn.feature_selection
#
Enhancement Updated the following
sklearn.feature_selection
estimators to allow NaN/Inf values intransform
andfit
:feature_selection.RFE
,feature_selection.RFECV
,feature_selection.SelectFromModel
, andfeature_selection.VarianceThreshold
. Note that if the underlying estimator of the feature selector does not allow NaN/Inf then it will still error, but the feature selectors themselves no longer enforce this restriction unnecessarily. #11635 by Alec Peters.Fix Fixed a bug where
feature_selection.VarianceThreshold
withthreshold=0
did not remove constant features due to numerical instability, by using range rather than variance in this case. #13704 by Roddy MacSween.
sklearn.gaussian_process
#
Feature Gaussian process models on structured data:
gaussian_process.GaussianProcessRegressor
andgaussian_process.GaussianProcessClassifier
can now accept a list of generic objects (e.g. strings, trees, graphs, etc.) as theX
argument to their training/prediction methods. A user-defined kernel should be provided for computing the kernel matrix among the generic objects, and should inherit fromgaussian_process.kernels.GenericKernelMixin
to notify the GPR/GPC model that it handles non-vectorial samples. #15557 by Yu-Hang Tang.Efficiency
gaussian_process.GaussianProcessClassifier.log_marginal_likelihood
andgaussian_process.GaussianProcessRegressor.log_marginal_likelihood
now accept aclone_kernel=True
keyword argument. When set toFalse
, the kernel attribute is modified, but may result in a performance improvement. #14378 by Masashi Shibata.API Change From version 0.24
gaussian_process.kernels.Kernel.get_params
will raise anAttributeError
rather than returnNone
for parameters that are in the estimator’s constructor but not stored as attributes on the instance. #14464 by Joel Nothman.
sklearn.impute
#
Major Feature Added
impute.KNNImputer
, to impute missing values using k-Nearest Neighbors. #12852 by Ashim Bhattarai and Thomas Fan and #15010 by Guillaume Lemaitre.Feature
impute.IterativeImputer
has newskip_compute
flag that is False by default, which, when True, will skip computation on features that have no missing values during the fit phase. #13773 by Sergey Feldman.Efficiency
impute.MissingIndicator.fit_transform
avoid repeated computation of the masked matrix. #14356 by Harsh Soni.Fix
impute.IterativeImputer
now works when there is only one feature. By Sergey Feldman.Fix Fixed a bug in
impute.IterativeImputer
where features where imputed in the reverse desired order withimputation_order
either"ascending"
or"descending"
. #15393 by Venkatachalam N.
sklearn.inspection
#
Major Feature
inspection.permutation_importance
has been added to measure the importance of each feature in an arbitrary trained model with respect to a given scoring function. #13146 by Thomas Fan.Feature
inspection.partial_dependence
andinspection.plot_partial_dependence
now support the fast ‘recursion’ method forensemble.HistGradientBoostingClassifier
andensemble.HistGradientBoostingRegressor
. #13769 by Nicolas Hug.Enhancement
inspection.plot_partial_dependence
has been extended to now support the new visualization API described in the User Guide. #14646 by Thomas Fan.Enhancement
inspection.partial_dependence
accepts pandas DataFrame andpipeline.Pipeline
containingcompose.ColumnTransformer
. In additioninspection.plot_partial_dependence
will use the column names by default when a dataframe is passed. #14028 and #15429 by Guillaume Lemaitre.
sklearn.kernel_approximation
#
Fix Fixed a bug where
kernel_approximation.Nystroem
raised aKeyError
when usingkernel="precomputed"
. #14706 by Venkatachalam N.
sklearn.linear_model
#
Efficiency The ‘liblinear’ logistic regression solver is now faster and requires less memory. #14108, #14170, #14296 by Alex Henrie.
Enhancement
linear_model.BayesianRidge
now accepts hyperparametersalpha_init
andlambda_init
which can be used to set the initial value of the maximization procedure in fit. #13618 by Yoshihiro Uchida.Fix
linear_model.Ridge
now correctly fits an intercept whenX
is sparse,solver="auto"
andfit_intercept=True
, because the default solver in this configuration has changed tosparse_cg
, which can fit an intercept with sparse data. #13995 by Jérôme Dockès.Fix
linear_model.Ridge
withsolver='sag'
now accepts F-ordered and non-contiguous arrays and makes a conversion instead of failing. #14458 by Guillaume Lemaitre.Fix
linear_model.LassoCV
no longer forcesprecompute=False
when fitting the final model. #14591 by Andreas Müller.Fix
linear_model.RidgeCV
andlinear_model.RidgeClassifierCV
now correctly scores whencv=None
. #14864 by Venkatachalam N.Fix Fixed a bug in
linear_model.LogisticRegressionCV
where thescores_
,n_iter_
andcoefs_paths_
attribute would have a wrong ordering withpenalty='elastic-net'
. #15044 by Nicolas HugFix
linear_model.MultiTaskLassoCV
andlinear_model.MultiTaskElasticNetCV
with X of dtype int andfit_intercept=True
. #15086 by Alex Gramfort.Fix The liblinear solver now supports
sample_weight
. #15038 by Guillaume Lemaitre.
sklearn.manifold
#
Feature
manifold.Isomap
,manifold.TSNE
, andmanifold.SpectralEmbedding
now accept precomputed sparse neighbors graph as input. #10482 by Tom Dupre la Tour and Kumar Ashutosh.Feature Exposed the
n_jobs
parameter inmanifold.TSNE
for multi-core calculation of the neighbors graph. This parameter has no impact whenmetric="precomputed"
or (metric="euclidean"
andmethod="exact"
). #15082 by Roman Yurchak.Efficiency Improved efficiency of
manifold.TSNE
whenmethod="barnes-hut"
by computing the gradient in parallel. #13213 by Thomas MoreauFix Fixed a bug where
manifold.spectral_embedding
(and thereforemanifold.SpectralEmbedding
andcluster.SpectralClustering
) computed wrong eigenvalues witheigen_solver='amg'
whenn_samples < 5 * n_components
. #14647 by Andreas Müller.Fix Fixed a bug in
manifold.spectral_embedding
used inmanifold.SpectralEmbedding
andcluster.SpectralClustering
whereeigen_solver="amg"
would sometimes result in a LinAlgError. #13393 by Andrew Knyazev #13707 by Scott WhiteAPI Change Deprecate
training_data_
unused attribute inmanifold.Isomap
. #10482 by Tom Dupre la Tour.
sklearn.metrics
#
Major Feature
metrics.plot_roc_curve
has been added to plot roc curves. This function introduces the visualization API described in the User Guide. #14357 by Thomas Fan.Feature Added a new parameter
zero_division
to multiple classification metrics:metrics.precision_score
,metrics.recall_score
,metrics.f1_score
,metrics.fbeta_score
,metrics.precision_recall_fscore_support
,metrics.classification_report
. This allows to set returned value for ill-defined metrics. #14900 by Marc Torrellas Socastro.Feature Added the
metrics.pairwise.nan_euclidean_distances
metric, which calculates euclidean distances in the presence of missing values. #12852 by Ashim Bhattarai and Thomas Fan.Feature New ranking metrics
metrics.ndcg_score
andmetrics.dcg_score
have been added to compute Discounted Cumulative Gain and Normalized Discounted Cumulative Gain. #9951 by Jérôme Dockès.Feature
metrics.plot_precision_recall_curve
has been added to plot precision recall curves. #14936 by Thomas Fan.Feature
metrics.plot_confusion_matrix
has been added to plot confusion matrices. #15083 by Thomas Fan.Feature Added multiclass support to
metrics.roc_auc_score
with corresponding scorers'roc_auc_ovr'
,'roc_auc_ovo'
,'roc_auc_ovr_weighted'
, and'roc_auc_ovo_weighted'
. #12789 and #15274 by Kathy Chen, Mohamed Maskani, and Thomas Fan.Feature Add
metrics.mean_tweedie_deviance
measuring the Tweedie deviance for a givenpower
parameter. Also add mean Poisson deviancemetrics.mean_poisson_deviance
and mean Gamma deviancemetrics.mean_gamma_deviance
that are special cases of the Tweedie deviance forpower=1
andpower=2
respectively. #13938 by Christian Lorentzen and Roman Yurchak.Efficiency Improved performance of
metrics.pairwise.manhattan_distances
in the case of sparse matrices. #15049 byPaolo Toccaceli <ptocca>
.Enhancement The parameter
beta
inmetrics.fbeta_score
is updated to accept the zero andfloat('+inf')
value. #13231 by Dong-hee Na.Enhancement Added parameter
squared
inmetrics.mean_squared_error
to return root mean squared error. #13467 by Urvang Patel.Enhancement Allow computing averaged metrics in the case of no true positives. #14595 by Andreas Müller.
Enhancement Multilabel metrics now supports list of lists as input. #14865 Srivatsan Ramesh, Herilalaina Rakotoarison, Léonard Binet.
Enhancement
metrics.median_absolute_error
now supportsmultioutput
parameter. #14732 by Agamemnon Krasoulis.Enhancement ‘roc_auc_ovr_weighted’ and ‘roc_auc_ovo_weighted’ can now be used as the scoring parameter of model-selection tools. #14417 by Thomas Fan.
Enhancement
metrics.confusion_matrix
accepts a parametersnormalize
allowing to normalize the confusion matrix by column, rows, or overall. #15625 byGuillaume Lemaitre <glemaitre>
.Fix Raise a ValueError in
metrics.silhouette_score
when a precomputed distance matrix contains non-zero diagonal entries. #12258 by Stephen Tierney.API Change
scoring="neg_brier_score"
should be used instead ofscoring="brier_score_loss"
which is now deprecated. #14898 by Stefan Matcovici.
sklearn.model_selection
#
Efficiency Improved performance of multimetric scoring in
model_selection.cross_validate
,model_selection.GridSearchCV
, andmodel_selection.RandomizedSearchCV
. #14593 by Thomas Fan.Enhancement
model_selection.learning_curve
now accepts parameterreturn_times
which can be used to retrieve computation times in order to plot model scalability (see learning_curve example). #13938 by Hadrien Reboul.Enhancement
model_selection.RandomizedSearchCV
now accepts lists of parameter distributions. #14549 by Andreas Müller.Fix Reimplemented
model_selection.StratifiedKFold
to fix an issue where one test set could ben_classes
larger than another. Test sets should now be near-equally sized. #14704 by Joel Nothman.Fix The
cv_results_
attribute ofmodel_selection.GridSearchCV
andmodel_selection.RandomizedSearchCV
now only contains unfitted estimators. This potentially saves a lot of memory since the state of the estimators isn’t stored. ##15096 by Andreas Müller.API Change
model_selection.KFold
andmodel_selection.StratifiedKFold
now raise a warning ifrandom_state
is set butshuffle
is False. This will raise an error in 0.24.
sklearn.multioutput
#
Fix
multioutput.MultiOutputClassifier
now has attributeclasses_
. #14629 by Agamemnon Krasoulis.Fix
multioutput.MultiOutputClassifier
now haspredict_proba
as property and can be checked withhasattr
. #15488 #15490 by Rebekah Kim
sklearn.naive_bayes
#
Major Feature Added
naive_bayes.CategoricalNB
that implements the Categorical Naive Bayes classifier. #12569 by Tim Bicker and Florian Wilhelm.
sklearn.neighbors
#
Major Feature Added
neighbors.KNeighborsTransformer
andneighbors.RadiusNeighborsTransformer
, which transform input dataset into a sparse neighbors graph. They give finer control on nearest neighbors computations and enable easy pipeline caching for multiple use. #10482 by Tom Dupre la Tour.Feature
neighbors.KNeighborsClassifier
,neighbors.KNeighborsRegressor
,neighbors.RadiusNeighborsClassifier
,neighbors.RadiusNeighborsRegressor
, andneighbors.LocalOutlierFactor
now accept precomputed sparse neighbors graph as input. #10482 by Tom Dupre la Tour and Kumar Ashutosh.Feature
neighbors.RadiusNeighborsClassifier
now supports predicting probabilities by usingpredict_proba
and supports more outlier_label options: ‘most_frequent’, or different outlier_labels for multi-outputs. #9597 by Wenbo Zhao.Efficiency Efficiency improvements for
neighbors.RadiusNeighborsClassifier.predict
. #9597 by Wenbo Zhao.Fix
neighbors.KNeighborsRegressor
now throws error whenmetric='precomputed'
and fit on non-square data. #14336 by Gregory Dexter.
sklearn.neural_network
#
Feature Add
max_fun
parameter inneural_network.BaseMultilayerPerceptron
,neural_network.MLPRegressor
, andneural_network.MLPClassifier
to give control over maximum number of function evaluation to not meettol
improvement. #9274 by Daniel Perry.
sklearn.pipeline
#
Enhancement
pipeline.Pipeline
now supports score_samples if the final estimator does. #13806 by Anaël Beaugnon.Fix The
fit
inFeatureUnion
now acceptsfit_params
to pass to the underlying transformers. #15119 by Adrin Jalali.API Change
None
as a transformer is now deprecated inpipeline.FeatureUnion
. Please use'drop'
instead. #15053 by Thomas Fan.
sklearn.preprocessing
#
Efficiency
preprocessing.PolynomialFeatures
is now faster when the input data is dense. #13290 by Xavier Dupré.Enhancement Avoid unnecessary data copy when fitting preprocessors
preprocessing.StandardScaler
,preprocessing.MinMaxScaler
,preprocessing.MaxAbsScaler
,preprocessing.RobustScaler
andpreprocessing.QuantileTransformer
which results in a slight performance improvement. #13987 by Roman Yurchak.Fix KernelCenterer now throws error when fit on non-square
preprocessing.KernelCenterer
#14336 by Gregory Dexter.
sklearn.model_selection
#
Fix
model_selection.GridSearchCV
andmodel_selection.RandomizedSearchCV
now supports the_pairwise
property, which prevents an error during cross-validation for estimators with pairwise inputs (such asneighbors.KNeighborsClassifier
when metric is set to ‘precomputed’). #13925 by Isaac S. Robson and #15524 by Xun Tang.
sklearn.svm
#
Enhancement
svm.SVC
andsvm.NuSVC
now accept abreak_ties
parameter. This parameter results in predict breaking the ties according to the confidence values of decision_function, ifdecision_function_shape='ovr'
, and the number of target classes > 2. #12557 by Adrin Jalali.Enhancement SVM estimators now throw a more specific error when
kernel='precomputed'
and fit on non-square data. #14336 by Gregory Dexter.Fix
svm.SVC
,svm.SVR
,svm.NuSVR
andsvm.OneClassSVM
when received values negative or zero for parametersample_weight
in method fit(), generated an invalid model. This behavior occurred only in some border scenarios. Now in these cases, fit() will fail with an Exception. #14286 by Alex Shacked.Fix The
n_support_
attribute ofsvm.SVR
andsvm.OneClassSVM
was previously non-initialized, and had size 2. It has now size 1 with the correct value. #15099 by Nicolas Hug.Fix fixed a bug in
BaseLibSVM._sparse_fit
where n_SV=0 raised a ZeroDivisionError. #14894 by Danna Naser.Fix The liblinear solver now supports
sample_weight
. #15038 by Guillaume Lemaitre.
sklearn.tree
#
Feature Adds minimal cost complexity pruning, controlled by
ccp_alpha
, totree.DecisionTreeClassifier
,tree.DecisionTreeRegressor
,tree.ExtraTreeClassifier
,tree.ExtraTreeRegressor
,ensemble.RandomForestClassifier
,ensemble.RandomForestRegressor
,ensemble.ExtraTreesClassifier
,ensemble.ExtraTreesRegressor
,ensemble.GradientBoostingClassifier
, andensemble.GradientBoostingRegressor
. #12887 by Thomas Fan.API Change
presort
is now deprecated intree.DecisionTreeClassifier
andtree.DecisionTreeRegressor
, and the parameter has no effect. #14907 by Adrin Jalali.API Change The
classes_
andn_classes_
attributes oftree.DecisionTreeRegressor
are now deprecated. #15028 by Mei Guan, Nicolas Hug, and Adrin Jalali.
sklearn.utils
#
Feature
check_estimator
can now generate checks by settinggenerate_only=True
. Previously, runningcheck_estimator
will stop when the first check fails. Withgenerate_only=True
, all checks can run independently and report the ones that are failing. Read more in Rolling your own estimator. #14381 by Thomas Fan.Feature Added a pytest specific decorator,
parametrize_with_checks
, to parametrize estimator checks for a list of estimators. #14381 by Thomas Fan.Feature A new random variable,
utils.fixes.loguniform
implements a log-uniform random variable (e.g., for use in RandomizedSearchCV). For example, the outcomes1
,10
and100
are all equally likely forloguniform(1, 100)
. See #11232 by Scott Sievert and Nathaniel Saul, andSciPy PR 10815 <https://github.com/scipy/scipy/pull/10815>
.Enhancement
utils.safe_indexing
(now deprecated) accepts anaxis
parameter to index array-like across rows and columns. The column indexing can be done on NumPy array, SciPy sparse matrix, and Pandas DataFrame. An additional refactoring was done. #14035 and #14475 by Guillaume Lemaitre.Enhancement
utils.extmath.safe_sparse_dot
works between 3D+ ndarray and sparse matrix. #14538 by Jérémie du Boisberranger.Fix
utils.check_array
is now raising an error instead of casting NaN to integer. #14872 by Roman Yurchak.Fix
utils.check_array
will now correctly detect numeric dtypes in pandas dataframes, fixing a bug wherefloat32
was upcast tofloat64
unnecessarily. #15094 by Andreas Müller.API Change The following utils have been deprecated and are now private:
choose_check_classifiers_labels
enforce_estimator_tags_y
mocking.MockDataFrame
mocking.CheckingClassifier
optimize.newton_cg
random.random_choice_csc
utils.choose_check_classifiers_labels
utils.enforce_estimator_tags_y
utils.optimize.newton_cg
utils.random.random_choice_csc
utils.safe_indexing
utils.mocking
utils.fast_dict
utils.seq_dataset
utils.weight_vector
utils.fixes.parallel_helper
(removed)All of
utils.testing
except forall_estimators
which is now inutils
.
sklearn.isotonic
#
Fix Fixed a bug where
isotonic.IsotonicRegression.fit
raised error whenX.dtype == 'float32'
andX.dtype != y.dtype
. #14902 by Lucas.
Miscellaneous#
Fix Port
lobpcg
from SciPy which implement some bug fixes but only available in 1.3+. #13609 and #14971 by Guillaume Lemaitre.API Change Scikit-learn now converts any input data structure implementing a duck array to a numpy array (using
__array__
) to ensure consistent behavior instead of relying on__array_function__
(see NEP 18). #14702 by Andreas Müller.API Change Replace manual checks with
check_is_fitted
. Errors thrown when using a non-fitted estimators are now more uniform. #13013 by Agamemnon Krasoulis.
Changes to estimator checks#
These changes mostly affect library developers.
Estimators are now expected to raise a
NotFittedError
ifpredict
ortransform
is called beforefit
; previously anAttributeError
orValueError
was acceptable. #13013 by by Agamemnon Krasoulis.Binary only classifiers are now supported in estimator checks. Such classifiers need to have the
binary_only=True
estimator tag. #13875 by Trevor Stephens.Estimators are expected to convert input data (
X
,y
,sample_weights
) tonumpy.ndarray
and never call__array_function__
on the original datatype that is passed (see NEP 18). #14702 by Andreas Müller.requires_positive_X
estimator tag (for models that require X to be non-negative) is now used byutils.estimator_checks.check_estimator
to make sure a proper error message is raised if X contains some negative entries. #14680 by Alex Gramfort.Added check that pairwise estimators raise error on non-square data #14336 by Gregory Dexter.
Added two common multioutput estimator tests
utils.estimator_checks.check_classifier_multioutput
andutils.estimator_checks.check_regressor_multioutput
. #13392 by Rok Mihevc.Fix Added
check_transformer_data_not_an_array
to checks where missingFix The estimators tags resolution now follows the regular MRO. They used to be overridable only once. #14884 by Andreas Müller.
Code and documentation contributors
Thanks to everyone who has contributed to the maintenance and improvement of the project since version 0.21, including:
Aaron Alphonsus, Abbie Popa, Abdur-Rahmaan Janhangeer, abenbihi, Abhinav Sagar, Abhishek Jana, Abraham K. Lagat, Adam J. Stewart, Aditya Vyas, Adrin Jalali, Agamemnon Krasoulis, Alec Peters, Alessandro Surace, Alexandre de Siqueira, Alexandre Gramfort, alexgoryainov, Alex Henrie, Alex Itkes, alexshacked, Allen Akinkunle, Anaël Beaugnon, Anders Kaseorg, Andrea Maldonado, Andrea Navarrete, Andreas Mueller, Andreas Schuderer, Andrew Nystrom, Angela Ambroz, Anisha Keshavan, Ankit Jha, Antonio Gutierrez, Anuja Kelkar, Archana Alva, arnaudstiegler, arpanchowdhry, ashimb9, Ayomide Bamidele, Baran Buluttekin, barrycg, Bharat Raghunathan, Bill Mill, Biswadip Mandal, blackd0t, Brian G. Barkley, Brian Wignall, Bryan Yang, c56pony, camilaagw, cartman_nabana, catajara, Cat Chenal, Cathy, cgsavard, Charles Vesteghem, Chiara Marmo, Chris Gregory, Christian Lorentzen, Christos Aridas, Dakota Grusak, Daniel Grady, Daniel Perry, Danna Naser, DatenBergwerk, David Dormagen, deeplook, Dillon Niederhut, Dong-hee Na, Dougal J. Sutherland, DrGFreeman, Dylan Cashman, edvardlindelof, Eric Larson, Eric Ndirangu, Eunseop Jeong, Fanny, federicopisanu, Felix Divo, flaviomorelli, FranciDona, Franco M. Luque, Frank Hoang, Frederic Haase, g0g0gadget, Gabriel Altay, Gabriel do Vale Rios, Gael Varoquaux, ganevgv, gdex1, getgaurav2, Gideon Sonoiya, Gordon Chen, gpapadok, Greg Mogavero, Grzegorz Szpak, Guillaume Lemaitre, Guillem García Subies, H4dr1en, hadshirt, Hailey Nguyen, Hanmin Qin, Hannah Bruce Macdonald, Harsh Mahajan, Harsh Soni, Honglu Zhang, Hossein Pourbozorg, Ian Sanders, Ingrid Spielman, J-A16, jaehong park, Jaime Ferrando Huertas, James Hill, James Myatt, Jay, jeremiedbb, Jérémie du Boisberranger, jeromedockes, Jesper Dramsch, Joan Massich, Joanna Zhang, Joel Nothman, Johann Faouzi, Jonathan Rahn, Jon Cusick, Jose Ortiz, Kanika Sabharwal, Katarina Slama, kellycarmody, Kennedy Kang’ethe, Kensuke Arai, Kesshi Jordan, Kevad, Kevin Loftis, Kevin Winata, Kevin Yu-Sheng Li, Kirill Dolmatov, Kirthi Shankar Sivamani, krishna katyal, Lakshmi Krishnan, Lakshya KD, LalliAcqua, lbfin, Leland McInnes, Léonard Binet, Loic Esteve, loopyme, lostcoaster, Louis Huynh, lrjball, Luca Ionescu, Lutz Roeder, MaggieChege, Maithreyi Venkatesh, Maltimore, Maocx, Marc Torrellas, Marie Douriez, Markus, Markus Frey, Martina G. Vilas, Martin Oywa, Martin Thoma, Masashi SHIBATA, Maxwell Aladago, mbillingr, m-clare, Meghann Agarwal, m.fab, Micah Smith, miguelbarao, Miguel Cabrera, Mina Naghshhnejad, Ming Li, motmoti, mschaffenroth, mthorrell, Natasha Borders, nezar-a, Nicolas Hug, Nidhin Pattaniyil, Nikita Titov, Nishan Singh Mann, Nitya Mandyam, norvan, notmatthancock, novaya, nxorable, Oleg Stikhin, Oleksandr Pavlyk, Olivier Grisel, Omar Saleem, Owen Flanagan, panpiort8, Paolo, Paolo Toccaceli, Paresh Mathur, Paula, Peng Yu, Peter Marko, pierretallotte, poorna-kumar, pspachtholz, qdeffense, Rajat Garg, Raphaël Bournhonesque, Ray, Ray Bell, Rebekah Kim, Reza Gharibi, Richard Payne, Richard W, rlms, Robert Juergens, Rok Mihevc, Roman Feldbauer, Roman Yurchak, R Sanjabi, RuchitaGarde, Ruth Waithera, Sackey, Sam Dixon, Samesh Lakhotia, Samuel Taylor, Sarra Habchi, Scott Gigante, Scott Sievert, Scott White, Sebastian Pölsterl, Sergey Feldman, SeWook Oh, she-dares, Shreya V, Shubham Mehta, Shuzhe Xiao, SimonCW, smarie, smujjiga, Sönke Behrends, Soumirai, Sourav Singh, stefan-matcovici, steinfurt, Stéphane Couvreur, Stephan Tulkens, Stephen Cowley, Stephen Tierney, SylvainLan, th0rwas, theoptips, theotheo, Thierno Ibrahima DIOP, Thomas Edwards, Thomas J Fan, Thomas Moreau, Thomas Schmitt, Tilen Kusterle, Tim Bicker, Timsaur, Tim Staley, Tirth Patel, Tola A, Tom Augspurger, Tom Dupré la Tour, topisan, Trevor Stephens, ttang131, Urvang Patel, Vathsala Achar, veerlosar, Venkatachalam N, Victor Luzgin, Vincent Jeanselme, Vincent Lostanlen, Vladimir Korolev, vnherdeiro, Wenbo Zhao, Wendy Hu, willdarnell, William de Vazelhes, wolframalpha, xavier dupré, xcjason, x-martian, xsat, xun-tang, Yinglr, yokasre, Yu-Hang “Maxin” Tang, Yulia Zamriy, Zhao Feng