Version 0.23#
For a short description of the main highlights of the release, please refer to Release Highlights for scikit-learn 0.23.
Legend for changelogs
Major Feature something big that you couldn’t do before.
Feature something that you couldn’t do before.
Efficiency an existing feature now may not require as much computation or memory.
Enhancement a miscellaneous minor improvement.
Fix something that previously didn’t work as documented – or according to reasonable expectations – should now work.
API Change you will need to change your code to have the same effect in the future; or a feature will be removed in the future.
Version 0.23.2#
Changed models#
The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.
Fix
inertia_
attribute ofcluster.KMeans
andcluster.MiniBatchKMeans
.
Details are listed in the changelog below.
(While we are trying to better inform users by providing this information, we cannot assure that this list is complete.)
Changelog#
sklearn.cluster
#
Fix Fixed a bug in
cluster.KMeans
where rounding errors could prevent convergence to be declared whentol=0
. #17959 by Jérémie du Boisberranger.Fix Fixed a bug in
cluster.KMeans
andcluster.MiniBatchKMeans
where the reported inertia was incorrectly weighted by the sample weights. #17848 by Jérémie du Boisberranger.Fix Fixed a bug in
cluster.MeanShift
withbin_seeding=True
. When the estimated bandwidth is 0, the behavior is equivalent tobin_seeding=False
. #17742 by Jeremie du Boisberranger.Fix Fixed a bug in
cluster.AffinityPropagation
, that gives incorrect clusters when the array dtype is float32. #17995 by Thomaz Santana and Amanda Dsouza.
sklearn.decomposition
#
Fix Fixed a bug in
decomposition.MiniBatchDictionaryLearning.partial_fit
which should update the dictionary by iterating only once over a mini-batch. #17433 by Chiara Marmo.Fix Avoid overflows on Windows in
decomposition.IncrementalPCA.partial_fit
for largebatch_size
andn_samples
values. #17985 by Alan Butler and Amanda Dsouza.
sklearn.ensemble
#
Fix Fixed bug in
ensemble.MultinomialDeviance
where the average of logloss was incorrectly calculated as sum of logloss. #17694 by Markus Rempfler and Tsutomu Kusanagi.Fix Fixes
ensemble.StackingClassifier
andensemble.StackingRegressor
compatibility with estimators that do not definen_features_in_
. #17357 by Thomas Fan.
sklearn.feature_extraction
#
Fix Fixes bug in
feature_extraction.text.CountVectorizer
where sample order invariance was broken whenmax_features
was set and features had the same count. #18016 by Thomas Fan, Roman Yurchak, and Joel Nothman.
sklearn.linear_model
#
Fix
linear_model.lars_path
does not overwriteX
whenX_copy=True
andGram='auto'
. #17914 by Thomas Fan.
sklearn.manifold
#
Fix Fixed a bug where
metrics.pairwise_distances
would raise an error ifmetric='seuclidean'
andX
is not typenp.float64
. #15730 by Forrest Koch.
sklearn.metrics
#
Fix Fixed a bug in
metrics.mean_squared_error
where the average of multiple RMSE values was incorrectly calculated as the root of the average of multiple MSE values. #17309 by Swier Heeres.
sklearn.pipeline
#
Fix
pipeline.FeatureUnion
raises a deprecation warning whenNone
is included intransformer_list
. #17360 by Thomas Fan.
sklearn.utils
#
Fix Fix
utils.estimator_checks.check_estimator
so that all test cases support thebinary_only
estimator tag. #17812 by Bruno Charron.
Version 0.23.1#
May 18 2020
Changelog#
sklearn.cluster
#
Efficiency
cluster.KMeans
efficiency has been improved for very small datasets. In particular it cannot spawn idle threads any more. #17210 and #17235 by Jeremie du Boisberranger.Fix Fixed a bug in
cluster.KMeans
where the sample weights provided by the user were modified in place. #17204 by Jeremie du Boisberranger.
Miscellaneous#
Fix Fixed a bug in the
repr
of third-party estimators that use a**kwargs
parameter in their constructor, whenchanged_only
is True which is now the default. #17205 by Nicolas Hug.
Version 0.23.0#
May 12 2020
Enforcing keyword-only arguments#
In an effort to promote clear and non-ambiguous use of the library, most
constructor and function parameters are now expected to be passed as keyword
arguments (i.e. using the param=value
syntax) instead of positional. To
ease the transition, a FutureWarning
is raised if a keyword-only parameter
is used as positional. In version 1.0 (renaming of 0.25), these parameters
will be strictly keyword-only, and a TypeError
will be raised.
#15005 by Joel Nothman, Adrin Jalali, Thomas Fan, and
Nicolas Hug. See SLEP009
for more details.
Changed models#
The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.
Fix
ensemble.BaggingClassifier
,ensemble.BaggingRegressor
, andensemble.IsolationForest
.Fix
cluster.KMeans
withalgorithm="elkan"
andalgorithm="full"
.Fix
cluster.Birch
Fix
compose.ColumnTransformer.get_feature_names
Fix
decomposition.PCA
withn_components='mle'
Enhancement
decomposition.NMF
anddecomposition.non_negative_factorization
with float32 dtype input.API Change
ensemble.HistGradientBoostingClassifier
andensemble.HistGradientBoostingRegressor
Fix
estimator_samples_
inensemble.BaggingClassifier
,ensemble.BaggingRegressor
andensemble.IsolationForest
Fix
ensemble.StackingClassifier
andensemble.StackingRegressor
withsample_weight
Fix
linear_model.RANSACRegressor
withsample_weight
.Fix
metrics.mean_squared_error
withsquared
andmultioutput='raw_values'
.Fix
metrics.mutual_info_score
with negative scores.Fix
metrics.confusion_matrix
with zero lengthy_true
andy_pred
Fix
preprocessing.StandardScaler
withpartial_fit
and sparse input.Fix
preprocessing.Normalizer
with norm=’max’Fix Any model using the
svm.libsvm
or thesvm.liblinear
solver, includingsvm.LinearSVC
,svm.LinearSVR
,svm.NuSVC
,svm.NuSVR
,svm.OneClassSVM
,svm.SVC
,svm.SVR
,linear_model.LogisticRegression
.Fix
tree.DecisionTreeClassifier
,tree.ExtraTreeClassifier
andensemble.GradientBoostingClassifier
as well aspredict
method oftree.DecisionTreeRegressor
,tree.ExtraTreeRegressor
, andensemble.GradientBoostingRegressor
and read-only float32 input inpredict
,decision_path
andpredict_proba
.
Details are listed in the changelog below.
(While we are trying to better inform users by providing this information, we cannot assure that this list is complete.)
Changelog#
sklearn.cluster
#
Efficiency
cluster.Birch
implementation of the predict method avoids high memory footprint by calculating the distances matrix using a chunked scheme. #16149 by Jeremie du Boisberranger and Alex Shacked.Efficiency Major Feature The critical parts of
cluster.KMeans
have a more optimized implementation. Parallelism is now over the data instead of over initializations allowing better scalability. #11950 by Jeremie du Boisberranger.Enhancement
cluster.KMeans
now supports sparse data whensolver = "elkan"
. #11950 by Jeremie du Boisberranger.Enhancement
cluster.AgglomerativeClustering
has a faster and more memory efficient implementation of single linkage clustering. #11514 by Leland McInnes.Fix
cluster.KMeans
withalgorithm="elkan"
now converges withtol=0
as with the defaultalgorithm="full"
. #16075 by Erich Schubert.Fix Fixed a bug in
cluster.Birch
where then_clusters
parameter could not have anp.int64
type. #16484 by Jeremie du Boisberranger.Fix
cluster.AgglomerativeClustering
add specific error when distance matrix is not square andaffinity=precomputed
. #16257 by Simona Maggio.API Change The
n_jobs
parameter ofcluster.KMeans
,cluster.SpectralCoclustering
andcluster.SpectralBiclustering
is deprecated. They now use OpenMP based parallelism. For more details on how to control the number of threads, please refer to our Parallelism notes. #11950 by Jeremie du Boisberranger.API Change The
precompute_distances
parameter ofcluster.KMeans
is deprecated. It has no effect. #11950 by Jeremie du Boisberranger.API Change The
random_state
parameter has been added tocluster.AffinityPropagation
. #16801 by @rcwoolston and Chiara Marmo.
sklearn.compose
#
Efficiency
compose.ColumnTransformer
is now faster when working with dataframes and strings are used to specific subsets of data for transformers. #16431 by Thomas Fan.Enhancement
compose.ColumnTransformer
methodget_feature_names
now supports'passthrough'
columns, with the feature name being either the column name for a dataframe, or'xi'
for column indexi
. #14048 by Lewis Ball.Fix
compose.ColumnTransformer
methodget_feature_names
now returns correct results when one of the transformer steps applies on an empty list of columns #15963 by Roman Yurchak.Fix
compose.ColumnTransformer.fit
will error when selecting a column name that is not unique in the dataframe. #16431 by Thomas Fan.
sklearn.datasets
#
Efficiency
datasets.fetch_openml
has reduced memory usage because it no longer stores the full dataset text stream in memory. #16084 by Joel Nothman.Feature
datasets.fetch_california_housing
now supports heterogeneous data using pandas by settingas_frame=True
. #15950 by Stephanie Andrews and Reshama Shaikh.Feature embedded dataset loaders
datasets.load_breast_cancer
,datasets.load_diabetes
,datasets.load_digits
,datasets.load_iris
,datasets.load_linnerud
anddatasets.load_wine
now support loading as a pandasDataFrame
by settingas_frame=True
. #15980 by @wconnell and Reshama Shaikh.Enhancement Added
return_centers
parameter indatasets.make_blobs
, which can be used to return centers for each cluster. #15709 by @shivamgargsya and Venkatachalam N.Enhancement Functions
datasets.make_circles
anddatasets.make_moons
now accept two-element tuple. #15707 by Maciej J Mikulski.Fix
datasets.make_multilabel_classification
now generatesValueError
for argumentsn_classes < 1
ORlength < 1
. #16006 by Rushabh Vasani.API Change The
StreamHandler
was removed fromsklearn.logger
to avoid double logging of messages in common cases where a handler is attached to the root logger, and to follow the Python logging documentation recommendation for libraries to leave the log message handling to users and application code. #16451 by Christoph Deil.
sklearn.decomposition
#
Enhancement
decomposition.NMF
anddecomposition.non_negative_factorization
now preserves float32 dtype. #16280 by Jeremie du Boisberranger.Enhancement
decomposition.TruncatedSVD.transform
is now faster on given sparsecsc
matrices. #16837 by @wornbb.Fix
decomposition.PCA
with a floatn_components
parameter, will exclusively choose the components that explain the variance greater thann_components
. #15669 by Krishna ChaitanyaFix
decomposition.PCA
withn_components='mle'
now correctly handles small eigenvalues, and does not infer 0 as the correct number of components. #16224 by Lisa Schwetlick, and Gelavizh Ahmadi and Marija Vlajic Wheeler and #16841 by Nicolas Hug.Fix
decomposition.KernelPCA
methodinverse_transform
now applies the correct inverse transform to the transformed data. #16655 by Lewis Ball.Fix Fixed bug that was causing
decomposition.KernelPCA
to sometimes raiseinvalid value encountered in multiply
duringfit
. #16718 by Gui Miotto.Feature Added
n_components_
attribute todecomposition.SparsePCA
anddecomposition.MiniBatchSparsePCA
. #16981 by Mateusz Górski.
sklearn.ensemble
#
Major Feature
ensemble.HistGradientBoostingClassifier
andensemble.HistGradientBoostingRegressor
now support sample_weight. #14696 by Adrin Jalali and Nicolas Hug.Feature Early stopping in
ensemble.HistGradientBoostingClassifier
andensemble.HistGradientBoostingRegressor
is now determined with a newearly_stopping
parameter instead ofn_iter_no_change
. Default value is ‘auto’, which enables early stopping if there are at least 10,000 samples in the training set. #14516 by Johann Faouzi.Major Feature
ensemble.HistGradientBoostingClassifier
andensemble.HistGradientBoostingRegressor
now support monotonic constraints, useful when features are supposed to have a positive/negative effect on the target. #15582 by Nicolas Hug.API Change Added boolean
verbose
flag to classes:ensemble.VotingClassifier
andensemble.VotingRegressor
. #16069 by Sam Bail, Hanna Bruce MacDonald, Reshama Shaikh, and Chiara Marmo.API Change Fixed a bug in
ensemble.HistGradientBoostingClassifier
andensemble.HistGradientBoostingRegressor
that would not respect themax_leaf_nodes
parameter if the criteria was reached at the same time as themax_depth
criteria. #16183 by Nicolas Hug.Fix Changed the convention for
max_depth
parameter ofensemble.HistGradientBoostingClassifier
andensemble.HistGradientBoostingRegressor
. The depth now corresponds to the number of edges to go from the root to the deepest leaf. Stumps (trees with one split) are now allowed. #16182 by Santhosh BFix Fixed a bug in
ensemble.BaggingClassifier
,ensemble.BaggingRegressor
andensemble.IsolationForest
where the attributeestimators_samples_
did not generate the proper indices used duringfit
. #16437 by Jin-Hwan CHO.Fix Fixed a bug in
ensemble.StackingClassifier
andensemble.StackingRegressor
where thesample_weight
argument was not being passed tocross_val_predict
when evaluating the base estimators on cross-validation folds to obtain the input to the meta estimator. #16539 by Bill DeRose.Feature Added additional option
loss="poisson"
toensemble.HistGradientBoostingRegressor
, which adds Poisson deviance with log-link useful for modeling count data. #16692 by Christian LorentzenFix Fixed a bug where
ensemble.HistGradientBoostingRegressor
andensemble.HistGradientBoostingClassifier
would fail with multiple calls to fit whenwarm_start=True
,early_stopping=True
, and there is no validation set. #16663 by Thomas Fan.
sklearn.feature_extraction
#
Efficiency
feature_extraction.text.CountVectorizer
now sorts features after pruning them by document frequency. This improves performances for datasets with large vocabularies combined withmin_df
ormax_df
. #15834 by Santiago M. Mola.
sklearn.feature_selection
#
Enhancement Added support for multioutput data in
feature_selection.RFE
andfeature_selection.RFECV
. #16103 by Divyaprabha M.API Change Adds
feature_selection.SelectorMixin
back to public API. #16132 by @trimeta.
sklearn.gaussian_process
#
Enhancement
gaussian_process.kernels.Matern
returns the RBF kernel whennu=np.inf
. #15503 by Sam Dixon.Fix Fixed bug in
gaussian_process.GaussianProcessRegressor
that caused predicted standard deviations to only be between 0 and 1 when WhiteKernel is not used. #15782 by @plgreenLIRU.
sklearn.impute
#
Enhancement
impute.IterativeImputer
accepts both scalar and array-like inputs formax_value
andmin_value
. Array-like inputs allow a different max and min to be specified for each feature. #16403 by Narendra Mukherjee.Enhancement
impute.SimpleImputer
,impute.KNNImputer
, andimpute.IterativeImputer
accepts pandas’ nullable integer dtype with missing values. #16508 by Thomas Fan.
sklearn.inspection
#
Feature
inspection.partial_dependence
andinspection.plot_partial_dependence
now support the fast ‘recursion’ method forensemble.RandomForestRegressor
andtree.DecisionTreeRegressor
. #15864 by Nicolas Hug.
sklearn.linear_model
#
Major Feature Added generalized linear models (GLM) with non normal error distributions, including
linear_model.PoissonRegressor
,linear_model.GammaRegressor
andlinear_model.TweedieRegressor
which use Poisson, Gamma and Tweedie distributions respectively. #14300 by Christian Lorentzen, Roman Yurchak, and Olivier Grisel.Major Feature Support of
sample_weight
inlinear_model.ElasticNet
andlinear_model.Lasso
for dense feature matrixX
. #15436 by Christian Lorentzen.Efficiency
linear_model.RidgeCV
andlinear_model.RidgeClassifierCV
now does not allocate a potentially large array to store dual coefficients for all hyperparameters during itsfit
, nor an array to store all error or LOO predictions unlessstore_cv_values
isTrue
. #15652 by Jérôme Dockès.Enhancement
linear_model.LassoLars
andlinear_model.Lars
now support ajitter
parameter that adds random noise to the target. This might help with stability in some edge cases. #15179 by @angelaambroz.Fix Fixed a bug where if a
sample_weight
parameter was passed to the fit method oflinear_model.RANSACRegressor
, it would not be passed to the wrappedbase_estimator
during the fitting of the final model. #15773 by Jeremy Alexandre.Fix Add
best_score_
attribute tolinear_model.RidgeCV
andlinear_model.RidgeClassifierCV
. #15655 by Jérôme Dockès.Fix Fixed a bug in
linear_model.RidgeClassifierCV
to pass a specific scoring strategy. Before the internal estimator outputs score instead of predictions. #14848 by Venkatachalam N.Fix
linear_model.LogisticRegression
will now avoid an unnecessary iteration whensolver='newton-cg'
by checking for inferior or equal instead of strictly inferior for maximum ofabsgrad
andtol
inutils.optimize._newton_cg
. #16266 by Rushabh Vasani.API Change Deprecated public attributes
standard_coef_
,standard_intercept_
,average_coef_
, andaverage_intercept_
inlinear_model.SGDClassifier
,linear_model.SGDRegressor
,linear_model.PassiveAggressiveClassifier
,linear_model.PassiveAggressiveRegressor
. #16261 by Carlos Brandt.Fix Efficiency
linear_model.ARDRegression
is more stable and much faster whenn_samples > n_features
. It can now scale to hundreds of thousands of samples. The stability fix might imply changes in the number of non-zero coefficients and in the predicted output. #16849 by Nicolas Hug.Fix Fixed a bug in
linear_model.ElasticNetCV
,linear_model.MultiTaskElasticNetCV
,linear_model.LassoCV
andlinear_model.MultiTaskLassoCV
where fitting would fail when using joblib loky backend. #14264 by Jérémie du Boisberranger.Efficiency Speed up
linear_model.MultiTaskLasso
,linear_model.MultiTaskLassoCV
,linear_model.MultiTaskElasticNet
,linear_model.MultiTaskElasticNetCV
by avoiding slower BLAS Level 2 calls on small arrays #17021 by Alex Gramfort and Mathurin Massias.
sklearn.metrics
#
Enhancement
metrics.pairwise_distances_chunked
now allows itsreduce_func
to not have a return value, enabling in-place operations. #16397 by Joel Nothman.Fix Fixed a bug in
metrics.mean_squared_error
to not ignore argumentsquared
when argumentmultioutput='raw_values'
. #16323 by Rushabh VasaniFix Fixed a bug in
metrics.mutual_info_score
where negative scores could be returned. #16362 by Thomas Fan.Fix Fixed a bug in
metrics.confusion_matrix
that would raise an error wheny_true
andy_pred
were length zero andlabels
was notNone
. In addition, we raise an error when an empty list is given to thelabels
parameter. #16442 by Kyle Parsons.API Change Changed the formatting of values in
metrics.ConfusionMatrixDisplay.plot
andmetrics.plot_confusion_matrix
to pick the shorter format (either ‘2g’ or ‘d’). #16159 by Rick Mackenbach and Thomas Fan.API Change From version 0.25,
metrics.pairwise_distances
will no longer automatically compute theVI
parameter for Mahalanobis distance and theV
parameter for seuclidean distance ifY
is passed. The user will be expected to compute this parameter on the training data of their choice and pass it topairwise_distances
. #16993 by Joel Nothman.
sklearn.model_selection
#
Enhancement
model_selection.GridSearchCV
andmodel_selection.RandomizedSearchCV
yields stack trace information in fit failed warning messages in addition to previously emitted type and details. #15622 by Gregory Morse.Fix
model_selection.cross_val_predict
supportsmethod="predict_proba"
wheny=None
. #15918 by Luca Kubin.Fix
model_selection.fit_grid_point
is deprecated in 0.23 and will be removed in 0.25. #16401 by Arie Pratama Sutiono
sklearn.multioutput
#
Feature
multioutput.MultiOutputRegressor.fit
andmultioutput.MultiOutputClassifier.fit
now can acceptfit_params
to pass to theestimator.fit
method of each step. #15953 #15959 by Ke Huang.Enhancement
multioutput.RegressorChain
now supportsfit_params
forbase_estimator
duringfit
. #16111 by Venkatachalam N.
sklearn.naive_bayes
#
Fix A correctly formatted error message is shown in
naive_bayes.CategoricalNB
when the number of features in the input differs betweenpredict
andfit
. #16090 by Madhura Jayaratne.
sklearn.neural_network
#
Efficiency
neural_network.MLPClassifier
andneural_network.MLPRegressor
has reduced memory footprint when using stochastic solvers,'sgd'
or'adam'
, andshuffle=True
. #14075 by @meyer89.Fix Increases the numerical stability of the logistic loss function in
neural_network.MLPClassifier
by clipping the probabilities. #16117 by Thomas Fan.
sklearn.inspection
#
Enhancement
inspection.PartialDependenceDisplay
now exposes the deciles lines as attributes so they can be hidden or customized. #15785 by Nicolas Hug
sklearn.preprocessing
#
Feature argument
drop
ofpreprocessing.OneHotEncoder
will now accept value ‘if_binary’ and will drop the first category of each feature with two categories. #16245 by Rushabh Vasani.Enhancement
preprocessing.OneHotEncoder
’sdrop_idx_
ndarray can now containNone
, wheredrop_idx_[i] = None
means that no category is dropped for indexi
. #16585 by Chiara Marmo.Enhancement
preprocessing.MaxAbsScaler
,preprocessing.MinMaxScaler
,preprocessing.StandardScaler
,preprocessing.PowerTransformer
,preprocessing.QuantileTransformer
,preprocessing.RobustScaler
now supports pandas’ nullable integer dtype with missing values. #16508 by Thomas Fan.Efficiency
preprocessing.OneHotEncoder
is now faster at transforming. #15762 by Thomas Fan.Fix Fix a bug in
preprocessing.StandardScaler
which was incorrectly computing statistics when callingpartial_fit
on sparse inputs. #16466 by Guillaume Lemaitre.Fix Fix a bug in
preprocessing.Normalizer
with norm=’max’, which was not taking the absolute value of the maximum values before normalizing the vectors. #16632 by Maura Pintor and Battista Biggio.
sklearn.semi_supervised
#
Fix
semi_supervised.LabelSpreading
andsemi_supervised.LabelPropagation
avoids divide by zero warnings when normalizinglabel_distributions_
. #15946 by @ngshya.
sklearn.svm
#
Fix Efficiency Improved
libsvm
andliblinear
random number generators used to randomly select coordinates in the coordinate descent algorithms. Platform-dependent Crand()
was used, which is only able to generate numbers up to32767
on windows platform (see this blog post) and also has poor randomization power as suggested by this presentation. It was replaced with C++11mt19937
, a Mersenne Twister that correctly generates 31bits/63bits random numbers on all platforms. In addition, the crude “modulo” postprocessor used to get a random number in a bounded interval was replaced by the tweaked Lemire method as suggested by this blog post. Any model using thesvm.libsvm
or thesvm.liblinear
solver, includingsvm.LinearSVC
,svm.LinearSVR
,svm.NuSVC
,svm.NuSVR
,svm.OneClassSVM
,svm.SVC
,svm.SVR
,linear_model.LogisticRegression
, is affected. In particular users can expect a better convergence when the number of samples (LibSVM) or the number of features (LibLinear) is large. #13511 by Sylvain Marié.Fix Fix use of custom kernel not taking float entries such as string kernels in
svm.SVC
andsvm.SVR
. Note that custom kennels are now expected to validate their input where they previously received valid numeric arrays. #11296 by Alexandre Gramfort and Georgi Peev.API Change
svm.SVR
andsvm.OneClassSVM
attributes,probA_
andprobB_
, are now deprecated as they were not useful. #15558 by Thomas Fan.
sklearn.tree
#
Fix
tree.plot_tree
rotate
parameter was unused and has been deprecated. #15806 by Chiara Marmo.Fix Fix support of read-only float32 array input in
predict
,decision_path
andpredict_proba
methods oftree.DecisionTreeClassifier
,tree.ExtraTreeClassifier
andensemble.GradientBoostingClassifier
as well aspredict
method oftree.DecisionTreeRegressor
,tree.ExtraTreeRegressor
, andensemble.GradientBoostingRegressor
. #16331 by Alexandre Batisse.
sklearn.utils
#
Major Feature Estimators can now be displayed with a rich html representation. This can be enabled in Jupyter notebooks by setting
display='diagram'
inset_config
. The raw html can be returned by usingutils.estimator_html_repr
. #14180 by Thomas Fan.Enhancement improve error message in
utils.validation.column_or_1d
. #15926 by Loïc Estève.Enhancement add warning in
utils.check_array
for pandas sparse DataFrame. #16021 by Rushabh Vasani.Enhancement
utils.check_array
now constructs a sparse matrix from a pandas DataFrame that contains onlySparseArray
columns. #16728 by Thomas Fan.Enhancement
utils.check_array
supports pandas’ nullable integer dtype with missing values whenforce_all_finite
is set toFalse
or'allow-nan'
in which case the data is converted to floating point values wherepd.NA
values are replaced bynp.nan
. As a consequence, allsklearn.preprocessing
transformers that accept numeric inputs with missing values represented asnp.nan
now also accepts being directly fed pandas dataframes withpd.Int* or `pd.Uint*
typed columns that usepd.NA
as a missing value marker. #16508 by Thomas Fan.API Change Passing classes to
utils.estimator_checks.check_estimator
andutils.estimator_checks.parametrize_with_checks
is now deprecated, and support for classes will be removed in 0.24. Pass instances instead. #17032 by Nicolas Hug.API Change The private utility
_safe_tags
inutils.estimator_checks
was removed, hence all tags should be obtained throughestimator._get_tags()
. Note that Mixins likeRegressorMixin
must come before base classes in the MRO for_get_tags()
to work properly. #16950 by Nicolas Hug.Fix
utils.all_estimators
now only returns public estimators. #15380 by Thomas Fan.
Miscellaneous#
Major Feature Adds a HTML representation of estimators to be shown in a jupyter notebook or lab. This visualization is activated by setting the
display
option insklearn.set_config
. #14180 by Thomas Fan.Enhancement
scikit-learn
now works withmypy
without errors. #16726 by Roman Yurchak.API Change Most estimators now expose a
n_features_in_
attribute. This attribute is equal to the number of features passed to thefit
method. See SLEP010 for details. #16112 by Nicolas Hug.API Change Estimators now have a
requires_y
tags which is False by default except for estimators that inherit from~sklearn.base.RegressorMixin
or~sklearn.base.ClassifierMixin
. This tag is used to ensure that a proper error message is raised when y was expected but None was passed. #16622 by Nicolas Hug.API Change The default setting
print_changed_only
has been changed from False to True. This means that therepr
of estimators is now more concise and only shows the parameters whose default value has been changed when printing an estimator. You can restore the previous behaviour by usingsklearn.set_config(print_changed_only=False)
. Also, note that it is always possible to quickly inspect the parameters of any estimator usingest.get_params(deep=False)
. #17061 by Nicolas Hug.
Code and documentation contributors
Thanks to everyone who has contributed to the maintenance and improvement of the project since version 0.22, including:
Abbie Popa, Adrin Jalali, Aleksandra Kocot, Alexandre Batisse, Alexandre Gramfort, Alex Henrie, Alex Itkes, Alex Liang, alexshacked, Alonso Silva Allende, Ana Casado, Andreas Mueller, Angela Ambroz, Ankit810, Arie Pratama Sutiono, Arunav Konwar, Baptiste Maingret, Benjamin Beier Liu, bernie gray, Bharathi Srinivasan, Bharat Raghunathan, Bibhash Chandra Mitra, Brian Wignall, brigi, Brigitta Sipőcz, Carlos H Brandt, CastaChick, castor, cgsavard, Chiara Marmo, Chris Gregory, Christian Kastner, Christian Lorentzen, Corrie Bartelheimer, Daniël van Gelder, Daphne, David Breuer, david-cortes, dbauer9, Divyaprabha M, Edward Qian, Ekaterina Borovikova, ELNS, Emily Taylor, Erich Schubert, Eric Leung, Evgeni Chasnovski, Fabiana, Facundo Ferrín, Fan, Franziska Boenisch, Gael Varoquaux, Gaurav Sharma, Geoffrey Bolmier, Georgi Peev, gholdman1, Gonthier Nicolas, Gregory Morse, Gregory R. Lee, Guillaume Lemaitre, Gui Miotto, Hailey Nguyen, Hanmin Qin, Hao Chun Chang, HaoYin, Hélion du Mas des Bourboux, Himanshu Garg, Hirofumi Suzuki, huangk10, Hugo van Kemenade, Hye Sung Jung, indecisiveuser, inderjeet, J-A16, Jérémie du Boisberranger, Jin-Hwan CHO, JJmistry, Joel Nothman, Johann Faouzi, Jon Haitz Legarreta Gorroño, Juan Carlos Alfaro Jiménez, judithabk6, jumon, Kathryn Poole, Katrina Ni, Kesshi Jordan, Kevin Loftis, Kevin Markham, krishnachaitanya9, Lam Gia Thuan, Leland McInnes, Lisa Schwetlick, lkubin, Loic Esteve, lopusz, lrjball, lucgiffon, lucyleeow, Lucy Liu, Lukas Kemkes, Maciej J Mikulski, Madhura Jayaratne, Magda Zielinska, maikia, Mandy Gu, Manimaran, Manish Aradwad, Maren Westermann, Maria, Mariana Meireles, Marie Douriez, Marielle, Mateusz Górski, mathurinm, Matt Hall, Maura Pintor, mc4229, meyer89, m.fab, Michael Shoemaker, Michał Słapek, Mina Naghshhnejad, mo, Mohamed Maskani, Mojca Bertoncelj, narendramukherjee, ngshya, Nicholas Won, Nicolas Hug, nicolasservel, Niklas, @nkish, Noa Tamir, Oleksandr Pavlyk, olicairns, Oliver Urs Lenz, Olivier Grisel, parsons-kyle-89, Paula, Pete Green, Pierre Delanoue, pspachtholz, Pulkit Mehta, Qizhi Jiang, Quang Nguyen, rachelcjordan, raduspaimoc, Reshama Shaikh, Riccardo Folloni, Rick Mackenbach, Ritchie Ng, Roman Feldbauer, Roman Yurchak, Rory Hartong-Redden, Rüdiger Busche, Rushabh Vasani, Sambhav Kothari, Samesh Lakhotia, Samuel Duan, SanthoshBala18, Santiago M. Mola, Sarat Addepalli, scibol, Sebastian Kießling, SergioDSR, Sergul Aydore, Shiki-H, shivamgargsya, SHUBH CHATTERJEE, Siddharth Gupta, simonamaggio, smarie, Snowhite, stareh, Stephen Blystone, Stephen Marsh, Sunmi Yoon, SylvainLan, talgatomarov, tamirlan1, th0rwas, theoptips, Thomas J Fan, Thomas Li, Thomas Schmitt, Tim Nonner, Tim Vink, Tiphaine Viard, Tirth Patel, Titus Christian, Tom Dupré la Tour, trimeta, Vachan D A, Vandana Iyer, Venkatachalam N, waelbenamara, wconnell, wderose, wenliwyan, Windber, wornbb, Yu-Hang “Maxin” Tang