Version 1.4#
For a short description of the main highlights of the release, please refer to Release Highlights for scikit-learn 1.4.
Legend for changelogs
Major Feature something big that you couldn’t do before.
Feature something that you couldn’t do before.
Efficiency an existing feature now may not require as much computation or memory.
Enhancement a miscellaneous minor improvement.
Fix something that previously didn’t work as documented – or according to reasonable expectations – should now work.
API Change you will need to change your code to have the same effect in the future; or a feature will be removed in the future.
Version 1.4.2#
April 2024
This release only includes support for numpy 2.
Version 1.4.1#
February 2024
Changed models#
API Change The
tree_.valueattribute intree.DecisionTreeClassifier,tree.DecisionTreeRegressor,tree.ExtraTreeClassifierandtree.ExtraTreeRegressorchanged from a weighted absolute count of number of samples to a weighted fraction of the total number of samples. #27639 by Samuel Ronsin.
Metadata Routing#
Fix Fix routing issue with
ColumnTransformerwhen used inside another meta-estimator. #28188 by Adrin Jalali.Fix No error is raised when no metadata is passed to a metaestimator that includes a sub-estimator which doesn’t support metadata routing. #28256 by Adrin Jalali.
Fix Fix
multioutput.MultiOutputRegressorandmultioutput.MultiOutputClassifierto work with estimators that don’t consume any metadata when metadata routing is enabled. #28240 by Adrin Jalali.
DataFrame Support#
Enhancement Fix Pandas and Polars dataframe are validated directly without ducktyping checks. #28195 by Thomas Fan.
Changes impacting many modules#
Efficiency Fix Partial revert of #28191 to avoid a performance regression for estimators relying on euclidean pairwise computation with sparse matrices. The impacted estimators are:
Fix Fixes a bug for all scikit-learn transformers when using
set_outputwithtransformset topandasorpolars. The bug could lead to wrong naming of the columns of the returned dataframe. #28262 by Guillaume Lemaitre.Fix When users try to use a method in
StackingClassifier,StackingClassifier,StackingClassifier,SelectFromModel,RFE,SelfTrainingClassifier,OneVsOneClassifier,OutputCodeClassifierorOneVsRestClassifierthat their sub-estimators don’t implement, theAttributeErrornow reraises in the traceback. #28167 by Stefanie Senger.
Changelog#
sklearn.calibration#
Fix
calibration.CalibratedClassifierCVsupports predict_proba with float32 output from the inner estimator. #28247 by Thomas Fan.
sklearn.cluster#
Fix
cluster.AffinityPropagationnow avoids assigning multiple different clusters for equal points. #28121 by Pietro Peterlongo and Yao Xiao.Fix Avoid infinite loop in
cluster.KMeanswhen the number of clusters is larger than the number of non-duplicate samples. #28165 by Jérémie du Boisberranger.
sklearn.compose#
Fix
compose.ColumnTransformernow transforms into a polars dataframe whenverbose_feature_names_out=Trueand the transformers internally used several times the same columns. Previously, it would raise a due to duplicated column names. #28262 by Guillaume Lemaitre.
sklearn.ensemble#
Fix
HistGradientBoostingClassifierandHistGradientBoostingRegressorwhen fitted onpandasDataFramewith extension dtypes, for examplepd.Int64Dtype#28385 by Loïc Estève.Fix Fixes error message raised by
ensemble.VotingClassifierwhen the target is multilabel or multiclass-multioutput in a DataFrame format. #27702 by Guillaume Lemaitre.
sklearn.impute#
Fix :
impute.SimpleImputernow raises an error in.fitand.transformiffill_valuecan not be cast to input value dtype withcasting='same_kind'. #28365 by Leo Grinsztajn.
sklearn.inspection#
Fix
inspection.permutation_importancenow handles properlysample_weighttogether with subsampling (i.e.max_features< 1.0). #28184 by Michael Mayer.
sklearn.linear_model#
Fix
linear_model.ARDRegressionnow handles pandas input types forpredict(X, return_std=True). #28377 by Eddie Bergman.
sklearn.preprocessing#
Fix make
preprocessing.FunctionTransformermore lenient and overwrite output column names with theget_feature_names_outin the following cases: (i) the input and output column names remain the same (happen when using NumPyufunc); (ii) the input column names are numbers; (iii) the output will be set to Pandas or Polars dataframe. #28241 by Guillaume Lemaitre.Fix
preprocessing.FunctionTransformernow also warns whenset_outputis called withtransform="polars"andfuncdoes not return a Polars dataframe orfeature_names_outis not specified. #28263 by Guillaume Lemaitre.Fix
preprocessing.TargetEncoderno longer fails whentarget_type="continuous"and the input is read-only. In particular, it now works with pandas copy-on-write mode enabled. #28233 by John Hopfensperger.
sklearn.tree#
Fix
tree.DecisionTreeClassifierandtree.DecisionTreeRegressorare handling missing values properly. The internal criterion was not initialized when no missing values were present in the data, leading to potentially wrong criterion values. #28295 by Guillaume Lemaitre and #28327 by Adam Li.
sklearn.utils#
Enhancement Fix
utils.metaestimators.available_ifnow reraises the error from thecheckfunction as the cause of theAttributeError. #28198 by Thomas Fan.Fix
utils._safe_indexingnow raises aValueErrorwhenXis a Python list andaxis=1, as documented in the docstring. #28222 by Guillaume Lemaitre.
Version 1.4.0#
January 2024
Changed models#
The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.
Efficiency
linear_model.LogisticRegressionandlinear_model.LogisticRegressionCVnow have much better convergence for solvers"lbfgs"and"newton-cg". Both solvers can now reach much higher precision for the coefficients depending on the specifiedtol. Additionally, lbfgs can make better use oftol, i.e., stop sooner or reach higher precision. Note: The lbfgs is the default solver, so this change might affect many models. This change also means that with this new version of scikit-learn, the resulting coefficientscoef_andintercept_of your models will change for these two solvers (when fit on the same data again). The amount of change depends on the specifiedtol, for small values you will get more precise results. #26721 by Christian Lorentzen.Fix fixes a memory leak seen in PyPy for estimators using the Cython loss functions. #27670 by Guillaume Lemaitre.
Changes impacting all modules#
Major Feature Transformers now support polars output with
set_output(transform="polars"). #27315 by Thomas Fan.Enhancement All estimators now recognize the column names from any dataframe that adopts the DataFrame Interchange Protocol. Dataframes that return a correct representation through
np.asarray(df)is expected to work with our estimators and functions. #26464 by Thomas Fan.Enhancement The HTML representation of estimators now includes a link to the documentation and is color-coded to denote whether the estimator is fitted or not (unfitted estimators are orange, fitted estimators are blue). #26616 by Riccardo Cappuzzo, Ines Ibnukhsein, Gael Varoquaux, Joel Nothman and Lilian Boulard.
Fix Fixed a bug in most estimators and functions where setting a parameter to a large integer would cause a
TypeError. #26648 by Naoise Holohan.
Metadata Routing#
The following models now support metadata routing in one or more of their methods. Refer to the Metadata Routing User Guide for more details.
Feature
LarsCVandLassoLarsCVnow support metadata routing in theirfitmethod and route metadata to the CV splitter. #27538 by Omar Salman.Feature
multiclass.OneVsRestClassifier,multiclass.OneVsOneClassifierandmulticlass.OutputCodeClassifiernow support metadata routing in theirfitandpartial_fit, and route metadata to the underlying estimator’sfitandpartial_fit. #27308 by Stefanie Senger.Feature
pipeline.Pipelinenow supports metadata routing according to metadata routing user guide. #26789 by Adrin Jalali.Feature
cross_validate,cross_val_score, andcross_val_predictnow support metadata routing. The metadata are routed to the estimator’sfit, the scorer, and the CV splitter’ssplit. The metadata is accepted via the newparamsparameter.fit_paramsis deprecated and will be removed in version 1.6.groupsparameter is also not accepted as a separate argument when metadata routing is enabled and should be passed via theparamsparameter. #26896 by Adrin Jalali.Feature
GridSearchCV,RandomizedSearchCV,HalvingGridSearchCV, andHalvingRandomSearchCVnow support metadata routing in theirfitandscore, and route metadata to the underlying estimator’sfit, the CV splitter, and the scorer. #27058 by Adrin Jalali.Feature
ColumnTransformernow supports metadata routing according to metadata routing user guide. #27005 by Adrin Jalali.Feature
linear_model.LogisticRegressionCVnow supports metadata routing.linear_model.LogisticRegressionCV.fitnow accepts**paramswhich are passed to the underlying splitter and scorer.linear_model.LogisticRegressionCV.scorenow accepts**score_paramswhich are passed to the underlying scorer. #26525 by Omar Salman.Feature
feature_selection.SelectFromModelnow supports metadata routing infitandpartial_fit. #27490 by Stefanie Senger.Feature
linear_model.OrthogonalMatchingPursuitCVnow supports metadata routing. Itsfitnow accepts**fit_params, which are passed to the underlying splitter. #27500 by Stefanie Senger.Feature
ElasticNetCV,LassoCV,MultiTaskElasticNetCVandMultiTaskLassoCVnow support metadata routing and route metadata to the CV splitter. #27478 by Omar Salman.Fix All meta-estimators for which metadata routing is not yet implemented now raise a
NotImplementedErroronget_metadata_routingand onfitif metadata routing is enabled and any metadata is passed to them. #27389 by Adrin Jalali.
Support for SciPy sparse arrays#
Several estimators are now supporting SciPy sparse arrays. The following functions and classes are impacted:
Functions:
cluster.compute_optics_graphin #27104 by Maren Westermann and in #27250 by Yao Xiao;decomposition.non_negative_factorizationin #27100 by Isaac Virshup;feature_selection.f_regressionin #27239 by Yaroslav Korobko;feature_selection.r_regressionin #27239 by Yaroslav Korobko;
Classes:
cluster.HDBSCANin #27250 by Yao Xiao;cluster.KMeansin #27179 by Nurseit Kamchyev;cluster.OPTICSin #27104 by Maren Westermann and in #27250 by Yao Xiao;decomposition.NMFin #27100 by Isaac Virshup;feature_extraction.text.TfidfTransformerin #27219 by Yao Xiao;manifold.Isomapin #27250 by Yao Xiao;manifold.TSNEin #27250 by Yao Xiao;impute.SimpleImputerin #27277 by Yao Xiao;impute.KNNImputerin #27277 by Yao Xiao;kernel_approximation.PolynomialCountSketchin #27301 by Lohit SundaramahaLingam;random_projection.GaussianRandomProjectionin #27314 by Stefanie Senger;random_projection.SparseRandomProjectionin #27314 by Stefanie Senger.
Support for Array API#
Several estimators and functions support the Array API. Such changes allow for using the estimators and functions with other libraries such as JAX, CuPy, and PyTorch. This therefore enables some GPU-accelerated computations.
See Array API support (experimental) for more details.
Functions:
sklearn.metrics.accuracy_scoreandsklearn.metrics.zero_one_lossin #27137 by Edoardo Abati;sklearn.model_selection.train_test_splitin #26855 by Tim Head;is_multilabelin #27601 by Yaroslav Korobko.
Classes:
decomposition.PCAfor thefullandrandomizedsolvers (with QR power iterations) in #26315, #27098 and #27431 by Mateusz Sokół, Olivier Grisel and Edoardo Abati;
Private Loss Function Module#
Fix The gradient computation of the binomial log loss is now numerically more stable for very large, in absolute value, input (raw predictions). Before, it could result in
np.nan. Among the models that profit from this change areensemble.GradientBoostingClassifier,ensemble.HistGradientBoostingClassifierandlinear_model.LogisticRegression. #28048 by Christian Lorentzen.
Changelog#
sklearn.base#
Enhancement
base.ClusterMixin.fit_predictandbase.OutlierMixin.fit_predictnow accept**kwargswhich are passed to thefitmethod of the estimator. #26506 by Adrin Jalali.Enhancement
base.TransformerMixin.fit_transformandbase.OutlierMixin.fit_predictnow raise a warning iftransform/predictconsume metadata, but no customfit_transform/fit_predictis defined in the class inheriting from them correspondingly. #26831 by Adrin Jalali.Enhancement
base.clonenow supportsdictas input and creates a copy. #26786 by Adrin Jalali.API Change
process_routingnow has a different signature. The first two (the object and the method) are positional only, and all metadata are passed as keyword arguments. #26909 by Adrin Jalali.
sklearn.calibration#
Enhancement The internal objective and gradient of the
sigmoidmethod ofcalibration.CalibratedClassifierCVhave been replaced by the private loss module. #27185 by Omar Salman.
sklearn.cluster#
Fix The
degreeparameter in thecluster.SpectralClusteringconstructor now accepts real values instead of only integral values in accordance with thedegreeparameter of thesklearn.metrics.pairwise.polynomial_kernel. #27668 by Nolan McMahon.Fix Fixes a bug in
cluster.OPTICSwhere the cluster correction based on predecessor was not using the right indexing. It would lead to inconsistent results dependent on the order of the data. #26459 by Haoying Zhang and Guillaume Lemaitre.Fix Improve error message when checking the number of connected components in the
fitmethod ofcluster.HDBSCAN. #27678 by Ganesh Tata.Fix Create copy of precomputed sparse matrix within the
fitmethod ofcluster.DBSCANto avoid in-place modification of the sparse matrix. #27651 by Ganesh Tata.Fix Raises a proper
ValueErrorwhenmetric="precomputed"and requested storing centers via the parameterstore_centers. #27898 by Guillaume Lemaitre.API Change
kdtreeandballtreevalues are now deprecated and are renamed askd_treeandball_treerespectively for thealgorithmparameter ofcluster.HDBSCANensuring consistency in naming convention.kdtreeandballtreevalues will be removed in 1.6. #26744 by Shreesha Kumar Bhat.API Change The option
metric=Noneincluster.AgglomerativeClusteringandcluster.FeatureAgglomerationis deprecated in version 1.4 and will be removed in version 1.6. Use the default value instead. #27828 by Guillaume Lemaitre.
sklearn.compose#
Major Feature Adds polars input support to
compose.ColumnTransformerthrough the DataFrame Interchange Protocol. The minimum supported version for polars is0.19.12. #26683 by Thomas Fan.Fix
cluster.spectral_clusteringandcluster.SpectralClusteringnow raise an explicit error message indicating that sparse matrices and arrays withnp.int64indices are not supported. #27240 by Yao Xiao.API Change outputs that use pandas extension dtypes and contain
pd.NAinColumnTransformernow result in aFutureWarningand will cause aValueErrorin version 1.6, unless the output container has been configured as “pandas” withset_output(transform="pandas"). Before, such outputs resulted in numpy arrays of dtypeobjectcontainingpd.NAwhich could not be converted to numpy floats and caused errors when passed to other scikit-learn estimators. #27734 by Jérôme Dockès.
sklearn.covariance#
Enhancement Allow
covariance.shrunk_covarianceto process multiple covariance matrices at once by handling nd-arrays. #25275 by Quentin Barthélemy.API Change Fix
ColumnTransformernow replaces"passthrough"with a correspondingFunctionTransformerin the fittedtransformers_attribute. #27204 by Adrin Jalali.
sklearn.datasets#
Enhancement
datasets.make_sparse_spd_matrixnow uses a more memory-efficient sparse layout. It also accepts a new keywordsparse_formatthat allows specifying the output format of the sparse matrix. By defaultsparse_format=None, which returns a dense numpy ndarray as before. #27438 by Yao Xiao.Fix
datasets.dump_svmlight_filenow does not raiseValueErrorwhenXis read-only, e.g., anumpy.memmapinstance. #28111 by Yao Xiao.API Change
datasets.make_sparse_spd_matrixdeprecated the keyword argumentdimin favor ofn_dim.dimwill be removed in version 1.6. #27718 by Adam Li.
sklearn.decomposition#
Feature
decomposition.PCAnow supportsscipy.sparse.sparrayandscipy.sparse.spmatrixinputs when using thearpacksolver. When used on sparse data likedatasets.fetch_20newsgroups_vectorizedthis can lead to speed-ups of 100x (single threaded) and 70x lower memory usage. Based on Alexander Tarashansky’s implementation in scanpy. #18689 by Isaac Virshup and Andrey Portnoy.Enhancement An “auto” option was added to the
n_componentsparameter ofdecomposition.non_negative_factorization,decomposition.NMFanddecomposition.MiniBatchNMFto automatically infer the number of components from W or H shapes when using a custom initialization. The default value of this parameter will change fromNonetoautoin version 1.6. #26634 by Alexandre Landeau and Alexandre Vigny.Fix
decomposition.dict_learning_onlinedoes not ignore anymore the parametermax_iter. #27834 by Guillaume Lemaitre.Fix The
degreeparameter in thedecomposition.KernelPCAconstructor now accepts real values instead of only integral values in accordance with thedegreeparameter of thesklearn.metrics.pairwise.polynomial_kernel. #27668 by Nolan McMahon.API Change The option
max_iter=Noneindecomposition.MiniBatchDictionaryLearning,decomposition.MiniBatchSparsePCA, anddecomposition.dict_learning_onlineis deprecated and will be removed in version 1.6. Use the default value instead. #27834 by Guillaume Lemaitre.
sklearn.ensemble#
Major Feature
ensemble.RandomForestClassifierandensemble.RandomForestRegressorsupport missing values when the criterion isgini,entropy, orlog_loss, for classification orsquared_error,friedman_mse, orpoissonfor regression. #26391 by Thomas Fan.Major Feature
ensemble.HistGradientBoostingClassifierandensemble.HistGradientBoostingRegressorsupportcategorical_features="from_dtype", which treats columns with Pandas or Polars Categorical dtype as categories in the algorithm.categorical_features="from_dtype"will become the default in v1.6. Categorical features no longer need to be encoded with numbers. When categorical features are numbers, the maximum value no longer needs to be smaller thanmax_bins; only the number of (unique) categories must be smaller thanmax_bins. #26411 by Thomas Fan and #27835 by Jérôme Dockès.Major Feature
ensemble.HistGradientBoostingClassifierandensemble.HistGradientBoostingRegressorgot the new parametermax_featuresto specify the proportion of randomly chosen features considered in each split. #27139 by Christian Lorentzen.Feature
ensemble.RandomForestClassifier,ensemble.RandomForestRegressor,ensemble.ExtraTreesClassifierandensemble.ExtraTreesRegressornow support monotonic constraints, useful when features are supposed to have a positive/negative effect on the target. Missing values in the train data and multi-output targets are not supported. #13649 by Samuel Ronsin, initiated by Patrick O’Reilly.Efficiency
ensemble.HistGradientBoostingClassifierandensemble.HistGradientBoostingRegressorare now a bit faster by reusing the parent node’s histogram as children node’s histogram in the subtraction trick. In effect, less memory has to be allocated and deallocated. #27865 by Christian Lorentzen.Efficiency
ensemble.GradientBoostingClassifieris faster, for binary and in particular for multiclass problems thanks to the private loss function module. #26278 and #28095 by Christian Lorentzen.Efficiency Improves runtime and memory usage for
ensemble.GradientBoostingClassifierandensemble.GradientBoostingRegressorwhen trained on sparse data. #26957 by Thomas Fan.Efficiency
ensemble.HistGradientBoostingClassifierandensemble.HistGradientBoostingRegressoris now faster whenscoringis a predefined metric listed inmetrics.get_scorer_namesand early stopping is enabled. #26163 by Thomas Fan.Enhancement A fitted property,
estimators_samples_, was added to all Forest methods, includingensemble.RandomForestClassifier,ensemble.RandomForestRegressor,ensemble.ExtraTreesClassifierandensemble.ExtraTreesRegressor, which allows to retrieve the training sample indices used for each tree estimator. #26736 by Adam Li.Fix Fixes
ensemble.IsolationForestwhen the input is a sparse matrix andcontaminationis set to a float value. #27645 by Guillaume Lemaitre.Fix Raises a
ValueErrorinensemble.RandomForestRegressorandensemble.ExtraTreesRegressorwhen requesting OOB score with multioutput model for the targets being all rounded to integer. It was recognized as a multiclass problem. #27817 by Daniele OngariFix Changes estimator tags to acknowledge that
ensemble.VotingClassifier,ensemble.VotingRegressor,ensemble.StackingClassifier,ensemble.StackingRegressor, support missing values if allestimatorssupport missing values. #27710 by Guillaume Lemaitre.Fix Support loading pickles of
ensemble.HistGradientBoostingClassifierandensemble.HistGradientBoostingRegressorwhen the pickle has been generated on a platform with a different bitness. A typical example is to train and pickle the model on 64 bit machine and load the model on a 32 bit machine for prediction. #28074 by Christian Lorentzen and Loïc Estève.API Change In
ensemble.AdaBoostClassifier, thealgorithmargumentSAMME.Rwas deprecated and will be removed in 1.6. #26830 by Stefanie Senger.
sklearn.feature_extraction#
API Change Changed error type from
AttributeErrortoexceptions.NotFittedErrorin unfitted instances offeature_extraction.DictVectorizerfor the following methods:feature_extraction.DictVectorizer.inverse_transform,feature_extraction.DictVectorizer.restrict,feature_extraction.DictVectorizer.transform. #24838 by Lorenz Hertel.
sklearn.feature_selection#
Enhancement
feature_selection.SelectKBest,feature_selection.SelectPercentile, andfeature_selection.GenericUnivariateSelectnow support unsupervised feature selection by providing ascore_functakingXandy=None. #27721 by Guillaume Lemaitre.Enhancement
feature_selection.SelectKBestandfeature_selection.GenericUnivariateSelectwithmode='k_best'now shows a warning whenkis greater than the number of features. #27841 by Thomas Fan.Fix
feature_selection.RFEandfeature_selection.RFECVdo not check for nans during input validation. #21807 by Thomas Fan.
sklearn.inspection#
Enhancement
inspection.DecisionBoundaryDisplaynow accepts a parameterclass_of_interestto select the class of interest when plotting the response provided byresponse_method="predict_proba"orresponse_method="decision_function". It allows to plot the decision boundary for both binary and multiclass classifiers. #27291 by Guillaume Lemaitre.Fix
inspection.DecisionBoundaryDisplay.from_estimatorandinspection.PartialDependenceDisplay.from_estimatornow return the correct type for subclasses. #27675 by John Cant.API Change
inspection.DecisionBoundaryDisplayraises anAttributeErrorinstead of aValueErrorwhen an estimator does not implement the requested response method. #27291 by Guillaume Lemaitre.
sklearn.kernel_ridge#
Fix The
degreeparameter in thekernel_ridge.KernelRidgeconstructor now accepts real values instead of only integral values in accordance with thedegreeparameter of thesklearn.metrics.pairwise.polynomial_kernel. #27668 by Nolan McMahon.
sklearn.linear_model#
Efficiency
linear_model.LogisticRegressionandlinear_model.LogisticRegressionCVnow have much better convergence for solvers"lbfgs"and"newton-cg". Both solvers can now reach much higher precision for the coefficients depending on the specifiedtol. Additionally, lbfgs can make better use oftol, i.e., stop sooner or reach higher precision. This is accomplished by better scaling of the objective function, i.e., using average per sample losses instead of sum of per sample losses. #26721 by Christian Lorentzen.Efficiency
linear_model.LogisticRegressionandlinear_model.LogisticRegressionCVwith solver"newton-cg"can now be considerably faster for some data and parameter settings. This is accomplished by a better line search convergence check for negligible loss improvements that takes into account gradient information. #26721 by Christian Lorentzen.Efficiency Solver
"newton-cg"inlinear_model.LogisticRegressionandlinear_model.LogisticRegressionCVuses a little less memory. The effect is proportional to the number of coefficients (n_features * n_classes). #27417 by Christian Lorentzen.Fix Ensure that the
sigma_attribute oflinear_model.ARDRegressionandlinear_model.BayesianRidgealways has afloat32dtype when fitted onfloat32data, even with the type promotion rules of NumPy 2. #27899 by Olivier Grisel.API Change The attribute
loss_function_oflinear_model.SGDClassifierandlinear_model.SGDOneClassSVMhas been deprecated and will be removed in version 1.6. #27979 by Christian Lorentzen.
sklearn.metrics#
Efficiency Computing pairwise distances via
metrics.DistanceMetricfor CSR x CSR, Dense x CSR, and CSR x Dense datasets is now 1.5x faster. #26765 by Meekail Zain.Efficiency Computing distances via
metrics.DistanceMetricfor CSR x CSR, Dense x CSR, and CSR x Dense now uses ~50% less memory, and outputs distances in the same dtype as the provided data. #27006 by Meekail Zain.Enhancement Improve the rendering of the plot obtained with the
metrics.PrecisionRecallDisplayandmetrics.RocCurveDisplayclasses. The x- and y-axis limits are set to [0, 1] and the aspect ratio between both axes is set to be 1 to get a square plot. #26366 by Mojdeh Rastgoo.Enhancement Added
neg_root_mean_squared_log_error_scoreras scorer #26734 by Alejandro Martin Gil.Enhancement
metrics.confusion_matrixnow warns when only one label was found iny_trueandy_pred. #27650 by Lucy Liu.Fix computing pairwise distances with
metrics.pairwise.euclidean_distancesno longer raises an exception whenXis provided as afloat64array andX_norm_squaredas afloat32array. #27624 by Jérôme Dockès.Fix
f1_scorenow provides correct values when handling various cases in which division by zero occurs by using a formulation that does not depend on the precision and recall values. #27577 by Omar Salman and Guillaume Lemaitre.Fix
metrics.make_scorernow raises an error when using a regressor on a scorer requesting a non-thresholded decision function (fromdecision_functionorpredict_proba). Such scorers are specific to classification. #26840 by Guillaume Lemaitre.Fix
metrics.DetCurveDisplay.from_predictions,metrics.PrecisionRecallDisplay.from_predictions,metrics.PredictionErrorDisplay.from_predictions, andmetrics.RocCurveDisplay.from_predictionsnow return the correct type for subclasses. #27675 by John Cant.API Change Deprecated
needs_thresholdandneeds_probafrommetrics.make_scorer. These parameters will be removed in version 1.6. Instead, useresponse_methodthat accepts"predict","predict_proba"or"decision_function"or a list of such values.needs_proba=Trueis equivalent toresponse_method="predict_proba"andneeds_threshold=Trueis equivalent toresponse_method=("decision_function", "predict_proba"). #26840 by Guillaume Lemaitre.API Change The
squaredparameter ofmetrics.mean_squared_errorandmetrics.mean_squared_log_erroris deprecated and will be removed in 1.6. Use the new functionsmetrics.root_mean_squared_errorandmetrics.root_mean_squared_log_errorinstead. #26734 by Alejandro Martin Gil.
sklearn.model_selection#
Enhancement
model_selection.learning_curveraises a warning when every cross validation fold fails. #26299 by Rahil Parikh.Fix
model_selection.GridSearchCV,model_selection.RandomizedSearchCV, andmodel_selection.HalvingGridSearchCVnow don’t change the given object in the parameter grid if it’s an estimator. #26786 by Adrin Jalali.
sklearn.multioutput#
Enhancement Add method
predict_log_probatomultioutput.ClassifierChain. #27720 by Guillaume Lemaitre.
sklearn.neighbors#
Efficiency
sklearn.neighbors.KNeighborsRegressor.predictandsklearn.neighbors.KNeighborsClassifier.predict_probanow efficiently support pairs of dense and sparse datasets. #27018 by Julien Jerphanion.Efficiency The performance of
neighbors.RadiusNeighborsClassifier.predictand ofneighbors.RadiusNeighborsClassifier.predict_probahas been improved whenradiusis large andalgorithm="brute"with non-Euclidean metrics. #26828 by Omar Salman.Fix Improve error message for
neighbors.LocalOutlierFactorwhen it is invoked withn_samples=n_neighbors. #23317 by Bharat Raghunathan.Fix
neighbors.KNeighborsClassifier.predictandneighbors.KNeighborsClassifier.predict_probanow raise an error when the weights of all neighbors of some sample are zero. This can happen whenweightsis a user-defined function. #26410 by Yao Xiao.API Change
neighbors.KNeighborsRegressornow acceptsmetrics.DistanceMetricobjects directly via themetrickeyword argument allowing for the use of accelerated third-partymetrics.DistanceMetricobjects. #26267 by Meekail Zain.
sklearn.preprocessing#
Efficiency
preprocessing.OrdinalEncoderavoids calculating missing indices twice to improve efficiency. #27017 by Xuefeng Xu.Efficiency Improves efficiency in
preprocessing.OneHotEncoderandpreprocessing.OrdinalEncoderin checkingnan. #27760 by Xuefeng Xu.Enhancement Improves warnings in
preprocessing.FunctionTransformerwhenfuncreturns a pandas dataframe and the output is configured to be pandas. #26944 by Thomas Fan.Enhancement
preprocessing.TargetEncodernow supportstarget_type‘multiclass’. #26674 by Lucy Liu.Fix
preprocessing.OneHotEncoderandpreprocessing.OrdinalEncoderraise an exception whennanis a category and is not the last in the user’s provided categories. #27309 by Xuefeng Xu.Fix
preprocessing.OneHotEncoderandpreprocessing.OrdinalEncoderraise an exception if the user provided categories contain duplicates. #27328 by Xuefeng Xu.Fix
preprocessing.FunctionTransformerraises an error attransformif the output ofget_feature_names_outis not consistent with the column names of the output container if those are defined. #27801 by Guillaume Lemaitre.Fix Raise a
NotFittedErrorinpreprocessing.OrdinalEncoderwhen callingtransformwithout callingfitsincecategoriesalways requires to be checked. #27821 by Guillaume Lemaitre.
sklearn.tree#
Feature
tree.DecisionTreeClassifier,tree.DecisionTreeRegressor,tree.ExtraTreeClassifierandtree.ExtraTreeRegressornow support monotonic constraints, useful when features are supposed to have a positive/negative effect on the target. Missing values in the train data and multi-output targets are not supported. #13649 by Samuel Ronsin, initiated by Patrick O’Reilly.
sklearn.utils#
Enhancement
sklearn.utils.estimator_html_reprdynamically adapts diagram colors based on the browser’sprefers-color-scheme, providing improved adaptability to dark mode environments. #26862 by Andrew Goh Yisheng, Thomas Fan, Adrin Jalali.Enhancement
MetadataRequestandMetadataRouternow have aconsumesmethod which can be used to check whether a given set of parameters would be consumed. #26831 by Adrin Jalali.Enhancement Make
sklearn.utils.check_arrayattempt to outputint32-indexed CSR and COO arrays when converting from DIA arrays if the number of non-zero entries is small enough. This ensures that estimators implemented in Cython and that do not acceptint64-indexed sparse datastucture, now consistently accept the same sparse input formats for SciPy sparse matrices and arrays. #27372 by Guillaume Lemaitre.Fix
sklearn.utils.check_arrayshould accept both matrix and array from the sparse SciPy module. The previous implementation would fail ifcopy=Trueby calling specific NumPynp.may_share_memorythat does not work with SciPy sparse array and does not return the correct result for SciPy sparse matrix. #27336 by Guillaume Lemaitre.Fix
check_estimators_picklewithreadonly_memmap=Truenow relies on joblib’s own capability to allocate aligned memory mapped arrays when loading a serialized estimator instead of calling a dedicated private function that would crash when OpenBLAS misdetects the CPU architecture. #27614 by Olivier Grisel.Fix Error message in
check_arraywhen a sparse matrix was passed butaccept_sparseisFalsenow suggests to use.toarray()and notX.toarray(). #27757 by Lucy Liu.Fix Fix the function
check_arrayto output the right error message when the input is a Series instead of a DataFrame. #28090 by Stan Furrer and Yao Xiao.API Change
sklearn.utils.extmath.log_logisticis deprecated and will be removed in 1.6. Use-np.logaddexp(0, -x)instead. #27544 by Christian Lorentzen.
Code and documentation contributors
Thanks to everyone who has contributed to the maintenance and improvement of the project since version 1.3, including:
101AlexMartin, Abhishek Singh Kushwah, Adam Li, Adarsh Wase, Adrin Jalali, Advik Sinha, Alex, Alexander Al-Feghali, Alexis IMBERT, AlexL, Alex Molas, Anam Fatima, Andrew Goh, andyscanzio, Aniket Patil, Artem Kislovskiy, Arturo Amor, ashah002, avm19, Ben Holmes, Ben Mares, Benoit Chevallier-Mames, Bharat Raghunathan, Binesh Bannerjee, Brendan Lu, Brevin Kunde, Camille Troillard, Carlo Lemos, Chad Parmet, Christian Clauss, Christian Lorentzen, Christian Veenhuis, Christos Aridas, Cindy Liang, Claudio Salvatore Arcidiacono, Connor Boyle, cynthias13w, DaminK, Daniele Ongari, Daniel Schmitz, Daniel Tinoco, David Brochart, Deborah L. Haar, DevanshKyada27, Dimitri Papadopoulos Orfanos, Dmitry Nesterov, DUONG, Edoardo Abati, Eitan Hemed, Elabonga Atuo, Elisabeth Günther, Emma Carballal, Emmanuel Ferdman, epimorphic, Erwan Le Floch, Fabian Egli, Filip Karlo Došilović, Florian Idelberger, Franck Charras, Gael Varoquaux, Ganesh Tata, Hleb Levitski, Guillaume Lemaitre, Haoying Zhang, Harmanan Kohli, Ily, ioangatop, IsaacTrost, Isaac Virshup, Iwona Zdzieblo, Jakub Kaczmarzyk, James McDermott, Jarrod Millman, JB Mountford, Jérémie du Boisberranger, Jérôme Dockès, Jiawei Zhang, Joel Nothman, John Cant, John Hopfensperger, Jona Sassenhagen, Jon Nordby, Julien Jerphanion, Kennedy Waweru, kevin moore, Kian Eliasi, Kishan Ved, Konstantinos Pitas, Koustav Ghosh, Kushan Sharma, ldwy4, Linus, Lohit SundaramahaLingam, Loic Esteve, Lorenz, Louis Fouquet, Lucy Liu, Luis Silvestrin, Lukáš Folwarczný, Lukas Geiger, Malte Londschien, Marcus Fraaß, Marek Hanuš, Maren Westermann, Mark Elliot, Martin Larralde, Mateusz Sokół, mathurinm, mecopur, Meekail Zain, Michael Higgins, Miki Watanabe, Milton Gomez, MN193, Mohammed Hamdy, Mohit Joshi, mrastgoo, Naman Dhingra, Naoise Holohan, Narendra Singh dangi, Noa Malem-Shinitski, Nolan, Nurseit Kamchyev, Oleksii Kachaiev, Olivier Grisel, Omar Salman, partev, Peter Hull, Peter Steinbach, Pierre de Fréminville, Pooja Subramaniam, Puneeth K, qmarcou, Quentin Barthélemy, Rahil Parikh, Rahul Mahajan, Raj Pulapakura, Raphael, Ricardo Peres, Riccardo Cappuzzo, Roman Lutz, Salim Dohri, Samuel O. Ronsin, Sandip Dutta, Sayed Qaiser Ali, scaja, scikit-learn-bot, Sebastian Berg, Shreesha Kumar Bhat, Shubhal Gupta, Søren Fuglede Jørgensen, Stefanie Senger, Tamara, Tanjina Afroj, THARAK HEGDE, thebabush, Thomas J. Fan, Thomas Roehr, Tialo, Tim Head, tongyu, Venkatachalam N, Vijeth Moudgalya, Vincent M, Vivek Reddy P, Vladimir Fokow, Xiao Yuan, Xuefeng Xu, Yang Tao, Yao Xiao, Yuchen Zhou, Yuusuke Hiramatsu