Version 0.13#
Version 0.13.1#
February 23, 2013
The 0.13.1 release only fixes some bugs and does not add any new functionality.
Changelog#
Fixed a testing error caused by the function
cross_validation.train_test_splitbeing interpreted as a test by Yaroslav Halchenko.Fixed a bug in the reassignment of small clusters in the
cluster.MiniBatchKMeansby Gael Varoquaux.Fixed default value of
gammaindecomposition.KernelPCAby Lars Buitinck.Updated joblib to
0.7.0dby Gael Varoquaux.Fixed scaling of the deviance in
ensemble.GradientBoostingClassifierby Peter Prettenhofer.Better tie-breaking in
multiclass.OneVsOneClassifierby Andreas Müller.Other small improvements to tests and documentation.
People#
List of contributors for release 0.13.1 by number of commits.
5 Robert Marchman
2 Hrishikesh Huilgolkar
1 Bastiaan van den Berg
1 Diego Molla
1 Rafael Cunha de Almeida
1 Rolando Espinoza La fuente
Version 0.13#
January 21, 2013
New Estimator Classes#
dummy.DummyClassifieranddummy.DummyRegressor, two data-independent predictors by Mathieu Blondel. Useful to sanity-check your estimators. See Dummy estimators in the user guide. Multioutput support added by Arnaud Joly.decomposition.FactorAnalysis, a transformer implementing the classical factor analysis, by Christian Osendorfer and Alexandre Gramfort. See Factor Analysis in the user guide.feature_extraction.FeatureHasher, a transformer implementing the “hashing trick” for fast, low-memory feature extraction from string fields by Lars Buitinck andfeature_extraction.text.HashingVectorizerfor text documents by Olivier Grisel See Feature hashing and Vectorizing a large text corpus with the hashing trick for the documentation and sample usage.pipeline.FeatureUnion, a transformer that concatenates results of several other transformers by Andreas Müller. See FeatureUnion: composite feature spaces in the user guide.random_projection.GaussianRandomProjection,random_projection.SparseRandomProjectionand the functionrandom_projection.johnson_lindenstrauss_min_dim. The first two are transformers implementing Gaussian and sparse random projection matrix by Olivier Grisel and Arnaud Joly. See Random Projection in the user guide.kernel_approximation.Nystroem, a transformer for approximating arbitrary kernels by Andreas Müller. See Nystroem Method for Kernel Approximation in the user guide.preprocessing.OneHotEncoder, a transformer that computes binary encodings of categorical features by Andreas Müller. See Encoding categorical features in the user guide.linear_model.PassiveAggressiveClassifierandlinear_model.PassiveAggressiveRegressor, predictors implementing an efficient stochastic optimization for linear models by Rob Zinkov and Mathieu Blondel. See Passive Aggressive Algorithms in the user guide.ensemble.RandomTreesEmbedding, a transformer for creating high-dimensional sparse representations using ensembles of totally random trees by Andreas Müller. See Totally Random Trees Embedding in the user guide.manifold.SpectralEmbeddingand functionmanifold.spectral_embedding, implementing the “laplacian eigenmaps” transformation for non-linear dimensionality reduction by Wei Li. See Spectral Embedding in the user guide.isotonic.IsotonicRegressionby Fabian Pedregosa, Alexandre Gramfort and Nelle Varoquaux,
Changelog#
metrics.zero_one_loss(formerlymetrics.zero_one) now has an option for normalized output that reports the fraction of misclassifications, rather than the raw number of misclassifications. By Kyle Beauchamp.tree.DecisionTreeClassifierand all derived ensemble models now support sample weighting, by Noel Dawe and Gilles Louppe.Speedup improvement when using bootstrap samples in forests of randomized trees, by Peter Prettenhofer and Gilles Louppe.
Partial dependence plots for Gradient-boosted trees in
ensemble.partial_dependence.partial_dependenceby Peter Prettenhofer. See Partial Dependence and Individual Conditional Expectation Plots for an example.The table of contents on the website has now been made expandable by Jaques Grobler.
feature_selection.SelectPercentilenow breaks ties deterministically instead of returning all equally ranked features.feature_selection.SelectKBestandfeature_selection.SelectPercentileare more numerically stable since they use scores, rather than p-values, to rank results. This means that they might sometimes select different features than they did previously.Ridge regression and ridge classification fitting with
sparse_cgsolver no longer has quadratic memory complexity, by Lars Buitinck and Fabian Pedregosa.Ridge regression and ridge classification now support a new fast solver called
lsqr, by Mathieu Blondel.Speed up of
metrics.precision_recall_curveby Conrad Lee.Added support for reading/writing svmlight files with pairwise preference attribute (qid in svmlight file format) in
datasets.dump_svmlight_fileanddatasets.load_svmlight_fileby Fabian Pedregosa.Faster and more robust
metrics.confusion_matrixand Clustering performance evaluation by Wei Li.cross_validation.cross_val_scorenow works with precomputed kernels and affinity matrices, by Andreas Müller.LARS algorithm made more numerically stable with heuristics to drop regressors too correlated as well as to stop the path when numerical noise becomes predominant, by Gael Varoquaux.
Faster implementation of
metrics.precision_recall_curveby Conrad Lee.New kernel
metrics.chi2_kernelby Andreas Müller, often used in computer vision applications.Fix of longstanding bug in
naive_bayes.BernoulliNBfixed by Shaun Jackman.Implemented
predict_probainmulticlass.OneVsRestClassifier, by Andrew Winterman.Improve consistency in gradient boosting: estimators
ensemble.GradientBoostingRegressorandensemble.GradientBoostingClassifieruse the estimatortree.DecisionTreeRegressorinstead of thetree._tree.Treedata structure by Arnaud Joly.Fixed a floating point exception in the decision trees module, by Seberg.
Fix
metrics.roc_curvefails when y_true has only one class by Wei Li.Add the
metrics.mean_absolute_errorfunction which computes the mean absolute error. Themetrics.mean_squared_error,metrics.mean_absolute_errorandmetrics.r2_scoremetrics support multioutput by Arnaud Joly.Fixed
class_weightsupport insvm.LinearSVCandlinear_model.LogisticRegressionby Andreas Müller. The meaning ofclass_weightwas reversed as erroneously higher weight meant less positives of a given class in earlier releases.Improve narrative documentation and consistency in
sklearn.metricsfor regression and classification metrics by Arnaud Joly.Fixed a bug in
sklearn.svm.SVCwhen using csr-matrices with unsorted indices by Xinfan Meng and Andreas Müller.cluster.MiniBatchKMeans: Add random reassignment of cluster centers with little observations attached to them, by Gael Varoquaux.
API changes summary#
Renamed all occurrences of
n_atomston_componentsfor consistency. This applies todecomposition.DictionaryLearning,decomposition.MiniBatchDictionaryLearning,decomposition.dict_learning,decomposition.dict_learning_online.Renamed all occurrences of
max_iterstomax_iterfor consistency. This applies tosemi_supervised.LabelPropagationandsemi_supervised.label_propagation.LabelSpreading.Renamed all occurrences of
learn_ratetolearning_ratefor consistency inensemble.BaseGradientBoostingandensemble.GradientBoostingRegressor.The module
sklearn.linear_model.sparseis gone. Sparse matrix support was already integrated into the “regular” linear models.sklearn.metrics.mean_square_error, which incorrectly returned the accumulated error, was removed. Usemetrics.mean_squared_errorinstead.Passing
class_weightparameters tofitmethods is no longer supported. Pass them to estimator constructors instead.GMMs no longer have
decodeandrvsmethods. Use thescore,predictorsamplemethods instead.The
solverfit option in Ridge regression and classification is now deprecated and will be removed in v0.14. Use the constructor option instead.feature_extraction.text.DictVectorizernow returns sparse matrices in the CSR format, instead of COO.Renamed
kincross_validation.KFoldandcross_validation.StratifiedKFoldton_folds, renamedn_bootstrapston_iterincross_validation.Bootstrap.Renamed all occurrences of
n_iterationston_iterfor consistency. This applies tocross_validation.ShuffleSplit,cross_validation.StratifiedShuffleSplit,utils.extmath.randomized_range_finderandutils.extmath.randomized_svd.Replaced
rhoinlinear_model.ElasticNetandlinear_model.SGDClassifierbyl1_ratio. Therhoparameter had different meanings;l1_ratiowas introduced to avoid confusion. It has the same meaning as previouslyrhoinlinear_model.ElasticNetand(1-rho)inlinear_model.SGDClassifier.linear_model.LassoLarsandlinear_model.Larsnow store a list of paths in the case of multiple targets, rather than an array of paths.The attribute
gmmofhmm.GMMHMMwas renamed togmm_to adhere more strictly with the API.cluster.spectral_embeddingwas moved tomanifold.spectral_embedding.Renamed
eig_tolinmanifold.spectral_embedding,cluster.SpectralClusteringtoeigen_tol, renamedmodetoeigen_solver.Renamed
modeinmanifold.spectral_embeddingandcluster.SpectralClusteringtoeigen_solver.classes_andn_classes_attributes oftree.DecisionTreeClassifierand all derived ensemble models are now flat in case of single output problems and nested in case of multi-output problems.The
estimators_attribute ofensemble.GradientBoostingRegressorandensemble.GradientBoostingClassifieris now an array oftree.DecisionTreeRegressor.Renamed
chunk_sizetobatch_sizeindecomposition.MiniBatchDictionaryLearninganddecomposition.MiniBatchSparsePCAfor consistency.svm.SVCandsvm.NuSVCnow provide aclasses_attribute and support arbitrary dtypes for labelsy. Also, the dtype returned bypredictnow reflects the dtype ofyduringfit(used to benp.float).Changed default test_size in
cross_validation.train_test_splitto None, added possibility to infertest_sizefromtrain_sizeincross_validation.ShuffleSplitandcross_validation.StratifiedShuffleSplit.Renamed function
sklearn.metrics.zero_onetosklearn.metrics.zero_one_loss. Be aware that the default behavior insklearn.metrics.zero_one_lossis different fromsklearn.metrics.zero_one:normalize=Falseis changed tonormalize=True.Renamed function
metrics.zero_one_scoretometrics.accuracy_score.datasets.make_circlesnow has the same number of inner and outer points.In the Naive Bayes classifiers, the
class_priorparameter was moved fromfitto__init__.
People#
List of contributors for release 0.13 by number of commits.
364 Andreas Müller
143 Arnaud Joly
131 Gael Varoquaux
117 Mathieu Blondel
108 Lars Buitinck
106 Wei Li
101 Olivier Grisel
65 Vlad Niculae
30 Rob Zinkov
19 Aymeric Masurelle
18 Andrew Winterman
17 Nelle Varoquaux
14 Daniel Nouri
13 syhw
10 Corey Lynch
10 Kyle Beauchamp
9 Brian Cheung
9 Immanuel Bayer
9 mr.Shu
8 Conrad Lee
7 Tadej Janež
6 Brian Cajes
6 Michael
6 Noel Dawe
6 Tiago Nunes
6 cow
5 Anze
5 Shiqiao Du
4 Christian Jauvin
4 Jacques Kvam
4 Richard T. Guy
3 Alexandre Abraham
3 Doug Coleman
3 Scott Dickerson
2 ApproximateIdentity
2 John Benediktsson
2 Mark Veronda
2 Matti Lyra
2 Mikhail Korobov
2 Xinfan Meng
1 Alejandro Weinstein
1 Christoph Deil
1 Eugene Nizhibitsky
1 Kenneth C. Arnold
1 Luis Pedro Coelho
1 Miroslav Batchkarov
1 Pavel
1 Sebastian Berg
1 Shaun Jackman
1 Subhodeep Moitra
1 bob
1 dengemann
1 emanuele
1 x006