Version 0.13#
Version 0.13.1#
February 23, 2013
The 0.13.1 release only fixes some bugs and does not add any new functionality.
Changelog#
Fixed a testing error caused by the function
cross_validation.train_test_split
being interpreted as a test by Yaroslav Halchenko.Fixed a bug in the reassignment of small clusters in the
cluster.MiniBatchKMeans
by Gael Varoquaux.Fixed default value of
gamma
indecomposition.KernelPCA
by Lars Buitinck.Updated joblib to
0.7.0d
by Gael Varoquaux.Fixed scaling of the deviance in
ensemble.GradientBoostingClassifier
by Peter Prettenhofer.Better tie-breaking in
multiclass.OneVsOneClassifier
by Andreas Müller.Other small improvements to tests and documentation.
People#
List of contributors for release 0.13.1 by number of commits.
5 Robert Marchman
2 Hrishikesh Huilgolkar
1 Bastiaan van den Berg
1 Diego Molla
1 Rafael Cunha de Almeida
1 Rolando Espinoza La fuente
Version 0.13#
January 21, 2013
New Estimator Classes#
dummy.DummyClassifier
anddummy.DummyRegressor
, two data-independent predictors by Mathieu Blondel. Useful to sanity-check your estimators. See Dummy estimators in the user guide. Multioutput support added by Arnaud Joly.decomposition.FactorAnalysis
, a transformer implementing the classical factor analysis, by Christian Osendorfer and Alexandre Gramfort. See Factor Analysis in the user guide.feature_extraction.FeatureHasher
, a transformer implementing the “hashing trick” for fast, low-memory feature extraction from string fields by Lars Buitinck andfeature_extraction.text.HashingVectorizer
for text documents by Olivier Grisel See Feature hashing and Vectorizing a large text corpus with the hashing trick for the documentation and sample usage.pipeline.FeatureUnion
, a transformer that concatenates results of several other transformers by Andreas Müller. See FeatureUnion: composite feature spaces in the user guide.random_projection.GaussianRandomProjection
,random_projection.SparseRandomProjection
and the functionrandom_projection.johnson_lindenstrauss_min_dim
. The first two are transformers implementing Gaussian and sparse random projection matrix by Olivier Grisel and Arnaud Joly. See Random Projection in the user guide.kernel_approximation.Nystroem
, a transformer for approximating arbitrary kernels by Andreas Müller. See Nystroem Method for Kernel Approximation in the user guide.preprocessing.OneHotEncoder
, a transformer that computes binary encodings of categorical features by Andreas Müller. See Encoding categorical features in the user guide.linear_model.PassiveAggressiveClassifier
andlinear_model.PassiveAggressiveRegressor
, predictors implementing an efficient stochastic optimization for linear models by Rob Zinkov and Mathieu Blondel. See Passive Aggressive Algorithms in the user guide.ensemble.RandomTreesEmbedding
, a transformer for creating high-dimensional sparse representations using ensembles of totally random trees by Andreas Müller. See Totally Random Trees Embedding in the user guide.manifold.SpectralEmbedding
and functionmanifold.spectral_embedding
, implementing the “laplacian eigenmaps” transformation for non-linear dimensionality reduction by Wei Li. See Spectral Embedding in the user guide.isotonic.IsotonicRegression
by Fabian Pedregosa, Alexandre Gramfort and Nelle Varoquaux,
Changelog#
metrics.zero_one_loss
(formerlymetrics.zero_one
) now has option for normalized output that reports the fraction of misclassifications, rather than the raw number of misclassifications. By Kyle Beauchamp.tree.DecisionTreeClassifier
and all derived ensemble models now support sample weighting, by Noel Dawe and Gilles Louppe.Speedup improvement when using bootstrap samples in forests of randomized trees, by Peter Prettenhofer and Gilles Louppe.
Partial dependence plots for Gradient-boosted trees in
ensemble.partial_dependence.partial_dependence
by Peter Prettenhofer. See Partial Dependence and Individual Conditional Expectation Plots for an example.The table of contents on the website has now been made expandable by Jaques Grobler.
feature_selection.SelectPercentile
now breaks ties deterministically instead of returning all equally ranked features.feature_selection.SelectKBest
andfeature_selection.SelectPercentile
are more numerically stable since they use scores, rather than p-values, to rank results. This means that they might sometimes select different features than they did previously.Ridge regression and ridge classification fitting with
sparse_cg
solver no longer has quadratic memory complexity, by Lars Buitinck and Fabian Pedregosa.Ridge regression and ridge classification now support a new fast solver called
lsqr
, by Mathieu Blondel.Speed up of
metrics.precision_recall_curve
by Conrad Lee.Added support for reading/writing svmlight files with pairwise preference attribute (qid in svmlight file format) in
datasets.dump_svmlight_file
anddatasets.load_svmlight_file
by Fabian Pedregosa.Faster and more robust
metrics.confusion_matrix
and Clustering performance evaluation by Wei Li.cross_validation.cross_val_score
now works with precomputed kernels and affinity matrices, by Andreas Müller.LARS algorithm made more numerically stable with heuristics to drop regressors too correlated as well as to stop the path when numerical noise becomes predominant, by Gael Varoquaux.
Faster implementation of
metrics.precision_recall_curve
by Conrad Lee.New kernel
metrics.chi2_kernel
by Andreas Müller, often used in computer vision applications.Fix of longstanding bug in
naive_bayes.BernoulliNB
fixed by Shaun Jackman.Implemented
predict_proba
inmulticlass.OneVsRestClassifier
, by Andrew Winterman.Improve consistency in gradient boosting: estimators
ensemble.GradientBoostingRegressor
andensemble.GradientBoostingClassifier
use the estimatortree.DecisionTreeRegressor
instead of thetree._tree.Tree
data structure by Arnaud Joly.Fixed a floating point exception in the decision trees module, by Seberg.
Fix
metrics.roc_curve
fails when y_true has only one class by Wei Li.Add the
metrics.mean_absolute_error
function which computes the mean absolute error. Themetrics.mean_squared_error
,metrics.mean_absolute_error
andmetrics.r2_score
metrics support multioutput by Arnaud Joly.Fixed
class_weight
support insvm.LinearSVC
andlinear_model.LogisticRegression
by Andreas Müller. The meaning ofclass_weight
was reversed as erroneously higher weight meant less positives of a given class in earlier releases.Improve narrative documentation and consistency in
sklearn.metrics
for regression and classification metrics by Arnaud Joly.Fixed a bug in
sklearn.svm.SVC
when using csr-matrices with unsorted indices by Xinfan Meng and Andreas Müller.cluster.MiniBatchKMeans
: Add random reassignment of cluster centers with little observations attached to them, by Gael Varoquaux.
API changes summary#
Renamed all occurrences of
n_atoms
ton_components
for consistency. This applies todecomposition.DictionaryLearning
,decomposition.MiniBatchDictionaryLearning
,decomposition.dict_learning
,decomposition.dict_learning_online
.Renamed all occurrences of
max_iters
tomax_iter
for consistency. This applies tosemi_supervised.LabelPropagation
andsemi_supervised.label_propagation.LabelSpreading
.Renamed all occurrences of
learn_rate
tolearning_rate
for consistency inensemble.BaseGradientBoosting
andensemble.GradientBoostingRegressor
.The module
sklearn.linear_model.sparse
is gone. Sparse matrix support was already integrated into the “regular” linear models.sklearn.metrics.mean_square_error
, which incorrectly returned the accumulated error, was removed. Usemetrics.mean_squared_error
instead.Passing
class_weight
parameters tofit
methods is no longer supported. Pass them to estimator constructors instead.GMMs no longer have
decode
andrvs
methods. Use thescore
,predict
orsample
methods instead.The
solver
fit option in Ridge regression and classification is now deprecated and will be removed in v0.14. Use the constructor option instead.feature_extraction.text.DictVectorizer
now returns sparse matrices in the CSR format, instead of COO.Renamed
k
incross_validation.KFold
andcross_validation.StratifiedKFold
ton_folds
, renamedn_bootstraps
ton_iter
incross_validation.Bootstrap
.Renamed all occurrences of
n_iterations
ton_iter
for consistency. This applies tocross_validation.ShuffleSplit
,cross_validation.StratifiedShuffleSplit
,utils.extmath.randomized_range_finder
andutils.extmath.randomized_svd
.Replaced
rho
inlinear_model.ElasticNet
andlinear_model.SGDClassifier
byl1_ratio
. Therho
parameter had different meanings;l1_ratio
was introduced to avoid confusion. It has the same meaning as previouslyrho
inlinear_model.ElasticNet
and(1-rho)
inlinear_model.SGDClassifier
.linear_model.LassoLars
andlinear_model.Lars
now store a list of paths in the case of multiple targets, rather than an array of paths.The attribute
gmm
ofhmm.GMMHMM
was renamed togmm_
to adhere more strictly with the API.cluster.spectral_embedding
was moved tomanifold.spectral_embedding
.Renamed
eig_tol
inmanifold.spectral_embedding
,cluster.SpectralClustering
toeigen_tol
, renamedmode
toeigen_solver
.Renamed
mode
inmanifold.spectral_embedding
andcluster.SpectralClustering
toeigen_solver
.classes_
andn_classes_
attributes oftree.DecisionTreeClassifier
and all derived ensemble models are now flat in case of single output problems and nested in case of multi-output problems.The
estimators_
attribute ofensemble.GradientBoostingRegressor
andensemble.GradientBoostingClassifier
is now an array oftree.DecisionTreeRegressor
.Renamed
chunk_size
tobatch_size
indecomposition.MiniBatchDictionaryLearning
anddecomposition.MiniBatchSparsePCA
for consistency.svm.SVC
andsvm.NuSVC
now provide aclasses_
attribute and support arbitrary dtypes for labelsy
. Also, the dtype returned bypredict
now reflects the dtype ofy
duringfit
(used to benp.float
).Changed default test_size in
cross_validation.train_test_split
to None, added possibility to infertest_size
fromtrain_size
incross_validation.ShuffleSplit
andcross_validation.StratifiedShuffleSplit
.Renamed function
sklearn.metrics.zero_one
tosklearn.metrics.zero_one_loss
. Be aware that the default behavior insklearn.metrics.zero_one_loss
is different fromsklearn.metrics.zero_one
:normalize=False
is changed tonormalize=True
.Renamed function
metrics.zero_one_score
tometrics.accuracy_score
.datasets.make_circles
now has the same number of inner and outer points.In the Naive Bayes classifiers, the
class_prior
parameter was moved fromfit
to__init__
.
People#
List of contributors for release 0.13 by number of commits.
364 Andreas Müller
143 Arnaud Joly
131 Gael Varoquaux
117 Mathieu Blondel
108 Lars Buitinck
106 Wei Li
101 Olivier Grisel
65 Vlad Niculae
30 Rob Zinkov
19 Aymeric Masurelle
18 Andrew Winterman
17 Nelle Varoquaux
14 Daniel Nouri
13 syhw
10 Corey Lynch
10 Kyle Beauchamp
9 Brian Cheung
9 Immanuel Bayer
9 mr.Shu
8 Conrad Lee
7 Tadej Janež
6 Brian Cajes
6 Michael
6 Noel Dawe
6 Tiago Nunes
6 cow
5 Anze
5 Shiqiao Du
4 Christian Jauvin
4 Jacques Kvam
4 Richard T. Guy
3 Alexandre Abraham
3 Doug Coleman
3 Scott Dickerson
2 ApproximateIdentity
2 John Benediktsson
2 Mark Veronda
2 Matti Lyra
2 Mikhail Korobov
2 Xinfan Meng
1 Alejandro Weinstein
1 Christoph Deil
1 Eugene Nizhibitsky
1 Kenneth C. Arnold
1 Luis Pedro Coelho
1 Miroslav Batchkarov
1 Pavel
1 Sebastian Berg
1 Shaun Jackman
1 Subhodeep Moitra
1 bob
1 dengemann
1 emanuele
1 x006