sklearn.datasets#

Utilities to load popular datasets and artificial data generators.

User guide. see the Dataset loading utilities section for further details.

Loaders#

clear_data_home

Delete all the content of the data home cache.

dump_svmlight_file

Dump the dataset in svmlight / libsvm file format.

fetch_20newsgroups

Load the filenames and data from the 20 newsgroups dataset (classification).

fetch_20newsgroups_vectorized

Load and vectorize the 20 newsgroups dataset (classification).

fetch_california_housing

Load the California housing dataset (regression).

fetch_covtype

Load the covertype dataset (classification).

fetch_file

Fetch a file from the web if not already present in the local folder.

fetch_kddcup99

Load the kddcup99 dataset (classification).

fetch_lfw_pairs

Load the Labeled Faces in the Wild (LFW) pairs dataset (classification).

fetch_lfw_people

Load the Labeled Faces in the Wild (LFW) people dataset (classification).

fetch_olivetti_faces

Load the Olivetti faces data-set from AT&T (classification).

fetch_openml

Fetch dataset from openml by name or dataset id.

fetch_rcv1

Load the RCV1 multilabel dataset (classification).

fetch_species_distributions

Loader for species distribution dataset from Phillips et.

get_data_home

Return the path of the scikit-learn data directory.

load_breast_cancer

Load and return the breast cancer wisconsin dataset (classification).

load_diabetes

Load and return the diabetes dataset (regression).

load_digits

Load and return the digits dataset (classification).

load_files

Load text files with categories as subfolder names.

load_iris

Load and return the iris dataset (classification).

load_linnerud

Load and return the physical exercise Linnerud dataset.

load_sample_image

Load the numpy array of a single sample image.

load_sample_images

Load sample images for image manipulation.

load_svmlight_file

Load datasets in the svmlight / libsvm format into sparse CsR matrix.

load_svmlight_files

Load dataset from multiple files in sVMlight format.

load_wine

Load and return the wine dataset (classification).

sample generators#

make_biclusters

Generate a constant block diagonal structure array for biclustering.

make_blobs

Generate isotropic Gaussian blobs for clustering.

make_checkerboard

Generate an array with block checkerboard structure for biclustering.

make_circles

Make a large circle containing a smaller circle in 2d.

make_classification

Generate a random n-class classification problem.

make_friedman1

Generate the "Friedman #1" regression problem.

make_friedman2

Generate the "Friedman #2" regression problem.

make_friedman3

Generate the "Friedman #3" regression problem.

make_gaussian_quantiles

Generate isotropic Gaussian and label samples by quantile.

make_hastie_10_2

Generate data for binary classification used in Hastie et al. 2009, Example 10.2.

make_low_rank_matrix

Generate a mostly low rank matrix with bell-shaped singular values.

make_moons

Make two interleaving half circles.

make_multilabel_classification

Generate a random multilabel classification problem.

make_regression

Generate a random regression problem.

make_s_curve

Generate an s curve dataset.

make_sparse_coded_signal

Generate a signal as a sparse combination of dictionary elements.

make_sparse_spd_matrix

Generate a sparse symmetric definite positive matrix.

make_sparse_uncorrelated

Generate a random regression problem with sparse uncorrelated design.

make_spd_matrix

Generate a random symmetric, positive-definite matrix.

make_swiss_roll

Generate a swiss roll dataset.