Utilities to load popular datasets and artificial data generators.
User guide. see the Dataset loading utilities section for further details.
Delete all the content of the data home cache. |
Dump the dataset in svmlight / libsvm file format. |
Load the filenames and data from the 20 newsgroups dataset (classification). |
Load and vectorize the 20 newsgroups dataset (classification). |
Load the California housing dataset (regression). |
Load the covertype dataset (classification). |
Fetch a file from the web if not already present in the local folder. |
Load the kddcup99 dataset (classification). |
Load the Labeled Faces in the Wild (LFW) pairs dataset (classification). |
Load the Labeled Faces in the Wild (LFW) people dataset (classification). |
Load the Olivetti faces data-set from AT&T (classification). |
Fetch dataset from openml by name or dataset id. |
Load the RCV1 multilabel dataset (classification). |
Loader for species distribution dataset from Phillips et. |
Return the path of the scikit-learn data directory. |
Load and return the breast cancer wisconsin dataset (classification). |
Load and return the diabetes dataset (regression). |
Load and return the digits dataset (classification). |
Load text files with categories as subfolder names. |
Load and return the iris dataset (classification). |
Load and return the physical exercise Linnerud dataset. |
Load the numpy array of a single sample image. |
Load sample images for image manipulation. |
Load datasets in the svmlight / libsvm format into sparse CsR matrix. |
Load dataset from multiple files in sVMlight format. |
Load and return the wine dataset (classification). |
sample generators#
Generate a constant block diagonal structure array for biclustering. |
Generate isotropic Gaussian blobs for clustering. |
Generate an array with block checkerboard structure for biclustering. |
Make a large circle containing a smaller circle in 2d. |
Generate a random n-class classification problem. |
Generate the "Friedman #1" regression problem. |
Generate the "Friedman #2" regression problem. |
Generate the "Friedman #3" regression problem. |
Generate isotropic Gaussian and label samples by quantile. |
Generate data for binary classification used in Hastie et al. 2009, Example 10.2. |
Generate a mostly low rank matrix with bell-shaped singular values. |
Make two interleaving half circles. |
Generate a random multilabel classification problem. |
Generate a random regression problem. |
Generate an s curve dataset. |
Generate a signal as a sparse combination of dictionary elements. |
Generate a sparse symmetric definite positive matrix. |
Generate a random regression problem with sparse uncorrelated design. |
Generate a random symmetric, positive-definite matrix. |
Generate a swiss roll dataset. |