fetch_lfw_people#

sklearn.datasets.fetch_lfw_people(*, data_home=None, funneled=True, resize=0.5, min_faces_per_person=0, color=False, slice_=(slice(70, 195, None), slice(78, 172, None)), download_if_missing=True, return_X_y=False, n_retries=3, delay=1.0)[source]#

Load the Labeled Faces in the Wild (LFW) people dataset (classification).

Download it if necessary.

Classes

5749

Samples total

13233

Dimensionality

5828

Features

real, between 0 and 255

For a usage example of this dataset, see Faces recognition example using eigenfaces and SVMs.

Read more in the User Guide.

Parameters:
data_homestr or path-like, default=None

Specify another download and cache folder for the datasets. By default all scikit-learn data is stored in ‘~/scikit_learn_data’ subfolders.

funneledbool, default=True

Download and use the funneled variant of the dataset.

resizefloat or None, default=0.5

Ratio used to resize the each face picture. If None, no resizing is performed.

min_faces_per_personint, default=None

The extracted dataset will only retain pictures of people that have at least min_faces_per_person different pictures.

colorbool, default=False

Keep the 3 RGB channels instead of averaging them to a single gray level channel. If color is True the shape of the data has one more dimension than the shape with color = False.

slice_tuple of slice, default=(slice(70, 195), slice(78, 172))

Provide a custom 2D slice (height, width) to extract the ‘interesting’ part of the jpeg files and avoid use statistical correlation from the background.

download_if_missingbool, default=True

If False, raise an OSError if the data is not locally available instead of trying to download the data from the source site.

return_X_ybool, default=False

If True, returns (dataset.data, dataset.target) instead of a Bunch object. See below for more information about the dataset.data and dataset.target object.

Added in version 0.20.

n_retriesint, default=3

Number of retries when HTTP errors are encountered.

Added in version 1.5.

delayfloat, default=1.0

Number of seconds between retries.

Added in version 1.5.

Returns:
datasetBunch

Dictionary-like object, with the following attributes.

datanumpy array of shape (13233, 2914)

Each row corresponds to a ravelled face image of original size 62 x 47 pixels. Changing the slice_ or resize parameters will change the shape of the output.

imagesnumpy array of shape (13233, 62, 47)

Each row is a face image corresponding to one of the 5749 people in the dataset. Changing the slice_ or resize parameters will change the shape of the output.

targetnumpy array of shape (13233,)

Labels associated to each face image. Those labels range from 0-5748 and correspond to the person IDs.

target_namesnumpy array of shape (5749,)

Names of all persons in the dataset. Position in array corresponds to the person ID in the target array.

DESCRstr

Description of the Labeled Faces in the Wild (LFW) dataset.

(data, target)tuple if return_X_y is True

A tuple of two ndarray. The first containing a 2D array of shape (n_samples, n_features) with each row representing one sample and each column representing the features. The second ndarray of shape (n_samples,) containing the target samples.

Added in version 0.20.

Examples

>>> from sklearn.datasets import fetch_lfw_people
>>> lfw_people = fetch_lfw_people()
>>> lfw_people.data.shape
(13233, 2914)
>>> lfw_people.target.shape
(13233,)
>>> for name in lfw_people.target_names[:5]:
...    print(name)
AJ Cook
AJ Lamas
Aaron Eckhart
Aaron Guiel
Aaron Patterson