fetch_lfw_pairs#

sklearn.datasets.fetch_lfw_pairs(*, subset='train', data_home=None, funneled=True, resize=0.5, color=False, slice_=(slice(70, 195, None), slice(78, 172, None)), download_if_missing=True, n_retries=3, delay=1.0)[source]#

Load the Labeled Faces in the Wild (LFW) pairs dataset (classification).

Download it if necessary.

Classes

2

Samples total

13233

Dimensionality

5828

Features

real, between 0 and 255

In the official README.txt this task is described as the “Restricted” task. As I am not sure as to implement the “Unrestricted” variant correctly, I left it as unsupported for now.

The original images are 250 x 250 pixels, but the default slice and resize arguments reduce them to 62 x 47.

Read more in the User Guide.

Parameters:
subset{‘train’, ‘test’, ‘10_folds’}, default=’train’

Select the dataset to load: ‘train’ for the development training set, ‘test’ for the development test set, and ‘10_folds’ for the official evaluation set that is meant to be used with a 10-folds cross validation.

data_homestr or path-like, default=None

Specify another download and cache folder for the datasets. By default all scikit-learn data is stored in ‘~/scikit_learn_data’ subfolders.

funneledbool, default=True

Download and use the funneled variant of the dataset.

resizefloat, default=0.5

Ratio used to resize the each face picture.

colorbool, default=False

Keep the 3 RGB channels instead of averaging them to a single gray level channel. If color is True the shape of the data has one more dimension than the shape with color = False.

slice_tuple of slice, default=(slice(70, 195), slice(78, 172))

Provide a custom 2D slice (height, width) to extract the ‘interesting’ part of the jpeg files and avoid use statistical correlation from the background.

download_if_missingbool, default=True

If False, raise an OSError if the data is not locally available instead of trying to download the data from the source site.

n_retriesint, default=3

Number of retries when HTTP errors are encountered.

Added in version 1.5.

delayfloat, default=1.0

Number of seconds between retries.

Added in version 1.5.

Returns:
dataBunch

Dictionary-like object, with the following attributes.

datandarray of shape (2200, 5828). Shape depends on subset.

Each row corresponds to 2 ravel’d face images of original size 62 x 47 pixels. Changing the slice_, resize or subset parameters will change the shape of the output.

pairsndarray of shape (2200, 2, 62, 47). Shape depends on subset

Each row has 2 face images corresponding to same or different person from the dataset containing 5749 people. Changing the slice_, resize or subset parameters will change the shape of the output.

targetnumpy array of shape (2200,). Shape depends on subset.

Labels associated to each pair of images. The two label values being different persons or the same person.

target_namesnumpy array of shape (2,)

Explains the target values of the target array. 0 corresponds to “Different person”, 1 corresponds to “same person”.

DESCRstr

Description of the Labeled Faces in the Wild (LFW) dataset.

Examples

>>> from sklearn.datasets import fetch_lfw_pairs
>>> lfw_pairs_train = fetch_lfw_pairs(subset='train')
>>> list(lfw_pairs_train.target_names)
[np.str_('Different persons'), np.str_('Same person')]
>>> lfw_pairs_train.pairs.shape
(2200, 2, 62, 47)
>>> lfw_pairs_train.data.shape
(2200, 5828)
>>> lfw_pairs_train.target.shape
(2200,)