nan_euclidean_distances#

sklearn.metrics.pairwise.nan_euclidean_distances(X, Y=None, *, squared=False, missing_values=nan, copy=True)[source]#

Calculate the euclidean distances in the presence of missing values.

Compute the euclidean distance between each pair of samples in X and Y, where Y=X is assumed if Y=None. When calculating the distance between a pair of samples, this formulation ignores feature coordinates with a missing value in either sample and scales up the weight of the remaining coordinates:

dist(x,y) = sqrt(weight * sq. distance from present coordinates) where, weight = Total # of coordinates / # of present coordinates

For example, the distance between [3, na, na, 6] and [1, na, 4, 5] is:

\[\sqrt{\frac{4}{2}((3-1)^2 + (6-5)^2)}\]

If all the coordinates are missing or if there are no common present coordinates then NaN is returned for that pair.

Read more in the User Guide.

Added in version 0.22.

Parameters:
Xarray-like of shape (n_samples_X, n_features)

An array where each row is a sample and each column is a feature.

Yarray-like of shape (n_samples_Y, n_features), default=None

An array where each row is a sample and each column is a feature. If None, method uses Y=X.

squaredbool, default=False

Return squared Euclidean distances.

missing_valuesnp.nan, float or int, default=np.nan

Representation of missing value.

copybool, default=True

Make and use a deep copy of X and Y (if Y exists).

Returns:
distancesndarray of shape (n_samples_X, n_samples_Y)

Returns the distances between the row vectors of X and the row vectors of Y.

See also

paired_distances

Distances between pairs of elements of X and Y.

References

Examples

>>> from sklearn.metrics.pairwise import nan_euclidean_distances
>>> nan = float("NaN")
>>> X = [[0, 1], [1, nan]]
>>> nan_euclidean_distances(X, X) # distance between rows of X
array([[0.        , 1.41421356],
       [1.41421356, 0.        ]])
>>> # get distance to origin
>>> nan_euclidean_distances(X, [[0, 0]])
array([[1.        ],
       [1.41421356]])