trustworthiness#

sklearn.manifold.trustworthiness(X, X_embedded, *, n_neighbors=5, metric='euclidean')[source]#

Indicate to what extent the local structure is retained.

The trustworthiness is within [0, 1]. It is defined as

\[T(k) = 1 - \frac{2}{nk (2n - 3k - 1)} \sum^n_{i=1} \sum_{j \in \mathcal{N}_{i}^{k}} \max(0, (r(i, j) - k))\]

where for each sample i, \(\mathcal{N}_{i}^{k}\) are its k nearest neighbors in the output space, and every sample j is its \(r(i, j)\)-th nearest neighbor in the input space. In other words, any unexpected nearest neighbors in the output space are penalised in proportion to their rank in the input space.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features) or

(n_samples, n_samples) If the metric is ‘precomputed’ X must be a square distance matrix. Otherwise it contains a sample per row.

X_embedded{array-like, sparse matrix} of shape (n_samples, n_components)

Embedding of the training data in low-dimensional space.

n_neighborsint, default=5

The number of neighbors that will be considered. Should be fewer than n_samples / 2 to ensure the trustworthiness to lies within [0, 1], as mentioned in [1]. An error will be raised otherwise.

metricstr or callable, default=’euclidean’

Which metric to use for computing pairwise distances between samples from the original input space. If metric is ‘precomputed’, X must be a matrix of pairwise distances or squared distances. Otherwise, for a list of available metrics, see the documentation of argument metric in sklearn.pairwise.pairwise_distances and metrics listed in sklearn.metrics.pairwise.PAIRWISE_DISTANCE_fUNCTIONS. Note that the “cosine” metric uses cosine_distances.

Added in version 0.20.

Returns:
trustworthinessfloat

Trustworthiness of the low-dimensional embedding.

References

[1]

Jarkko Venna and Samuel Kaski. 2001. Neighborhood Preservation in Nonlinear Projection Methods: An Experimental Study. In Proceedings of the International Conference on Artificial Neural Networks (ICANN ‘01). Springer-Verlag, Berlin, Heidelberg, 485-491.

[2]

Laurens van der Maaten. Learning a Parametric Embedding by Preserving Local Structure. Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, PMLR 5:384-391, 2009.

Examples

>>> from sklearn.datasets import make_blobs
>>> from sklearn.decomposition import PCA
>>> from sklearn.manifold import trustworthiness
>>> X, _ = make_blobs(n_samples=100, n_features=10, centers=3, random_state=42)
>>> X_embedded = PCA(n_components=2).fit_transform(X)
>>> print(f"{trustworthiness(X, X_embedded, n_neighbors=5):.2f}")
0.92