trustworthiness#
- sklearn.manifold.trustworthiness(X, X_embedded, *, n_neighbors=5, metric='euclidean')[source]#
Indicate to what extent the local structure is retained.
The trustworthiness is within [0, 1]. It is defined as
\[T(k) = 1 - \frac{2}{nk (2n - 3k - 1)} \sum^n_{i=1} \sum_{j \in \mathcal{N}_{i}^{k}} \max(0, (r(i, j) - k))\]where for each sample i, \(\mathcal{N}_{i}^{k}\) are its k nearest neighbors in the output space, and every sample j is its \(r(i, j)\)-th nearest neighbor in the input space. In other words, any unexpected nearest neighbors in the output space are penalised in proportion to their rank in the input space.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features) or
(n_samples, n_samples) If the metric is ‘precomputed’ X must be a square distance matrix. Otherwise it contains a sample per row.
- X_embedded{array-like, sparse matrix} of shape (n_samples, n_components)
Embedding of the training data in low-dimensional space.
- n_neighborsint, default=5
The number of neighbors that will be considered. Should be fewer than
n_samples / 2
to ensure the trustworthiness to lies within [0, 1], as mentioned in [1]. An error will be raised otherwise.- metricstr or callable, default=’euclidean’
Which metric to use for computing pairwise distances between samples from the original input space. If metric is ‘precomputed’, X must be a matrix of pairwise distances or squared distances. Otherwise, for a list of available metrics, see the documentation of argument metric in
sklearn.pairwise.pairwise_distances
and metrics listed insklearn.metrics.pairwise.PAIRWISE_DISTANCE_FUNCTIONS
. Note that the “cosine” metric usescosine_distances
.Added in version 0.20.
- Returns:
- trustworthinessfloat
Trustworthiness of the low-dimensional embedding.
References
[1]Jarkko Venna and Samuel Kaski. 2001. Neighborhood Preservation in Nonlinear Projection Methods: An Experimental Study. In Proceedings of the International Conference on Artificial Neural Networks (ICANN ‘01). Springer-Verlag, Berlin, Heidelberg, 485-491.
[2]Laurens van der Maaten. Learning a Parametric Embedding by Preserving Local Structure. Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, PMLR 5:384-391, 2009.
Examples
>>> from sklearn.datasets import make_blobs >>> from sklearn.decomposition import PCA >>> from sklearn.manifold import trustworthiness >>> X, _ = make_blobs(n_samples=100, n_features=10, centers=3, random_state=42) >>> X_embedded = PCA(n_components=2).fit_transform(X) >>> print(f"{trustworthiness(X, X_embedded, n_neighbors=5):.2f}") 0.92