pair_confusion_matrix#

sklearn.metrics.cluster.pair_confusion_matrix(labels_true, labels_pred)[source]#

Pair confusion matrix arising from two clusterings.

The pair confusion matrix \(C\) computes a 2 by 2 similarity matrix between two clusterings by considering all pairs of samples and counting pairs that are assigned into the same or into different clusters under the true and predicted clusterings [1].

Considering a pair of samples that is clustered together a positive pair, then as in binary classification the count of true negatives is \(C_{00}\), false negatives is \(C_{10}\), true positives is \(C_{11}\) and false positives is \(C_{01}\).

See also

sklearn.metrics.rand_score: Rand Score.
sklearn.metrics.adjusted_rand_score: Adjusted Rand Score.
sklearn.metrics.adjusted_mutual_info_score: Adjusted Mutual Information.

References

[1]

Hubert, L., Arabie, P. “Comparing partitions.” Journal of Classification 2, 193–218 (1985).

Examples

Perfectly matching labelings have all non-zero entries on the diagonal regardless of actual label values:

>>> from sklearn.metrics.cluster import pair_confusion_matrix
>>> pair_confusion_matrix([0, 0, 1, 1], [1, 1, 0, 0])
array([[8, 0],
       [0, 4]]...

Labelings that assign all classes members to the same clusters are complete but may be not always pure, hence penalized, and have some off-diagonal non-zero entries:

>>> pair_confusion_matrix([0, 0, 1, 2], [0, 0, 1, 1])
array([[8, 2],
       [0, 2]]...

Note that the matrix is not symmetric.