fowlkes_mallows_score#

sklearn.metrics.fowlkes_mallows_score(labels_true, labels_pred, *, sparse=False)[source]#

Measure the similarity of two clusterings of a set of points.

Added in version 0.18.

The Fowlkes-Mallows index (FMI) is defined as the geometric mean between of the precision and recall:

FMI = TP / sqrt((TP + FP) * (TP + FN))

Where TP is the number of True Positive (i.e. the number of pairs of points that belong to the same cluster in both labels_true and labels_pred), FP is the number of False Positive (i.e. the number of pairs of points that belong to the same cluster in labels_pred but not in labels_true) and FN is the number of False Negative (i.e. the number of pairs of points that belong to the same cluster in labels_true but not in labels_pred).

The score ranges from 0 to 1. A high value indicates a good similarity between two clusters.

Read more in the User Guide.

Parameters:

labels_truearray-like of shape (n_samples,), dtype=int: A clustering of the data into disjoint subsets.
labels_predarray-like of shape (n_samples,), dtype=int: A clustering of the data into disjoint subsets.
sparsebool, default=False: Compute contingency matrix internally with sparse matrix.

Returns:

scorefloat: The resulting Fowlkes-Mallows score.

References

[1]

E. B. Fowkles and C. L. Mallows, 1983. “A method for comparing two hierarchical clusterings”. Journal of the American Statistical Association

[2]

Wikipedia entry for the Fowlkes-Mallows Index

Examples

Perfect labelings are both homogeneous and complete, hence have score 1.0:

>>> from sklearn.metrics.cluster import fowlkes_mallows_score
>>> fowlkes_mallows_score([0, 0, 1, 1], [0, 0, 1, 1])
np.float64(1.0)
>>> fowlkes_mallows_score([0, 0, 1, 1], [1, 1, 0, 0])
np.float64(1.0)

If classes members are completely split across different clusters, the assignment is totally random, hence the FMI is null:

>>> fowlkes_mallows_score([0, 0, 0, 0], [0, 1, 2, 3])
0.0