sklearn.metrics.fowlkes_mallows_score

sklearn.metrics.fowlkes_mallows_score(labels_true, labels_pred, sparse=False)[source]

Measure the similarity of two clusterings of a set of points.

The Fowlkes-Mallows index (FMI) is defined as the geometric mean between of the precision and recall:

FMI = TP / sqrt((TP + FP) * (TP + FN))

Where TP is the number of True Positive (i.e. the number of pair of points that belongs in the same clusters in both labels_true and labels_pred), FP is the number of False Positive (i.e. the number of pair of points that belongs in the same clusters in labels_true and not in labels_pred) and FN is the number of False Negative (i.e the number of pair of points that belongs in the same clusters in labels_pred and not in labels_True).

The score ranges from 0 to 1. A high value indicates a good similarity between two clusters.

Read more in the User Guide.

Parameters

labels_true : int array, shape = (n_samples,)

A clustering of the data into disjoint subsets.

labels_pred : array, shape = (n_samples, )

A clustering of the data into disjoint subsets.

sparse : bool

Compute contingency matrix internally with sparse matrix.

Returns

score : float

The resulting Fowlkes-Mallows score.

References

R35

E. B. Fowkles and C. L. Mallows, 1983. “A method for comparing two hierarchical clusterings”. Journal of the American Statistical Association

R36

Wikipedia entry for the Fowlkes-Mallows Index

Examples

Perfect labelings are both homogeneous and complete, hence have score 1.0:

>>> from sklearn.metrics.cluster import fowlkes_mallows_score
>>> fowlkes_mallows_score([0, 0, 1, 1], [0, 0, 1, 1])
1.0
>>> fowlkes_mallows_score([0, 0, 1, 1], [1, 1, 0, 0])
1.0

If classes members are completely split across different clusters, the assignment is totally random, hence the FMI is null:

>>> fowlkes_mallows_score([0, 0, 0, 0], [0, 1, 2, 3])
0.0