cophenet#
- scipy.cluster.hierarchy.cophenet(Z, Y=None)[source]#
Calculate the cophenetic distances between each observation in the hierarchical clustering defined by the linkage
Z.Suppose
pandqare original observations in disjoint clusterssandt, respectively andsandtare joined by a direct parent clusteru. The cophenetic distance between observationsiandjis simply the distance between clusterssandt.- Parameters:
- Zndarray
The hierarchical clustering encoded as an array (see
linkagefunction).- Yndarray (optional)
Calculates the cophenetic correlation coefficient
cof a hierarchical clustering defined by the linkage matrix Z of a set of \(n\) observations in \(m\) dimensions. Y is the condensed distance matrix from which Z was generated.
- Returns:
- cndarray
The cophentic correlation distance (if
Yis passed).- dndarray
The cophenetic distance matrix in condensed form. The \(ij\) th entry is the cophenetic distance between original observations \(i\) and \(j\).
See also
linkagefor a description of what a linkage matrix is.
scipy.spatial.distance.squareformtransforming condensed matrices into square ones.
Examples
>>> from scipy.cluster.hierarchy import single, cophenet >>> from scipy.spatial.distance import pdist, squareform
Given a dataset
Xand a linkage matrixZ, the cophenetic distance between two points ofXis the distance between the largest two distinct clusters that each of the points:>>> X = [[0, 0], [0, 1], [1, 0], ... [0, 4], [0, 3], [1, 4], ... [4, 0], [3, 0], [4, 1], ... [4, 4], [3, 4], [4, 3]]
Xcorresponds to this datasetx x x x x x x x x x x x
>>> Z = single(pdist(X)) >>> Z array([[ 0., 1., 1., 2.], [ 2., 12., 1., 3.], [ 3., 4., 1., 2.], [ 5., 14., 1., 3.], [ 6., 7., 1., 2.], [ 8., 16., 1., 3.], [ 9., 10., 1., 2.], [11., 18., 1., 3.], [13., 15., 2., 6.], [17., 20., 2., 9.], [19., 21., 2., 12.]]) >>> cophenet(Z) array([1., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 1., 2., 2., 2., 2., 2., 2., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 1., 2., 2., 2., 1., 2., 2., 2., 2., 2., 2., 1., 1., 1.])
The output of the
scipy.cluster.hierarchy.cophenetmethod is represented in condensed form. We can usescipy.spatial.distance.squareformto see the output as a regular matrix (where each elementijdenotes the cophenetic distance between eachi,jpair of points inX):>>> squareform(cophenet(Z)) array([[0., 1., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2.], [1., 0., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2.], [1., 1., 0., 2., 2., 2., 2., 2., 2., 2., 2., 2.], [2., 2., 2., 0., 1., 1., 2., 2., 2., 2., 2., 2.], [2., 2., 2., 1., 0., 1., 2., 2., 2., 2., 2., 2.], [2., 2., 2., 1., 1., 0., 2., 2., 2., 2., 2., 2.], [2., 2., 2., 2., 2., 2., 0., 1., 1., 2., 2., 2.], [2., 2., 2., 2., 2., 2., 1., 0., 1., 2., 2., 2.], [2., 2., 2., 2., 2., 2., 1., 1., 0., 2., 2., 2.], [2., 2., 2., 2., 2., 2., 2., 2., 2., 0., 1., 1.], [2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 0., 1.], [2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 1., 0.]])
In this example, the cophenetic distance between points on
Xthat are very close (i.e., in the same corner) is 1. For other pairs of points is 2, because the points will be located in clusters at different corners - thus, the distance between these clusters will be larger.