Clusters: cluster analysis

The Clusters button opens the cluster menu shown in Fig. 19; this button becomes sensitive only after the number of significant components was chosen in the Components widget. Before running cluster analysis it is required to choose the number of clusters to be sought with the Seek slide-bar in the upper left corner. Generally speaking it's suggested that you search for several more clusters than the number of components you have identified as significant (in addition, you can alter your choice of the number of significant components with the Significant components button here).

Figure 19: Cluster widget used for cluster analysis.
\includegraphics[width=\textwidth]{pca_cluster}

When searching for clusters, you are looking for pixels with similar weightings of PCA components. As described in our first paper [2, Eq. 19], it is often useful to scale the $s$ components according to

\begin{displaymath}
R^{\rm scaled}_{s \times P} =
\left(R_{s \times P}-<R_{s}>\right)
\left(\frac{\lambda(1)}{\lambda(s)}\right)^{\gamma}.
\end{displaymath}

That is, cluster weightings are all centered on an average value of zero ($<R_{s}>$), and they are scaled by eigenvalue according to $(\lambda_{1}/\lambda_{s})^{\gamma}$. Scalings of $\gamma=0.3$-0.6 are often favorable to use.

In some cases, it is also useful to exclude the first component from the cluster search as discussed in [2], so as to reduce sensitivity to thickness variations. However, in our second paper [3], we describe what we feel is a much better strategy: the use of an angle distance measure for clustering. In this case, you will generally want to select No to the option Cluster without using first principal component. When you use the angle distance measure, you must select Cutoff for angle distance measure to specify a radius below which pixels will not be included in the determination of cluster centers (Fig. 20). You want to pick a radius which will exclude only a relatively small number of pixels, but get above the radius where different compositions will begin to blur into each other due to noise at low absorptivity (Fig. 21).

Figure 20: The screen used to set the radial cutoff for angle distance measure cluster analysis. The histogram shows the number of pixels with various distances from the origin in component space; pixels with small radii are then excluded from the calculation of cluster centers.
\includegraphics{pca_radius_cutoff}

Figure 21: Illustration of the use of angle distance measure in cluster analysis. For a particular composition, a fixed ratio of PCA components is expected for all thickness, suggesting that an angle distance measure is appropriate for clustering. For clustering, one must exclude pixels below a radius at which different compositions will intermingle due to noise at low absorptivities.
\includegraphics[width=0.35\textwidth]{pca_angle_measure}

The Calculate button starts the clustering calculation. When the calculation is finished, results are displayed on five graphical areas:

  1. The Histogram shows how many members each cluster contains; this is located in the lower left corner.
  2. A pseudo-color image of cluster pixels for first twelve clusters is generated by displaying members of different clusters with different colors.
  3. An image of pixels which are part of the currently-selected cluster is shown in the upper right corner. The index for this cluster can be chosen using the slider Cluster number above it.
  4. The spectrum of the current cluster is displayed in the lower right corner. Bellow the spectrum, image component weights are listed. Component weights indicate the location of the cluster center. If one was to fit the cluster spectrum using principal components, the cluster weights would be the fit parameters.
  5. Distances from cluster centers provide a way to detect errors in clustering. This image shows scaled distances from pixels to belonging cluster center. White regions are pixels that are most distant from belonging cluster center. Maximum value, RMS, and 95% error values are displayed above the image.

The Show outliers and spectra button opens a widget which displays a pseudo-colored image of the stack. The pixels that are farthest from the cluster center are colored white, and the rest are red. Moving the mouse above the image shows the spectrum at the pixel below the cursor.

The Scatterplots of pixel weightings button opens the scatterplot widget shown on Fig. 22. (Encapsulated PostScript .eps files of these plots can be saved within this widget). If the angular distance measure is used, the scatterplot gives an option for a spherical projection view of the pixel weightings (Fig. 23). The Histogram of aggregate cluster distances button displays histogram of distances of all pixels from their belonging cluster center. The Histogram of distances by cluster button displays gray-scale image histograms of distances of pixels from the cluster center for each cluster separately.

The Dendrogram button (Fig. 24) will display the image and save a drawing of a dendrogram in current directory. This dendrogram indicates the degree of similarity between different clusters. If one has several very short branches at the end of one limb of the dendrogram, one might think of reducing the number of clusters to be sought and re-running the clustering algorithm, as discussed in our first paper [2].

Figure 22: The scatterplot shows pixels plotted according to their weights in two selected component axes. Because clustering is done over a number of dimensions equal to the number of selected components, a particular two dimensional view will give an incomplete sense of how the data are clustered.
\includegraphics{pca_scatterplots}

Figure 23: A scatterplot like Fig. 22, but with a spherical projection of the angle distance measure pixel weightings. Mirna: please describe just a bit more.
\includegraphics{pca_scatterplot_spherical}

Figure 24: The Dendrogram widget shows a sort of genetic tree of the data. In this view, the distance between various cluster groupings is represented by a distance on the horizontal axis.
\includegraphics{pca_dendrogram}

Finally, one can save the results of cluster analysis in various formats. Save cluster ``.roi'' files saves region of interest files for all clusters; these .roi files can be read by stack_analyze. Save all cluster spectra as ``.eps'' saves all cluster spectra in Encapsulated PostScript .eps format. Save all cluster spectra, images as ``.png'' saves portable network graphics .png images of cluster indices and spectra. Save all cluster spectra, images as ``.csv'', ``.nc'' saves spectra in Excel-readable .csv text files, and images in NetCDF .nc format.

Holger Fleckenstein 2008-07-08