Component exposures in population and subgroups

The SNMU solution of matrix V is used to group individuals with similar mixture exposure profiles. In Figure 71, the idea of clustering is shown.

Figure 71 SNMU: matrix \(V\), individual scores to components.

Crépet et al. (2022) propose to identify components by coupling statistical criteria with the relevance of combined exposure profiles and component composition. First, the optimal choice for \(k\), the number of components, is determined using a trade off between the decrease of the residual sum of squares and number of components. Then, hierarchical clustering was applied to the matrix of individual scores \(V\) to group individual(day)s with similar exposure profiles to the \(k\) components. This identification of components is repeated for different values of \(k\) where inspection of components not relevant to characterize a cluster, or concerned with only a small part of the population leads to rejection of the mixture.

In MCRA, two clustering methods are availabe. The first, hierarchical clustering, is implemented as described in Crépet et al. (2022). Ward’s clustering criterion is implemented using Euclidean distances (Ward.D2, Murtagh and Legendre (2014)) . Specification of the optimal number of clusters is not needed. Results of the clustering are displayed in a dendrogram, Figure 72. The second one, based on K-means, requires specification of the number of clusters. The results of the clustering are represented in a scatter plot using principal components and convex envelopes to identify the clusters, Figure 74.

Figure 72 Hierarchical clustering of human monitoring data, 3 clusters, largest and smallest cluster contain 152 and 37 individuals, respectively

In Figure 73, the relative exposure to componaents in the total populations are shown. These plots are also available for the subgroups resulting from the clustering.

Figure 73 Relative exposure to components in the population

Advantages of K-means clustering is that it is simple and fast and large datasets can be handeled. Visualisation for large data sets is straightforward but for hierarchical clustering dendrograms maybe very dense. Disadvantage of K-means is that it requires the number of clusters set before. For large datasets, hierarchical clustering maybe slow, O(n^2), but for small datasets, the dendrogram helps in interpreting the results and in selecting the optimal number of clusters.

Figure 74 K-means clustering of human monitoring data, 3 clusters, largest and smallest cluster contain 204 and 21 individuals, respectively