Screening calculation for large Cumulative Assessment Groups

Statistical model for the screening step (acute exposure)

The screening step implements a simple model that is applied to each SCC. Assume independent NonDetectSpike-LogNormal (NDS-LN) models for both the consumptions of modelled food in source S and the concentrations of substance C in source S. A ‘non-detect’ consumption is assumed to be a zero consumption. A censored concentration will be imputed by a user-specified fraction f of the Limit of Reporting. Then the model for consumption has 3 parameters and the model for concentration has four parameters, as specified in Table 100. Note that the parameters of the consumption distribution are estimated from the consumption data using sampling weights if these have been provided in the consumption data set.

Table 100 Parameters for screening models (per source/substance)
parameter	consumptions	concentrations
probability of a positive	\(\pi_{x}\)	\(\pi_{c}\)
mean positives (ln scale)	\(\mu_{x}\)	\(\mu_{c}\)
standard deviation positives (ln scale)	\(\sigma_{x}\)	\(\sigma_{c}\)
value to use for censored values (ln scale)		\(f \cdot L_{c}\)

Exposure is consumption times concentration, so on logarithmic scale they can be added:

\(e=x+c\).

The assessment will focus on a chosen percentile of exposure, e.g. p95. The relevant fraction will be denoted by \(p\), for example \(p = 0.95\) for the 95th percentile. The two NDS-LN models combine to three possibilities, depending on whether there is consumption and if so, whether the concentration is censored or positive. In the screening model the two possibilities that lead to potential exposure are modelled with a mixture of two lognormal distribution. For the censored case the positive exposure distribution equals the positive consumption distribution modified by the multiplication of a user-chosen factor times an estimate of the average worst-case limit value for concentration (LOR):

\[\pi_{1} = \pi_{x} ( 1 - \pi_{c} ) ; \mu_{1} = \mu_{x} + f \cdot L_{c} ; \sigma_{1} = \sigma_{x}\]

where \(L_{c}\) is the logarithm of the LOR, or, if there are multiple analytical methods with different LOR, a weighted average of these different LORs.

For the detect case the positive exposure distribution is easily combined from the positive consumption distribution and the positive concentration distribution:

\[\pi_{2} = \pi_{x} \pi_{c} ; \mu_{2} = \mu_{x} + \mu_{c} ; \sigma_{12} = \sqrt{\sigma_{x}^2 + \sigma_{c}^2}\]

\(p\) can be corrected for the non-consumptions to the appropriate fraction needed in the mixture of the two positive distributions:

\[p' = \frac{p - (1 - \pi_{x})}{\pi_{x}}\]

If \(p'\leq 0\) then all positive exposures are beyond the requested fraction, and the estimated exposure is just 0.

If \(p'> 0\) then the relevant log exposure \(e_{p}\) satisfies

\[(1 - \pi_{c}) \cdot \Phi \left(\frac{e_{p} - \mu_{1}}{\sigma_{1}}\right) + \pi_{c} \cdot \Phi \left(\frac{e_{p} - \mu_{12}}{\sigma_{2}}\right) = p'\]

where \(\Phi( \cdot )\) represents the cumulative standard normal distribution function. The value of \(e_{p}\) can easily be found in a bisection search within the interval

\[[ \mu_{min} - 4 \sigma_{max}, \mu_{max} + max(0, z_{p'} \sigma_{max}) ].\]

The final exposure percentile estimate then is \(\exp(e_{p})\).

Denote by \(e_{(p,max)}\) the highest estimate (for the SCC denoted by \(SSC_{\mathit{highest}}\)). Then evaluate for each SCC the probability to exceed \(e_{(p,max)}\).

\[P_{i} = Pr(e > e_{p, max}) = \pi_{x} \cdot \left [ (1 - \pi_{c}) * \Phi \left( \frac{e_{p,max} - \mu_{1}}{\sigma_{1}} \right) + \pi_{c} \cdot \Phi \left( \frac{e_{p,max} - \mu_{2}}{\sigma_{1}} \right) \right ]\]

\(P_{i}\) is a tentative measure for the ‘probability of a high exposure’. For \(SSC_{\mathit{highest}}\) \(P_{i}=1-p\), for all other SCCs it will be lower. The sum of all these probabilities is not a meaningful probability in itself. However, this sum is used to scale the individual \(P_{i}\) values to measures of relative importance for the SCCs

\[Imp_{i} = P_{i} / \sum {P_{i}}\]

Rank all SCCs according to \(\mathit{Imp}_{i}\) and calculate cumulative importance. The relative importance of the two mixture components at \(e_{p}\) can be estimated as

\[w_{1,2} = \frac{\pi_{1,2} \cdot \phi \left( \frac{ e_{p} - \mu_{1,2}}{ \sigma_{1,2}} \right) / \sigma_{1,2}} { \pi_{1} \cdot \phi \left( \frac{e_{p} - \mu_{1}}{\sigma_{1}} \right) / \sigma_{1} + \pi_{2} \cdot \phi \left( \frac{e_{p} - \mu_{2}}{\sigma_{2}} \right) / \sigma_{2}}\]

where \(\phi(.)\) represent the standard normal probability density function. The user interface should allow to select the top-\(N\) SCCs from the list, based on a chosen percentage (e.g. 95%) of cumulative importance included. The full analysis will calculate exactly the same exposure distribution as a full analysis without screening. However, less information is retained in the output. This concerns tables with information on foods-as-eaten, which is only shown for the selected risk driver components (SCCs). Risk drivers are groupings of SCCs (risk driver components) at the level of measured-source-substance combinations (MSCCs). Note that output for an MSSC (e.g. APPLE/captan) only covers the selected SCCs (e.g. APPLE from apple juice/captan and APPLE from apple pie/captan), but not unselected SCCs (e.g. APPLE from fruit yoghurt/captan).

Statistical model for the screening step (chronic exposure)

In chronic exposure assessments, the mean concentration of chemicals is calculated first, and combined with the consumption distribution. For this reason a chronic calculation uses less memory, and therefore larger datasets can be handled.

The model described under acute exposure can be simplified for a chronic screening. The concentration distribution is only used to estimate a mean exposure, incorporating any effect from the imputation of censored values. The exposure distribution is therefore only a scaled version of the consumption distribution.

\[\pi_{2} = \pi_{x} \pi_{c} ; \mu_{2} = \mu_{x} + \mu_{c} ; \sigma_{2} = \sigma_{x}\]

The parameters of the consumption distribution \((\pi_{x}, \mu_{x}, \sigma_{x})\) are calculated from the observed individual means (OIM), i.e. the mean daily consumptions over the survey days of each person in the data (allowing for sampling weights). The percentiles are calculated as \(e_{p} = \mu_{2}+z_{p}\) where \(z\) is a percentile of the standard normal distribution. The exceedances of the maximum percentile are calculated as

\[P_{i} = Pr(e > e_{p, max}) = \pi_{x} \cdot \Phi \left( \frac{e_{p,max} - \mu_{2}}{\sigma_{2}} \right)\]