Modelling unit-to-unit variation

The basic model for an acute exposure assessment assumes that the concentration of the substance displays the variation of residues between units in the marketplace. In general, both monitoring data and controlled field trial data are obtained using composite samples. As a result some of the unit-to-unit variation is averaged out. The model for unit variability aims to adjust the composite sample mean such that sampled concentrations represent the originally unit-to-unit variation of the units in the composite sample.

MCRA offers three distributions to sample from:

  1. the beta distribution,

  2. the lognormal distribution,

  3. and the bernoulli distribution.

The beta distribution simulates values for a unit in the composite sample. It requires knowledge of the number of units in a composite sample and of the variability between units.

The lognormal distribution simulates values for a new unit in the batch. It requires only knowledge of the variability between units.

The bernoulli distribution is considered as a limiting case of the beta distribution when knowledge of the variability between units is lacking and only the number of units in the composite sample is known. For the beta and lognormal distribution, estimates of unit variability are either realistic (no censoring at the value of the monitoring residue) or conservative (unit values are left-censored at the value of the monitoring residue). For the lognormal distribution sampled concentrations have no upper limit. Whereas for the beta distribution, sampled concentration values for a unit are never higher than the monitoring residue times the number of units in the composite sample.

Variability between units is specified using a variability factor \(v\) (defined as 97.5th percentile divided by mean) or a coefficient of variation \(c_\mathtt{V}\) (standard deviation divided by mean). Following FAO/WHO recommendations, the default variability factor \(v\) = 1 for small crops (unit weight < 25 g). For large crops (unit weight ≥ 25 g) \(v\) = 5. For foods which are processed in large batches, e.g. juicing, marmalade/jam, sauce/puree, bulking/blending the variability factor \(v\) = 1 is proposed.

Estimation of intake values using the concept of unit variability

A composite sample for food \(k\) is composed of \(nu_{k}\) units with nominal (whole food/RAC) unit weight \(wu_{k}\). The weight of a composite sample is \(wm_{k} = nu_{k} \cdot wu_{k}\) with mean residue value \(cm_{k}\).

  • For each iteration \(i\) in the MC-simulation, obtain for each food k a simulated intake \(x_{ik}\), and a simulated composite sample concentration \(cm_{ik}\).

  • Calculate the number of unit intakes \(nux_{ik}\) in \(x_{ik}\) (round upwards) and set weights \(w_{ikl}\) equal to unit weight \(wu_{k}\), except for the last partial intake, which has weight \(w_{ikl} = x_{ik} - (nux_{ik} - 1) wu_{k}\) .

  • For the beta or bernoulli distribution: draw \(nux_{ik}\) simulated values \(bc_{ikl}\) from a beta or bernoulli distribution. Calculate concentration values as \(c_{ikl} = bc_{ikl} \cdot cm_{ik, max} = bc_{ikl} \cdot cm_{ik} \cdot nu_{k} = \mathit{svf}_{ikl} \cdot cm_{ik}\), where \(nu_{k}\) is the number of units in a composite sample of food \(k\), and \(svf_{ikl}\) is the stochastic variability factor for this simulated unit, i.e. the ratio between simulated concentration \(c_{ikl}\) and the simulated composite sample concentration \(cm_{ik}\). Sum to obtain the simulated concentration in the consumed portion:

\[c_{ik} = \sum_{l=1}^{nux_{ik}}w_{ikl}c_{ikl} / x_{ik}\]
  • For the lognormal distribution: draw \(nux_{ik}\) simulated logconcentration values \(\mathit{lc}_{ikl}\) from a normal distribution with (optional) a biased mean \(\mu = ln(cm_{ik})\) or (default) unbiased mean \(\mu = ln(cm_{ik}) - 1/2 \sigma^2\) and standard deviation \(\sigma\). Calculate concentration values as

\[c_{ikl} = \exp(lc_{ikl}) = \mathit{svf}_{ikl} * cm_{ik}\]

where \(\mathit{svf}_{ikl}\) is the stochastic variability factor for this simulated unit, i.e. the ratio between simulated concentration \(c_{ikl}\) and the simulated composite sample concentration \(cm_{ik}\). Back transform and sum to obtain the simulated concentration in the consumed portion:

\[c_{ik} = \sum_{l=1}^{nux_{ik}}w_{ikl}c_{ikl} / x_{ik}\]

For cumulative exposure assessments, a sensitivity analysis may be performed by specifying a full correlation between concentrations from different substances on the same unit. As a result, high (or low) concentrations from different substances occur together on the same unit. In MCRA, for each unit the random sequence is repeatedly used to generate concentration values for all substances.

Beta distribution

Under the beta model simulated unit values are drawn from a bounded distribution on the interval (0, \(c_{max}\)) with \(c_{max} = \mathit{nu}_{k} \cdot \mathit{cm}_{k}\). The standard beta distribution is defined on the interval (0, 1) and is usually characterised by two parameters \(a\) and \(b\), with \(a>0\), \(b>0\) (see e.g. Mood et al. 1974) [Mood et al., 1974]. Alternatively, it can be parameterised by the mean

\[\mu = a/(a+b)\]

and the variance

\[\sigma^2 = ab/(a+b+1)^{-1}(a+b)^{-2}\]

or, as applied in MCRA, by the mean \(\mu\) and the squared coefficient of variation

\[c_\mathtt{V}^2 = ba^{-1}(a+b+1)^{-1}\]

For the simulated unit values in each iteration of the program we require an expected value \(cm_{k}\). This scales down to a mean value \(\mu = \mathit{cm}_{k}/c_{max} = 1/\mathit{nu}_{k}\) in the (standard) beta distribution. From this value for \(\mu\) and an externally specified value for \(\mathit{cv}_{k}\) the parameters \(a\) and \(b\) of the beta distribution are calculated as:

\[a = b(\mathit{nu}_{k} - 1)^{-1}\]

and

\[b = \frac{(\mathit{nu}_{k} - 1)(\mathit{nu}_{k} - 1 - \mathit{cv}_{k}^2)}{\mathit{nu}_{k}\mathit{cv}_{k}^2}\]

From the second formula it can be seen that \(\mathit{cv}_{k}\) should not be larger than \(\sqrt{\mathit{nu}_{k} -1}\) in order to avoid negative values for \(b\). When the unit variability is specified by a variability factor

\[v_{k} = \frac{\mathit{p97.5}_{k}}{\mathit{cm}_{k}}\]

instead of a coefficient of variation \(\mathit{cv}_{k}\) then MCRA applies a bisection algorithm to find a such that the cumulative probability

\[P[Beta(a,b)] = 0.975\]

for \(b = a(\mathit{nu}_{k}-1)\).

Sampled values from the beta distribution are rescaled by multiplication with \(\mathit{cm}_{max}\) to unit concentrations \(c_{ijk}\) on the interval \((0, \mathit{cm}_{max}\)).

Lognormal distribution

The lognormal distribution is characterised by \(\mu\) and \(\sigma\), which are the mean and standard deviation of the log-transformed concentrations. The unit log-concentrations are drawn from a normal distribution with mean \(\mu = ln(\mathit{cm}_{ik}) - 1/2\sigma^2\). The coefficient of variation \(cv\) is turned into the standard deviation \(\sigma\) on the log-transformed scale with:

\[\sigma = \sqrt{ln(cv^2 + 1)}\]

The variability factor is defined as the 97.5th percentile of the concentration in the individual measurements divided by the corresponding mean concentration seen in the composite sample. A variability factor \(v\) is converted into the standard deviation \(\sigma\) as follows:

\[v = \frac{p97.5}{mean} = \frac{e^{\mu + 1.96\sigma} }{e^{\mu + 1/2 \sigma^2}} = e^{1.96\sigma - 1/2 \sigma^2}\]

with \(\mu\) and \(\sigma\) representing the mean and standard deviation of the log-transformed concentrations. So

\[ln(v) = 1.96\sigma – 1/2\sigma^2\]

Solving for \(\sigma\) gives:

\[\sigma^2 – 2 \cdot 1.96\sigma + 2log(v) = 0\]

with roots for \(\sigma\) according to:

\[\sigma = 1.96 \pm \sqrt{(1.96^2 – 2log(v))}\]

The smallest positive root is taken as an estimate for \(\sigma\).

Bernoulli distribution

The bernoulli model is a limiting case of the beta model, which can be used if no information on unit variability is available, but only the number of units in a composite sample is known (see van der Voet et al. 2001). As a worst case approach we may take the coefficient of variation \(cv\) as large as possible. When \(cv\) is equal to the maximum possible value \(\sqrt{\mathit{nu}_{k}-1}\), the (unstandardised) beta distribution simplifies to a bernoulli distribution with probability

\[\left( \mathit{nu}_{k} – 1 \right) / \mathit{nu}_{k}\]

or

\[(v_{k}-1)/ v_{k}\]

for the value 0 and probability

\[1/\mathit{nu}_{k}\]

or

\[1/v_{k}\]

for the value \(c_{max} = \mathit{nu}_{k} \cdot \mathit{cm}_{k}\).

In MCRA values 0 are actually replaced by \(\mathit{cm}_{k}\), to keep all values on the conservative side. For example, with \(\mathit{nu}_{k}\) = 5, there will be 80% probability at \(c_{ijk} = cm_{k}\) and 20% probability at \(c_{ijk} = c_{max}\). When the number of units \(\mathit{nu}_{k}\) in the composite sample is missing, the nominal unit weight \(\mathit{wu}_{k}\) is used to calculate the parameter for unit variability.