Uncertainty

Uncertainties may arise in different forms in many of the models and data of the toolbox. One may encounter uncertainty in the data values (e.g., uncertain NOAELs, uncertain RPFs, or uncertain processing factors), uncertainty due to limited data (e.g., a limited number of food samples), uncertainty due to a lack of data (e.g., missing concentration data for some foods/substances or missing processing factors), and uncertainty of the models, (e.g., due to a lack of detail). In many situations it is desirable to analyse how the model outcomes vary for the different scenarios that uncertainties give rise to. For this, the toolbox offers:

  1. for many types of data, the possibility to provide data including quantifications of uncertainty for many types of data,

  2. imputation methods for filling in missing data in various types of models, and

  3. a generic uncertainty analysis method that providing uncertainty estimates of the modelling results for many of the modules, which are based on bootstrapping, parametric resampling, and/or re-calculation on all sub-modules for which this is possible.

Uncertainty due to limited sampled data

For some type of data, e.g., processing factors, it may be that in some cases it is possible to not only provide nominal estimates of the data values, but also to provide quantified estimates of the uncertainties of these values. In other cases, it may happen that quantifications of the uncertainties of these estimates are not available. In the toolbox, the aim is to provide the possibility to work with both quantified and unquantified uncertainties. That is, include quantified uncertainties in a quantitative uncertainty analysis when available, or to ignore their absence and only use the nominal estimates, perhaps in combination with an offline qualitative uncertainty analysis.

Uncertainties of the data values may be expressed in different forms, and it depends on the type of data which forms are available, suitable, and implemented in the toolbox. For some data values, uncertainty may be quantified by means of parametric distribution parameters (e.g., processing factor uncertainties, or kinetic model instance parameter uncertainties). Alternatively, uncertainty values may be provided in the form of an empirical set of uncertainty values (e.g., relative potency factor uncertainties, or points of departure uncertainties).

Whenever data include quantified uncertainties, and the data module to which they belong is included as a sub-module of a calculation module. These uncertainties may be chosen to be included in an uncertainty analysis of the main module, and if this is so, the data values are resampled in each uncertainty analysis cycle based on the uncertainty quantifications.

The basic acute exposure distribution is estimated in a Monte Carlo simulation by combining dietary consumption records (person-days) with sampled residue values. The resulting distribution represents a combination of variability in consumption within the population and between residues in a food lot. Percentiles may be used for further quantification e.g. the median or 99th percentile. Due to the limited size of the underlying data, these outcomes are uncertain. Confidence (or uncertainty) intervals reflect the uncertainty of these estimates, where MCRA uses bootstrap methodology and/or, depending on the available data, parametric methods to estimate the uncertainty.

Empirical method, resampling

The empirical bootstrap is an approach to estimate the accuracy of an outcome. In its most simple, non-parametric form, the bootstrap algorithm resamples a dataset of n observations to obtain a bootstrap sample or resampled set of again n observations (sampling with replacement, that is: each observation has a probability of \(1/n\) to be selected at any position in the new resampled set). By repeating this process \(B\) times, one can obtain \(B\) resampled sets, which may be considered as alternative data sets that might have been obtained during sampling from the population of interest. Any statistic that can be calculated from the original dataset (e.g. the median, the standard deviation, the 99th percentile, etc.) can also be calculated from each of the \(B\) resampled sets. This generates a uncertainty distribution for the statistic under consideration. The uncertainty distribution characterises the uncertainty of the inference due to the sampling uncertainty of the original dataset: it shows which statistics could have been obtained if random sampling from the population would have generated another sample than the one actually observed [Efron, 1979], [Efron et al., 1993].

Parametric methods

Instead of bootstrapping the observed data, inference about parameters is based on parametric methods. For processing, where factors are specified through a nominal and/or upper value this is the natural choice. For concentration data, where the lognormal model is used to represent less conservative scenario’s (EFSA, 2012) [EFSA, 2012], the parametric bootstrap may be an alternative, especially when data are limited and the empirical bootstrap fails.

According to Cochran’s theorem, sample variance \(\hat{\sigma}_{y}^2\) follows a scaled chi-square distribution. In the parametric bootstrap for the lognormal distribution, the sample variance \(\hat{\sigma}_{y}^2\) is replaced by a random draw from a chi-square distribution with \(n_{1}-1\) degrees of freedom; the sample mean \(\hat{\mu}_{y}\) is replaced by a random draw from a normal distribution with parameters \(\hat{\mu}_{y}\) and \(\hat{\sigma}_{y}^{*2} / n_{1}\), giving a new set of parameters \(\hat{\mu}_{y}\) and \(\hat{\sigma}_{y}^{*2}\). This is repeated \(B\) times.

For the truncated lognormal and censored lognormal, large sample maximum likelihood theory is used to derive new parameters \(\hat{\mu}_{y}\) and \(\hat{\sigma}_{y}^{*2}\). This is repeated \(B\) times.

The binomial fraction of non-detects for the mixture lognormal and mixture truncated distribution is sampled using the beta distribution with uniform priors \(a = b = 1\) (with the beta distribution as the empirical Bayes estimator for the binomial distribution). This is repeated \(B\) times.

Uncertainty due to missing data

In some cases, it may be that data as only available for specific (primary) entities and missing for others. E.g., points of departure (such as NOAELs or BMDs) may only be available for some of the substances of interest.

Uncertainty due to modelling approach

There is also uncertainty of model outcomes that may arise by conducting different modelling approaches or applying alternative modelling assumptions.

Note

TODO