Uncertainty¶

Uncertainties arise in different forms in many of the models and data of the toolbox.

uncertainty in the data values (e.g., uncertain NOAELs, uncertain RPFs, or uncertain processing factors),

uncertainty due to limited data (e.g., a limited number of food samples),

uncertainty due to a lack of data (e.g., missing concentration data for some foods/substances or missing processing factors),

uncertainty of the models, (e.g., due to a lack of detail).

The toolbox offers the following options to handle uncertainty:

for many types of data, the possibility to provide data including quantifications of uncertainty,

imputation methods for filling in missing data in various types of models, and

a generic uncertainty analysis method that providing uncertainty estimates of the modelling results for many of the modules, which are based on bootstrapping, parametric resampling, and/or re-calculation on all sub-modules for which this is possible.

Uncertainty due to limited sampled data¶

For some type of data, e.g., processing factors, it is possible to not only provide nominal estimates of the data values, but also to provide quantified estimates of the uncertainties of these values. Occasionally, quantifications of the uncertainties of these estimates are not available. The toolbox provides the possibility to work with both quantified and unquantified uncertainties: include uncertainties in a quantitative uncertainty analysis when available, or, when not available, use nominal estimates, followed by an offline qualitative uncertainty analysis.

Uncertainties of data values are available in different forms. For some data values, uncertainty may be quantified by means of parametric distribution parameters (e.g., processing factor uncertainties, or kinetic model instance parameter uncertainties). Alternatively, uncertainty values may be provided in the form of an empirical set of uncertainty values (e.g., relative potency factor uncertainties, or points of departure uncertainties).

For each data sub-module that has quantified uncertainties, it is optional to include the source of uncertainty in the uncertainty analysis of the main module. Then, when specified, data values are resampled in each uncertainty analysis cycle.

The basic acute exposure distribution is estimated in a Monte Carlo simulation by combining dietary consumption records (person-days) with sampled residue values. The resulting distribution represents a combination of variability in consumption within the population and between residues in a food lot. Percentiles may be used for further quantification e.g. the median or 99th percentile. Due to the limited size of the underlying data, these outcomes are uncertain. Confidence (or uncertainty) intervals reflect the uncertainty of these estimates, where MCRA uses bootstrap methodology and/or, depending on the available data, parametric methods to estimate the uncertainty.

Empirical method, resampling¶

The empirical bootstrap is an approach to estimate the accuracy of an outcome. In its most simple, non-parametric form, the bootstrap algorithm resamples a dataset of n observations to obtain a bootstrap sample or resampled set of again n observations (sampling with replacement, that is: each observation has a probability of \(1/n\) to be selected at any position in the new resampled set). By repeating this process \(B\) times, one can obtain \(B\) resampled sets, which may be considered as alternative data sets that might have been obtained during sampling from the population of interest. Any statistic that can be calculated from the original dataset (e.g. the median, the standard deviation, the 99th percentile, etc.) can also be calculated from each of the \(B\) resampled sets. This generates a uncertainty distribution for the statistic under consideration. The uncertainty distribution characterises the uncertainty of the inference due to the sampling uncertainty of the original dataset: it shows which statistics could have been obtained if random sampling from the population would have generated another sample than the one actually observed (Efron (1979) and Efron and Tibshirani (1993)).

Parametric methods¶

Instead of bootstrapping the observed data, inference about parameters is based on parametric methods. For processing, where factors are specified through a nominal and/or upper value this is the natural choice. For concentration data, where the lognormal model is used to represent less conservative scenario’s (EFSA (2012)), the parametric bootstrap may be an alternative, especially when data are limited and the empirical bootstrap fails.

According to Cochran’s theorem, sample variance \(\hat{\sigma}_{y}^2\) follows a scaled chi-square distribution. In the parametric bootstrap for the lognormal distribution, the sample variance \(\hat{\sigma}_{y}^2\) is replaced by a random draw from a chi-square distribution with \(n_{1}-1\) degrees of freedom; the sample mean \(\hat{\mu}_{y}\) is replaced by a random draw from a normal distribution with parameters \(\hat{\mu}_{y}\) and \(\hat{\sigma}_{y}^{*2} / n_{1}\), giving a new set of parameters \(\hat{\mu}_{y}\) and \(\hat{\sigma}_{y}^{*2}\). This is repeated \(B\) times.

For the truncated lognormal and censored lognormal, large sample maximum likelihood theory is used to derive new parameters \(\hat{\mu}_{y}\) and \(\hat{\sigma}_{y}^{*2}\). This is repeated \(B\) times.

The binomial fraction of non-detects for the mixture lognormal and mixture truncated distribution is sampled using the beta distribution with uniform priors \(a = b = 1\) (with the beta distribution as the empirical Bayes estimator for the binomial distribution). This is repeated \(B\) times.

Uncertainty due to missing data¶

In some cases, data are only available for specific (primary) entities and missing for others. E.g., points of departure (such as NOAELs or BMDs) may only be available for some of the substances of interest.

Uncertainty due to modelling approach¶

Model uncertainty or uncertainty of model outcomes arise by applying different modelling approaches or applying alternative model assumptions.

Note

TODO