Chronic exposure assessment, daily consumed foods¶

Model based usual intake¶

Foods are consumed on a daily basis.

For individual \(i\) on day \(j\) let \(Y_{ij}\) denote the 24 hour recall of a food \((i=1…n; j=1…n_{i})\). In most cases within-individual random variation is dependent on the individual mean and has a skewed distribution. It is therefore customary to define a one-way random effects model for \(Y_{ij}\) on some transformed scale

\[Y_{ij}^*=g(Y_{ij} )= \mu_{i}+b_{i}+w_{ij}\]

with \(b_{i} \sim N(0,\sigma_{b}^2 )\) and \(w_{ij} \sim N(0, \sigma_{w}^2 )\)

Note that \(b_{i}\) represents variation between individuals and \(w_{ij}\) represents variation within individuals between days.

The mean \(\mu_{i}\) may depend on a set of covariate \(Z_{i}=(Z_{i1},…,Z_{ip} )\):

\[\mu_{i} = \beta_{0} + \beta_{1}^t Z_{i}\]

where \(\beta_{0}\) and \(\beta_{1}\) are regression coefficients.

The usual intake \(T_{i}\) for an individual \(i\) is defined as the mean consumption over many many days. This assumes that the untransformed intakes \(Y_{ij}\) are unbiased for true usual intake rather than the transformed intakes \(Y_{ij}^*\). In mathematical terms \(T_{i}\) is the expectation of the intake for this individual where the expectation is taken over the random day effect:

\[T_{i}=E_{w} [g^{-1} (\mu_{i}+b_{i}+w_{ij} ) | b_{i} ] = F(b_{i} )\]

Model based usual intake on the transformed scale¶

For the model based usual intake first note that the conditional distribution

\[(\mu_{i} + b_{i} + w_{ij} | b_{i} ) \sim N (\mu_{i} + b_{i} , \sigma_{w}^2 )\]

It follows that the usual intake \(T_{i}\) is given by

\[T_{i}= E_{w} [g^{-1} (\mu_{i}+b_{i}+w_{ij} | b_{i} )]= \int\limits_{-\infty}^\infty g^{-1} (\mu_{i}+b_{i}+w_{ij} ) \frac{1} {\sqrt{2\pi\sigma_{w}^2}} \exp\left(-\frac{w^2}{2\sigma_{w}^2 }\right) \mathrm{d}w\]

Model based using a logarithmic transformation¶

For the logarithmic transform the usual intake \(T_{i}\) can be written in closed form using the formula for the mean of the lognormal distribution:

\[T_{i}=\exp(\mu_{i}+b_{i} + \sigma_{w}^2 / 2)\]

In this case \(T_{i}\) follows a log-normal distribution with mean \(\mu_{i} + \sigma_{w}^2 / 2\) and variance \(\sigma_{b}^2\). This fully specifies the usual intake distribution, e.g. the mean and variance of the usual intake are given by

\[\mu_{iT} = E [ T_{i} ] = \exp(\mu_{i} + \sigma_{w}^2 / 2 + \sigma_{b}^2 / 2)\]

\[\sigma_{iT}^2 = Var [ T_{i} ]= [\exp (\sigma_{b}^2 )-1] \exp(2\mu_{i}+\sigma_{w}^2+\sigma_{b}^2 )\]

Model based using a power transformation¶

For the power transformation the integral can be approximated by means of N-point Gauss-Hermite integration. This results in the following usual intake

\[T_{i} \approx \frac{1}{\sqrt{\pi}} \sum_{j=1}^{N} w_{j} (\mu_{i}+b_{i}+\sqrt{2} \sigma_{w} x_{j} )^p\]

with \(p\) the inverse of the power transformation. A similar approximation can be used for the Box-Cox transformation. There can be a small problem with Gauss-Hermite integration. The summation term \((\mu_{i}+b_{i}+\sqrt{2} \sigma_{w} x_{j} )^p\) can not be calculated when the factor between round brackets is negative and the power \(p\) is not an integer. This can happen when \((\mu_{i}+b_{i})\) is small relative to the between day standard error \(\sigma_{w}\). In that case the corresponding term is set to zero. This is not a flaw in the numerical method but in the statistical model since the model allows negative intakes on the transformed scale which cannot be transformed back to the natural scale. The mean and variance of \(T_{i}\) can be approximated again by using Gauss-Hermite integration:

\[\mu_{iT}= E[T_{i} ] = \frac{1}{\sqrt{\pi}} \sum_{k=1}^{N} w_{k} \frac{1}{\sqrt{\pi}} \sum_{j=1}^{N} w_{j} (\mu_{i}+\sqrt{2} \sigma_{w} x_{j} + \sqrt{2} \sigma_{b} x_{k})\]

\[\sigma_{iT}= Var[T_{i} ] = \frac{1}{\sqrt{\pi}} \sum_{k=1}^{N} w_{k} \left [ \frac{1}{\sqrt{\pi}} \sum_{j=1}^{N} w_{j} (\mu_{i}+\sqrt{2} \sigma_{w} x_{j} + \sqrt{2} \sigma_{b} x_{k}) \right ]^2 - \mu_{T}^2\]

An alternative method for obtaining model based usual intakes for the power transformation employs a Taylor series expansion for the power, see e.g. Kipnis (2009) [Kipnis et al., 2009]. This is however less accurate than Gauss-Hermite integration. For the power transformation simulation is required to derive the usual intake distribution: simulate a random effect \(b_{i}\) for many individuals and then approximate \(T_{i}\) for these individuals. The \(T_{i}\) values then form a sample form the usual intake distribution.

Model assisted usual intake on the transformed scale¶

The model assisted approach employs a prediction for the usual intakes of every individual in the study. This requires a prediction of the individual random effect \(b_{i}\) for every individual.

In the one-way random effects model the Best Linear Unbiased Prediction for \((\mu_{i}+b_{i} )\) is given by

\[\mathit{BLUP}_{i} = \mu_{i} + (\bar{Y}_{i}^* - \mu_{i} ) \left( \frac{\sigma_{b}^2}{\sigma_{b}^2 + \sigma_{w}^2 / n_{i}} \right )\]

in which \(\bar{Y}_{i}^*\) is the mean of the transformed intakes for individual \(i\). BLUPs have optimal properties for some purposes, but not for the purpose of representing the variation \(\sigma_{b}^2\) between individuals. This can be seen by noting that

\[\mathit{Var}(\bar{Y}_{i}^*) = \sigma_{b}^2 + \sigma_{w}^2 / n_{i}\]

and thus

\[\mathit{Var}(\mathit{BLUP}_{i}) = \left( \frac{\sigma_{b}^4}{\sigma_{b}^2 + \sigma_{w}^2 / n_{i}} \right )\]

which is smaller than the between individual variance \(\sigma_{b}^2\). As an alternative a modified BLUP can be defined by means of

\[\mathit{modifiedBLUP}_{i} = \mu_{i} + (\bar{Y}_{i}^* - \mu_{i}) \sqrt{\left( \frac{\sigma_{b}^2}{\sigma_{b}^2 + \sigma_{w}^2 / n_{i}} \right ) }\]

which has the correct variance \(\sigma_{b}^2\) and also the correct mean \(\mu_{i}\). However these optimal properties disappear when modified BLUPs are directly backtransformed to the original scale.

Model assisted using a logarithmic transformation¶

For the logarithmic transformation the usual intake \(T_{i}\) follows a log-normal distribution with mean \(\mu_{i}+\sigma_{w}^2/2\) and variance \(\sigma_{b}^2\). If we can construct a BLUP like stochastic variable with the same mean and variance, then this variable be an unbiased predictor with the correct variance. It is easy to see that the following variable has the same distribution as \(T_{i}\)

\[\mathit{modelassistedBLUP}_{i} = \mu_{i} + \frac{\sigma_{w}^2}{2} + (\bar{Y}_{i}^* - \mu_{i}) \sqrt{\left( \frac{\sigma_{b}^2}{\sigma_{b}^2 + \sigma_{w}^2 / n_{i}} \right ) }\]

So the model assisted individual intake \(\exp(\mathit{model assisted BLUP}_{i})\) has the same distribution as the usual intake and is thus the best predictor for usual intake.

Kipnis et al. (2009) [Kipnis et al., 2009] employs the conditional distribution of \(b_{i}\) given the observations \(Y_{i1}, \cdots , Y_{in_{i}}\) to obtain a prediction. First note that

\[(b_{i} | Y_{i1}, \cdots , Y_{in_{i}}) = (b_{i} | Y_{i1}^*, \cdots , Y_{in_{i}}^*) = (b_{i} | \bar{Y}_{i}^*)\]

Since all distributions in the one-way random effects model are normal it follows that:

\[(b_{i},\bar{Y}_{i}^*) \sim \mathit{BivariateNormal}(0, \mu_{i}, \sigma_{b}^2, \sigma_{b}^2 + \sigma_{w}^2 / n_{i}, \sigma_{b}^2)\]

where the last parameter represents the covariance between \(b_{i}\) and \(\bar{Y}_{i}^*\). It follows that the conditional distribution

\[(b_{i}|\bar{Y}_{i}^*) \sim N(\mu_{c}, \sigma_{c}^2)\]

with

\[\mu_{c} = \frac{\sigma_{b}^2}{\sigma_{b}^2 +\sigma_{w}^2 / n_{i}} (\bar{Y}_{i}^* - \mu_{i})\]

and

\[\sigma_{c}^2 = \frac{\sigma_{b}^2 \sigma_{w}^2 / n_{i}}{\sigma_{b}^2 + \sigma_{w}^2 / n_{i}}\]

A prediction for the usual intake \(T_{i}=F(b_{i} )\) is then obtained by the expectation

\[E[F(b_{i}) | \bar{Y}_{i}^*] = \int F(b) \phi(b; \mu_{c}, \sigma_{c}^2 )\mathrm{d}b\]

For the logarithmic transform \(F(b_{i}) =\exp(\mu_{i} + b_{i} + \sigma_{w}^2 / 2)\) and the expectation reduces to

\[E[F(b_{i}) | \bar{Y}_{i}^*] = \exp(\mu_{i} + \mu_{c} + \sigma_{c}^2 / 2 + \sigma_{w}^2 / 2)\]

which is a function of \(\bar{Y}_{i}^*\) through \(\mu_{c}\). To obtain the mean and variance of the prediction note that

\[\mu_{i} + \mu_{c} + \sigma_{c}^2 / 2 + \sigma_{w}^2 / 2 \sim N \left(\mu_{i} + \frac{\sigma_{b}^2 \sigma_{w}^2 / n_{i}}{2(\sigma_{b}^2 + \sigma_{w}^2 / n_{i})} + \frac{\sigma_{w}^2}{2}, \frac{\sigma_{b}^4}{\sigma_{b}^2 +\sigma_{w}^2 / n_{i}} \right)\]

It follows that the expectation of the prediction equals

\[E[E[F(b_{i}) | \bar{Y}_{i}^*]] = \exp \left( \mu_{i} + \frac{\sigma_{b}^2 \sigma_{w}^2 / n_{i}}{2(\sigma_{b}^2 + \sigma_{w}^2 / n_{i})} + \frac{\sigma_{w}^2}{2} + \frac{\sigma_{b}^4}{2(\sigma_{b}^2 +\sigma_{w}^2 / n_{i})} \right)\]

\[= \exp \left( \mu_{i} + \frac{\sigma_{b}^2}{2} + \frac{\sigma_{w}^2}{2} \right)\]

which equals the mean of the usual intake. However the variance of the prediction equals

\[\mathit{Var}[E[F(b_{i}|\bar{Y}_{i}^* ]] = \left[ \exp \left( \frac{\sigma_{b}^4}{(\sigma_{b}^2 +\sigma_{w}^2 / n_{i})} \right) -1 \right] \exp( 2\mu_{i} + \sigma_{b}^2 + \sigma_{w}^2)\]

Which is less than the variance of the usual intake. The approach of Kipnis et al (2009) [Kipnis et al., 2009] will therefor result in too much shrinkage of the model assisted usual intake.

Model assisted using a power transformation¶

For the power transformation a model assisted BLUP with optimal properties, as derived above, cannot be constructed. The approach of Kipnis et al. (2009) [Kipnis et al., 2009] can however be used to obtain a prediction in the following way. First approximate \(T_{i}=F(b_{i} )\) by Gauss-Hermite integration:

\[F(b_{i}) = T_{i} \approx \frac{1}{\sqrt{\pi}} \sum_{j=1}^{N} w_{i} (\mu_{i} + b_{i} +\sqrt{2} \sigma_{w} x_{i})^p\]

Secondly again use Gauss-Hermite to approximate the expectation of the conditional distribution giving the prediction \(P_{i}\).

\[P_{i} = E[F(b_{i}) | \bar{Y}_{i}^*] = \int F(b_{i}) \phi (b;\mu_{c}, \sigma_{c}^2) \mathrm{d}b \approx \frac{1}{\pi} \sum_{k=1}^{N} w_{k} \sum_{j=1}^{N} w_{j} (\mu_{i} + \mu_{c} + \sqrt{2} \sigma_{w} x_{j} + \sqrt{2} \sigma_{c}x_{k})^p\]

which is a function of \(\bar{Y}_{i}^*\) through \(\mu_{c}\). It is likely that the thus obtained predictions \(P_{i}\) have a variance that is too small. If we would know the mean \(\mu_{iP}\) and variance \(\sigma_{iP}^2\) of the predictions, the predictions could be linearly rescaled to have the correct mean \(\mu_{iT}\) and variance \(_{iT}^2\). The mean and variance of the prediction can be calculated using Gauss-Hermite integration.

\[\mu_{iP} = \frac{1}{\sqrt{\pi}} \sum_{l=1}^{N} w_{l} \frac{1}{\pi} \sum_{k=1}^{N} w_{k} \sum_{j=1}^{N} w_{j} (\mu_{i} + \sqrt{2} \frac{\sigma_{b} ^ 2}{\sigma_{b} ^ 2 + \sigma_{w} ^ 2 / n_{i}} x_{l}+ \sqrt{2} \sigma_{w} x _{j} + \sqrt{2} \sigma _{c} x _{k})^p\]

\[\sigma_{iP} ^2 = \frac{1}{\sqrt{\pi}} \sum_{l=1}^{N} w_{l} \left[ \frac{1}{\pi} \sum_{k=1}^{N} w_{k} \sum_{j=1}^{N} w_{j} (\mu_{i} + \sqrt{2} \frac{\sigma_{b} ^ 2}{\sigma_{b} ^ 2 + \sigma_{w} ^ 2 / n_{i}} x_{l}+ \sqrt{2} \sigma_{w} x _{j} + \sqrt{2} \sigma _{c} x _{k})^p \right] ^2 - \mu_{iP}^2\]

The proposed prediction then equals

\[P_{i}^* = \mu_{iT} + \frac{\sigma_{iT}}{\sigma_{iP}} (P_{i} - \mu_{iP})\]