By Anonymous - August 2, 2017

Regression Calibration

A discussion of regression calibration in the context of AHS-2 data is available at:

Fraser, Gary E; Stram, Daniel O
Regression calibration when foods (measured with error) are the variables of interest: markedly non-Gaussian data with many zeroes.
American journal of epidemiology. 2012; 175:325-31
Pubmed ID: 10.1093/aje/kwr316

Abstract

Regression calibration has been described as a means of correcting effects of measurement error for normally distributed dietary variables. When foods are the items of interest, true distributions of intake are often positively skewed, may contain many zeroes, and are usually not described by well-known statistical distributions. The authors considered the validity of regression calibration assumptions where data are non-Gaussian. Such data (including many zeroes) were simulated, and use of the regression calibration algorithm was evaluated. An example used data from Adventist Health Study 2 (2002–2008). In this special situation, a linear calibration model does (as usual) at least approximately correct the parameter that captures the exposure-disease association in the “disease” model. Poor fit in the calibration model does not produce biased calibrated estimates when the “disease” model is linear, and it produces little bias in a nonlinear “disease” model if the model is approximately linear. Poor fit will adversely affect statistical power, but more complex linear calibration models can help here. The authors conclude that non-Gaussian data with many zeroes do not invalidate regression calibration. Irrespective of fit, linear regression calibration in this situation at least approximately corrects bias. More complex linear calibration equations that improve fit may increase power over that of uncalibrated regressions.

Bootstrap and BCa Confidence Intervals

The regression calibration approach described above relies on bootstrapping to better model the distribution of the regression coefficients and thus produce better confidence intervals. The procedure used for the calculation of the confidence intervals is the BCa method described in the following paper:

DiCiccio, T. J., & Efron, B. (1996). Bootstrap confidence intervals. Statistical science, 189-212.

BCa Confidence Intervals

The BCa confidence interval is $\left[\hat{\theta}^{*(\alpha_1/2)}, \hat{\theta}^{*(\alpha_2/2)}\right]$ where $\hat{}$ indicates estimated and $*$ indicates bootstrap. $\theta$ is the parameter whose CI is being estimated -- in our case usually a $\beta$. So the CI above indicates a value of $\theta$ such that the probability of being lower is $\alpha_1/2$ for the lower bound, and the probability of being higher is $\alpha_2/2$. These are read off the bootstrap distribution of $\hat{\theta}^{*}$.

The BCa values of $\alpha_1/2$ and $\alpha_2/2$ are given by
$$ \alpha_1/2 = \mathrm{pnorm}\left(Z_0 + \frac{Z_0 + Z^{\alpha/2}}{1-a(Z_0 + Z^{\alpha/2})}\right) \\
\alpha_2/2 = \mathrm{pnorm}\left(Z_0 + \frac{Z_0 + Z^{1-\alpha/2}}{1-a(Z_0 + Z^{1-\alpha/2})}\right) $$

where pnorm indicates (in R) the distribution function of a standardized normal curve (i.e. $\mathrm{pr}(\mathrm{being} \leq Z^\prime)$ where $Z^\prime$ is the quantity that $\mathrm{pnorm}$ is operating on in parentheses. So $\alpha_1/2$ is the probability of being $\leq Z^{\alpha_1/2}$ and the right side of the equation is $\mathrm{pr}\left(\theta \leq Z^\prime\right)$. Hence, $$Z^{\alpha_1/2} = Z^\prime = Z_0 + \frac{Z_0 + Z^{\alpha/2}}{1-a(Z_0 + Z^{\alpha/2}} $$.

The BCa 'p' value

When $\hat{\theta} > 0$:

If we observe a proportion $\alpha_1/2$ values of the bootstrap $\hat{\theta}^* 0$, then pretend this is the lower bound of a BCA CI, and solve for $Z^{\alpha/2}$. This sets the lower bound of the CI at a value $\theta = 0$, and then asks what % CI this is? Remember this is only one side, so the value of $\alpha/2$ that we estimate needs to be doubled.

So, it ends up that we take equation(1) and solve for $Z^{\alpha/2}$ as other quantities are known. The solution is:
$$ Z^{\alpha/2} = \frac{(a\cdot Z_0 -1)(Z^{\alpha_1/2} - Z_0) + Z_0}{a(Z_0 - Z^{\alpha_1/2}) - 1}$$
Then $\alpha/2$ is also found.

When $\hat{\theta} 0$:

The calculation is almost identical when $\hat{\theta} 0$ except in equation (1) $\alpha/2$ is replaced by $(1-\alpha/2)$. Note that $\alpha_2/2$ does not become $(1-\alpha_2/2)$ as it reflects pnorm which does not change direction.

Then $$Z^{(1-\alpha/2)} = \frac{(a\cdot Z_0-1)(Z^{\alpha_2/2} - Z_0) + Z_0}{a(Z_0 - Z^{\alpha_2/2})-1}$$
Then $\alpha/2$ is found.

Method of Triads

For nutritional variable which have biomarkers available, the method of triads is sometimes used. See:

Publications
Burkholder-Cooley NM, Rajaram SS, Haddad EH, Oda K, Fraser GE, Jaceldo-Siegl K. Validating polyphenol intake estimates from a food-frequency questionnaire by using repeated 24-h dietary recalls and a unique method-of-triads approach with 2 biomarkers. Am J Clin Nutr. 2017 Mar;105(3):685-694. doi: 10.3945/ajcn.116.137174. Epub 2017 Jan 25. PubMed PMID: 28122784; PubMed Central PMCID: PMC5320407.

References
Kaaks RJ. Biochemical markers as additional measurements in studies of the accuracy of dietary questionnaire measurements: conceptual issues. Am J Clin Nutr. 1997 Apr;65(4 Suppl):1232S-1239S. doi: 10.1093/ajcn/65.4.1232S. Review. PubMed PMID: 9094927.
Gormley IC, Bai Y, Brennan L. Combining biomarker and self-reported dietary intake data: A review of the state of the art and an exposition of concepts. Stat Methods Med Res. 2019 Apr 4:962280219837698. doi: 10.1177/0962280219837698. [Epub ahead of print] PubMed PMID: 30943855.