Three commands available since Stata 8 fit generalized linear models when one or more covariates are measured with error.
It is well known that such errors usually lead to attenuation of estimated effects, and the new commands adjust for that attenuation to produce correct standard errors and test statistics. These commands allow adjustments to be made in the generalized linear model framework using the following methods:
The software described here provides the first implementation of regression calibration and of SIMEX in a general-purpose statistical package. Regression calibration was suggested as a general approach by Carroll and Stefanski (1990) and Gleser (1990). SIMEX was proposed by Cook and Stefanski (1995) and further developed by Carroll, Küchenhoff, Lombard, and Stefanski (1996) and Stefanski and Cook (1995).
The software provided is written by R. J. Carroll, J. Hardin, and H. Schmiediche. The work described here was partly funded by the National Institutes of Health, National Center for Research Resources, Grant Number 5R44RR12435-03. The SIMEX method is very computationally intensive, and this new implementation of it is the fastest ever.
The discussion below is presented under the headings
The new commands are named qvf, rcal, simex, and simexplot.
To load them, type the following in Stata:
. net from http://www.stata.com/merror . net install merror
Or you can download merror.zip and after unzipping the file to somewhere on your harddrive (i.e., C:/data/merror/ ), type
. net from C:/data/merror/ . net install merror
Once installed, you can type
. whelp qvf . whelp rcal . whelp simex . whelp simexplot
These commands are implemented using Stata’s plug-in features, which allow code written in C to be added to Stata. This means the new commands are fast.
Because the new features are written as binary code, modules for different platforms (e.g., Windows and Unix) cannot be interchanged. Nevertheless, installation is completely automatic. When you type net install merror, Stata will install the appropriate modules for your computer. Just as with ado-files installed over the web, should you wish to uninstall these materials, you can type ado uninstall merror.
The measurement-error analysis software is available for the following platforms:
Windows (XP, 2000, NT, ME, 98)
Mac (PowerPC)
Linux x86 and x86-64
IBM RS/6000 AIX
Digital Unix
HP-UX
Sun Solaris and Sun Solaris 64-bit
If you attempt to install from a computer not on the list above, when you type net install merror, you will get the error "file http://www.stata.com/merror/qvfmex.plugin not found; could not copy http://www.stata.com/merror/qvfmex.plugin".
The generalized linear model framework is a rich collection of models that allows fitting of
and many others. Say you wish to fit such a model and include the variable X:
F(outcome) = b0 + b1*X + b2*Z2 + b3*Z3 + ...
You, however, do not have X. Let's assume that instead you have W, an error-prone version of X. Simply substituting W for X will result in estimates of b1 being biased toward 0 and estimates of b2, b3, ..., also being biased, although the bias may be toward 0 or away from it.
Correctly dealing with measurement error requires estimating an equation such as the one above to obtain unbiased coefficients and correct standard errors.
The software provided here can do that when
Using the SIMEX method, for instance, not only can you obtain unbiased estimates and correct standard errors, you can obtain a graph that shows how the amount of measurement error affects the estimated coefficients:
The above graph shows estimated coefficients (b1, b2, b3, b4, b5) for
yi = b1x1i + b2x2i + b3x3i + b4x4i + b5 + ui
where x3 and x4 are measured with error by w3 and w4. The graph illustrates the extrapolated point estimates for all covariates in the fitted model. With multiple covariates, naive fitted covariates may be biased in either direction, as illustrated.
At the North American Users Group meeting held the March 18–19th, 2003 in Boston, Massachusetts, Raymond Carroll, James Hardin, and Henrik Schmiediche presented a one-day workshop on measurement error and the use of the new software. The slides from that presentation are available in two formats:
162 slides, one per page, in
pdf format
162 slides, four per page,
in pdf format (suitable for printing)
Stata Journal Volume 3, Number 4 is dedicated to measurement-error issues and the use of the software:
Measurement error, GLMs, and notational conventions, by James Hardin and Raymond Carroll
Variance estimation for the instrumental variables approach to measurement error in generalized linear models by James Hardin and Raymond Carroll
Instrumental variables, bootstrapping, and generalized linear models, by James Hardin, Henrik Schmiediche, and Raymond Carroll
The regression calibration method for fitting generalized linear models with additive measurement error, by James Hardin, Henrik Schmiediche, and Raymond Carroll
The simulation extrapolation method for fitting generalized linear models with additive measurement error, by James Hardin, Henrik Schmiediche, and Raymond Carroll
Maximum likelihood estimation of generalized linear models with covariate measurement error, by Sophia Rabe–Hesketh, Anders Skrondal, and Andrew Pickles
We also recommend the book Measurement Error in Nonlinear Models by R. J. Carroll, D. Ruppert, and L. A. Stefanski, published by Chapman & Hall, 1995.
Raymond Carroll is a Distinguished Professor, a Professor of Statistics, and a Professor of Nutrition and Toxicology at Texas A&M University. He is also Director of Biostatistics Research at the Center for Environmental and Rural Health (NIEHS) and Director of the Training Program in Biology, Bioinformatics, and Nutrition for the National Cancer Institute, both at Texas A&M University.
Dr. Carroll is the author of three books and over 200 professional papers, including papers on measurement error modeling, regression variance functions and transformations, nutrition, toxicology, and bioinformatics. Dr. Carroll received his Ph.D. in Statistics from Purdue University in 1974.
James Hardin is Lecturer and Assistant Research Scientist at Texas A&M University and previously was a Senior Statistician at StataCorp, where he developed Stata's cross-sectional time-series capabilities. He is also the author of Stata's current GLM command. He is the author of two books and ten refereed papers, and he has recently been working with Henrik Schmiediche developing the Stata software for fitting measurement-error models. Dr. Hardin received his Ph.D. in Statistics from Texas A&M University in 1992.
Henrik Schmiediche is a Senior Lecturer and Senior Systems Analyst at the Department of Statistics of Texas A&M University. He holds a B.S. degree in Computer Science and Ph.D. in Statistics. He has enjoyed programming since his high school days when RAM was scarce and CPU’s were slow. Over the last decade he has had several occasions to work on implementing and coding aspects of estimating measurement error models. The culmination of this effort is the software in Stata, written in collaboration with Dr. Hardin.
The project described above was supported by Grant Number R44 RR12435 from the National Institutes of Health, National Center for Research Resources. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the National Center for Research Resources.