Home  /  Resources & support  /  Generalized linear measurement error models
This page is for archival purposes only. The content on it is from June 2003, and will not be updated.

Stata software for generalized linear measurement error models

June 4, 2003

Three commands available since Stata 8 fit generalized linear models when one or more covariates are measured with error.

It is well known that such errors usually lead to attenuation of estimated effects, and the new commands adjust for that attenuation to produce correct standard errors and test statistics. These commands allow adjustments to be made in the generalized linear model framework using the following methods:

  • Instrumental variables
  • Regression calibration
  • Simulation/extrapolation (SIMEX)

The software described here provides the first implementation of regression calibration and of SIMEX in a general-purpose statistical package. Regression calibration was suggested as a general approach by Carroll and Stefanski (1990) and Gleser (1990). SIMEX was proposed by Cook and Stefanski (1995) and further developed by Carroll, Küchenhoff, Lombard, and Stefanski (1996) and Stefanski and Cook (1995).

The software provided is written by R. J. Carroll, J. Hardin, and H. Schmiediche. The work described here was partly funded by the National Institutes of Health, National Center for Research Resources, Grant Number 5R44RR12435-03. The SIMEX method is very computationally intensive, and this new implementation of it is the fastest ever.

The discussion below is presented under the headings

  1. Downloading and installing the new commands
  2. Background and introduction
  3. Obtaining more information
  4. About the authors
  5. References

1. Downloading and installing the new commands

The new commands are named qvf, rcal, simex, and simexplot.

To load them, type the following in Stata:

        . net from http://www.stata.com/merror
        . net install merror

Or you can download merror.zip and after unzipping the file to somewhere on your harddrive (i.e., C:/data/merror/ ), type

        . net from C:/data/merror/
        . net install merror

Once installed, you can type

        . whelp qvf
        . whelp rcal
        . whelp simex
        . whelp simexplot

These commands are implemented using Stata’s plug-in features, which allow code written in C to be added to Stata. This means the new commands are fast.

Because the new features are written as binary code, modules for different platforms (e.g., Windows and Unix) cannot be interchanged. Nevertheless, installation is completely automatic. When you type net install merror, Stata will install the appropriate modules for your computer. Just as with ado-files installed over the web, should you wish to uninstall these materials, you can type ado uninstall merror.

The measurement-error analysis software is available for the following platforms:

Windows (XP, 2000, NT, ME, 98)
Mac (PowerPC)
Linux x86 and x86-64
IBM RS/6000 AIX
Digital Unix
HP-UX
Sun Solaris and Sun Solaris 64-bit

If you attempt to install from a computer not on the list above, when you type net install merror, you will get the error "file http://www.stata.com/merror/qvfmex.plugin not found; could not copy http://www.stata.com/merror/qvfmex.plugin".

2. Background and introduction

The generalized linear model framework is a rich collection of models that allows fitting of

  • linear regression models
  • logistic and probit regression models
  • Poisson and negative binomial regression models

and many others. Say you wish to fit such a model and include the variable X:

F(outcome) = b0 + b1*X + b2*Z2 + b3*Z3 + ...

You, however, do not have X. Let's assume that instead you have W, an error-prone version of X. Simply substituting W for X will result in estimates of b1 being biased toward 0 and estimates of b2, b3, ..., also being biased, although the bias may be toward 0 or away from it.

Correctly dealing with measurement error requires estimating an equation such as the one above to obtain unbiased coefficients and correct standard errors.

The software provided here can do that when

  1. You have one variable W that measures X with error and a value for s2, the variance of that error:
    W = X + u, E(u)=0, V(u) = s2
  2. You have two or more replicates W1, W2, ..., which each measure X with error, and optionally you also have a value for s2, their common error variance:
    W1 = X + u1,
    W2 = X + u2,
    ...
    E(ui)=0 and V(ui) = s2
  3. You have a set of exogenous variables Z correlated with X from which you can derive an instrument T;
    W = a1*Z + e; T = â1*Z

Using the SIMEX method, for instance, not only can you obtain unbiased estimates and correct standard errors, you can obtain a graph that shows how the amount of measurement error affects the estimated coefficients:

Graph
Click for larger image

The above graph shows estimated coefficients (b1, b2, b3, b4, b5) for

yi = b1x1i + b2x2i + b3x3i + b4x4i + b5 + ui

where x3 and x4 are measured with error by w3 and w4. The graph illustrates the extrapolated point estimates for all covariates in the fitted model. With multiple covariates, naive fitted covariates may be biased in either direction, as illustrated.

3. Obtaining more information

At the North American Users Group meeting held the March 18–19th, 2003 in Boston, Massachusetts, Raymond Carroll, James Hardin, and Henrik Schmiediche presented a one-day workshop on measurement error and the use of the new software. The slides from that presentation are available in two formats:

162 slides, one per page, in pdf format

162 slides, four per page, in pdf format (suitable for printing)

Stata Journal Volume 3, Number 4 is dedicated to measurement-error issues and the use of the software:

Measurement error, GLMs, and notational conventions, by James Hardin and Raymond Carroll

Variance estimation for the instrumental variables approach to measurement error in generalized linear models by James Hardin and Raymond Carroll

Instrumental variables, bootstrapping, and generalized linear models, by James Hardin, Henrik Schmiediche, and Raymond Carroll

The regression calibration method for fitting generalized linear models with additive measurement error, by James Hardin, Henrik Schmiediche, and Raymond Carroll

The simulation extrapolation method for fitting generalized linear models with additive measurement error, by James Hardin, Henrik Schmiediche, and Raymond Carroll

Maximum likelihood estimation of generalized linear models with covariate measurement error, by Sophia Rabe–Hesketh, Anders Skrondal, and Andrew Pickles

We also recommend the book Measurement Error in Nonlinear Models by R. J. Carroll, D. Ruppert, and L. A. Stefanski, published by Chapman & Hall, 1995.

4. About the authors

Raymond Carroll is a Distinguished Professor, a Professor of Statistics, and a Professor of Nutrition and Toxicology at Texas A&M University. He is also Director of Biostatistics Research at the Center for Environmental and Rural Health (NIEHS) and Director of the Training Program in Biology, Bioinformatics, and Nutrition for the National Cancer Institute, both at Texas A&M University.

Dr. Carroll is the author of three books and over 200 professional papers, including papers on measurement error modeling, regression variance functions and transformations, nutrition, toxicology, and bioinformatics. Dr. Carroll received his Ph.D. in Statistics from Purdue University in 1974.

James Hardin is Lecturer and Assistant Research Scientist at Texas A&M University and previously was a Senior Statistician at StataCorp, where he developed Stata's cross-sectional time-series capabilities. He is also the author of Stata's current GLM command. He is the author of two books and ten refereed papers, and he has recently been working with Henrik Schmiediche developing the Stata software for fitting measurement-error models. Dr. Hardin received his Ph.D. in Statistics from Texas A&M University in 1992.

Henrik Schmiediche is a Senior Lecturer and Senior Systems Analyst at the Department of Statistics of Texas A&M University. He holds a B.S. degree in Computer Science and Ph.D. in Statistics. He has enjoyed programming since his high school days when RAM was scarce and CPU’s were slow. Over the last decade he has had several occasions to work on implementing and coding aspects of estimating measurement error models. The culmination of this effort is the software in Stata, written in collaboration with Dr. Hardin.

5. References

Carroll, R. J., D. Ruppert, and L. A. Stefanski. 1995.
Measurement Error in Nonlinear Models. London: Chapman & Hall/CRC.
Carroll, R. J., H. Küchenhoff, F. Lombard, and L. A. Stefanski. 1996.
Asymptotics for the SIMEX estimator in structural measurement error models. Journal of the American Statistical Association, vol. 91, no. 433, 242–250.
Carroll, R. J. and L. A. Stefanski. 1990.
Approximate quasilikelihood estimation in models with surrogate predictors. Journal of the American Statistical Association, vol. 85, pp. 652–663.
Cook, J. and L. A. Stefanski. 1995.
A simulation extrapolation method for parametric measurement error models. Journal of the American Statistical Association, vol. 89, pp. 1314–1328.
Gleser, L. J. 1990.
Improvements of the naive approach to estimation in nonlinear errors-in-variables regression models. In Statistical Analysis of Error Measurement Models and Application, P. J. Brown and W. A. Fuller, ed. Providence: American Mathematics Society.
Stefanski, L. A. and J. Cook. 1995.
Simulation extrapolation: The measurement error jackknife. Journal of the American Statistical Association, vol. 90, no. 432, 1247–1256.

The project described above was supported by Grant Number R44 RR12435 from the National Institutes of Health, National Center for Research Resources. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the National Center for Research Resources.