I agree with most of what Maarten says. The -robust- option is indeed
robust to heteroskedasticity and/or overdispersion and/or
underdispersion at the price of being less robust to small sample
numbers. However, the -robust- option of -glm-, in default, does not use
the assumption that the mean is specified correctly. If the
user-specified model is linear, and the true relationship is a little
bit curved, then the robust 95% confidence interval for the slope of the
best-fit straight line for the sample should asymptotically, in 95% of
samples, contain the slope of the best-fit straight line for the
population. This population best-fit slope may be a useful thing to
know, even if the true relationship is not exactly linear. This is
because (as George Box said) all models are wrong, but some are not too
far wrong to be useful. This issue is discussed in Subsection 3.6 of
Hardin and Hilbe (2001).
It is not necessarily true that the Huber confidence interval is always
narrower than the maximum-likelihood confidence interval. It may be
either wider or narrower. In the simple case of the unequal-variance
t-test (which uses a Huber standard error), two good definitive
references are Moser et al. (1989) and Moser and Stevens (1992). In this
case, the confidence interval from the equal-variance t-test is too
narrow if the smaller sub-sample is from the more variable
sub-population, and is too wide if the smaller sub-sample is from the
less variable sub-population. This is because the equal-variance t-test
estimates the population variability of the smaller of 2 samples using
the sample variability of the larger sample, while the unequal-variance
t-test estimates the population variability of the smaller sample using
the sample variability of the smaller sample. Note, however, that, in
the case of the unequal-variance t-test, we calculate the confidence
interval using the t-distribution, with the degrees of freedom given by
Satterthwaite's formula. This is in contrast to -glm-, which calculates
the confidence interval using the Normal distribution, and therefore,
with small samples, probably gives too-narrow confidence intervals more
often than too-wide confidence intervals. Work seems to be in progress
to find a generalized Satterthwaite degrees-of-freedom formula for the
general Huber variance (eg Lipsitz and Ibrahim, 1999).
I hope this helps.
Best wishes
Roger
References
Hardin J ,Hilbe J. Generalized Linear Models and Extensions. College
Station, TX: Stata Press; 2001.
Lipsitz SR, Ibrahim JG. A degrees-of-freedom approximation for a
t-statistic with heterogeneous variance. The Statistician 1999; 48(Part
A): 495-505.
Moser BK, Stevens GR, Watts CL. The two-sample t-test versus
Satterthwaite's approximate F-test. Communications in Statistics -
Theory and Methods 1989; 18(11): 3963-3975.
B. K. Moser and G. R. Stevens. Homogeneity of variance in the
two--sample means test. The American Statistician 1992; 46(1): 19-21.
Roger Newson
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group
National Heart and Lung Institute
Imperial College London
Royal Brompton campus
Room 33, Emmanuel Kaye Building
1B Manresa Road
London SW3 6LR
UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: [email protected]
www.imperial.ac.uk/nhli/r.newson/
Opinions expressed are those of the author, not of the institution.
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Maarten buis
Sent: 31 January 2007 07:54
To: [email protected]
Subject: Re: st: option -robust- for -glm- and -poisson-
--- Joan Holand <[email protected]> wrote:
> I'm running a loglinear model (categorical data; Stata V.9, Windows
> XP) using the -glm- (option: fam(pois)) and the -poisson- command.
>
> I have a question about the option -robust-: When is it reasonable to
> use this option? I have read that it makes the confidence intervall
> narrower. Are there other reasons for / benefits of using this
> option?
When you are doing some type of regression modeling you are usually
making a model for both the mean and the variance. For example with the
poisson family you are assuming that the variance of the dependent
variable (the count) is the same as the mean. Another example is with
-regress- or the normal family you are assuming that the variance
remains constant (homoscedasticity). If you think you moddeled the mean
correctly but are not sure whether you modeled the variance correctly
then specifying the robust option corrects your standard errors for the
possible misspecification of your variance. This correction can lead
to both larger and smaller standard errors.
Robust is not a magic option though. It is based on an asymptotic
argument so I would not put too much faith in it in small samples.
Furthermore, it still assumes that you've modeled the mean correctly,
so you still need to check your model for that (i.e. look at the
residuals). Freedman (2006) is a (too) critical and readable piece on
the robust option.
Hope this helps,
Maarten
David A. Freedman (2006) `On The So-Called "Huber Sandwich Estimator"
and "Robust Standard Errors"', The American Statistician, 60(4), pp.
299-302.
-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands
visiting address:
Buitenveldertselaan 3 (Metropolitan), room Z434
+31 20 5986715
http://home.fsw.vu.nl/m.buis/
-----------------------------------------
___________________________________________________________
New Yahoo! Mail is the ultimate force in competitive emailing. Find out
more at the Yahoo! Mail Championships. Plus: play games and win prizes.
http://uk.rd.yahoo.com/evt=44106/*http://mail.yahoo.net/uk
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/