Ali and Nicola--
Note that -vwls- and the -robust- option address different issues--the
first tries to improve efficiency of point estimates using some
untenable assumptions and the second tries to improve standard errors.
Also note I replied to the original post on Sep 21 to the effect:
"
[R] vwls gives
Grizzle J E, Starmer C F & Koch G G. 1969. "Analysis of categorical
data by linear models." Biometrics 25:489-504.
as the reference.
I believe this model is a special case of meta-analytic regression (FE
metareg) and also a special case of GMM (with a potentially suboptimal
weighting matrix). I would recommend you read the GMM exposition in
BSS2007:
http://econpapers.repec.org/paper/bocbocoec/667.htm
and use -ivreg2- instead.
"
I will add that -vwls- assumes the relevant variances are known, not
estimated--the help file says "vwls treats the estimated variance as
if it were the true variance when it computes standard errors of the
coefficients." Therefore the approach outlined of calculating the SD
of the outcome variable (not the residuals, hence not like the
-robust- option) and then using the calculated SD to construct what
amount to analytic weights for use with -vwls- does not take account
of the variance due to the preliminaries of calculating the SDs of
outcome variable Y (birthwt) at different levels of X (gest age), and
need not have any of the desirable properties of OLS. -vwls- is
essentially calculating SDs of the outcome variable, and including a
transformation of these as analytic weights in a -regress- command,
which seems less than optimal--it will also drop observations if
categories are insufficiently large to produce estimates of the SD, or
if the estimated SD is zero (in which case that value of X would have
infinite weight in the regression), so you may lose sample size.
As an example of the strangeness of this model, if Y varied not at all
for X=27 and X=28 in your data but had variance=10 for X=29 and X=30,
-vwls- would drop all the obs with X=27 and X=28 and estimate only
using obs with X=29 and X=30. If the variance of Y at X=27 and X=28
were instead 1e-15, i.e. very close to zero but nonzero, then -vwls-
would essentially drop obs all the obs with X=29 and X=30 and estimate
only off obs with X=27 and X=28, i.e. use an entirely different
subsample.
It's not clear to me why the variable SD of Y is a problem--unless you
know measurement error varies with X, in which case you should be
using a different model in any case--it is the variable mean of the
error term that is a problem for inference. You would be much better
off using the -robust- option or perhaps -cluster(gest_age)- or, if
you can think of some excluded instruments, -ivreg2- with the -gmm-
option.
Compare:
sysuse auto, clear
bys rep: egen s=sd(mpg)
gen w=s^(-2)
xi: reg mpg i.rep [aw=w]
xi: vwls mpg i.rep
xi: reg mpg i.rep, r
(there are not enough categories in rep78 to use the -cluster- option
with any confidence) or:
bys wei: egen sd=sd(mpg)
gen wt=sd^(-2)
reg mpg wei [aw=wt], nohe
vwls mpg wei
reg mpg wei, r nohe
reg mpg wei, cl(wei) nohe
ivreg2 mpg (wei=len), gmm nohe
On 9/26/07, [email protected] <[email protected]> wrote:
> I have no statistical references, sorry, and I don't have access to Stata manuals. But if the problem is heteroskedasticity, can't you simple use option robust? Has your variance some clear trend (e.g. increasing with time)?
> Can -wls0- from http://www.ats.ucla.edu/stat/stata/ado/analysis help? It cites the Greene's manual "Econometric Analysis" fourth edition
> Nicola
>
> At 12.25 24/09/2007 +0100, you wrote:
> >Thanks Nicola,
> >
> >I used vwls for a birthweight model with gestational age as an explanatory
> >variable. One of the problems of regress was the constant variance assumption.
> >The variance was different for different gestational age weeks. I calculated
> >the standard deviation for each gestational age week and included it in the
> >model as suggested by stata manuals.
> >
> >I have references for weighted least squares but was wondering whether there was
> >any references that clearly talk about vwls apart from the two references in
> >stata manual.
> >
> >Many thanks
> >Ali
> >
> >
> >Quoting [email protected]:
> >
> >>I have a lot of empirical papers using weighted least squares to account for heteroskedasticity. But it sounds more like -regress- with analytic weights than -vwls- (the latter is useful only in particularly rare situations, I believe, in which you'd never think about OLS), and I am not sure about what you are referring to.
> >>Nicola
> >>
> >>At 02.33 21/09/2007 -0400, Ali Khashan wrote:
> >>>Dear all,
> >>>
> >>>I have used variance weighted least squares (vwls) instead of ordinary least
> >>>squares regression in my study. I need a reference for this method but I can't
> >>>find any papers which clearly mention it.
> >>>
> >>>Are there any papers/books which clearly recommend the use of vwls in some
> >>>cases?
> >>>
> >>>your help is highly appreciated
> >>>
> >>>Many Thanks
> >>>Ali
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/