Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Constructing a variable from standard deviations
From
"M.P.J. van Zaal" <[email protected]>
To
[email protected]
Subject
Re: st: Constructing a variable from standard deviations
Date
Mon, 22 Nov 2010 16:28:06 +0100
Thanks for your point of view.
First of all, my sample size is +/_ 6000, but because not all my
control variables have 6000 observations the sample size used in my
regression is +/- 3500.
I am inclining to agree with mr Kolenikov, because the study i am
(partly) replicating in my thesis does use OLS estimates.
They do use robust standard errors in the Mincerian wage regression.
In the second step (regression of standard deviations on individual
characteristics) they use robust errors en clustering.
Does robust errors and clustering indicate that I should use mr Buis
his method or can I still stick with OLS?
What do you guys think?
p.s The study i am referring to is:
Bonin et all (2007). Cross-sectional earnings risk and occupational
sorting: The role of risk attitudes. Labour Economics 14 926–937.
----- Original Message -----
From: Stas Kolenikov <[email protected]>
Date: Monday, November 22, 2010 4:13 pm
Subject: Re: st: Constructing a variable from standard deviations
To: [email protected]
> On Mon, Nov 22, 2010 at 6:29 AM, Maarten buis
> <[email protected]> wrote:
> > --- On Mon, 22/11/10, M.P.J. van Zaal wrote:
> >> You state that the residual variance is assumed to be
> >> constant. This is actually not the case. I have 106
> >> different residual stand deviations. I achieved this by
> >> using "predict "nameocc" if "dummyoccupation"==1, resid"
> >> to predict the residuals. Now I have 106 different
> >> residuals, and when i check tabstat their standard
> >> deviations are quite different (varying from 0.18-0.8).
> >
> > If you use -regress- than you assume that the residual
> > variance is constant. The fact that you find differences
> > in the residual variance across groups just means that
> > you estimated a misspecified model. Normally I would be
> > pretty relaxed about this heteroskedasticity, but not so
> > in your case, because now this residual variance is a
> > key parameter of substantive interest. If you estimate the
> > model I proposed you solve that problem.
>
> I disagree. Mathijs can run any regression he likes, can't he? It is
> just a matter of doing the inference right, if he needs to. If he
> needed to do inference with this regression, then of course without
> -robust- or -cluster(occupation)- option the results may be
> meaningless. Maarten is right: the basic assumption of OLS is that
> error variances are constant (and Mathijs cannot argue with that; he
> can report the finding that in his actual data this assumption does
> not hold, but this does not change the underlying assumption of the
> model). But if all Mathijs needs out of this regression is a
> reasonable line to take deviations from, then OLS is pretty much as
> good as a line by any other sophisticated method.
>
> Maarten's solution will give asymptotically efficient estimates in
> presence of heteroskedasticity, i.e., will be slightly more accurate
> in large samples when heteroskedasticity is indeed present. I
> personally don't believe you can gain much from modeling
> heteroskedasticity unless the differences in variances are huge, like
> a factor of 20 or so, although I cannot ground my belief in anything
> outside the common statistical sense. In small samples, however,
> excessive modeling of difficult-to-identify phenomena (like
> heteroskedasticity here) usually leads to notable small sample
biases,
> so in the end the estimates from the solution that Maarten suggested
> may not be of much greater accuracy unless the sample sizes are well
> into thousands (Mathijs did not give his original sample size for us
> to make a judgement).
>
> So I would stick with a simple solution:
>
> regress depvar whatever
> predict res
> egen sd_by_occup = sd( res ), by( occupation )
>
> --
> Stas Kolenikov, also found at http://stas.kolenikov.name
> Small print: I use this email account for mailing lists only.
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/