Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Constructing a variable from standard deviations

From	"M.P.J. van Zaal" <[email protected]>
To	[email protected]
Subject	Re: st: Constructing a variable from standard deviations
Date	Mon, 22 Nov 2010 16:28:06 +0100

Thanks for your point of view.

First of all, my sample size is +/_ 6000, but because not all my 
control variables have 6000 observations the sample size used in my 
regression is +/- 3500.

I am inclining to agree with mr Kolenikov, because the study i am 
(partly) replicating in my thesis does use OLS estimates.  

They do use robust standard errors in the Mincerian wage regression. 
In the second step (regression of standard deviations on individual 
characteristics) they use robust errors en clustering. 
Does robust errors and clustering indicate that I should use mr Buis 
his method or can I still stick with OLS?

What do you guys think?

p.s The study i am referring to is: 
Bonin et all (2007). Cross-sectional earnings risk and occupational 
sorting: The role of risk attitudes. Labour Economics 14 926–937.




----- Original Message -----
From: Stas Kolenikov <[email protected]>
Date: Monday, November 22, 2010 4:13 pm
Subject: Re: st: Constructing a variable from standard deviations
To: [email protected]

> On Mon, Nov 22, 2010 at 6:29 AM, Maarten buis 
> <[email protected]> wrote:
> > --- On Mon, 22/11/10, M.P.J. van Zaal wrote:
> >> You state that the residual variance is assumed to be
> >> constant. This is actually not the case. I have 106
> >> different residual stand deviations. I achieved this by
> >> using "predict "nameocc" if "dummyoccupation"==1, resid"
> >> to predict the residuals. Now I have 106 different
> >> residuals, and when i check tabstat their standard
> >> deviations are quite different (varying from 0.18-0.8).
> >
> > If you use -regress- than you assume that the residual
> > variance is constant. The fact that you find differences
> > in the residual variance across groups just means that
> > you estimated a misspecified model. Normally I would be
> > pretty relaxed about this heteroskedasticity, but not so
> > in your case, because now this residual variance is a
> > key parameter of substantive interest. If you estimate the
> > model I proposed you solve that problem.
> 
> I disagree. Mathijs can run any regression he likes, can't he? It is
> just a matter of doing the inference right, if he needs to. If he
> needed to do inference with this regression, then of course without
> -robust- or -cluster(occupation)- option the results may be
> meaningless. Maarten is right: the basic assumption of OLS is that
> error variances are constant (and Mathijs cannot argue with that; he
> can report the finding that in his actual data this assumption does
> not hold, but this does not change the underlying assumption of the
> model). But if all Mathijs needs out of this regression is a
> reasonable line to take deviations from, then OLS is  pretty much as
> good as a line by any other sophisticated method.
> 
> Maarten's solution will give asymptotically efficient estimates in
> presence of heteroskedasticity, i.e., will be slightly more accurate
> in large samples when heteroskedasticity is indeed present. I
> personally don't believe you can gain much from modeling
> heteroskedasticity unless the differences in variances are huge, like
> a factor of 20 or so, although I cannot ground my belief in anything
> outside the common statistical sense. In small samples, however,
> excessive modeling of difficult-to-identify phenomena (like
> heteroskedasticity here) usually leads to notable small sample 
biases,
> so in the end the estimates from the solution that Maarten suggested
> may not be of much greater accuracy unless the sample sizes are well
> into thousands (Mathijs did not give his original sample size for us
> to make a judgement).
> 
> So I would stick with a simple solution:
> 
> regress depvar whatever
> predict res
> egen sd_by_occup = sd( res ), by( occupation )
> 
> -- 
> Stas Kolenikov, also found at http://stas.kolenikov.name
> Small print: I use this email account for mailing lists only.
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Constructing a variable from standard deviations
  - From: Maarten buis <[email protected]>

References:
- Re: st: Constructing a variable from standard deviations
  - From: "M.P.J. van Zaal" <[email protected]>
- Re: st: Constructing a variable from standard deviations
  - From: Maarten buis <[email protected]>
- Re: st: Constructing a variable from standard deviations
  - From: Stas Kolenikov <[email protected]>

Prev by Date: [no subject]
Next by Date: Re: st: Constructing a variable from standard deviations
Previous by thread: Re: st: Constructing a variable from standard deviations
Next by thread: Re: st: Constructing a variable from standard deviations
Index(es):
- Date
- Thread