Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
re: Re: st: Constructing a variable from standard deviations
From
Maarten buis <[email protected]>
To
[email protected]
Subject
re: Re: st: Constructing a variable from standard deviations
Date
Mon, 22 Nov 2010 19:19:38 +0000 (GMT)
So now Mathijs has 3 solutions:
[Stas] use one -regres-, -predict- the residuals, compute the standard
error of these residuals for each occupation separately.
[me] use maximum likelihood to estimate a model with different
residual standard errors for each occupation.
[Kit] estimate a separate regression for each occupation, predict the
residuals, and compute the standard error of each occupation.
The solutions by me and Kit have the advantage that the model explicitly
allows for differences in the residual variance, Stas' solution works, but
is substantively awkward as in it you use a model that assumes that the
residual variance is constant acros occupations to estimate differences in
the residual variance across occupations.
However, when comparing these models in the simulation below, all three
seem to perform reasonably well. The bias is larger than one would have
expected given the randomness inherit in Monte Carlo simulations (compare
the biases with the MCse), but the biases are so small that based on this
simulation I would be happy to use any of the three methods.
The example below requires Ian White's -simsum-, which you can download
by typing in Stata: -ssc install simsum-, and you can read more about it
in Ian White (2010) "simsum: Analyses of simulation studies including
Monte Carlo error" The Stata Journal, 10(3): 369--385.
http://www.stata-journal.com/article.html?article=st0200
*--------------------------- begin simulation ------------------------
program drop _all
program define mynormal_lf
version 11
args lnfj mu ln_sigma
quietly replace `lnfj' = ///
ln(normalden($ML_y1, `mu', exp(`ln_sigma')))
end
program define sim, rclass
// create data
drop _all
set obs 500
gen occ = ceil(4*runiform())
gen x = rnormal()
gen sd = cond(occ==1,.2, ///
cond(occ==2,.6, ///
cond(occ==3,.4, .8)))
gen y = 1 + 2*(occ==2) + .5*(occ==3) + ///
3*(occ==4) + sd*rnormal()
// Stas
reg y x i.occ
predict double resid, resid
sum resid if occ == 2
return scalar stas = r(sd)
// Maarten
tempname b0 rmse
matrix `b0' = e(b)
scalar `rmse' = ln(e(rmse))
ml model lf mynormal_lf ///
(mu: y = x i.occ) ///
(ln_sigma: i.occ)
ml init `b0'
ml init ln_sigma:_cons = `= `rmse' '
ml maximize
return scalar maarten = exp( [ln_sigma]_b[2.occ] + ///
[ln_sigma]_b[_cons] )
// Kit
reg y x if occ==2
qui predict double eps if e(sample), resid
sum eps if occ == 2
return scalar kit = r(sd)
end
simulate stas=r(stas) maarten=r(maarten) kit=r(kit), reps(10000) : sim
simsum stas maarten kit, true(.6) mcse bias
twoway kdensity stas || kdensity maarten || kdensity kit
*--------------------------- end simulation ----------------------------
Hope this helps,
Maarten
--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany
http://www.maartenbuis.nl
--------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/