Dear Steven and Stas,
thank you very much for your extremely helpful advice - I will get back to you
once I have tried!
Kind regards,
Johannes
Steven Samuels
<ssamuels@alban
y.edu> To
Sent by: [email protected]
owner-statalist cc
@hsphsun2.harva
rd.edu Subject
Re: st: How to test for equality of
variance in data with sampling weights
05/25/2007
06:10 PM
Please respond
to
statalist@hsphs
un2.harvard.edu
Yohannes wrote to me privately that he has a household indicator, say
hh_id; that household is the only PSU he can identify in the data
set; and that in a single urban setting, there is no stratum
variable. In that case he would set up his analysis with:
"svyset hh_id [pweight=finalwgt]"
As Stas indicated in HIS recent email, one can compute a SD as a
function of expectations and use either -testnl- or (my choice) -
nlcom- because:
SD(income)= square root of E(income^2)-(E(income))^2
However -nlcom- will produce an erroneous standard error unless the
variable is standardized by subtracting off the mean: inc= inc - mean
(inc) and then squaring : inc2= inc*inc Then E(inc2) = var(income)
and sqrt(E(inc2)) estimates the SD of income
As the SD is apt to have an asymmetric distribution, I suggest that
the Johannes estimate the SE for the log(SD) and then convert back to
the SD scale.
Johannes actually wants to compare SD's in two groups, assumed to be
Male & Female gender here for illustration. In that case, I
recommend that he compute a CI for the ratio, rather then for the
difference, and that he do this on the log scale and then convert
back to the ratio scale.
Below is code that should work. This utilizes the linearization
method. Possibly Johannes might wish to try a jackknife estimate of
the variance-covariance matrix.
Steve
/***************************CODE
FOLLOWS*********************************************/
capture program drop _all
/* First a little program to back transform calculations done on the
log scale after -nlcom- */
program antilog
local lparm el(r(b),1,1)
local se sqrt(el(r(V),1,1))
local bound invttail(e(df_r),.025)*`se' //For 95% CI's
local parm exp(`lparm')
local ll exp(`lparm' - `bound')
local ul exp( `lparm' + `bound')
di "parm =" `parm' " ll = " `ll' " ul = " `ul'
end
/* Get Estimate of the Mean for each Group */
svy: mean income, over(gender)
/* If gender has value labels (e.g. 1=Male 2=Female) use the
following syntax */
gen inc=income-[income]Male if gender==1
replace inc=income-[imcome]Female if gender==2
/* Use this syntax if gender has no value label, but values 1 & 2 as
above */
gen inc=income-[income]1 if gender==1
replace inc=income-[imcome]2 if gender==2
/* Now compute the square term */
gen inc2=inc*inc
svymean: inc2, over(gender) //estimate for inc2 is the estimated
Variance of income
/* Individual SD's. Log Scale */
nlcom .5*log([inc2]Male)
antilog
nlcom .5*log([inc2]Female)
antilog
/* CI for the ratio of SD's--No Log */
nlcom sqrt([inc2]Male/[inc2]Female)
/* CI for ratio of SD's after Log Transformation. The square root can
be omitted, because log(A^.5)-log(B^.5) = log(A)-log(B)
The t-statistic is apt to be very different from that of the no-log
version above*/
nlcom log([inc2]Male/[inc2]Female)
antilog
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/