| |
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: Re: Correction: How to test for equality of variance in data with sampling weights
From |
Steven Samuels <[email protected]> |
To |
[email protected] |
Subject |
st: Re: Correction: How to test for equality of variance in data with sampling weights |
Date |
Fri, 25 May 2007 22:25:44 -0400 |
The previous code below had an error, now corrected.
Yohannes wrote to me privately that he has a household indicator, say
hh_id; that household is the only PSU he can identify in the data
set; and that in a single urban setting, there is no stratum
variable. In that case he would set up his analysis with:
"svyset hh_id [pweight=finalwgt]"
As Stas indicated in HIS recent email, one can compute a SD as a
function of expectations and use either -testnl- or (my choice) -
nlcom- because:
SD(income)= square root of E(income^2)-(E(income))^2
However -nlcom- will produce an erroneous standard error unless the
variable is standardized by subtracting off the mean: inc= inc - mean
(inc) and then squaring : inc2= inc*inc Then E(inc2) = var(income)
and sqrt(E(inc2)) estimates the SD of income
As the SD is apt to have an asymmetric distribution, I suggest that
the Johannes estimate the SE for the log(SD) and then convert back to
the SD scale.
Johannes actually wants to compare SD's in two groups, assumed to be
Male & Female gender here for illustration. In that case, I
recommend that he compute a CI for the ratio, rather then for the
difference, and that he do this on the log scale and then convert
back to the ratio scale.
Below is code that should work. This utilizes the linearization
method. Possibly Johannes might wish to try a jackknife estimate of
the variance-covariance matrix.
Steve
/***************************CODE
FOLLOWS*********************************************/
capture program drop _all
/* First a little program to back transform calculations done on the
log scale after -nlcom- */
program antilog
local lparm el(r(b),1,1)
local se sqrt(el(r(V),1,1))
local bound invttail(e(df_r),.025)*`se' //For 95% CI's
local parm exp(`lparm')
local ll exp(`lparm' - `bound')
local ul exp( `lparm' + `bound')
di "parm =" `parm' " ll = " `ll' " ul = " `ul'
end
/* Get Estimate of the Mean for each Group */
svy: mean income, over(gender)
/* If gender has value labels (e.g. 1=Male 2=Female) use the
following syntax */
gen inc=income-[income]Male if gender==1
replace inc=income-[imcome]Female if gender==2
/* Use this syntax if gender has no value label, but values 1 & 2 as
above */
gen inc=income-[income]1 if gender==1
replace inc=income-[imcome]2 if gender==2
/* Now compute the square term */
gen inc2=inc*inc
svymean: inc2, over(gender) //estimate for inc2 is the estimated
Variance of income
/* Individual SD's. Log Scale */
nlcom .5*log([inc2]Male)
antilog
nlcom .5*log([inc2]Female)
antilog
/* CI for the ratio of SD's--No Log */
nlcom sqrt([inc2]Male/[inc2]Female)
/* CI for ratio of SD's after Log Transformation. Omit 0.5 for ratio
of Variances
The t-statistic is apt to be very different from that of the no-log
version above*/
nlcom 0.5*log([inc2]Male/[inc2]Female)
antilog
/*--------END CODE-----------*/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/