Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: -svy- commands with a pps sample vs. a simple random sample
From
Mike Lacy <[email protected]>
To
[email protected]
Subject
st: -svy- commands with a pps sample vs. a simple random sample
Date
Fri, 13 May 2011 15:58:58 -0600
Greetings,
I'm getting standard errors for means and regression coefficients
using the -svy- commands that surprise me enough to make me wonder if
I am using them correctly. What I'm finding is that the SE(mean) and
SE(b) are smaller with a simple random sample than with probability
proportional to size,
even though the pps sample is constructed using a variable
correlated about 0.9 with the outcome of interest. Below, I have
some code with simulated data that shows what I am doing.
Background: I'm simulating data for an electrical utility usage
reduction experiment. I've made the simulated distribution of kwh
usage look like the real distribution. I assume that the percent of
kwh usage saved (savepct) following an experiment with the users is of the
form y = b0 + b1X + b2*sqrt(x), with that being the function of
interested to be estimated.
// Create the simulated data
clear
set obs 25000
local sampleN = 500
set seed 83573
gen kwh = exp(rnormal(6.4, 0.65)) // kwh usage
gen savepct = -0.61 - 0.00014*kwh + 0.14 * sqrt(kwh) // looks realistic to me
replace savepct = savepct + rnormal(0,0.5) // gives r = 0.9 with kwh
// Population regression relationship
gen sqrtk = sqrt(kwh)
regress savepct kwh sqrtk // The true populatioh relationship
//
// Sample the data, pps, and run a regression model
quiet summ kwh, detail
gen pps = `sampleN' * kwh/r(sum) // sampling prob to get pps and n = 500
// User written -gsample- , see -findit gsample-
gsample `sampleN' [aw = pps], gen(picked_pps) wor
gen pwt = 1/pps
svyset _n [pweight = pwt]
svy: mean savepct if picked_pps
svy: regress savepct kwh sqrtk if picked_pps
//
// Repeat analysis with simple random sampling
svyset, clear
gsample `sampleN', gen(picked_psrs) wor
gen psrs = `sampleN'/`=_N' // sampling prob
replace pwt = 1/psrs
svyset _n [pweight = pwt]
svy: mean savepct if picked_psrs
svy: regress savepct kwh sqrtk if picked_psrs
Thanks,
=-=-=-=-=-=-=-=-=-=-=-=-=
Mike Lacy, Assoc. Prof.
Soc. Dept., Colo. State. Univ.
Fort Collins CO 80523 USA
(970)-491-6721
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/