Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: why results after expanding data for probability weight is only close to the svy estimation
From
Amanda Fu <[email protected]>
To
[email protected]
Subject
st: why results after expanding data for probability weight is only close to the svy estimation
Date
Sat, 25 Dec 2010 13:57:53 -0500
Hi all,
I was trying to figure out how to expand the data set according to
the sampling weight on a survey to get the same result of using svy
estimation and without svy estimation. Let's use OLS as an example.
In the following data set, first I expand the data simply according to
sampling weight. The result (estimated coefficients) are close but not
equal to the survey estimation. Then I expand the data set by sampling
weight*100, the results are more closer. My question is, is the
difference between the svy estimation and regular estimation on
expanded data set caused by the change of sample size?
Thank you for your time!
Sincerely,
Amanda
---------------------------------start here----------------
. clear all
***********READ DATA*************************
. input id prob sweight y x
id prob sweight y x
1. 1 0.2 5 79 1200
2. 2 0.2 5 10 2700
3. 3 0.3 3.33 15 2500
4. 4 0.1 10 21 2800
5. 5 0.2 5 16 2480
6. end
. lab var prob "selection probability"
. lab var sweight "sampling weight,=1/prob"
. svyset [pw=sweight]
pweight: sweight
***********************************************************************************
* SVY ESTIMATION RESULT (1)
*
************************************************************************************
VCE: linearized
Single unit: missing
Strata 1: <one>
SU 1: <observations>
FPC 1: <zero>
. svy: reg y x
(running regress on estimation sample)
Survey: Linear regression
Number of strata = 1 Number of obs
= 5
Number of PSUs = 5 Population size = 28.33
Design df = 4
F(
1, 4) = 64.60
Prob > F = 0.0013
R-squared = 0.8968
------------------------------------------------------------------------------
| Linearized
y | Coef. Std. Err. t P>|t| [95%
Conf. Interval]
-------------+----------------------------------------------------------------
x | -.0397343 .0049435 -8.04 0.001
-.0534598 -.0260088
_cons | 123.3965 9.286147 13.29 0.000 97.61402
149.179
------------------------------------------------------------------------------
**************************EXPAND DATA SET BY SWEIGHT*************
. expand ceil(sweight)
(24 observations created)
****** I tried to use .expand round(sweight). the difference between
the result with (1) is larger.
.list
id prob sweight y x
1 .2 5 79 1200
1 .2 5 79 1200
1 .2 5 79 1200
1 .2 5 79 1200
1 .2 5 79 1200
2 .2 5 10 2700
2 .2 5 10 2700
2 .2 5 10 2700
2 .2 5 10 2700
2 .2 5 10 2700
3 .3 3.33 15 2500
3 .3 3.33 15 2500
3 .3 3.33 15 2500
3 .3 3.33 15 2500
4 .1 10 21 2800
4 .1 10 21 2800
4 .1 10 21 2800
4 .1 10 21 2800
4 .1 10 21 2800
4 .1 10 21 2800
4 .1 10 21 2800
4 .1 10 21 2800
4 .1 10 21 2800
4 .1 10 21 2800
5 .2 5 16 2480
5 .2 5 16 2480
5 .2 5 16 2480
5 .2 5 16 2480
5 .2 5 16 2480
*********************************************************************************************
* EXPAND DATA SET BY SWEIGHT RESULT (2)
*
*********************************************************************************************
. reg y x
Source | SS df MS
Number of obs = 29
-------------+------------------------------
F( 1, 27) = 228.33
Model | 14756.097 1 14756.097 Prob
> F = 0.0000
Residual | 1744.9375 27 64.6273149 R-squared
= 0.8943
-------------+------------------------------
Adj R-squared = 0.8903
Total | 16501.0345 28 589.32266 Root
MSE = 8.0391
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | -.0397927 .0026335 -15.11 0.000
-.0451961 -.0343893
_cons | 123.3279 6.520714 18.91 0.000
109.9485 136.7073
------------------------------------------------------------------------------
*********************************************************************************************
* EXPAND DATA SET BY SWEIGHT*100 RESULT (3)
*
*********************************************************************************************
expand
. expand ceil(sweight*100)
(2828 observations created)
. reg y x
Source SS df MS Number of obs = 2833
F( 1,
2831) =24613.57
Model 1470410.91 1 1470410.91 Prob > F = 0.0000
Residual 169123.508 2831 59.7398474 R-squared = 0.8968
Adj R-squared = 0.8968
Total 1639534.42 2832 578.931644 Root MSE = 7.7292
-------------------------------------------------------------------
y Coef. Std. Err. t P>t [95% Conf. Interval]
--------------------------------------------------------------------
x -.0397343 .0002533 -156.89 0.000 -.0402309 -.0392377
_cons 123.3965 .6269718 196.81 0.000 122.1671 124.6259
----------------------------------------end-------------------------------------------------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/