Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: why results after expanding data for probability weight is only close to the svy estimation
From
Amanda Fu <[email protected]>
To
[email protected]
Subject
Re: st: why results after expanding data for probability weight is only close to the svy estimation
Date
Sun, 26 Dec 2010 16:36:38 -0500
Dear Mr. Waldo,
Thank you very much for helping me with my question!
I am sorry that I did not make my question very clear. To my surprise,
you have given me the answer that I wanted to find out. Yes, I meant
to ask why (2) is diffrent from (1) and if I rescale the sweight with
a constant, like 10,100, etc. if the estimation results (just the
coefficients) will be different.
Thank you for reminding me the variance is a different story from the
coefficients before and after expanding.
Thank you for your help! I appreciate it.
Amanda
On Sat, Dec 25, 2010 at 1:57 PM, Amanda Fu <[email protected]> wrote:
> Hi all,
>
> I was trying to figure out how to expand the data set according to
> the sampling weight on a survey to get the same result of using svy
> estimation and without svy estimation. Let's use OLS as an example.
> In the following data set, first I expand the data simply according to
> sampling weight. The result (estimated coefficients) are close but not
> equal to the survey estimation. Then I expand the data set by sampling
> weight*100, the results are more closer. My question is, is the
> difference between the svy estimation and regular estimation on
> expanded data set caused by the change of sample size?
>
> Thank you for your time!
>
> Sincerely,
> Amanda
>
> ---------------------------------start here----------------
> . clear all
> ***********READ DATA*************************
> . input id prob sweight y x
> id prob sweight y x
> 1. 1 0.2 5 79 1200
> 2. 2 0.2 5 10 2700
> 3. 3 0.3 3.33 15 2500
> 4. 4 0.1 10 21 2800
> 5. 5 0.2 5 16 2480
> 6. end
>
> . lab var prob "selection probability"
> . lab var sweight "sampling weight,=1/prob"
> . svyset [pw=sweight]
> pweight: sweight
> ***********************************************************************************
> * SVY ESTIMATION RESULT (1)
> *
> ************************************************************************************
> VCE: linearized
> Single unit: missing
> Strata 1: <one>
> SU 1: <observations>
> FPC 1: <zero>
>
> . svy: reg y x
> (running regress on estimation sample)
> Survey: Linear regression
> Number of strata = 1 Number of obs
> = 5
> Number of PSUs = 5 Population size = 28.33
>
> Design df = 4
> F(
> 1, 4) = 64.60
>
> Prob > F = 0.0013
>
> R-squared = 0.8968
> ------------------------------------------------------------------------------
> | Linearized
> y | Coef. Std. Err. t P>|t| [95%
> Conf. Interval]
> -------------+----------------------------------------------------------------
> x | -.0397343 .0049435 -8.04 0.001
> -.0534598 -.0260088
> _cons | 123.3965 9.286147 13.29 0.000 97.61402
> 149.179
> ------------------------------------------------------------------------------
> **************************EXPAND DATA SET BY SWEIGHT*************
> . expand ceil(sweight)
> (24 observations created)
> ****** I tried to use .expand round(sweight). the difference between
> the result with (1) is larger.
> .list
> id prob sweight y x
> 1 .2 5 79 1200
> 1 .2 5 79 1200
> 1 .2 5 79 1200
> 1 .2 5 79 1200
> 1 .2 5 79 1200
> 2 .2 5 10 2700
> 2 .2 5 10 2700
> 2 .2 5 10 2700
> 2 .2 5 10 2700
> 2 .2 5 10 2700
> 3 .3 3.33 15 2500
> 3 .3 3.33 15 2500
> 3 .3 3.33 15 2500
> 3 .3 3.33 15 2500
> 4 .1 10 21 2800
> 4 .1 10 21 2800
> 4 .1 10 21 2800
> 4 .1 10 21 2800
> 4 .1 10 21 2800
> 4 .1 10 21 2800
> 4 .1 10 21 2800
> 4 .1 10 21 2800
> 4 .1 10 21 2800
> 4 .1 10 21 2800
> 5 .2 5 16 2480
> 5 .2 5 16 2480
> 5 .2 5 16 2480
> 5 .2 5 16 2480
> 5 .2 5 16 2480
> *********************************************************************************************
> * EXPAND DATA SET BY SWEIGHT RESULT (2)
> *
> *********************************************************************************************
> . reg y x
> Source | SS df MS
> Number of obs = 29
> -------------+------------------------------
> F( 1, 27) = 228.33
> Model | 14756.097 1 14756.097 Prob
>> F = 0.0000
> Residual | 1744.9375 27 64.6273149 R-squared
> = 0.8943
> -------------+------------------------------
> Adj R-squared = 0.8903
> Total | 16501.0345 28 589.32266 Root
> MSE = 8.0391
> ------------------------------------------------------------------------------
> y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
> -------------+----------------------------------------------------------------
> x | -.0397927 .0026335 -15.11 0.000
> -.0451961 -.0343893
> _cons | 123.3279 6.520714 18.91 0.000
> 109.9485 136.7073
> ------------------------------------------------------------------------------
> *********************************************************************************************
> * EXPAND DATA SET BY SWEIGHT*100 RESULT (3)
> *
> *********************************************************************************************
> expand
> . expand ceil(sweight*100)
> (2828 observations created)
>
> . reg y x
> Source SS df MS Number of obs = 2833
> F( 1,
> 2831) =24613.57
> Model 1470410.91 1 1470410.91 Prob > F = 0.0000
> Residual 169123.508 2831 59.7398474 R-squared = 0.8968
>
> Adj R-squared = 0.8968
> Total 1639534.42 2832 578.931644 Root MSE = 7.7292
> -------------------------------------------------------------------
> y Coef. Std. Err. t P>t [95% Conf. Interval]
> --------------------------------------------------------------------
> x -.0397343 .0002533 -156.89 0.000 -.0402309 -.0392377
> _cons 123.3965 .6269718 196.81 0.000 122.1671 124.6259
> ----------------------------------------end-------------------------------------------------------------------
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/