Dear List Members
I am encountering an odd problem when using pweights in regressions in Stata 10. The problem, which I've now come across when using data from two different epidemiological studies (NHANES and the Health and Retirement Study), is that applying pweights seems to alter the number of observations reported as included in the analysis.
An example is pasted below. When I saved this dataset in Stata 9 format and opened it there the problem did not replicate - in fact, all the outputs were the same *except* for the number of observations. Given that this has happened to me in two different datasets and in Stata 10 but not Stata 9 I'm guessing that it's something specific to Stata 10. A colleague has encountered something similar. Has anyone else come across this problem? I'm not too worried about this since the estimates, etc., are the same as in Stata 9 but if anyone else has encountered this I'd be interested to hear about it.
With regards
Iain
Example output:
. reg hrs06cog35 age gender
Source | SS df MS Number of obs = 10478
-------------+------------------------------ F( 2, 10475) = 206.84
Model | 10680.3419 2 5340.17097 Prob > F = 0.0000
Residual | 270441.234 10475 25.8177789 R-squared = 0.0380
-------------+------------------------------ Adj R-squared = 0.0378
Total | 281121.576 10477 26.8322588 Root MSE = 5.0811
------------------------------------------------------------------------------
hrs06cog35 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | -.121624 .0061161 -19.89 0.000 -.1336128 -.1096353
gender | .5013875 .1005095 4.99 0.000 .3043697 .6984053
_cons | 29.77283 .4344999 68.52 0.000 28.92113 30.62454
------------------------------------------------------------------------------
. svyset [pweight=EWGTR]
pweight: EWGTR
VCE: linearized
Single unit: missing
Strata 1: <one>
SU 1: <observations>
FPC 1: <zero>
. svy: reg hrs06cog35 age gender
(running regress on estimation sample)
Survey: Linear regression
Number of strata = 1 Number of obs = 27821
Number of PSUs = 27821 Population size = 15217686
Design df = 27820
F( 2, 27819) = 51.63
Prob > F = 0.0000
R-squared = 0.0203
------------------------------------------------------------------------------
| Linearized
hrs06cog35 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | -.1533668 .0207634 -7.39 0.000 -.194064 -.1126695
gender | .8502326 .1253616 6.78 0.000 .6045176 1.095948
_cons | 32.94872 1.376334 23.94 0.000 30.25104 35.64641
------------------------------------------------------------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/