|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Svy regress using subpop and incorrect number of obs
From |
[email protected] (Jeff Pitblado, StataCorp LP) |
To |
[email protected] |
Subject |
Re: st: Svy regress using subpop and incorrect number of obs |
Date |
Tue, 22 Jul 2008 10:31:47 -0500 |
Heather Ridolfo <[email protected]> is using -svy: regress- with the -subpop()-
option, and noticed that the reported sample size is smaller than the number
of observations in here dataset:
> I am using survey regression with the subpop command, see below.
>
> svyset psuscid [pweight = gswgt1], strata(region)
> svy, subpop(allsp): regress esteem white3 hisp child_sex
>
> However the number of observations in the output does not match the
> total number of cases in my dataset. I have 18924 cases in the original
> dataset, here the number of observations is only 18768.
> However, the subpopulation number of observations does appear correct
> (10224).
>
> Survey: Linear regression
>
> Number of strata = 4 Number of obs = 18768
> Number of PSUs = 132 Population size = 22000302
> Subpop. no. of obs = 10224
> Subpop. size = 12582072
> Design df = 128
> F( 3, 126) = 47.41
> Prob > F = 0.0000
> R-squared = 0.0331
>
>
> In another output not only are the number of observations incorrect
> (should be 10244) but the PSUs are also lower.
>
> svyset psuscid [pweight = gswgt1], strata(region)
> svy, subpop(if bhsp == 1): regress esteem racebh
>
>
> Number of strata = 4 Number of obs = 6973
> Number of PSUs = 126 Population size = 5955176.3
> Subpop. no. of obs = 4017
> Subpop. size = 3346350.8
> Design df = 122
> F( 1, 122) = 33.36
> Prob > F = 0.0000
> R-squared = 0.0253
>
> There are missing cases in some of the variables in my regression. Is
> stata dropping these cases from the number of original observations? I
> do specify in my subpop command to not include cases with missing data.
> If STATA is dropping observations from my original dataset due to
> incomplete data, is the survey design information from these
> observations retained in the calculation of the standard errors?
> Every example I have found of stata output using survey regression with
> the subpop command the number of observations matches the total number
> of cases in the dataset.
Heather should check that her Stata is fully up-to-date. On 02apr2008, we
posted an ado-file update that fixed a problem similar to what Heather is
describing above. Here is the corresponding entry from -help whatsnew-:
5. svy's linearized variance estimator was marking out observations that
had missing values in the independent variables for observations outside
the subpopulation. This affects the estimated variance values when the
primary sampling units were the individual observations and could decrease
the design degrees of freedom. Both of these effects are very slight and
inversely related to the sample size. This has been fixed.
Note that, prior to this update, entire PSU's can be dropped if each
observation within the PSU contains a missing value in one of the variables in
the model fit. With an updated Stata, only observations containing missing
values within the subpop are dropped.
Here is a simple experiment, using the auto data:
. sysuse auto
. gen sub = for & !missing(rep78)
. tab rep78 sub
Repair |
Record | sub
1978 | 0 1 | Total
-----------+----------------------+----------
1 | 2 0 | 2
2 | 8 0 | 8
3 | 27 3 | 30
4 | 9 9 | 18
5 | 2 9 | 11
. | 5 0 | 5
-----------+----------------------+----------
Total | 53 21 | 74
. svyset _n
. svy, subpop(sub): regress mpg rep78
(running regress on estimation sample)
Survey: Linear regression
Number of strata = 1 Number of obs = 74
Number of PSUs = 74 Population size = 74
Subpop. no. of obs = 21
Subpop. size = 21
Design df = 73
F( 1, 73) = 0.60
Prob > F = 0.4409
R-squared = 0.0285
------------------------------------------------------------------------------
| Linearized
mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rep78 | 1.486111 1.917962 0.77 0.441 -2.336381 5.308604
_cons | 18.91667 7.106727 2.66 0.010 4.75298 33.08035
------------------------------------------------------------------------------
--Jeff
[email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/