Arnold,
Arnold,
I cannot tell why the SE's are so different. The n's and outcome means
for the subpopulation size total for the "smkskul" are identical in
all three analyses, so that is not a problem. I do see some issues.
1. In SAS. the variables in the CLUSTER statement should identify
only the PSUs, the 1st stage units. This should, however, lead to
smaller, rather than larger standard errors.
2. Stata thinks that there are 46 strata in the entire sample, but SAS
thinks that there are 27. SUDAAN and SAS differ by about 1,000 in
their report of the sample size for the original population.
3. The subpopulation seems confined to one PSU- one value of "skulid"
- and one stratum,, but Stata says that there arere nine PSUs with
observations in the subpopulation. Perhaps Stata considers the
second stage units,, class rooms, as PSU's in this case, and the
othefrs do not. If so, this could account for some of the discrepancy:
between-classroom variation could be be small, if there are 16
individuals in nine classrooms.
4. The outcome, according to SUDAAN, is missing for 93% of the
subpopulation sample.
I suggest that you make sure that variables and observations are
identical in the data sets (I notice two different weight variables);
make sure that the cluster, classroom, and stratum counts agree in SAS
and Stata. Rerun your analyses on this outcome and on one with no
missing values and submit your findings to the group with a copy to
Jeff Pitblado at Stata.
Good luck!
Steve
On Tue, Jun 30, 2009 at 3:50 PM, Levinson,
Arnold<[email protected]> wrote:
> Steve,
> Sorry for overlooking the obvious. Here are the commands and output. (I note as usual the wonderful output efficiency of Stata over the others.)
> arnold
> _____________________
> *Stata*
> svyset skulid [pw=w2f2f3], strata(strat) fpc(fpc) || classid
>
> pweight: w2f2f3
> VCE: linearized
> Strata 1: strat
> SU 1: skulid
> FPC 1: fpc
> Strata 2: <one>
> SU 2: classid
> FPC 2: <zero>
>
> . svy, subpop(if year==2008 & skulid==80001): mean smkskul
> (running mean on estimation sample)
>
> Survey: Mean estimation
>
> Number of strata = 1 Number of obs = 131
> Number of PSUs = 9 Population size = 783.698
> Subpop. no. obs = 16
> Subpop. size = 120.542
> Design df = 8
>
> --------------------------------------------------------------
> | Linearized
> | Mean Std. Err. [95% Conf. Interval]
> -------------+------------------------------------------------
> smkskul | .5806258 .014649 .5468452 .6144064
> --------------------------------------------------------------
> Note: 45 strata omitted because they contain no subpopulation members
>
> ___________________
> SAS:
> PROC SURVEYMEANS DATA = ytabstest RATE = FPC;
> VAR SMKSKUL;
> STRATA STRAT;
> CLUSTER SKULID CLASSID;
> WEIGHT SKULWT;
> DOMAIN skulstrat;
> RUN;
>
> The SAS System 08:13 Tuesday, June 30, 2009 315
>
> The SURVEYMEANS Procedure
>
> Data Summary
> Number of Strata 27
> Number of Clusters 1282
> Number of Observations 21212
> Sum of Weights 98864
>
> Statistics
> Std Error
> Variable Label N Mean of Mean 95% CL for Mean
> ャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャ SMKSKUL SMKSKUL 1706 0.488438 0.015833 0.45735470 0.51952078
> ャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャ
>
> Domain Analysis: skulstrat
>
> Std Error
> skulstrat Variable Label N Mean of Mean 95% CL for Mean
> ャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャ 0 SMKSKUL SMKSKUL 1690 0.487015 0.016001 0.45560287 0.51842627
> 1 SMKSKUL SMKSKUL 16 0.580626 0.104178 0.37624423 0.78500743
> ャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャ
>
> The SAS System 08:13 Tuesday, June 30, 2009 316
>
>
> PROC DESCRIPT DATA = ytabstest DESIGN = WOR;
> NEST STRAT SKULID CLASSID / MISSUNIT;
> TOTCNT TOTSAMP _MINUS1_ _MINUS1_;
> VAR SMKSKUL;
> CLASS SMKSKUL;
> WEIGHT SKULWT;
> SUBPOPN skulstrat = 1;
> RUN;
>
> S U D A A N
> Software for the Statistical Analysis of Correlated Data
> Copyright Research Triangle Institute August 2008
> Release 10.0
>
>
> DESIGN SUMMARY: Variances will be computed using the Taylor Linearization Method, Assuming a
> Without Replacement (WOR) Design
> Sample Weight: SKULWT
> Stage 1 Stratification Variable: STRAT
> Stage 1 Population Count Variable: TOTSAMP
> Stage 2 NEST Variable: SKULID (stage type is data dependent)
> Stage 2 Population Count Variable: _MINUS1_
> Stage 3 With Replacement Sampling Variable: CLASSID
> Stage 3 Population Count Variable: _MINUS1_
>
>
> Number of observations read : 20434 Weighted count : 97843
> Observations in subpopulation : 226 Weighted count : 1650
> Denominator degrees of freedom : 128
> Date: 06-30-2009 SUDAAN Page: 1
> Time: 13:38:12 Table: 1
>
> Frequencies and Values for CLASS Variables
> by: SMKSKUL.
>
> ----------------------------------
> SMKSKUL Frequency Value
> ----------------------------------
> Ordered
> Position:
> 1 6 0
> Ordered
> Position:
> 2 10 1
> ----------------------------------
>
>
> Date: 06-30-2009 SUDAAN Page: 2
> Time: 13:38:12 Table: 1
>
> Variance Estimation Method: Taylor Series (WOR)
> For Subpopulation: SKULSTRAT = 1
> by: Variable, SUDAAN Reserved Variable One.
>
> --------------------------------------------------------------------
> | | | SUDAAN Reserved Variable |
> | Variable | | One |
> | | |-----------------------------|
> | | | Total | 1 |
> --------------------------------------------------------------------
> | | | | |
> | SMKSKUL | Sample Size | 16 | 16 |
> | | Weighted Size | 120.54 | 120.54 |
> | | Total | 69.99 | 69.99 |
> | | Lower 95% Limit | | |
> | | Total | -39.85 | -39.85 |
> | | Upper 95% Limit | | |
> | | Total | 179.83 | 179.83 |
> | | Mean | 0.58063 | 0.58063 |
> | | SE Mean | 0.09 | 0.09 |
> | | Lower 95% Limit | | |
> | | Mean | 0.39690 | 0.39690 |
> | | Upper 95% Limit | | |
> | | Mean | 0.76435 | 0.76435 |
> --------------------------------------------------------------------
>
>> On Tue, Jun 30, 2009 at 12:41 PM, Levinson,
>> Arnold<[email protected]> wrote:
>>> Survey analysis experts:
>>> I have data from a stratified two-stage school survey. The first stage sampled schools within strata, the second sampled classrooms within selected schools.
>>>
>>> When estimating variables of interest at the school level, I get hugely different variance estimates running Stata vs. SAS or SUDAAN. Stata's estimates are generally a lot smaller than SAS's or SUDAAN's, and the latter to are similar or identical to each other.
--
Steven Samuels
[email protected]
18 Cantine's Island
Saugerties NY 12477
USA
845-246-0774
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/