Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Maria E. Montez Rath" <maria.rath@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | st: svylogitgof after logistic using subpop option |
Date | Sun, 6 Mar 2011 16:21:12 -0800 |
Hi! I'm using the NIS which follows a complex survey design to obtain the odds of dying for patients with acute kidney disease in a subpopulation. I'll be using 10 years of data which will make the dataset too big. Since I'm interested in a subpopulation I found out that in order to obtain correct standard errors, my dataset only needs to include the subpopulation plus one record for each PSU that would be dropped when creating the subpopulation dataset. This way, I can still use the svy, subpop(): logistic command because Stata can still compute the total number of hospitals sampled. While testing this theory I found that Stata will give me the same results whether I use the entire sample or my augmented subpopulation data but the goodness of fit test using svylogitgof is very different. I also found that svylogitgof is reporting the number of observations in the total sample and not the subpopulation number of observations. Does this have any implication in the actual test? Below you can see the results from my test. First, is the output using the entire dataset and second using my augmented subpopulation dataset. The output from svy logistic is identical with the only difference being the population size reported which is wrong on my augmented dataset as it should be. However, all the results (ORs, SE, t,...) are equal. The output for the goodness of fit test is very different. As you can see, the number of observations reported are the total number of observations in the data even though I'm doing a subpopulation analysis. We see that the number of groups used is different and using the entire dataset the test rejects the hypothesis of model is a good fit, but using my augmented dataset we do not reject the hypothesis that the model is a good fit. But they are the same model, so how can I have such different analysis? I have read the paper on the test and I don't see where the number of observations come into play. Also, in the paper it was assumed that the number of groups used was 10 (generating deciles of risk). In the new svylogitgof update, this was changed to vary. Can anyone help me? I don't know what to make of these results and I surely cannot use them as I don't think the test applied to the entire dataset is also correct. Thank you, Maria Using ALL data: . svy, subpop(pah): logistic dead i.diabetes i.aki2 i.mec_vent i.fem Survey: Logistic regression Number of strata = 58 Number of obs = 8104197 Number of PSUs = 1027 Population size = 39615465 Subpop. no. of obs = 1971 Subpop. size = 9686.4649 Design df = 969 F( 4, 966) = 27.18 Prob > F = 0.0000 ------------------------------------------------------------------------------- | Linearized dead | Odds Ratio Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.aki2 | 3.511044 .8979891 4.91 0.000 2.125493 5.799799 1.diabetes | .4748568 .1459044 -2.42 0.016 .2598337 .8678202 1.mec_vent | 9.576589 2.515918 8.60 0.000 5.718832 16.03668 1.fem | 1.88229 .5211665 2.28 0.023 1.093231 3.240866 ------------------------------------------------------------------------------ Note: 2 strata omitted because they contain no subpopulation members. . svylogitgof Number of observations = 8104197 F-adjusted test statistic = F(3,967) = 7865.271 Prob > F = 0.000 Using AUGMENTED subpopulation data: . svy, subpop(pah): logistic died i.aki2 i.diabetes i.mec_vent i.fem Survey: Logistic regression Number of strata = 58 Number of obs = 2565 Number of PSUs = 1027 Population size = 12682.585 Subpop. no. of obs = 1971 Subpop. size = 9686.4649 Design df = 969 F( 4, 966) = 27.18 Prob > F = 0.0000 ------------------------------------------------------------------------------ | Linearized died | Odds Ratio Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.aki2 | 3.511044 .8979891 4.91 0.000 2.125493 5.799799 1.diabetes | .4748568 .1459044 -2.42 0.016 .2598337 .8678202 1.mec_vent | 9.576589 2.515918 8.60 0.000 5.718832 16.03668 1.female | 1.88229 .5211665 2.28 0.023 1.093231 3.240866 ------------------------------------------------------------------------------ Note: 2 strata omitted because they contain no subpopulation members. . svylogitgof Number of observations = 2565 F-adjusted test statistic = F(5,965) = 1.096 Prob > F = 0.361 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/