Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: svylogitgof after logistic using subpop option
From
"Maria E. Montez Rath" <[email protected]>
To
[email protected]
Subject
Re: st: svylogitgof after logistic using subpop option
Date
Tue, 8 Mar 2011 16:05:49 -0800
Steve,
I got it now. I suppose that in the goodness of fit test all we need
are the predictions and so it doesn't matter that we are using a
conditional model.
Below are the results for the entire dataset. I don't know what to
make of the p-value=1.0 but that's another story I suppose. Also, a
lot of PSUs got dropped in the test part and so I don't know if we are
really testing the same model (although the estimates are the same).
I get the same results using both data sets.
Thanks for all your help.
Maria
. svy, subpop(if newpah==1): logistic dead i.aki2 i.diabetes i.mec_vent i.fem
(running logistic on estimation sample)
Survey: Logistic regression
Number of strata = 58 Number of obs = 8104197
Number of PSUs = 1027 Population size = 39615465
Subpop. no. of obs = 1971
Subpop. size = 9686.4649
Design df = 969
F( 4, 966) = 27.18
Prob > F = 0.0000
------------------------------------------------------------------------------
| Linearized
dead | Odds Ratio Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.aki2 | 3.511044 .8979891 4.91 0.000 2.125493 5.799799
1.diabetes | .4748568 .1459044 -2.42 0.016 .2598337 .8678202
1.mec_vent | 9.576589 2.515918 8.60 0.000 5.718832 16.03668
1.fem | 1.88229 .5211665 2.28 0.023 1.093231 3.240866
------------------------------------------------------------------------------
Note: 2 strata omitted because they contain no subpopulation members.
. // For the goodness of fit test, run :
. svy: logistic dead i.aki2 i.diabetes i.mec_vent i.fem if newpah==1
(running logistic on estimation sample)
Survey: Logistic regression
Number of strata = 58 Number of obs = 1971
Number of PSUs = 433 Population size = 9686.4649
Design df = 375
F( 0, 375) = .
Prob > F = .
------------------------------------------------------------------------------
| Linearized
dead | Odds Ratio Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.aki2 | 3.511044 . . . . .
1.diabetes | .4748568 . . . . .
1.mec_vent | 9.576589 . . . . .
1.fem | 1.88229 . . . . .
------------------------------------------------------------------------------
Note: missing standard errors because of stratum with single sampling unit.
. estat gof
Logistic model for dead, goodness-of-fit test
F(9,367) = 0.00
Prob > F = 1.0000
On Tue, Mar 8, 2011 at 3:27 PM, Steven Samuels <[email protected]> wrote:
>
>
> Maria-
>
>
>
> You must use the -if- clause in the -svy logistic- statement. -estat gof-, even with an -if- clause, takes its degrees of freedom from the original logistic regression with the -subpop- statement.
>
> ***************************
> // For standard errors and tests, run:
> svy, subpop(if newpah==1): logistic dead i.aki2 i.diabetes i.mec_vent i.fem
> // For the goodness of fit test, run :
> svy: logistic dead i.aki2 i.diabetes i.mec_vent i.fem if newpah==1
> estat gof
> // estat gof if newpah==1 also works
> ***************************
>
> Note: you seem to have switched subpopulations in your code, with "subpop(pah)" in the -svy: logistic- statement and and "if newpah==1" in the -estat gof- statement. This might have led to unexpected results.
>
>
> Steve
>
> Steven J. Samuels
> Consulting Statistician
> 18 Cantine's Island
> Saugerties, NY 12477 USA
> Voice: 845-246-0774
> Fax: 206-202-4783
> [email protected]
>
>
>
>
> On Mar 8, 2011, at 4:45 PM, Maria E. Montez Rath wrote:
>
> Steve,
>
> thanks for pointing me to -estat gof-.
>
> I just found out that the -estat- Stata manual had been updated and
> now includes the goodness of fit test for binary data. I believe that
> -estat gof- is reporting the F-adjusted mean residual test according
> to Archer and Lemeshow (2006).
>
> Reference
> Archer, K. J., and S. Lemeshow. 2006. Goodness-of-fit test for a
> logistic regression model fitted using survey sample data. Stata
> Journal 6: 97–105.
>
> But I still have a problem. I have 10 years of data and so I created a
> smaller dataset that includes my subpopulation augmented by one record
> for each PSU dropped when selecting the subpopulation. In theory this
> should work because the problem with selecting the subpopulation
> directly and doing a conditional analysis is that there is no way of
> the program to know how many PSUs were sampled. By augmenting my
> dataset with the PSUs dropped Stata can still compute n (total number
> of PSUs sampled). I tested that this would work by comparing the
> results from -svy: logistic- with -subpop()- option using 1) the
> complete one year of data and 2) my augmented data for that same year.
>
> The results from -svy: logistic- are identical using both methods
> (Point estimates and SEs are equal) but the results from -estat gof-
> are very different where using the entire data the test indicates a
> lack of fit while using my augmented data the test indicates good fit.
>
> So, I'm still wondering how does -estat gof- uses the results from
> -svy: logistic- with the subpopulation option.
>
> Thank you,
>
> Maria
>
> Using ALL data:
>
> . use pah08
> . svy, subpop(pah): logistic dead i.aki2 i.diabetes i.mec_vent i.fem
> . estat gof if newpah==1
>
> Logistic model for dead, goodness-of-fit test
>
> F(9,961) = 3126.59
> Prob > F = 0.0000
>
> Using AUGMENTED data:
>
> . use pahsubpop08, clear
> . svy, subpop(pah): logistic died i.aki2 i.diabetes i.mec_vent i.fem
> . estat gof
> . estat gof
>
> Logistic model for died, goodness-of-fit test
>
> F(9,961) = 0.66
> Prob > F = 0.7500
>
> On Sun, Mar 6, 2011 at 5:41 PM, Steven Samuels <[email protected]> wrote:
>>
>> -
>>
>> -svylogitgof- is not official command. The Statalist FAQ request that you identify non-official commands as such and say where you got them. -svylogit- is not "subpopulation-aware". To use use it for a subpopulation, you will have to first run of -svy logistic- with an -if- clause, not the subpop() option.
>>
>>
>> Steve
>> [email protected]
>>
>> On Mar 6, 2011, at 7:21 PM, Maria E. Montez Rath wrote:
>>
>> Hi!
>>
>> I'm using the NIS which follows a complex survey design to obtain the
>> odds of dying for patients with acute kidney disease in a
>> subpopulation. I'll be using 10 years of data which will make the
>> dataset too big. Since I'm interested in a subpopulation I found out
>> that in order to obtain correct standard errors, my dataset only needs
>> to include the subpopulation plus one record for each PSU that would
>> be dropped when creating the subpopulation dataset. This way, I can
>> still use the svy, subpop(): logistic command because Stata can still
>> compute the total number of hospitals sampled.
>>
>> While testing this theory I found that Stata will give me the same
>> results whether I use the entire sample or my augmented subpopulation
>> data but the goodness of fit test using svylogitgof is very different.
>> I also found that svylogitgof is reporting the number of observations
>> in the total sample and not the subpopulation number of observations.
>> Does this have any implication in the actual test?
>>
>> Below you can see the results from my test. First, is the output using
>> the entire dataset and second using my augmented subpopulation
>> dataset.
>>
>> The output from svy logistic is identical with the only difference
>> being the population size reported which is wrong on my augmented
>> dataset as it should be. However, all the results (ORs, SE, t,...) are
>> equal.
>>
>> The output for the goodness of fit test is very different. As you can
>> see, the number of observations reported are the total number of
>> observations in the data even though I'm doing a subpopulation
>> analysis. We see that the number of groups used is different and using
>> the entire dataset the test rejects the hypothesis of model is a good
>> fit, but using my augmented dataset we do not reject the hypothesis
>> that the model is a good fit. But they are the same model, so how can
>> I have such different analysis?
>>
>> I have read the paper on the test and I don't see where the number of
>> observations come into play. Also, in the paper it was assumed that
>> the number of groups used was 10 (generating deciles of risk). In the
>> new svylogitgof update, this was changed to vary.
>>
>> Can anyone help me? I don't know what to make of these results and I
>> surely cannot use them as I don't think the test applied to the entire
>> dataset is also correct.
>>
>> Thank you,
>>
>> Maria
>>
>> Using ALL data:
>>
>> . svy, subpop(pah): logistic dead i.diabetes i.aki2 i.mec_vent i.fem
>> Survey: Logistic regression
>>
>> Number of strata = 58 Number of obs = 8104197
>> Number of PSUs = 1027 Population size = 39615465
>> Subpop. no.
>> of obs = 1971
>> Subpop. size
>> = 9686.4649
>> Design df
>> = 969
>> F( 4,
>> 966) = 27.18
>> Prob > F
>> = 0.0000
>>
>> -------------------------------------------------------------------------------
>> | Linearized
>> dead | Odds Ratio Std. Err. t P>|t| [95% Conf. Interval]
>> -------------+----------------------------------------------------------------
>> 1.aki2 | 3.511044 .8979891 4.91 0.000 2.125493 5.799799
>> 1.diabetes | .4748568 .1459044 -2.42 0.016 .2598337 .8678202
>> 1.mec_vent | 9.576589 2.515918 8.60 0.000 5.718832 16.03668
>> 1.fem | 1.88229 .5211665 2.28 0.023 1.093231 3.240866
>> ------------------------------------------------------------------------------
>> Note: 2 strata omitted because they contain no subpopulation members.
>>
>> . svylogitgof
>> Number of observations = 8104197
>> F-adjusted test statistic = F(3,967) = 7865.271
>> Prob > F = 0.000
>>
>>
>> Using AUGMENTED subpopulation data:
>>
>> . svy, subpop(pah): logistic died i.aki2 i.diabetes i.mec_vent i.fem
>> Survey: Logistic regression
>>
>> Number of strata = 58 Number of obs =
>> 2565
>> Number of PSUs = 1027 Population size = 12682.585
>> Subpop. no.
>> of obs = 1971
>> Subpop. size
>> = 9686.4649
>> Design df
>> = 969
>> F( 4,
>> 966) = 27.18
>> Prob > F
>> = 0.0000
>>
>> ------------------------------------------------------------------------------
>> | Linearized
>> died | Odds Ratio Std. Err. t P>|t| [95%
>> Conf. Interval]
>> -------------+----------------------------------------------------------------
>> 1.aki2 | 3.511044 .8979891 4.91 0.000 2.125493 5.799799
>> 1.diabetes | .4748568 .1459044 -2.42 0.016 .2598337 .8678202
>> 1.mec_vent | 9.576589 2.515918 8.60 0.000 5.718832 16.03668
>> 1.female | 1.88229 .5211665 2.28 0.023 1.093231 3.240866
>> ------------------------------------------------------------------------------
>> Note: 2 strata omitted because they contain no subpopulation members.
>>
>> . svylogitgof
>> Number of observations = 2565
>> F-adjusted test statistic = F(5,965) = 1.096
>> Prob > F = 0.361
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>>
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/