Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: predicted proportions greater than 1 using -adjust- after GLM family(binomial) link (logit)


From   Austin Nichols <[email protected]>
To   [email protected]
Subject   Re: st: predicted proportions greater than 1 using -adjust- after GLM family(binomial) link (logit)
Date   Tue, 24 Feb 2009 22:04:12 -0500

I mean:

sysuse nlsw88, clear
egen m=mean(married), by(race)
char _dta[omit] "prevalent"
qui xi: glm m i.age i.grade, family(bin) link (logit) r
unab v: _Igrade*
loc s
foreach i of loc v {
 loc s "`s' `i'=0"
 }
preserve
qui adjust `s' if e(sample), by(age) se replace
g p=invlogit(xb)
g ub=invlogit(xb+1.96*stdp*abs(1/xb-1/(1-xb)))
g lb=invlogit(xb-1.96*stdp*abs(1/xb-1/(1-xb)))
di as res "Adjusted for grade; predictions at grade=12"
li age p lb ub, noo sep(0)
restore
di as res "Unadjusted means by age"
mean m, over(age) nohe

and I would add an alternative calculation predicting over the whole
sample replacing age with each observed value of age in turn, but it
would make no difference in this silly example using faked data.

In any case, better to go down the demand system route:
http://www.stata-journal.com/sjpdf.html?articlenum=st0029
and see also
http://www.stata.com/meeting/snasug08/nelson_snasug08.pdf

On Tue, Feb 24, 2009 at 9:51 PM, Austin Nichols <[email protected]> wrote:
> Gina Bilenkij <[email protected]>:
> You should read
> http://www.stata-journal.com/sjpdf.html?articlenum=st0029
> for a more standard method of modeling expenditure shares.
>
> That said, it's not clear to me what you hope to do with your -adjust-
> call, but I am fairly certain what you have shown is not the actual
> code you are using. You should show an example of what you hope to
> achieve using data that is accessible to all (start with -sysuse- or
> -webuse-) and code that runs without error. One problem that is
> evident without data is that the exponentiated linear prediction e(xb)
> is not the predicted proportion with a logit link; invlogit(xb) is.
> To get predicted proportions and a rough idea of CI from -adjust-
> after -xi: glm- you could do something like:
>
> sysuse nlsw88, clear
> egen m=mean(married), by(race)
> char _dta[omit] "prevalent"
> qui xi: glm m i.age i.grade, family(bin) link (logit) r
> unab v: _Igrade*
> loc s
> foreach i of loc v {
>  loc s "`s' `i'=0"
>  }
> forv q=1/1 {
> di as res "Unadjusted means by age"
> mean m, over(age) nohe
> preserve
> qui adjust `s' if e(sample), by(age) se replace
> g p=invlogit(xb)
> g ub=invlogit(xb+1.96*stdp*abs(1/xb-1/(1-xb)))
> g lb=invlogit(xb-1.96*stdp*abs(1/xb-1/(1-xb)))
> di as res "Adjusted for grade; predictions at grade=12"
> li age p lb ub, noo sep(0)
> restore
>
> [You should also give the URL for an old thread if it is relevant--but
> I don't see how the threads at
> http://www.stata.com/statalist/archive/2007-08/msg00098.html
> http://www.stata.com/statalist/archive/2007-08/msg00108.html
> are relevant here]
>
> On Tue, Feb 24, 2009 at 7:34 PM, Gina Bilenkij
> <[email protected]> wrote:
>> This is my first posting to statalist- will do my best to be clear. I am a public health PhD student, so I am still learning the basics of statistics and Stata.
>>
>> I am running an analysis of some expenditure data using several (14) dependant variables, which are the expenditure on different items as a proportion of total expenditure (range 0-1). I am interested in the association between income and the patterns of expenditure for these items.
>>
>> I am using a -GLM family(binomial) link (logit)- (As suggested but the FAQ "How do you fit a model when the dependent variable is a proportion?" and the Stata tip 63 by Baum 2008)
>>
>> The dependant variables are of 2 different types (individual items and aggregates), so are
>> A1, A2, A3, A4, A5, A_total
>> B1, B2, B3, B4, B5, B6, B7, B_total
>>
>> The independent variable of interest is income (quintiles)- and I am adjusting for 4 other categorical covariates
>>
>> I am running a glm to look at between group differences, then hoping to convert the coefficients back to adjusted proportions to aid interpretation.
>>
>> The code I am using is:
>>
>>        xi: glm A1 i.income i.x2 i.x3 i.x4 i.x5, family(binomial) link (logit) robust
>>
>>        adjust if x2==ref & x3==ref & x4==ref & x5==ref, by (income) exp ci
>>
>>
>> Everything is working well, except that the results for the aggregate proportion (A_total) when adjusted is larger than 1 - ranging between 1.09 and 1.24 for the quintiles. The mean of this variable is by far the largest and prior to running the GLM is around 0.58. The other aggregate (B_total) is about 0.43 when adjusted (mean prior to running GLM about 0.28). The adjusted proportions for all of the individual proportions (A1, A2 etc) seem to be fairly close to their (quite low) pre-GLM means.
>>
>> I have plotted the deviance residuals and they appear to be close to normally distributed around 0- so from what I have read, my model fit should be OK.
>>
>> Is there something strange going on that I am missing, or is it reasonable for adjusted proportions to go above 1 as the data is extrapolated from the GLM model?
>>
>> I have searched statalist and found the thread "st:Binomial regression" in Aug 2007 that seems to be on a similar topic- but the glm link functions discussed are different, and with my elementary knowledge it is all a little over my head.
>>
>> Any help would be appreciated.
>>
>> Thanks,
>> Gina Bilenkij
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index