Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Ocratio gives neither AIC nor BIC
From
Marcos Almeida <[email protected]>
To
[email protected]
Subject
Re: st: Ocratio gives neither AIC nor BIC
Date
Sun, 29 Dec 2013 17:33:12 -0200
Dear David and dear Statalisters,
Thank you for you your reply and for the tips.
Regarding the details of the data set of around 1800 adults, the
independent variables are age_group according to the age spam (5
groups: 40-49; 50-59, 60-69; 70-79; > 80 years), gender, diabetes,
hypertension and dyslipidemia (all four binomial variables) and the
continuous bmi (body mass index).
The dependent variable relates to heart rate variability software in
time domain analysis. It is called pnn50. Pnn50 is a result of
(validated) computerized measurements done over a 24 hour
electrocardiogram and conveys the parasympathetic flow: the higher the
values, the higher the parasympathetic flow.
In this dataset, the mean of pnn50 is = 9; the SD = 15; min = 0.01; max = 213).
Pnn50 lacked a normality patter even after log-transformation
(Shapiro-Francia test). There was also positive skewness..
The scatterplot of age versus pnn50 showed a trend of increasing of
pnn50 with ageing, as expected. But there was too much variance and
many extreme values (as we see, from 0.01 to 213).
Also, the residuals lacked a normal distribution, even after
log-transforming. By the way, I choose log-transforming after
analyzing the "ladder" command, that is, it seemed log-transforming
was the best option, graphically and numerically, if compared to the
other transformations.
Regarding the glm, since the dependent variable was continuous, I
choose the Gaussian family. Indeed, as you well noticed, the link must
be log , as I truly commanded: I just misstyped “logit” in the last
message.
This was the command:
. glm pnn50 gender bmi diabetes hyper dyslipidemia group,
family(gaussian) link(log)
Aftwards, I demanded the residuals (after “predict”), according to the
histogram. For example:
. twoway (scatter resid bmi)
That gave many atypical observations, I mean, a significant percentage
of then was not concentrated. The scatterplot with the other
independent variables were not much promising, due to the same
problem: too much variation.
The issues regarding the violation of assumptions in the model are the
same I wrote in the first message (they are described below).
I gather my awkward mistyping (logit, when I wanted to say log) in the
message got confusing. But now I hope I made it clearer.
That said, I gathered the extreme variance of the dependent variable
could be somewhat “adjusted” if we categorize it according to the
quartiles. So did I.
That was the reason I tried to test multinomial, ordered and
generalized ordered models. And I used the AICs and BICs so as to
spot the best model. The generalized ordered model with the partial
proportional odds fitted best.
But then, when I employed the user-written ocratio (for
continuation-ratio models), I couldn’t get neiher AIC nor BIC. And I
really don’t know the reason. What is more, I wish to get them so as
to be able to compare with the other models.
The troubleshooting was: after ocratio, Stata 13 didn't present the
AIC and BIC. I know it can be presented after "aic". Also, I typed
"estat ic", and even the user-writter fitstat. But I got nor AIC
neither BIC, but only the red message: "estimates not found". That's
what puzzled me.
That said, may this kind of model be considered inappropriate for the
task, please let me know. Indeed, it's the first time I delve into
polytomous models.
I thank you, David, for your considerations and I still hope to get
further advice and suggestions from you and the fellows from
Statalist!
Best regards,
Marcos Almeida
Associate Professor of Medicine
UNIT
Brazil
>Date: Sat, 28 Dec 2013 21:14:43 -0500
>From: David Hoaglin <[email protected]>
>Subject: Re: st: Ocratio gives neither AIC nor BIC
> Dear Marcos,
>I have not seen any replies so far to your posting. Perhaps I can
>make a start, though I have more questions than answers.
>I did not see any information on your dependent variable, other than
>that it is continuous and very positively skewed. It would help me
>(and it may help others) to know more about the nature of that
>variable.
Skewness of the dependent variable, considered alone, does not
necessarily prevent you from using it in a regression model. It is
more important to examine (e.g., in scatterplots) the relation between
the dependent variable and each of the predictor variables. Some of
those relations may account for the apparent skewness. After fitting
an initial model, you should examine the residuals. The various plots
may suggest that you transform the dependent variable (e.g., to a
logarithmic scale).
A GLM is a common alternative to using a transformation, but I don't
understand why you chose the logit link. With a continuous dependent
variable, I would have expected a log link.
I will stop here. The rest of your analysis goes in a direction whose
logic I do not understand.
David Hoaglin
On Fri, Dec 27, 2013 at 5:02 PM, Marcos Almeida <[email protected]> wrote:
>> Hello, Statalisters,
>> I have a dataset whose continuous dependent variable is very
> positively skewed. I decided to eschew regression analysis, even after
> log-transforming it, because I gather a generalized linear model gives
> better adjustments for this "situation".
>> After testing with glm family (gaussian) link (logit), it still
> presented signs of needing a better-fit model. Then, I took the
> decision to create a new variable, that is, I transformed the
> dependent variable in quartiles. After that, I got 4 categories(up to
> the 25th percentile; from the 25th to the median; from the median to
> the 75th percentile; from the 75th up to the highest value).
>> And now comes my question.
>> I compared several models: the multinomial logit (mlogit) the ordinal
> logit(ologit), the generalized ordered model (gologit2 user-written
> command) and finally, the gologit2 with proportional-odds(autofit
> option)pleased me most. I mean it because the multinomial logit didn't
> comply with the IIA assumption (the much debated Hausman test), the
> ologit didn't comply with the proportional-odds assumption and the
> gologit2 with the autofit option dutifully adjusted for the partial
> proportional-odds.
>> After doing each modelling, I calculated the AIC and BIC without any
> trouble. However, just for a last try, I decided to perform a
> continuation-ratio model. At first, I found it a reasonable option,
> theoretically speaking.
>> After installing the user-written ocratio, I did the estimations and
> all seemed to be just fine. But I noticed something wrong: the report
> didn't show the AIC statistic. That came as I surprise.I really don't
> understand what might have happened. I did (almost) everything, I
> mean, in terms of commands I knew:estat ic, for example. Also, I
> installed use-written commands, like fitstat, unfortunately of no
> avail. By the way, I carefully read a book (Generalized Linear Models
> and Extensions, Hardin and Hilbe, StataPress, page 343), and,lo and
> behold, there ocratio gave the AIC after typing "aic". With much hope,
> I typed this command, again of no avail. Sadly
> enough, all I got was the message in red: estimates not found.
>> I checked the FAQs on the matter as well as potential queries on the
> Web, but nothing was found related to this. And I'm still perplexed.
>> My software is a weekly updated Stata13 IC. I wonder if you could give
> me some advice.
>> Finally, I heartly thank you for your consideration.
>> Best regards,
>> Marcos Almeida,
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/