Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: correcting skewness of an indep variables
From
David Hoaglin <[email protected]>
To
[email protected]
Subject
Re: st: correcting skewness of an indep variables
Date
Sun, 21 Jul 2013 15:20:32 -0400
Dimitrie,
I understand that natural disasters may be regarded as random events,
but that would not rule out some sort of time trend in the number of
articles. Also, I suppose you have made the amounts of aid comparable
over time.
Skewness in a predictor would not affect the validity of its p-value.
I would not expect that aspect of the data ever to be a "primary
concern."
Setting aside outliers may be all right if those natural disasters are
recognizably special in some way.
David Hoaglin
On Sun, Jul 21, 2013 at 1:22 PM, Mihes, Dimitrie
<[email protected]> wrote:
> David,
>
> With regards to your question, time is not a predictor in my model, as naturally disasters are naturally and randomly triggered. The unit of analysis is, to be more precise, every natural disaster to which the US contributed between 1992- 2004.
>
> Going back to the issue of linear relationship between the predictor and outcome, by regressing amount of aid (logged) on no of articles on each event (count) and then running the command -cprplot no_of_articles, lowess lsopts(bwidth(1))- , both with and without the values of 0, the relationship seemed non-linear, as confirmed by a -ovtest- with a p-value=0.0083. Even so, the bivariate relationship between aid and no. of articles was significant at p<0.001. However, after removing some of the outliers in the predictor, and running the same tests, with and without the values of 0, the relationship became linear, as confirmed by the graph and an -ovtest- , p= 0.9669.
>
> Nevertheless, my primary concern was that the skewness would affect the validity of the p-value in the full regression model, as the "no of articles" is almost always significant, p<0.001, even when clustering or using robust standard errors, removing outliers as well as values of zero.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/