Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Re: st: about residuals and coefficients
From
Yuval Arbel <[email protected]>
To
statalist <[email protected]>
Subject
Re: Re: st: about residuals and coefficients
Date
Thu, 5 Sep 2013 22:54:06 -0700
David,
I see as part of our job to educate our students (and the general
public) - how to interpret correctly statistical information (with
some caution). There are many common mistaken beliefs in the general
public regarding statistical information, which leads to wrong
conclusions.
A famous example that comes to my mind (from a book written by the
mathematician Haim Shapira) - is a statement made in the O.J. Simpson
trial ("among all persons who hit their wives, only a fraction
actually murdered them" - where the correct question should be -
"among all the murdered women, what is the percentage of cases where
the murderer was the hitting husband?")
I treat statistical inference in the same way Winston ChurcilI treats
Democracy (it is a bad system - but the best among all the existing
alternatives). In my opinion - one should view regression analysis as
a very rough approximation. Recall, that regression analysis "works"
under very very long series of assumptions, where collinearity is only
one problem. We did not start talking about functional form and model
specification, simultaneity etc.
Yet, from all the alternatives, when I make comparisons across groups
- I'd rather make them by using projection of regression analysis,
rather than simple descriptive statistics comparison.
An interesting discussion in this context - is given by the noble
prize winner Daniel Kahneman. He concludes - that projected values
produced from a regression analysis - is better than intuitive
prediction (according to the principle - "the prediction should be
weaker than the information it is relied on")
On Thu, Sep 5, 2013 at 8:48 PM, David Hoaglin <[email protected]> wrote:
> Yuval,
>
> Part of your comment illustrates the practice that I am criticizing.
> In general, regression analysis, desirable or actual, estimates the
> effect of each predictor after adjusting for (not "controlling for")
> the contributions of the other predictors. One does not have equal
> conditions or ceteris paribus unless the collection of the data was
> designed to produce such structure.
>
> For sophisticated users of regression analysis, the distinction
> between "adjusting for" and "controlling for" may be largely semantic.
> For less-sophisticated users or consumers of the results, language
> such as "controlling for" gives the misleading impression that
> something is being held constant. For observational data, that is
> usually an overstatement.
>
> Many patterns of correlation among predictors are not substantial
> enough to qualify as "collinearity."
>
> I am not familiar with the example of repair expenditures on a Toyota
> car, but the negative coefficient on one of the predictors is
> implausible only if one tries to interpret it in the same way as the
> coefficient in the corresponding simple regression. In the model that
> uses both mileage and age as predictors, the coefficient of age
> summarizes the change in repair expenditures per unit increase in age
> after adjusting for simultaneous linear change in mileage. For a
> more-detailed understanding, one would have to look at the structure
> of the data (e.g., cross-sectional or longitudinal, the particular
> cars involved). If the two-predictor model is not an appreciably
> better fit than the one-predictor models, it would be appropriate to
> remove one of the predictors.
>
> David Hoaglin
>
> On Thu, Sep 5, 2013 at 5:00 PM, Yuval Arbel <[email protected]> wrote:
>> David,
>>
>> I believe there are two levels in the regression analysis: 1) what is
>> desirable; 2) what is possible to achieve.
>>
>> In terms of desirability, the objective of the regression analysis is
>> to isolate the effect of each covariate after controlling other
>> factors (what we call "under equal conditions" or "ceteris paribus")
>>
>> In terms of actual possibility - the degree of success depends (among
>> other things) on the degree of collinearity.
>>
>> High and low collinearity are dealt with in each and every Econometric
>> textbook that I am familiar with.
>>
>> Moreover, the example of repair expenditures on Toyota car as a linear
>> function of mileage and age of the car is very well known: it yield
>> negative coefficient on one of the explanatory variable (implying the
>> implausible outcome that as the age of the car goes up, the repair
>> expenditures goes down. This problem is resolved when one of these
>> variables are omitted.
>>
>> In term of correct practice - if you get implausible outcome - the
>> first thing you should eliminate - is high collinearity.
>>
>> At least the textbooks I know reflect this insight.
>>
>> P.S. There is a possible methodology to remedy collinearity called ORR
>> - and I believe it also exists in Stata. Economists don't like this
>> methodology very much - because you enter a bias into the model, in
>> order to decrease collinearity
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
--
Dr. Yuval Arbel
School of Business
Carmel Academic Center
4 Shaar Palmer Street,
Haifa 33031, Israel
e-mail1: [email protected]
e-mail2: [email protected]
You can access my latest paper on SSRN at: http://ssrn.com/abstract=2263398
You can access previous papers on SSRN at: http://ssrn.com/author=1313670
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/