Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: about residuals and coefficients
From
Lucas <[email protected]>
To
[email protected]
Subject
Re: st: about residuals and coefficients
Date
Wed, 18 Sep 2013 17:20:59 -0700
Dear David,
Hmm. At the point of learning that " As a predictor variable,
however, education is usually expressed as a set of categories,
corresponding to the major steps in the education system." I guess I
give up and stop reading. You win.
Sam
On Wed, Sep 18, 2013 at 12:04 PM, David Hoaglin <[email protected]> wrote:
> Dear Sam,
>
> Your comments reflect a number of misunderstandings.
>
> For any set of data, the phrasing "per unit increase" accurately
> reflects the underlying mathematics. Thus, it cannot be a disservice.
> As I have mentioned earlier, "other things constant" does not reflect
> the way that multiple regression actually works.
>
> Your example of regressing earnings on age and years of education is
> puzzling. In cross-sectional data the comparison would be between
> persons 30 years of age with 12 years of schooling and persons 30
> years of age with 12 + 1 years of schooling. As a predictor variable,
> however, education is usually expressed as a set of categories,
> corresponding to the major steps in the education system. The
> coefficients for those categories would be differences from the chosen
> reference category, adjusted for the contribution of age in the data.
>
> For the model
>
> $ = b_0 + b_1 Yrs Ed + b_2 Age + e
>
> the usual plot would put earnings on the z-axis, Yrs Ed on the x-axis,
> and Age on the y-axis, and the fitted equation would describe a plane
> (not two planes). You may be intersecting that plane with planes that
> are perpendicular to the y-axis and the x-axis, respectively. That
> picture does not alter the interpretation of b_1 and b_2.
>
> In the model
>
> $ = b_0 + b_1 Yrs Ed + b_2 Age + b_3 Age^2 + e
>
> the coefficients b_0, b_1, and b_2 have different definitions from the
> b_0, b_1, and b_2 in the previous model. The definition of each
> coefficient in a multiple regression includes the set of other
> predictors in the model. Now b_1 is the slope of Earnings against Yrs
> Ed, after adjusting for the contributions of Age and Age^2. Thus, the
> interpretation of b_1 in this model differs from the interpretation of
> b_1 in the previous model.
>
> The difference between "per unit change" and "per unit difference" is
> only semantic. I said "per unit increase" because that is how slopes
> are defined. The meaning should always be consistent with the context
> of the data.
>
> The geometric representations of those two models are sometimes
> useful, but least-squares fitting in a multiple regression involves a
> different geometry. If the data consist of n observations, y is a
> vector in n-dimensional space, and the fitted regression is the
> projection of the y-vector onto the subspace spanned by the constant
> vector, the Yrs Ed vector, and the Age vector in the first model and
> onto the subspace spanned by the constant vector, the Yrs Ed vector,
> the Age vector, and the Age^2 vector in the second model.
>
> David Hoaglin
>
> On Wed, Sep 18, 2013 at 11:08 AM, Lucas <[email protected]> wrote:
>> Dear David,
>>
>> This is why I do not understand why you prefer the "per unit increase"
>> phrasing. Many (probably most) analyses use cross-sectional data.
>> Thus, nothing is increasing or decreasing. The coefficients describe
>> the relationships, but there is no reason to suspect -- just on the
>> basis of cross-sectional data -- that change in an X will lead to the
>> slope's change in Y.
>>
>> For example, if I regress earnings on yrs of education and age, that
>> doesn't mean that a 30 year old with 12 years of schooling will be
>> expected to increase their earnings by the increment of the slope for
>> years of education by going to college for 1 year.
>>
>> It seems to me of the two potential disservices we can do to students,
>> teaching them "per unit increase" is far more misleading than teaching
>> them "other things constant" because at least the latter is an
>> accurate representation of what the cross-sectional data can allow.
>>
>> Think about it like this. If my model is:
>>
>> $ = b_0 + b_1 Yrs Ed + b_2 Age + e
>>
>> then the model summarizes two planes. The plane for YrsEd has a
>> constant slope, i.e., the slope of the plane for Yrs Ed does not vary
>> regardless of where you are on the plane for Age. And, vice versa. If
>> for theoretical, prior research, or other reasons I estimate:
>>
>> $ = b_0 + b_1 Yrs Ed + b_2 Age + b_3 Age^2 + e
>>
>> then the "plane" for Age has become a curved surface which means its
>> slope varies for values of Age. Still, the slope for YrsEd is
>> constant. So, the interpretation of the YrsEd slope seems unchanged.
>> And so on.
>>
>> Of course, observational data does not usually fix the values of the
>> independent variables, and experimenters can (up to a point). But
>> there are other ways of addressing this than changing the
>> interpretation so that it is either inaccurate or unduly confusing.
>>
>> Anyway, if we want to be as faithful as possible to what the data can
>> say, we should avoid "per unit change" in favor of "per unit
>> difference" because for cross-sectional data -- i.e., what is usually
>> used -- change is obviously beyond the ability of the data to support.
>>
>> Other issues (e.g., being on the support vs. extrapolating off the
>> support) obviously come in as well.
>>
>> Sam
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/