Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: about residuals and coefficients

From	Lucas <[email protected]>
To	[email protected]
Subject	Re: st: about residuals and coefficients
Date	Wed, 18 Sep 2013 17:20:59 -0700

Dear David,

Hmm. At the point of learning that " As a predictor variable,
however, education is usually expressed as a set of categories,
corresponding to the major steps in the education system." I guess I
give up and stop reading.  You win.

Sam

On Wed, Sep 18, 2013 at 12:04 PM, David Hoaglin <[email protected]> wrote:
> Dear Sam,
>
> Your comments reflect a number of misunderstandings.
>
> For any set of data, the phrasing "per unit increase" accurately
> reflects the underlying mathematics.  Thus, it cannot be a disservice.
>  As I have mentioned earlier, "other things constant" does not reflect
> the way that multiple regression actually works.
>
> Your example of regressing earnings on age and years of education is
> puzzling.  In cross-sectional data the comparison would be between
> persons 30 years of age with 12 years of schooling and persons 30
> years of age with 12 + 1 years of schooling.  As a predictor variable,
> however, education is usually expressed as a set of categories,
> corresponding to the major steps in the education system.  The
> coefficients for those categories would be differences from the chosen
> reference category, adjusted for the contribution of age in the data.
>
> For the model
>
> $ = b_0 + b_1 Yrs Ed + b_2 Age + e
>
> the usual plot would put earnings on the z-axis, Yrs Ed on the x-axis,
> and Age on the y-axis, and the fitted equation would describe a plane
> (not two planes).  You may be intersecting that plane with planes that
> are perpendicular to the y-axis and the x-axis, respectively.  That
> picture does not alter the interpretation of b_1 and b_2.
>
> In the model
>
> $ = b_0 + b_1 Yrs Ed + b_2 Age + b_3 Age^2 + e
>
> the coefficients b_0, b_1, and b_2 have different definitions from the
> b_0, b_1, and b_2 in the previous model.  The definition of each
> coefficient in a multiple regression includes the set of other
> predictors in the model.  Now b_1 is the slope of Earnings against Yrs
> Ed, after adjusting for the contributions of Age and Age^2.  Thus, the
> interpretation of b_1 in this model differs from the interpretation of
> b_1 in the previous model.
>
> The difference between "per unit change" and "per unit difference" is
> only semantic.  I said "per unit increase" because that is how slopes
> are defined.  The meaning should always be consistent with the context
> of the data.
>
> The geometric representations of those two models are sometimes
> useful, but least-squares fitting in a multiple regression involves a
> different geometry.  If the data consist of n observations, y is a
> vector in n-dimensional space, and the fitted regression is the
> projection of the y-vector onto the subspace spanned by the constant
> vector, the Yrs Ed vector, and the Age vector in the first model and
> onto the subspace spanned by the constant vector, the Yrs Ed vector,
> the Age vector, and the Age^2 vector in the second model.
>
> David Hoaglin
>
> On Wed, Sep 18, 2013 at 11:08 AM, Lucas <[email protected]> wrote:
>> Dear David,
>>
>> This is why I do not understand why you prefer the "per unit increase"
>> phrasing. Many (probably most) analyses use cross-sectional data.
>> Thus, nothing is increasing or decreasing. The coefficients describe
>> the relationships, but there is no reason to suspect -- just on the
>> basis of cross-sectional data -- that change in an X will lead to the
>> slope's change in Y.
>>
>> For example, if I regress earnings on yrs of education and age, that
>> doesn't mean that a 30 year old with 12 years of schooling will be
>> expected to increase their earnings by the increment of the slope for
>> years of education by going to college for 1 year.
>>
>> It seems to me of the two potential disservices we can do to students,
>> teaching them "per unit increase" is far more misleading than teaching
>> them "other things constant" because at least the latter is an
>> accurate representation of what the cross-sectional data can allow.
>>
>> Think about it like this.  If my model is:
>>
>> $ = b_0 + b_1 Yrs Ed + b_2 Age + e
>>
>> then the model summarizes two planes.  The plane for YrsEd has a
>> constant slope, i.e., the slope of the plane for Yrs Ed does not vary
>> regardless of where you are on the plane for Age.  And, vice versa. If
>> for theoretical, prior research, or other reasons I estimate:
>>
>> $ = b_0 + b_1 Yrs Ed + b_2 Age + b_3 Age^2 + e
>>
>> then the "plane" for Age has become a curved surface which means its
>> slope varies for values of Age. Still, the slope for YrsEd  is
>> constant.  So, the interpretation of the YrsEd slope seems unchanged.
>> And so on.
>>
>> Of course, observational data does not usually fix the values of the
>> independent variables, and experimenters can (up to a point). But
>> there are other ways of addressing this than changing the
>> interpretation so that it is either inaccurate or unduly confusing.
>>
>> Anyway, if we want to be as faithful as possible to what the data can
>> say, we should avoid "per unit change" in favor of "per unit
>> difference" because for cross-sectional data -- i.e., what is usually
>> used -- change is obviously beyond the ability of the data to support.
>>
>> Other issues (e.g., being on the support vs. extrapolating off the
>> support) obviously come in as well.
>>
>> Sam
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- Re:Re: st: about residuals and coefficients
  - From: Christopher Baum <[email protected]>
- Re: Re: st: about residuals and coefficients
  - From: Yuval Arbel <[email protected]>
- Re: Re: st: about residuals and coefficients
  - From: David Hoaglin <[email protected]>
- RE: Re: st: about residuals and coefficients
  - From: Charley Greenwood <[email protected]>
- Re: Re: st: about residuals and coefficients
  - From: David Hoaglin <[email protected]>
- Re: st: about residuals and coefficients
  - From: Ronan Conroy <[email protected]>
- Re: st: about residuals and coefficients
  - From: David Hoaglin <[email protected]>
- Re: st: about residuals and coefficients
  - From: Richard Williams <[email protected]>
- Re: st: about residuals and coefficients
  - From: David Hoaglin <[email protected]>
- Re: st: about residuals and coefficients
  - From: Lucas <[email protected]>
- Re: st: about residuals and coefficients
  - From: David Hoaglin <[email protected]>

Prev by Date: Re: st: Predicted probabilities for logistic regression after svy
Next by Date: Re: st: Predicted probabilities for logistic regression after svy
Previous by thread: Re: st: about residuals and coefficients
Next by thread: Re: st: about residuals and coefficients
Index(es):
- Date
- Thread