Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Incomplete results of linear regression with interaction variable
From
David Hoaglin <[email protected]>
To
[email protected]
Subject
Re: st: Incomplete results of linear regression with interaction variable
Date
Wed, 20 Mar 2013 19:30:37 -0400
Jean-Baptiste,
The table that lists the content of your database shows that n > 1100
in each of the four "cells" in the cross-classification by race and
quality. If you fit the interaction model to the detailed data (all
1159 + 1188 + 1159 + 1169 = 4675 observations), you will have plenty
of degrees of freedom to support standard errors for the four
coefficients.
In each cell, sd_call is substantially larger than mean_call. You did
not describe the nature of the data, but that pattern suggests that
the data may have substantial skewness, especially if the individual
observations cannot be negative. You may want to consider a
transformation that renders the distributions with the cells roughly
symmetric. Alternatively, you may be able to use a generalized linear
model with a random component that models the behavior of the data.
If the data are counts, a Poisson or negative binomial model may be
appropriate.
David Hoaglin
On Wed, Mar 20, 2013 at 5:56 PM, Jean-Baptiste Peraldi
<[email protected]> wrote:
> Hi Statalisters,
>
> I want to to run two linear regressions with dichotomous independant variables, where one contains an interaction variable.
> It appears that the regression with the interaction variable gives only results for the coefficients.
>
> Here is the content of my database:
> ***
> . list
> +---------------------------------------------------------------------------+
> | race quality mean_call sd_call n r_q |
> |----------------------------------------------------------------------------|
> 1. | 0 0 .0854185 .279624 1159 0 |
> 2. | 0 1 .1069024 .3091192 1188 0 |
> 3. | 1 0 .0569456 .2318388 1159 0 |
> 4. | 1 1 .0675791 .2511297 1169 1 |
> +---------------------------------------------------------------------------+
> ***
>
>
> The first regression is :
> " mean_call = cst + beta1*race "
> where "race" is a dichotomous (0 or 1) variable.
>
> The second regression contains an interaction variable :
> " mean_call = cst + beta1*race + beta2*quality + beta3*race*quality " where both "race" and "quality" are dichotomous (0 or 1) variables.
>
> When running the first regression, I get full results:
> ***
> . reg mean_call race
>
> Source | SS df MS Number of obs = 4
> -------------+----------------------------------------- F( 1, 2) = 8.00
> Model | .001149076 1 .001149076 Prob > F = 0.1056
> Residual | .000287314 2 .000143657 R-squared = 0.8000
> -------------+----------------------------------------- Adj R-squared = 0.7000
> Total | .00143639 3 .000478797 Root MSE = .01199
>
> ------------------------------------------------------------------------------
> mean_call | Coef. Std. Err. t P>|t| [95% Conf. Interval]
> -------------+----------------------------------------------------------------
> race | -.033898 .0119857 -2.83 0.106 -.0854683 .0176723
> _cons | .0961604 .0084752 11.35 0.008 .0596947 .1326261
> ------------------------------------------------------------------------------
> ***
>
> For the second regression, I create the interaction variable and run the regression
> ***
> . gen r_q = race*quality
> . reg mean_call race quality r_q
>
> Source | SS df MS Number of obs = 4
> -------------+---------------------------------------- F( 3, 0) = .
> Model | .00143639 3 .000478797 Prob > F = .
> Residual | 0 0 . R-squared = 1.0000
> -------------+---------------------------------------- Adj R-squared = .
> Total | .00143639 3 .000478797 Root MSE = 0
>
> ------------------------------------------------------------------------------
> mean_call | Coef. Std. Err. t P>|t| [95% Conf. Interval]
> -------------+----------------------------------------------------------------
> race | -.0284728 . . . . .
> quality | .0214839 . . . . .
> r_q | -.0108504 . . . . .
> _cons | .0854185 . . . . .
> ------------------------------------------------------------------------------
> ***
> Here we can see that we get results for the coefficients only, which is quite weird. I will be glad if you can help me solve this problem.
> Thanks for your consideration.
>
> Jean-Baptiste P.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/