Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Why do I get two different results from the same specification and the same dataset?
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
RE: st: Why do I get two different results from the same specification and the same dataset?
Date
Sun, 6 Nov 2011 14:28:05 +0000
The same point can be made differently. Yuval's story depends on (a) data that we cannot see and (b) in part on code that is not shown to us. There can be no objection to people working with their own data, naturally, but it doesn't make remote analysis any easier. However, any bug in Stata if it exists can be demonstrated with data accessible to all. More crucially, we have no way of checking Yuval's own creation of interactions.
So, the best hypothesis on this evidence is that Yuval did something differently in the code not shown to us.
Nick
[email protected]
Joerg Luedicke
You must have made a mistake when creating your interaction terms
"directly". I cannot think of any other explanation.
On Sun, Nov 6, 2011 at 7:24 AM, Yuval Arbel <[email protected]> wrote:
> when I run the following regression
>
> reg bid_win dev_cost bid_num year area units min min_price
> c.dev_cost#i.min c.bid_num#i.min c.year#i.min c.area#i.min
> c.units#i.min
>
> I get the following output:
>
>
> Source | SS df MS Number of obs = 6802
> -------------+------------------------------ F( 12, 6789) = 2891.19
> Model | 7.0107e+17 12 5.8423e+16 Prob > F = 0.0000
> Residual | 1.3719e+17 6789 2.0207e+13 R-squared = 0.8363
> -------------+------------------------------ Adj R-squared = 0.8361
> Total | 8.3826e+17 6801 1.2326e+14 Root MSE = 4.5e+06
>
> ------------------------------------------------------------------------------
> bid_win | Coef. Std. Err. t P>|t| [95% Conf. Interval]
> -------------+----------------------------------------------------------------
> dev_cost | -.0782451 .1319286 -0.59 0.553 -.3368666 .1803764
> bid_num | 53637.98 18087.12 2.97 0.003 18181.54 89094.41
> year | 61991.65 44544.85 1.39 0.164 -25330.21 149313.5
> area | 204.3705 105.0742 1.95 0.052 -1.607834 410.3488
> units | 52691.04 11756.39 4.48 0.000 29644.82 75737.26
> min | 1.04e+08 9.74e+07 1.06 0.288 -8.73e+07 2.95e+08
> min_price | 3.956053 .0241168 164.04 0.000 3.908777 4.00333
> |
> min#|
> c.dev_cost |
> 1 | -.460682 .1391518 -3.31 0.001 -.7334631 -.187901
> |
> min#|
> c.bid_num |
> 1 | -43194.68 19552.09 -2.21 0.027 -81522.9 -4866.457
> |
> min#c.year |
> 1 | -51639.35 48543.57 -1.06 0.287 -146800 43521.27
> |
> min#c.area |
> 1 | 186.6343 110.1857 1.69 0.090 -29.36416 402.6327
> |
> min#c.units |
> 1 | -128569.4 12642.23 -10.17 0.000 -153352.2 -103786.7
> |
> _cons | -1.25e+08 8.94e+07 -1.39 0.163 -3.00e+08 5.05e+07
> ------------------------------------------------------------------------------
>
>
> But when I define directly the interaction variables, and run the
> regression, I get different outcomes:
>
> . reg bid_win dev_cost bid_num year area units min min_price
> dev_cost_int bid_num_int year_int area_int units_int
>
> Source | SS df MS Number of obs = 6802
> -------------+------------------------------ F( 12, 6789) = 2840.90
> Model | 6.9905e+17 12 5.8254e+16 Prob > F = 0.0000
> Residual | 1.3921e+17 6789 2.0505e+13 R-squared = 0.8339
> -------------+------------------------------ Adj R-squared = 0.8336
> Total | 8.3826e+17 6801 1.2326e+14 Root MSE = 4.5e+06
>
> ------------------------------------------------------------------------------
> bid_win | Coef. Std. Err. t P>|t| [95% Conf. Interval]
> -------------+----------------------------------------------------------------
> dev_cost | .3458744 .1259233 2.75 0.006 .0990254 .5927235
> bid_num | 50612.77 18218.04 2.78 0.005 14899.71 86325.84
> year | 26138.3 44731.32 0.58 0.559 -61549.1 113825.7
> area | 796.392 88.98841 8.95 0.000 621.9468 970.8371
> units | -56322.78 4522.886 -12.45 0.000 -65189.06 -47456.51
> min | 2.11e+07 9.78e+07 0.22 0.829 -1.71e+08 2.13e+08
> min_price | 3.914549 .0241269 162.25 0.000 3.867252 3.961845
> dev_cost_int | -.9575921 .1316807 -7.27 0.000 -1.215728 -.6994567
> bid_num_int | -40191.13 19694.51 -2.04 0.041 -78798.54 -1583.728
> year_int | -10450.87 48755.43 -0.21 0.830 -106026.8 85125.05
> area_int | -443.5883 91.15801 -4.87 0.000 -622.2866 -264.89
> units_int | -.2338972 .131622 -1.78 0.076 -.4919176 .0241233
> _cons | -5.29e+07 8.98e+07 -0.59 0.556 -2.29e+08 1.23e+08
> ------------------------------------------------------------------------------
>
> My question is why do I get two different results from the same specification?
> Just to exemplify: note that the coefficient of "dev_cost" has
> modified signs and became significant
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/