Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: overfitting


From   Nick Cox <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: overfitting
Date   Wed, 14 Aug 2013 14:47:36 +0100

Please use full real names on Statalist. This is explained in the FAQ.

Your output is, what we shall say, not rich in clues on what you are
doing, given your variable names. Substantive background on data and
aims can be as valuable as Stata results for giving advice.

For example, -x4- could be a really important variable that should be
included in your model, if only to test a hypothesis. Or your
hypothesis could that -x1- to -x5- jointly are important, in which
case you need to test that.

All that said, coefficients near 0 don't look scientifically (or even
economically) credible here, regardless of P-values. If this were my
output, I'd move straight to a model with -x2- alone. I'd then want to
plot the residuals from that against the other -x-'s to get an idea of
what structure, if any, was being missed.

Even more important, I'd look at the raw data again. Even with log
scale, it is not out of the question that some outlier is warping your
results with a factor unknown even on Star Trek.
Nick
[email protected]


On 14 August 2013 14:18,  <[email protected]> wrote:
> Hi StataList,
>
> I have the problem of overfitting. All the values are in log forms. What
> should I do?
> Thanks in advance.
>
> reg y x1 x2 x3 x4 x5
>
>       Source |       SS       df       MS              Number of obs =
> 160
> -------------+------------------------------           F(  5,   154)
> =22942.30
>        Model |  .214791334     5 .042958267           Prob > F      =
> 0.0000
>     Residual |  .000288357   154  1.8724e-06           R-squared     =
> 0.9987
> -------------+------------------------------           Adj R-squared =
> 0.9986
>        Total |  .215079691   159  .001352702           Root MSE      =
> .00137
>
> ------------------------------------------------------------------------------
>            y |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
> Interval]
> -------------+----------------------------------------------------------------
>           x1 |   -.039897   .0008289   -48.13   0.000    -.0415345
> -.0382595
>           x2 |   1.063405   .0033848   314.17   0.000     1.056719
> 1.070092
>           x3 |  -.0008994   .0005395    -1.67   0.098    -.0019652
> .0001664
>           x4 |   .0023422   .0024396     0.96   0.339    -.0024772
> .0071615
>           x5 |   .0025709    .000884     2.91   0.004     .0008246
> .0043172
>        _cons |  -.1434952   .0099081   -14.48   0.000    -.1630685
> -.1239219
> ------------------------------------------------------------------------------
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index