Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: overfitting
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: overfitting
Date
Wed, 14 Aug 2013 14:47:36 +0100
Please use full real names on Statalist. This is explained in the FAQ.
Your output is, what we shall say, not rich in clues on what you are
doing, given your variable names. Substantive background on data and
aims can be as valuable as Stata results for giving advice.
For example, -x4- could be a really important variable that should be
included in your model, if only to test a hypothesis. Or your
hypothesis could that -x1- to -x5- jointly are important, in which
case you need to test that.
All that said, coefficients near 0 don't look scientifically (or even
economically) credible here, regardless of P-values. If this were my
output, I'd move straight to a model with -x2- alone. I'd then want to
plot the residuals from that against the other -x-'s to get an idea of
what structure, if any, was being missed.
Even more important, I'd look at the raw data again. Even with log
scale, it is not out of the question that some outlier is warping your
results with a factor unknown even on Star Trek.
Nick
[email protected]
On 14 August 2013 14:18, <[email protected]> wrote:
> Hi StataList,
>
> I have the problem of overfitting. All the values are in log forms. What
> should I do?
> Thanks in advance.
>
> reg y x1 x2 x3 x4 x5
>
> Source | SS df MS Number of obs =
> 160
> -------------+------------------------------ F( 5, 154)
> =22942.30
> Model | .214791334 5 .042958267 Prob > F =
> 0.0000
> Residual | .000288357 154 1.8724e-06 R-squared =
> 0.9987
> -------------+------------------------------ Adj R-squared =
> 0.9986
> Total | .215079691 159 .001352702 Root MSE =
> .00137
>
> ------------------------------------------------------------------------------
> y | Coef. Std. Err. t P>|t| [95% Conf.
> Interval]
> -------------+----------------------------------------------------------------
> x1 | -.039897 .0008289 -48.13 0.000 -.0415345
> -.0382595
> x2 | 1.063405 .0033848 314.17 0.000 1.056719
> 1.070092
> x3 | -.0008994 .0005395 -1.67 0.098 -.0019652
> .0001664
> x4 | .0023422 .0024396 0.96 0.339 -.0024772
> .0071615
> x5 | .0025709 .000884 2.91 0.004 .0008246
> .0043172
> _cons | -.1434952 .0099081 -14.48 0.000 -.1630685
> -.1239219
> ------------------------------------------------------------------------------
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/