Paulo - The difference is that for nlpark1, the R2 is calculated as if there
were no constant ,so the model sum of squares is raw, not centered. This
inflates the "R^2". In the second case, where there are linear terms, -nl-
apparently calculates as if there is a constant term. This is analogous to
fitting linear regression models one with a constant, the other without. For
example the following 2 regressions give almost identical fits (root MSE =
1.84), yet the "R2"'s are different because in the second case, the constant
term is assumed known ( = 6).
. clear
. range x 0 1 101
obs was 0, now 101
. gen y=6+10*x+2*invnorm(uniform())
. reg y x
Source | SS df MS Number of obs =
101
-------------+------------------------------ F( 1, 99) =
226.42
Model | 770.936359 1 770.936359 Prob > F =
0.0000
Residual | 337.078803 99 3.4048364 R-squared =
0.6958
-------------+------------------------------ Adj R-squared =
0.6927
Total | 1108.01516 100 11.0801516 Root MSE =
1.8452
----------------------------------------------------------------------------
--
y | Coef. Std. Err. t P>|t| [95% Conf.
Interval]
-------------+--------------------------------------------------------------
--
x | 9.476307 .6297642 15.05 0.000 8.226718
10.7259
_cons | 6.251979 .3645024 17.15 0.000 5.528728
6.975231
----------------------------------------------------------------------------
--
. gen ym6=y-6
. reg ym6 x,noc
Source | SS df MS Number of obs =
101
-------------+------------------------------ F( 1, 100) =
969.68
Model | 3284.35327 1 3284.35327 Prob > F =
0.0000
Residual | 338.705947 100 3.38705947 R-squared =
0.9065
-------------+------------------------------ Adj R-squared =
0.9056
Total | 3623.05921 101 35.8718734 Root MSE =
1.8404
----------------------------------------------------------------------------
--
ym6 | Coef. Std. Err. t P>|t| [95% Conf.
Interval]
-------------+--------------------------------------------------------------
--
x | 9.852396 .3163941 31.14 0.000 9.224679
10.48011
----------------------------------------------------------------------------
--
In the case of -nl-, it's not obvious to me how -nl- decides what to use for
the "constant" term in it's output. I've wondered about this myself.
Al Feiveson
-----Original Message-----
From: Paulo Guimaraes [mailto:[email protected]]
Sent: Wednesday, June 18, 2003 10:29 AM
To: [email protected]
Subject: st: -nl- command bug?
Dear All,
While using the -nl- command I have come across a
rather puzzling result. Two different specifications
that fit the data equally well (the correlation from
their predicted values is 1) show an r-squared of
0.1 and 0.7 respectively.
It seems that in one of the models stata is
incorrectly calculating the sum of squares from the
model (I could replicate the value for one of the
models but not for the other).
Any hint to what is happening? It seems to me like a
bug but I may be overlooking something. I also dont
understand why the degrees of freedom is 3 in one
model and 4 in the other.
Here is the code and results:
**************************************************
. program define nlpark1
1. version 7
2. if "`1'"=="?" {
3. global S_1 "b2 b3 b4 b5"
4. global b2=0
5. global b3=.01
6. global b4=0
7. global b5=10
8. exit
9. }
10. replace `1'
=exp($b2)*exp($b3*time)+exp($b4)*exp(-$b5*time)
11. end
.
. program define nlpark2
1. version 7
2. if "`1'"=="?" {
3. global S_1 "b2 b3 b4 b5"
4. global b2=0
5. global b3=.01
6. global b4=0
7. global b5=10
8. exit
9. }
10. replace `1'
=$b2+$b3*time+exp($b4)*exp(-$b5*time)
11. end
.
.
. use final
. drop if total==.|total==0
(10 observations deleted)
. gen time=days0/365.25
. drop if time==.
(0 observations deleted)
. sum total if days0==0
Variable | Obs Mean Std. Dev.
Min Max
-------------+-----------------------------------------------------
total | 150 31.09333 12.82777
6.5 73
. local avdep=r(mean)
. local startv=log(`avdep'/2)
. nl park1 total, eps(1e-9) init(b2=
`startv',b4=`startv')
(obs = 1576)
Iteration 0: residual SS = 204943.1
...
Iteration 6: residual SS = 196084.1
Source | SS df MS
Number of obs = 1576
-------------+------------------------------
F( 4, 1572) = 1342.48
Model | 669821.354 4 167455.339
Prob > F = 0.0000
Residual | 196084.146 1572 124.735462
R-squared = 0.7736
-------------+------------------------------
Adj R-squared = 0.7730
Total | 865905.5 1576 549.432424
Root MSE = 11.1685
Res. dev. = 12074.57
(park1)
----------------------------------------------------------------------------
--
total | Coef. Std. Err. t P>|t|
[95% Conf. Interval]
-------------+--------------------------------------------------------------
--
b2 | 2.805342 .0511239 54.87 0.000
2.705063 2.90562
b3 | .0977053 .0379304 2.58 0.010
.0233058 .1721048
b4 | 2.673872 .0815743 32.78 0.000
2.513866 2.833877
b5 | 10.69553 2.007598 5.33 0.000
6.757673 14.63338
----------------------------------------------------------------------------
--
(SE's, P values, CI's, and correlations are
asymptotic approximations)
. predict tm1
(option yhat assumed; fitted values)
. nl park2 total, eps(1e-9) init(b2=
`startv',b4=`startv')
(obs = 1576)
Iteration 0: residual SS = 547186.4
...
Iteration 5: residual SS = 196080.5
Source | SS df MS
Number of obs = 1576
-------------+------------------------------
F( 3, 1572) = 60.60
Model | 22675.1691 3 7558.38969
Prob > F = 0.0000
Residual | 196080.524 1572 124.733158
R-squared = 0.1037
-------------+------------------------------
Adj R-squared = 0.1019
Total | 218755.693 1575 138.892503
Root MSE = 11.1684
Res. dev. = 12074.54
(park2)
----------------------------------------------------------------------------
--
total | Coef. Std. Err. t P>|t|
[95% Conf. Interval]
-------------+--------------------------------------------------------------
--
b2 | 16.42274 .9216482 17.82 0.000
14.61495 18.23053
b3 | 1.824629 .7069261 2.58 0.010
.438012 3.211246
b4 | 2.681091 .0842907 31.81 0.000
2.515757 2.846425
b5 | 10.5638 2.023805 5.22 0.000
6.594161 14.53344
----------------------------------------------------------------------------
--
* Parameter b2 taken as constant term in model & ANOVA
table
(SE's, P values, CI's, and correlations are
asymptotic approximations)
. predict tm2
(option yhat assumed; fitted values)
. corr tm1 tm2
(obs=1576)
| tm1 tm2
-------------+------------------
tm1 | 1.0000
tm2 | 1.0000 1.0000
**************************************************
thanks
Paulo Guimaraes
__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/