Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: mvrs: out-of-sample prediction/definition of the splines
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: mvrs: out-of-sample prediction/definition of the splines
Date
Wed, 5 Dec 2012 17:47:02 +0000
This refers to -mvrs- by Patrick Royston.
SJ-7-1 st0120 . Multivar. modeling with cubic reg. splines: A prin. approach
(help mvrs, uvrs, splinegen if installed) P. Royston and W. Sauerbrei
Q1/07 SJ 7(1):45--70
discusses how to limit instability and provide sensible
regression models when using spline functions in a
multivariable setting
mvrs from http://www.homepages.ucl.ac.uk/~ucakjpr/stata
mvrs. Package for univariate and multivariable regression spline modelling
/ Programs by Patrick Royston. / Distribution-Date: 20121205 / version:
2.0.0 (uvrs), 2.0.0 (mvrs), 1.2.2 (splinegen) / Please direct queries to
Patrick Royston ([email protected])
Please remember to explain _where_ user-written packages you refer to
come from.
Patrick Royston is not a member of Statalist, but I forwarded this to him.
He writes
% begin Patrick R
In fact, what -mvrs- is doing is to create spline basis variables
correctly when -all- is specified. It automatically orthogonalizes
them so that the mean of each is 0 and the variance is 1 and the
covariances between them are 0. With the -all- option, knots are
determined only from the estimation sample (training==1 in your
example) and are applied to all observations when basis functions are
calculated. The orthogonalization is effectively just a linear
transformation. This transformation should not affect the predicted
values from regression analysis on the basis functions calculated in
the out-of-sample part of the data.
However, if you are sceptical, you can include the -noorthog- option
in -mvrs- (the default is -orthog-). This option was previously
undocumented, but below are some details as an addition to the help
file. Please try out your example both ways, with and without
orthogonalization.
-orthog- creates orthogonalized spline basis functions. After
orthogonalization, all the basis functions are uncorrelated and have
mean 0 and SD 1. The default is to create orthogonalized basis
functions. -noorthog- produces non-orthogonalized basis functions.
They are typically highly correlated, possibly resulting in numerical
instability when fitting the model.
Updated help files mvrs.sthlp and splinegen.sthlp are now on my UCL webpage.
% end Patrick R
Patrick Miller <[email protected]>
> I want to use a mvrs model for out-of-sample prediction but
> unfortunately I have some trouble with the option "all".
>
> I have devided my data in training- and test-sample by using a binary
> variable train. To build a model (without option "all") I use:
>
> (1) mvrs regress y x1 x2 x3 if train==1, degree(3)
>
> Let x1 be continuous and a spline transformation with two knots is
> done. Hence new variables x1_0, x1_1 and x1_2 are generated.
>
> If I use the samle model but with option "all":
>
> (2) mvrs regress y x1 x2 x2 if train==1, all degree(3)
>
> x1_0, x1_1 and x1_2 are generate as well, but the stored values for
> the training-sample differ from the ones generated by model (1).
>
> My interpretation is that the transformation for the test-sample is
> not only done by the information provided by the training-sample. In
> fact for the transformation training- and test-data are used. In my
> opinion this is not a correct way of out-of-sample testing.
>
> Is there any way to generate x1_0, x1_1 and x1_2 for the test-sample
> only based on the information of the training-sample?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/