Thanks for the detailed explanation! This seems to work. Although if
I use the 'condition' option, I still have the same problem (matsize
being too small) when I try to make out-of-sample
forecasts. 'diffuse' works OK, if slowly, as advised.
C.
--- In [email protected], vwiggins@s... (Vince Wiggins,
StataCorp) wrote:
> Clarence Tam <Clarence.Tam@l...> asks whether he needs to have
Stata/SE
> to estimate an arima model with an MA term at the 52nd lag,
>
> > [...] Model diagnostics suggest that there's a residual seasonal
> > correlation (at week 52) both in the ACF and PACF. My next step
was
> > going to be to include an additional AR or MA term to account for
> > this, but I'm not sure how to do it. I've tried:
> >
> > . arima DS52.lnreps, ar(1) ma(1 52) noconstant
> >
> > but Stata says that the matsize is too small, even though it's set
> > at the maximum of 800 (I'm using Intercooled Stata 8.0).
> > Does anyone have any suggestions on how to get round this problem
> > (preferably ones that don't involve upgrading to Stata SE...)?
>
>
> Answer
> ------
>
> Clarence does not need to upgrade to SE.
>
> The message he received after his -arima- command should have been,
>
> matsize too small, must be max(AR, MA+1)^2
> use -diffuse- option or type -help matsize-
>
> In this case, with the maximum MA being 52, the message implies
that a matrix
> size of 53^2=2809 is required, and that would indeed require
Stata/SE. The
> first suggestion in the message, however, will let him use
Intercooled Stata
> to estimate the model. If Clarence types,
>
> . arima DS52.lnreps, ar(1) ma(1 52) noconstant diffuse
> ^^^^^^^
> he should be able to estimate the model.
>
>
> Explanation
> -----------
>
> By default -arima- uses a Kalman filter to produce unconditional
maximum
> likelihood estimates of the specified model. To obtain the
unconditional
> estimates the Kalman filter must be initialized with the expected
value of the
> initial state vector and the MSE of this vector. These initial
values depend
> on the current parameter estimates and in computing the MSE we must
invert a
> square matrix the size of the state vector -- max(AR, MA+1)^2.
Thus, the need
> for such a large matrix. These are the most efficient estimates
for the model
> because the initial state vector and its MSE are forced to conform
to the
> current parameter estimates.
>
> We can, however, obtain slightly less efficient estimates by
assuming that the
> initial state vector is zero and its variance is unknown and
effectively
> infinite. This is what the -diffuse- option specifies. This
assumption
> essential down-weights the initial observations until the data
itself can be
> used to develop a state vector and its MSE.
>
> With large datasets, the two estimates tend to be close.
>
>
> Suggestion
> ----------
>
> Even though this model has only 4 parameters, including sigma, the
Kalman
> filter iterations may be somewhat slow because the filter must
maintain a
> state vector that is the maximum of the largest AR or MA term and
will thus be
> flopping around some pretty large matrices to compute the
likelihood at each
> observation. For this reason, I would recommend that Clarence use
the
> -condition- option to estimate the model,
>
> . arima DS52.lnreps, ar(1) ma(1 52) noconstant condition
> ^^^^^^^^^
>
> The -condition- option specifies conditional-maximum likelihood
estimates,
> rather than unconditional. These estimates to not require
maintaining a state
> vector. Specifically, all pre-sample values of the white noise,
e_t, and
> autocorrelated, u_t, disturbances are taken to be 0 and the MSE of
e_t is
> taken to be constant over the entire sample. Effectively this
means that the
> initial observations in the sample get just as much weight as the
middle or
> end observations even though we know less about them. We know less
because
> the process is autocorrelated and this implies that knowing the past
> observations tells us something about the current observation, and
because
> nothing is known about the pre-sample observations.
>
> What unconditional maximum likelihood effectively does is use the
current
> estimates to imply information about the pre-sample while optimally
> down-weighting this information so that the initial observations
get a little
> less weight that the remaining observations.
>
> What the -diffuse- option effectively does is to say we know
nothing about the
> pre-sample and accordingly down-weights the initial observations in
the sample
> even more.
>
> What conditional maximum likelihood effectively does is assume that
the
> pre-sample values are their long-run expected value of zero, that
we know this
> just as well as we know later later, and accordingly weights the
initial
> observations equally with the remaining observations.
>
> With large datasets, it generally does not matter which method we
use because
> the contribution of the initial observations is dominated by the
remaining
> data. Note, however, that "large" must be used carefully when the
process has
> large autocorrelation terms.
>
>
>
> -- Vince
> vwiggins@s...
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/