Clarence Tam <[email protected]> asks whether he needs to have Stata/SE
to estimate an arima model with an MA term at the 52nd lag,
> [...] Model diagnostics suggest that there's a residual seasonal
> correlation (at week 52) both in the ACF and PACF. My next step was
> going to be to include an additional AR or MA term to account for
> this, but I'm not sure how to do it. I've tried:
>
> . arima DS52.lnreps, ar(1) ma(1 52) noconstant
>
> but Stata says that the matsize is too small, even though it's set
> at the maximum of 800 (I'm using Intercooled Stata 8.0).
> Does anyone have any suggestions on how to get round this problem
> (preferably ones that don't involve upgrading to Stata SE...)?
Answer
------
Clarence does not need to upgrade to SE.
The message he received after his -arima- command should have been,
matsize too small, must be max(AR, MA+1)^2
use -diffuse- option or type -help matsize-
In this case, with the maximum MA being 52, the message implies that a matrix
size of 53^2=2809 is required, and that would indeed require Stata/SE. The
first suggestion in the message, however, will let him use Intercooled Stata
to estimate the model. If Clarence types,
. arima DS52.lnreps, ar(1) ma(1 52) noconstant diffuse
^^^^^^^
he should be able to estimate the model.
Explanation
-----------
By default -arima- uses a Kalman filter to produce unconditional maximum
likelihood estimates of the specified model. To obtain the unconditional
estimates the Kalman filter must be initialized with the expected value of the
initial state vector and the MSE of this vector. These initial values depend
on the current parameter estimates and in computing the MSE we must invert a
square matrix the size of the state vector -- max(AR, MA+1)^2. Thus, the need
for such a large matrix. These are the most efficient estimates for the model
because the initial state vector and its MSE are forced to conform to the
current parameter estimates.
We can, however, obtain slightly less efficient estimates by assuming that the
initial state vector is zero and its variance is unknown and effectively
infinite. This is what the -diffuse- option specifies. This assumption
essential down-weights the initial observations until the data itself can be
used to develop a state vector and its MSE.
With large datasets, the two estimates tend to be close.
Suggestion
----------
Even though this model has only 4 parameters, including sigma, the Kalman
filter iterations may be somewhat slow because the filter must maintain a
state vector that is the maximum of the largest AR or MA term and will thus be
flopping around some pretty large matrices to compute the likelihood at each
observation. For this reason, I would recommend that Clarence use the
-condition- option to estimate the model,
. arima DS52.lnreps, ar(1) ma(1 52) noconstant condition
^^^^^^^^^
The -condition- option specifies conditional-maximum likelihood estimates,
rather than unconditional. These estimates to not require maintaining a state
vector. Specifically, all pre-sample values of the white noise, e_t, and
autocorrelated, u_t, disturbances are taken to be 0 and the MSE of e_t is
taken to be constant over the entire sample. Effectively this means that the
initial observations in the sample get just as much weight as the middle or
end observations even though we know less about them. We know less because
the process is autocorrelated and this implies that knowing the past
observations tells us something about the current observation, and because
nothing is known about the pre-sample observations.
What unconditional maximum likelihood effectively does is use the current
estimates to imply information about the pre-sample while optimally
down-weighting this information so that the initial observations get a little
less weight that the remaining observations.
What the -diffuse- option effectively does is to say we know nothing about the
pre-sample and accordingly down-weights the initial observations in the sample
even more.
What conditional maximum likelihood effectively does is assume that the
pre-sample values are their long-run expected value of zero, that we know this
just as well as we know later later, and accordingly weights the initial
observations equally with the remaining observations.
With large datasets, it generally does not matter which method we use because
the contribution of the initial observations is dominated by the remaining
data. Note, however, that "large" must be used carefully when the process has
large autocorrelation terms.
-- Vince
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/