On 6/27/08, Andrea Bennett <[email protected]> wrote:
> Dear Statalisters,
>
> I'm pretty new to panel data (I'm trying to cope with it by reading
> "Multilevel and Longitudinal Modeling Using Stata" ,
> http://www.stata.com/bookstore/mlmus2.html). I have two
> data sets while in both of them I use one binary variable (yes/no) and one
> ordered categorical variable (0, 1, 2) as dependent variables. Now, I've
> been playing around with probit/xtprobit/gllamm and oprobit/gllamm
> estimations for the former and the latter case. Both data sets contain
> information based on random samples (but the people filling in the
> questionnaire are different for each year), e.g. for the first data set I
> have a persistent survey structure relate to the very same topic while the
> survey was performed in four different years. In the second data set I also
> have a persistent survey structure but there is more then one topic per year
> as well as the related topic is different for each year (it's a survey
> asking people about their opinion on political issues). Though having seen
> this, from what I understand it is not really appropriate to use reg/xtreg
> in these cases.
>
> What I am not sure about is how I should perform the estimation. From a
> theoretical standpoint, I would assume fixed effects in the first data set
> while for the second data set it is an open question, still (I could group
> for political main topics, for example). Now, in OLS I simply would include
> year dummies for fixed effects. But as far as I know, I should not use year
> dummies in gllamm (and usually not in probit/oprobit) for estimating fixed
> effects.
"Fixed effects", to my view, is a rather unfortunate name. A more
appropriate name would be "conditional" models, as the -xtreg, fe- and
-xtlogit- condition on some sufficient statistics of the data. There
aren't any sufficient statistics in ordered models, so there are no
feasible fixed effects estimators for those. The black Wooldridge
gives a nice treatment of those -- he does highlight that fixed
effects and random effects are just two estimators of the same model,
rather than two totally different models.
That's a technical point though. A more substantial point is that as
long as you don't have any real panel structure, you don't have any
way to identify the individual effects (and that's what you try to
squeeze out of panel data). Hence, you can just analyze those models
with plain -ologit-/-oprobit-, modeling time as you see fit -- with
dummies, or better yet with interactions of time with your major
variables if you feel, from substantive knowledge, that the phenomenon
is changing over time. With any combined estimation of this kind,
however, you would need to think carefully about the assumptions on
the measurement process. If you pool the data from different years,
you implicitly assume that the regression coefficients stayed the same
over time (unless you model them explicitly as interactions with
time), that the error variances remained the same (and that's kinda
difficult to model, as putting Var[epsilon]=1 is the scale
identification condition in [ordered] probit models), and that the
thresholds of the ordinal variables are the same (which can be modeled
with -gllamm-).
Now, given the above argument, the only reason you would want to use
-gllamm- is to model jointly your ordinal and binary variables if they
stem from the same idiosyncratic individual preferences (political
attitudes, it appears, for your case). If you have really different
topics then -gllamm- would not get you much: you will get two sets of
estimates that will probably be approximately independent of one
another (which you would see in their covariance matrix, e(V) -- if it
has a block structure with zeroes between blocks of coefficients, then
you can just as well run simpler models for each block).
> So, a) how would I include a simple fixed effect estimation in gllamm (as I
> understand it, using i(year) applies random effects) and b) how would I deal
> with year fixed effects and random effects for the different topics.
> Additionally, which gllamm options are best suited for these kind of
> estimation? Should I use -adapt- and should I set a specific value for -nip-
> ? Using only the -adapt- option (which I read results in a better
> estimation) already results in 15min estimation time for a sub-sample of
> about 3000 observations while the full data set contains 25'000
> observations. I really fear this will take forever (besides that my computer
> dies from overheating!).
Wow, you've got a fast computer. Or -gllamm- has gotten faster in the
last few months.
An earlier comment on importance of starting values is a good one! Run
the basic -[o]probit-, save the results (mat bb=e(b) ) and figure out
how to use them in -gllamm- (- , from(bb, copy)-, may be?). Note that
adding random effects of any kind will change the aforementioned
variance scaling condition for the [ordered] probit models, and hence
your coefficients will go up. The ratios of beta/sqrt(1+random effect
variance) should stay about the same though, as that's your familiar
beta/sigma which is the only identifiable combination in limited
dependent variable models.
--
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: Please do not reply to my Gmail address as I don't check
it regularly.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/