All,
I've only just joined this community after having been freed from the
shackles of SPSS: so a warm *watcha!* to you all! :-)
I have some queries regarding -xtgee-, -xtlogit- and -xtprobit- that some
of you may be able to help on, so here goes. I'm currently analysing a
survey dataset of 3615 respondents who were surveyed across eight waves
between 1997-2001 on the British Election Panel Study (BEPS).
In waves 3, 4 and 5, the panel was asked how they voted in certain
mid-term elections that these waves followed immediately after. I'm
interested in how much impact their mid-term electoral behaviour had on
their voting in the British general election of 2001. Needless to say,
studying such dynamics requires the -xt*- functions. But since the
response variable is binary (eg, 0=didn't vote Labour in 2001; 1=voted
Labour in 2001), this throws up a number of Qs that I'm interested in
seeking the As to:
(1) I cannot successfully fit an -xtgee- model (ie, xtgee [depvar]
[varlist], family(binomial) link(probit) i(id) t(waves38)). All I get is
this error message: "estimates diverging (absolute correlation > 1)
r(430);". Uhh? And how can this be got round?
(2) An expert in pooled analysis taught us in the summer that if you're
fitting fixed-effect models, you *must* choose logit. If a random-effects
model is chosen, then *probit* must be used. Can anybody tell me why this
is, especially since you can, eg, fit random-effect logit models with
Stata?
(3) In running any of the binary -xt*- models, is serial correlation and
heteroscedasicity corrected for? (I'm assuming not.) Since using -xtgls-,
-xtprais- and -xtpcse- is out of the question for binary pooled/panel
regression models (or is it??), how does one instruct Stata 8 to test for
it and correct for it?
Finally, a number of people have been talking about the supposed follies
of R-squared (R^2) in model estimation. I agree with most of what has been
said, but I'd like to make some further comments:
(a) Whatever is judged to be the 'best' measure of R^2, one *must* keep in
mind that (i) high levels of intercorrelation between X-variables inflate
R^2 to artifically-high levels; and (ii) models deploying aggregate-level
data with large spatial units of analysis inevitably have knock-on
(upward) effects on R^2, regardless of its measurement;
(b) Why should *anybody* attempt to build a regression model that hopes to
produce an R^2 of 100%? Anybody with half a brain on these matters will
tell you that if your model has yielded a 'perfect' R^2, something is
wrong (probably multicollinearity among two or three X-variables). When
will people learn to love *low* levels of R^2? Low levels means there is
more to explain, and thus stretches our academic imaginations by providing
us with more challenges as to what the missing key factors might be.
If only social scientists, psychologists and economists alike simply
focused on the theoretical and empirical validity and reliability of their
variables and modelled social reality as accurately as possible in order
to test theories about human behaviour, then this will tell us more than
what R^2 tells us about *anything!* :-)
Yours,
CLIVE NICHOLAS,
Politics Building,
School of Geography, Politics and Sociology,
University of Newcastle-upon-Tyne,
Newcastle-upon-Tyne,
NE1 7RU,
United Kingdom.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/