Kit Baum wrote:
> Without looking carefully at the formulas used in xtreg, fe, I think
> conceptually the issue is that xtreg, fe considers the explanation of
> \sum {{y_{it} - \bar{y}_i)^2} : that is, after demeaning the data by
> the individual means, how much of the remaining variation is explained
> by your regressors? In areg, I suspect that in absorbing the factor
> pano, the amount of variation absorbed is also included in r^2, just as
> it would be if you included the dummies explicitly. That is, a one-way
> ANOVA of your depvar on pano would explain some amount of the
> variation. Do an ANOCOVA including pano and a bunch of regressors, and
> you explain more. But the xtreg, fe model considers that the only thing
> to be explained is y net of individual mean y.
After running a couple of -anova- tests, I see what you mean, Kit: PANO on
its own explains 31.03% of the variation in EDCONCH, nearly half of the
total in the full model (66.78%: both adj R^2s). Of course, section 1 of
ch.14 in Wooldridge (2003) goes into the mechanics in a bit more detail,
inter alia.
In fitting that model, I overlooked something quite basic. I discovered
that I should have fitted time dummies. The result was dramatic:
. areg edconch ed2-ed13 edpollch lagconch laglabch lagldmch clmargin
cdmargin conplace edenp class if edmarker==1 [pw=weight], absorb(pano)
cluster(pano)
Regression with robust standard errors Number of obs = 1875
F( 18, 1552) = 84.75
Prob > F = 0.0000
R-squared = 0.7618
Adj R-squared = 0.7124
Root MSE = 6.1587
(standard errors adjusted for clustering on pano)
----------------------------------------------------------------------------
| Robust
edconch | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
ed2 | (dropped)
ed3 | (dropped)
ed4 | 1.706714 3.270239 0.52 0.602 -4.707839 8.121267
ed5 | 4.584752 3.005362 1.53 0.127 -1.310248 10.47975
ed6 | 4.106024 2.795745 1.47 0.142 -1.377812 9.589861
ed7 | 2.216951 2.656637 0.83 0.404 -2.994025 7.427927
ed8 | -1.576794 2.641104 -0.60 0.551 -6.757303 3.603714
ed9 | 1.673161 2.288051 0.73 0.465 -2.814837 6.161158
ed10 | 2.835353 2.825758 1.00 0.316 -2.707352 8.378059
ed11 | -6.985576 2.506125 -2.79 0.005 -11.90132 -2.069828
ed12 | -.8350404 2.366703 -0.35 0.724 -5.477313 3.807233
ed13 | 6.787137 2.954843 2.30 0.022 .9912307 12.58304
edpollch | .1638447 .0581991 2.82 0.005 .0496876 .2780018
lagconch | -.1384667 .1818321 -0.76 0.446 -.4951292 .2181958
laglabch | -.0874243 .178278 -0.49 0.624 -.4371156 .262267
lagldmch | -.0721165 .1699159 -0.42 0.671 -.4054054 .2611724
clmargin | -.2637521 .0463728 -5.69 0.000 -.354712 -.1727921
cdmargin | -.2226632 .0477287 -4.67 0.000 -.3162827 -.1290438
conplace | 1.289732 .9699538 1.33 0.184 -.6128259 3.19229
edenp | -9.368422 .974361 -9.61 0.000 -11.27962 -7.457219
class | .0221002 .0536338 0.41 0.680 -.0831022 .1273025
_cons | 28.84836 4.044046 7.13 0.000 20.91599 36.78073
-----------+----------------------------------------------------------------
pano | absorbed (304 categories)
EDCONCH = change (%) in Con ED vote from the previous general election.
Note that I've taken out the four time/cyclical variables that were in
there orginally (including the time trend) and CDMARGIN has replaced
LDMARGIN (but this has no effect whatsoever on the fit, which improves by
5%). Best of all (from my anorak point of view) I've now got a sensible
value for the constant term! Leaving the time trend variable in produces
the exact same model except that two more time dummies become significant,
the time trend itself is significant and the constant takes on a
ridiculous value (-1004.363, in this case). It would be nice to understand
why/how time dummies produce this effect (if it does at all), but I'm very
pleased with this.
However a couple of issues remain. Note that the first two time dummies
have been dropped. I don't understand why, since:
. mrtab ed1- ed13
| Pct. of Pct. of
| Freq. responses cases
-------------------------+-----------------------------------
ed1 edyear== 1976.0000 | 303 11.55 11.55
ed2 edyear== 1978.0000 | 55 2.10 2.10
ed3 edyear== 1979.0000 | 297 11.32 11.32
ed4 edyear== 1980.0000 | 125 4.77 4.77
ed5 edyear== 1982.0000 | 128 4.88 4.88
ed6 edyear== 1983.0000 | 305 11.63 11.63
ed7 edyear== 1984.0000 | 167 6.37 6.37
ed8 edyear== 1986.0000 | 168 6.40 6.40
ed9 edyear== 1987.0000 | 305 11.63 11.63
ed10 edyear== 1988.0000 | 155 5.91 5.91
ed11 edyear== 1990.0000 | 157 5.99 5.99
ed12 edyear== 1991.0000 | 305 11.63 11.63
ed13 edyear== 1992.0000 | 153 5.83 5.83
-------------------------+-----------------------------------
Total | 2623 100.00 100.00
(Thank the heavens for Ben Jann, incidentally.) OK, ED2 may be an outlier
as far as Ns are concerned, but ED3 isn't, so why does my LSDV model drop
this too?
The second problem is that I wish to include a lagged term on the depvar.
I know how to do this correctly (gen lag = l.edconch), but there are two
problems here. First, there are gaps in the time series. Now I could use
-tsfill-, but I want the lag to latch on the correct EDYEAR, rather than
incorrectly onto a year for which there no elections. The second problem
is that adding the lagged term knocks out a lot of the time dummies. Is
there a way round this?
Naturally, anyone is invited to take a stab at these posers. Ta.
CLIVE NICHOLAS |t: 0(044)191 222 5969
Politics |e: [email protected]
Newcastle University |http://www.ncl.ac.uk/geps
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/