Cross-sectional time-series regression
Stata fits fixed-effects (within), between-effects, and random-effects
(mixed) models on balanced and unbalanced data. We use the notation
y[i,t] = X[i,t]*b + u[i] + v[i,t]
That is, u[i] is the fixed or random effect, and v[i,t] is the pure
residual.
xtreg is Stata’s cross-sectional time-series regression
command. xtreg, fe estimates the parameters of fixed-effects models:
. xtreg ln_w grade age* ttl_exp* tenure* black not_smsa south, fe
Fixed-effects (within) regression Number of obs = 28091
Group variable (i): idcode Number of groups = 4697
R-sq: within = 0.1727 Obs per group: min = 1
between = 0.3505 avg = 6.0
overall = 0.2625 max = 15
F(8,23386) = 610.12
corr(u_i, Xb) = 0.1936 Prob > F = 0.0000
------------------------------------------------------------------------------
ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
grade | (dropped)
age | .0359987 .0033864 10.63 0.000 .0293611 .0426362
age2 | -.000723 .0000533 -13.58 0.000 -.0008274 -.0006186
ttl_exp | .0334668 .0029653 11.29 0.000 .0276545 .039279
ttl_exp2 | .0002163 .0001277 1.69 0.090 -.0000341 .0004666
tenure | .0357539 .0018487 19.34 0.000 .0321303 .0393775
tenure2 | -.0019701 .000125 -15.76 0.000 -.0022151 -.0017251
black | (dropped)
not_smsa | -.0890108 .0095316 -9.34 0.000 -.1076933 -.0703282
south | -.0606309 .0109319 -5.55 0.000 -.0820582 -.0392036
_cons | 1.03732 .0485546 21.36 0.000 .9421497 1.13249
-------------+----------------------------------------------------------------
sigma_u | .35562203
sigma_e | .29068923
rho | .59946283 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(4696, 23386) = 5.13 Prob > F = 0.0000
The syntax of all estimation commands is the same: the name of the
dependent variable is followed by the names of the independent variables.
Here the dependent variable ln_w (log of wage) was modeled as a
function of a number of explanatory variables. grade and
black were dropped from the model because they do not vary within
person.
Our dataset contains 28,091 “observations”, which are 4,697
people each observed, on average, on 6.0 different years. An observation in
our data is a person in a given year. The dataset contains variable
idcode, which identifies the persons — the i index in x[i,t].
Before fitting the model, we typed iis idcode to tell Stata this.
Told once, Stata remembers.
To fit the corresponding random-effects model, we use the same command but
change the fe option to re.
. xtreg ln_w grade age* ttl_exp* tenure* black not_smsa south, re
Random-effects GLS regression Number of obs = 28091
Group variable (i): idcode Number of groups = 4697
R-sq: within = 0.1715 Obs per group: min = 1
between = 0.4784 avg = 6.0
overall = 0.3708 max = 15
Random effects u_i ~ Gaussian Wald chi2(10) = 9244.87
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
------------------------------------------------------------------------------
ln_wage | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
grade | .0646499 .0017811 36.30 0.000 .0611589 .0681408
age | .036806 .0031195 11.80 0.000 .0306918 .0429201
age2 | -.0007133 .00005 -14.27 0.000 -.0008113 -.0006153
ttl_exp | .0290207 .0024219 11.98 0.000 .0242737 .0337676
tl_exp2 | .0003049 .0001162 2.62 0.009 .000077 .0005327
tenure | .039252 .0017555 22.36 0.000 .0358114 .0426927
tenure2 | -.0020035 .0001193 -16.80 0.000 -.0022373 -.0017697
black | -.0530532 .0099924 -5.31 0.000 -.0726379 -.0334685
not_smsa | -.1308263 .0071751 -18.23 0.000 -.1448891 -.1167634
south | -.0868927 .0073031 -11.90 0.000 -.1012066 -.0725788
_cons | .2387209 .0494688 4.83 0.000 .1417639 .335678
-------------+----------------------------------------------------------------
sigma_u | .25790313
sigma_e | .29069544
rho | .44043812 (fraction of variance due to u_i)
------------------------------------------------------------------------------
We can also perform the Hausman specification test, which compares the
consistent fixed-effects model with the efficient random-effects model. To
do that, we must first store the results from our random-effects model,
refit the fixed-effects model to make those results current, and then
perform the test.
. estimates store random_effects
. quietly xtreg ln_w grade age* ttl_exp* tenure* black not_smsa south, fe
. hausman . random_effects
---- Coefficients ----
| (b) (B) (b-B) sqrt(diag(V_b-V_B))
| . random_eff~s Difference S.E.
-------------+----------------------------------------------------------------
age | .0359987 .036806 -.0008073 .0013177
age2 | -.000723 -.0007133 -9.68e-06 .0000184
ttl_exp | .0334668 .0290207 .0044461 .001711
tl_exp2 | .0002163 .0003049 -.0000886 .000053
tenure | .0357539 .039252 -.0034981 .0005797
tenure2 | -.0019701 -.0020035 .0000334 .0000373
not_smsa | -.0890108 -.1308263 .0418155 .0062745
south | -.0606309 -.0868927 .0262618 .0081346
------------------------------------------------------------------------------
b = consistent under Ho and Ha; obtained from xtreg
B = inconsistent under Ha, efficient under Ho; obtained from xtreg
Test: Ho: difference in coefficients not systematic
chi2(8) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 149.44
Prob>chi2 = 0.0000
Stata can also perform the Breusch and Pagan Lagrange multiplier (LM) test
for random effects and can calculate various predictions, including the
random effect, based on the estimates.
Equally as important as its ability to fit statistical models with
cross-sectional time-series data is Stata's ability to provide
meaningful summary statistics.
xtsum reports means and standard deviations in a meaningful way:
. xtsum hours
Variable | Mean Std. Dev. Min Max | Observations
-----------------+--------------------------------------------+----------------
hours overall | 36.55956 9.869623 1 168 | N = 28467
between | 7.846585 1 83.5 | n = 4710
within | 7.520712 -2.154726 130.0596 | T-bar = 6.04395
The negative minimum for hours within is not a mistake; the within shows the
variation of hours within person around the global mean 36.55956.
xttab does the same for one-way tabulations:
. xttab msp
Overall Between Within
msp | Freq. Percent Freq. Percent Percent
----------+-----------------------------------------------------
0 | 11324 39.71 3113 66.08 55.06
1 | 17194 60.29 3643 77.33 71.90
----------+-----------------------------------------------------
Total | 28518 100.00 6756 143.41 64.14
(n = 4711)
msp is a variable that takes on the value 1 if the surveyed woman is
married and the spouse is present in the household. Overall, some 60% of
our person-year observations are msp. Taking women individually, 77% of the
women are at some point msp, and 66% are not; thus some women are msp one
year and not others. Taking women one at a time, if a woman is ever msp, 72%
of her observations are msp observations. If a woman is ever not msp, 55% of
her observations are not msp. (If marital status never varied in our data,
the within percentages would all be 100.)
xttrans reports the transition matrix:
. xttrans msp
1 if| 1 if married, spouse present
married,|
spouse|
present| 0 1 | Total
-----------+----------------------+----------
0 | 80.49 19.51 | 100.00
1 | 7.96 92.04 | 100.00
-----------+----------------------+----------
Total| 37.11 62.89 | 100.00
See
New in Stata 11
for more about what was added in Stata Release 11.
|