Hello, Statalist!
This is a modified (simplified, I think) version of an unanswered question.
In brief, are -xt- commands appropriate for repeated cross-sectional
data in which different units are observed over time? For the data
below, I considered the following:
/* First Error - repeated time valus within panel */
. xtset state year
repeated time values within panel
/* Attempt to correct first error, followed by second error - weight
must be constant within panel variable */
. by state year: generate type = _n
. egen newid = group(state type)
. xtset newid year
. xtreg y x1 x2 [fw=n], fe
weight must be constant within newid
/* Attempt to correct second error, followed by third "error" */
. egen newerid = group(state year type)
. xtset newerid year
. xtreg y x1 x2 [fw=n], fe
The third "error" is that I get something like the following for output:
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------------
x1 | (dropped)
x2 | (dropped)
_cons | 2.979312 2.12e-19 1.4e+19 0.000 2.979312 2.979312
This last "error" seems to be simply a consequence of having no
"cross-sectional" units being "sampled" more than once.
Using -expand- or -expandcl- results in a dataset that is too large
for my memory constraints, even with up to 100g.
What is the appropriate way of analyzing repeated cross-sectional data
in which different units are observed in each period (in general) and,
in particular, grouped cross-sectional data of the variety that I have
(where the same units are probably observed, but they are not uniquely
identified and cannot be followed in the sense of panel data with a
unique identifier)?
Ultimately, the problem seems to be that newerid_t != newerid_t+k for
all (any) k.
My data look like the following:
State Year Population y n x1 x2
1 1990 25261069 2.57 1070121 -1.33 11.4
1 1990 25261069 1.19 1810912 -0.57 9.98
1 1990 25261069 1.8 4748773 0.16 8.44
1 1990 25261069 4.07 3289300 -0.08 7.66
1 1990 25261069 5.53 4125362 1.85 7.84
1 1990 25261069 4.03 10216601 -0.46 6.26
… … … … … … …
50 1990 11092381 4.74 332842 -1.41 13.43
50 1990 11092381 2.9 1233123 0.96 12.2
50 1990 11092381 4.56 1922374 1.75 13.41
50 1990 11092381 5.17 1218358 -0.26 9.6
50 1990 11092381 2.18 423648 -2.09 10.48
50 1990 11092381 2.97 5962036 -0.51 6.52
… … … … … … …
1 2000 27787176 3.56 1769078 0.4 9.84
1 2000 27787176 2.04 2083925 0.32 9.93
1 2000 27787176 4.01 3338879 -0.1 8.4
1 2000 27787176 2.83 5401349 -1.28 11.65
1 2000 27787176 6.81 3204418 1.04 9.27
1 2000 27787176 2.33 11989527 0.15 10.4
… … … … … … …
50 2000 12201619 6.52 701923 0.39 12.31
50 2000 12201619 5.02 2224842 -1.62 7.55
50 2000 12201619 4.6 713768 0.02 11.61
50 2000 12201619 2.75 1172416 -0.43 12.94
50 2000 12201619 6.95 858296 1 10.48
50 2000 12201619 4.27 6530374 -2.14 11.58
Thank you for your time and attention.
Two thoughts that came to mind (without any basis) are (1) using
seemingly unrelated regressions (even though the dependent variable is
the same, but the year of observation would be different) or (2) using
a meta-analysis.
Misha
Using Stata 10.1 but with access to Stata 11
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/