Dear Statalisters:
I have 3 waves of panel data with an increasing sample size. For each
wave I have sampling weights, and identifiers for strata and clusters
(I describe the data in more detail below). I'd like to use this
dataset in two ways:
1) pool the three waves (e.g., to estimate an ordered logit model);
2) use the panel structure (e.g., to estimate a standard linear
fixed-effects model).
I'm having some trouble deciding how to account for the survey design
in the best possible way. I've read the FAQs and previous threads on
this topic but I still can't make up my mind.
For the pooled data I'd like to use the svy commands (e.g., svy
ologit) but I'm not sure how to construct the weights, and if that
would have any effect on the identifiers for clusters and strata (the
strata and cluster identifiers are such that individuals observed more
than once belong to the same stratum and cluster). In the archives I
found the suggestion to "weight the weights"
(http://www.stata.com/statalist/archive/2004-12/msg00655.html) but I'm
not sure whether this is ok when the same individual is observed over
time.
I have a similar problem for the panel models: I don't know how to
construct proper weights that remain constant within panel. In
addition, since xtreg does not work with the svy prefix, I can use the
cluster option but I won't be able to account for the effects of
stratification (is this correct, or I am missing something that would
allow me to do it? Maybe I could include dummies for the strata?).
I'm using Stata/SE 10.1 on Windows XP.
I'd really appreciate any help on these issues.
Best,
Francisco.
Data description:
The first wave has a sample size of 1100, 184 clusters, 8 strata, and
represents a population of almost 8 million people.
The second wave has a sample size of 1500, 250 clusters, 10 strata,
and represents a population of almost 11 million people.
The third wave has a sample size of 2500, 420 clusters, 10 strata, and
represents a population of almost 12 million people.
Strata are defined by two criteria: region and socioeconomic category.
For wave 1, there are two regions and four socioeconomic categories,
which results in the 8 strata mentioned above. For waves 2 and 3,
there are 2 regions but 5 socioeconomic categories (the original 4
plus a new one not included in wave 1), resulting in 10 strata.
The increase in the sample size from wave 1 to wave 2 has two sources:
300 observations come from the new strata (150 obs. from each new
stratum); the other 100 correspond to an increase in the sample size
of two old strata, 50 from each of them (same socioeconomic category,
different region).
The increase in the sample size from wave 2 to wave 3 comes from an
increase in the sample size of region 2, evenly distributed across the
5 socioeconomic categories.
Some individuals are observed only once, others are observed twice,
and others are observed three times.
My understanding of the data documentation is that the weights
provided are ok for cross-section analysis using each wave separately
but there are no weights specifically constructed to use with the
panel structure of the data.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/