|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Which weight
On Oct 8, 2007, at 11:44 AM, Sergiy Radyakin wrote:
Dear Steven,
your answer to Nikolaos' question sounds perfectly reasonable. But is
there also a straightforward answer in the following situtations?
Situation A: N households were chosen at time t0 and followed as a
panel. M households were added to the panel at t1 (t0<t1). At any
point t I have crossectional weights and a probability to stay in the
sample for each household. At any moment t an event of interest can
happen in the houshold, (e.g. a child leaves the household). In this
case I add the characteristics of the household at time t [and may be
t-1] to "my" sample). Which weights should I use? I obviously can't
choose weights at t0 because not all households were in the sample at
t0 and I can't use the weights at t1 because some of the households,
for which the event of interest has occured before t1, have dropped
out before t1 due to panel mortality.
Situation B: N households are chosen at time t0 and M households are
chosen at t1. pweights are given. However, during the time between t0
and t1, the population (e.g. the population of a country), from which
the samples were drawn has changed (e.g. doubled in size). I am
working with a pooled sample of households (N+M). Which weights can I
use?
If I am working with one subpopulation only (e.g. men) and the
proportion of these cases has changed, can I still pool observations?
(E.g. women/men=50/50 in t0 but women/men=60/40 in t1). If yes, what
interpretation do I give to the estimates then? [this is not a panel
case]
Is there any good online guide on longitudinal weights? Preferrably
with plain examples on how to deal with different situations as
outlined above?
Sergiy, There is no easy answer to your questions. I do not know a
good text for panel study weighting. I learned much of what I know by
studying the documentation for some of the large panel studies.
Situation A
Recall that a sample weight is, roughly, the number of population
members 'represented' by the observation; the sum of sample weights
should equal the total number of population members.
If you need data only at calendar time 't', you would the cross-
sectional weight for 't'. The data set authors will have carefully
calibrated them. For example, panels may rotate every six months,
but the weights may be appropriate to the entire calendar year
population. The documentation should make this clear.
However, you want to use information from period 't-1' to predict
outcomes in 't'. So, you require observations with data both at 't-1'
and at 't'; these are a subset of those with data at ‘t’ or 't-1'.
Neither the 't' nor the 't-1' weights will add to the population
totals at those periods; so neither is a proper weight. Yet if the
sampling protocol did not drastically change over your period of
interest, then these weights should be approximately proportional to
the proper weights. I recommend that you use the 't' weights.
Situation B
An answer will depend on the purpose of your analysis. Is it
'descriptive', meaning that you are interested only in descriptive
statistics or 'analytic', meaning that the focus is on models and
hypothesis tests. The fact that you are interested in pooling
suggests the purpose is analytic.
Note that time period will be a part of the stratum identification.
Purpose Descriptive:
Use the original weights supplied with the data. The pooled sample
represents the experience of the population during the two survey
periods. Sometimes this is a legitimate target for descriptive
statistics: if the study was done in adjacent years, then the pooled
sample represents the population experience over the two-year
period. Any descriptive statistic, such as the ratio of men and
women, will be an average of the ratios from the two periods. If the
two periods are adjacent, then these kinds of averages might be
valuable.
The interpretation is the same if the weights have been post-
stratified or raked.
Purpose Analytic
If you are interested in modeling outcomes as functions of
predictors, then pooling is a way of increasing sample size. When
you present descriptive statistics, I suggest that you present
unweighted statistics. Your readers will want to see the actual
numbers, not the weighted population numbers.
In the analysis, most investigators would use the original supplied
weights. Suppose regression coefficients changed between periods
(factor x period interaction), then the estimated coefficients
ignoring the interaction will be a weighted average of the period-
specific coefficients.
Most investigators would weight the analysis of the pooled data. I
might not. Suppose that the population doubled in size between the
two time periods, but that the sample sizes were similar. In the
weighted data, observations in the second period will have twice the
weight of observations in the first period. I would consider this
undesirable and would
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/