|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: pweight or fweight?
On Sep 8, 2008, at 5:16 AM, Andrea Bennett wrote:
Dear all,
I'm getting a little confused with the weight options. I've
constructed the following weight: A / B=weight, with A==share in
true population and B==share in sample. These weights refer to
U.S. states while the observations are single individuals.
Which weight option should I use in the regress command? I think
it should be the -pweight- but since I've never worked with
weights before, I thought I ask!
I should give Andrea more detail.
According to Stata's help:
1. fweights, or frequency weights, are weights that indicate
the number of duplicated observations.
2. pweights, or sampling weights, are weights that denote the
inverse of the probability that the observation is included
because of the sampling design
Now, Andrea's weights are certainly not frequency weights. Are they
pweights? They do not meet the technical definition, but they can
function as pweights:
Take the following example:
Suppose Andrea has a sample of 1000 individuals, 5 from Alabama and
8 from California The "shares" for Albama and California are B= 5%
and 8%, respectively.
The US Population is approximately 300,000,000 people; Alabama has
about 4.6 x 10^6 and California has about 36.6 x 10^6, for
percentages of about A= 1.53% and 12.2%.
Andrea's weights for Alabama and California are wt_1 = A/B = 0.306
Alabama. wt_1= 1.525 California.
Andrea says nothing about how the sample was drawn. But in an
informal sense, the five sample Alabamans represent 4.6 x 10^6 true
Alabamans. Therefore each represents wt_2 = 920,000 Alabamans.
Similarly, each of the eight Californians represents wt_2 = 4.58 X
10^6 Californians. These look like probability weights, but are not:
for pweights refer to sampling probabilities and Andrea says nothing
about sampling. However the essence of a "weight" is the number of
population members represented by each sampled unit; in this sense,
these are post-stratification weights.
Andrea's weights look nothing like these. But, consider the ratios
of the weights two kinds of weights:
Alabama: wt_1/wt_2 = 1/(3,000,000) California: wt_2/wt_1 = 1/
(3,000,000)
In Stata's survey commands, only estimation of population totals
require absolute weights--the absolute number of people represented
by each sample member. Estimation of means and regression
coefficients require only weights that are proportional to the
absolute weights. Andrea's weights are proportional, as the example
shows. In the general case, the constant of proportionality is n/N,
where n is the sample size and N is the population size.
So, Andrea can use the pweight specification.
However I have doubts about the utility of this whole effort. Perhaps
Andrea will tell us more about the sample and the study and about the
reasons for choosing these particular weights. Why weight for state
population, for example, but not for age, gender, or other
characteristics.
-Steve
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/