| |
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: Re: sampling problem
...
It isn't clear to me if you define the 1985 health distribution based on the raw
health1985 variable or if you are employing the wgt1985 to come up with a
weighted distribution. Here's an approach that may work for the simpler case,
although it can be modified to include the wgt1985 weights to define the target
population fractions:
gen wt_h85=.
count if health1985<.
local popcount85=r(N)
qui levels health2007, local(hcats)
foreach h of local hcats {
qui count if health1985==`h'
replace wt_h85=r(N)/(`popcount85') if health2007==`h'
}
The wt_85 variable will now hold, for each value of health2007, the proportion
of matching values in health1985. These weights will not sum to one since they
are normalized to the observed health1985 proportions. You could normalize them
for the 2007 data to use them directly as sampling fractions or you could use
them to generate weighted point estimates based on the full dataset -- analogous
to survey raking or post-stratification.
Michael Blasnik
----- Original Message -----
From: "join allfish" <[email protected]>
To: <[email protected]>
Sent: Wednesday, June 13, 2007 6:17 AM
Subject: st: sampling problem
I want to sample data on the basis of counterfactuals - so what would the
distribution of income in 2007 look like if individuals had the distribution of
health of 1985.
So imagine I have the following data
id income2007 health2007 health1985
wgt1985
1 10 1 1
65.38
2 10 1 1
153.91
3 20 1 1
458.34
4 20 1 1
484.2
5 40 2 1
906.1
6 40 2 4
943.96
7 60 4 5
1176.87
8 60 4 5
1389.91
9 100 5 5
1716.93
10 100 5 5
4067.68
where weight is the sampling weights for the 1985 data (I also have sampling
weights for the 2007 data). The order of the 1985 data makes no difference to
the 2007 data it is just pasted in to obtain the health distribution.
What I want to do is sample from the 2007 data to make the distribution of
health in 2007 look like that in 1985. So I want to draw individuals from 2007
according to the distribution of health in 1985 so I draw individuals with
health=1 with prob=0.4, health=2 with prob=0, health=4 with prob=0.1 and
health=5 with prob=5 (where the probabilities comes from the health1985
distribution). This should give me a hypothetical distribution of income in
2007 if the distribution of health was as in 1985.
I cannot see how to do this with the bsample command. Further I am not sure
then how to incorporate the sampling weights to ensure that my samples
correctly represent the population distributions.
Any help would be much appreciated.
Yours,
John
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/