Focusing on this (typos corrected)
I want to draw individuals from 2007 according to the distribution
of health in 1985 so I draw individuals
with health=1 with prob=0.4,
health=2 with prob=0,
health=4 with prob=0.1
and health=5 with prob=0.5
(where the probabilities come from the health1985 distribution).
you can work out from your desired sample size the subsample
sizes you desire. Suppose you want a sample of 1000
use mydata
bsample 400 if health == 1
save cfsample
use mydata, clear
bsample 100 if health == 4
append using cfsample
use mydata, clear
bsample 500 if health == 5
append using cfsample
I would be happy to learn of a smarter solution. Naturally
you need do nothing about outcomes not to be included
in your sample. I can't comment on the status of samples
like this. Bootstrap experts may be able to help further.
Nick
[email protected]
join allfish (a.k.a. John)
> I want to sample data on the basis of counterfactuals - so
> what would the
> distribution of income in 2007 look like if individuals had
> the distribution
> of health of 1985.
>
> So imagine I have the following data
>
> id income2007 health2007
> health1985
> wgt1985
> 1 10 1
> 1
> 65.38
> 2 10 1
> 1
> 153.91
> 3 20 1
> 1
> 458.34
> 4 20 1
> 1
> 484.2
> 5 40 2
> 1
> 906.1
> 6 40 2
> 4
> 943.96
> 7 60 4
> 5
> 1176.87
> 8 60 4
> 5
> 1389.91
> 9 100 5
> 5
> 1716.93
> 10 100 5
> 5
> 4067.68
>
> where weight is the sampling weights for the 1985 data (I
> also have sampling
> weights for the 2007 data). The order of the 1985 data makes
> no difference
> to the 2007 data it is just pasted in to obtain the health
> distribution.
> What I want to do is sample from the 2007 data to make the
> distribution of
> health in 2007 look like that in 1985. So I want to draw
> individuals from
> 2007 according to the distribution of health in 1985 so I
> draw individuals
> with health=1 with prob=0.4, health=2 with prob=0, health=4
> with prob=0.1
> and health=5 with prob=5 (where the probabilities comes from
> the health1985
> distribution). This should give me a hypothetical
> distribution of income in
> 2007 if the distribution of health was as in 1985.
> I cannot see how to do this with the bsample command. Further
> I am not sure
> then how to incorporate the sampling weights to ensure that
> my samples
> correctly represent the population distributions.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/