Do you really need sampling for this? My suggestion would be to work
with weights. Maybe have a look at:
DiNardo, John E., Nicole Fortin, and Thomas Lemieux (1996). Labour
Market Institutions and the Distribution of Wages, 1973-1992: A
Semiparametric Approach. Econometrica 64(5): 1001-1046.
ben
On 6/13/07, join allfish <[email protected]> wrote:
Dear Nick,
Thanks for this suggestion - I did think of doing this. The problem is I
have other variables, which are far more complicated and have many more
values, which I want to use for the counterfactuals as well. I was hoping
that there may be a program which could help - or at least some short cut I
could use.
Thanks,
John
>From: "Nick Cox" <[email protected]>
>Reply-To: [email protected]
>To: <[email protected]>
>Subject: st: RE: sampling problem
>Date: Wed, 13 Jun 2007 11:50:03 +0100
>
>Focusing on this (typos corrected)
>
>I want to draw individuals from 2007 according to the distribution
>of health in 1985 so I draw individuals
>with health=1 with prob=0.4,
>health=2 with prob=0,
>health=4 with prob=0.1
>and health=5 with prob=0.5
>(where the probabilities come from the health1985 distribution).
>
>you can work out from your desired sample size the subsample
>sizes you desire. Suppose you want a sample of 1000
>
>use mydata
>bsample 400 if health == 1
>save cfsample
>
>use mydata, clear
>bsample 100 if health == 4
>append using cfsample
>
>use mydata, clear
>bsample 500 if health == 5
>append using cfsample
>
>I would be happy to learn of a smarter solution. Naturally
>you need do nothing about outcomes not to be included
>in your sample. I can't comment on the status of samples
>like this. Bootstrap experts may be able to help further.
>
>Nick
>[email protected]
>
>join allfish (a.k.a. John)
>
> > I want to sample data on the basis of counterfactuals - so
> > what would the
> > distribution of income in 2007 look like if individuals had
> > the distribution
> > of health of 1985.
> >
> > So imagine I have the following data
> >
> > id income2007 health2007
> > health1985
> > wgt1985
> > 1 10 1
> > 1
> > 65.38
> > 2 10 1
> > 1
> > 153.91
> > 3 20 1
> > 1
> > 458.34
> > 4 20 1
> > 1
> > 484.2
> > 5 40 2
> > 1
> > 906.1
> > 6 40 2
> > 4
> > 943.96
> > 7 60 4
> > 5
> > 1176.87
> > 8 60 4
> > 5
> > 1389.91
> > 9 100 5
> > 5
> > 1716.93
> > 10 100 5
> > 5
> > 4067.68
> >
> > where weight is the sampling weights for the 1985 data (I
> > also have sampling
> > weights for the 2007 data). The order of the 1985 data makes
> > no difference
> > to the 2007 data it is just pasted in to obtain the health
> > distribution.
> > What I want to do is sample from the 2007 data to make the
> > distribution of
> > health in 2007 look like that in 1985. So I want to draw
> > individuals from
> > 2007 according to the distribution of health in 1985 so I
> > draw individuals
> > with health=1 with prob=0.4, health=2 with prob=0, health=4
> > with prob=0.1
> > and health=5 with prob=5 (where the probabilities comes from
> > the health1985
> > distribution). This should give me a hypothetical
> > distribution of income in
> > 2007 if the distribution of health was as in 1985.
> > I cannot see how to do this with the bsample command. Further
> > I am not sure
> > then how to incorporate the sampling weights to ensure that
> > my samples
> > correctly represent the population distributions.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/