[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: sampling problem

From	"Nick Cox" <[email protected]>
To	<[email protected]>
Subject	st: RE: sampling problem
Date	Wed, 13 Jun 2007 11:50:03 +0100

Focusing on this (typos corrected) 

I want to draw individuals from 2007 according to the distribution 
of health in 1985 so I draw individuals 
with health=1 with prob=0.4, 
health=2 with prob=0, 
health=4 with prob=0.1 
and health=5 with prob=0.5 
(where the probabilities come from the health1985 distribution).

you can work out from your desired sample size the subsample 
sizes you desire. Suppose you want a sample of 1000

use mydata
bsample 400 if health == 1 
save cfsample 

use mydata, clear  
bsample 100 if health == 4 
append using cfsample 

use mydata, clear  
bsample 500 if health == 5 
append using cfsample 

I would be happy to learn of a smarter solution. Naturally
you need do nothing about outcomes not to be included 
in your sample. I can't comment on the status of samples
like this. Bootstrap experts may be able to help further. 

Nick 
[email protected] 

join allfish (a.k.a. John) 
 
> I want to sample data on the basis of counterfactuals - so 
> what would the 
> distribution of income in 2007 look like if individuals had 
> the distribution 
> of health of 1985.
> 
> So imagine I have the following data
> 
> id           income2007          health2007             
> health1985          
> wgt1985
> 1                 10                      1                   
>            1   
>                  65.38
> 2                 10                      1                   
>            1   
>                 153.91
> 3                 20                      1                   
>            1   
>                 458.34
> 4                 20                      1                   
>            1   
>                 484.2
> 5                 40                      2                   
>            1   
>                 906.1
> 6                 40                      2                   
>            4   
>                 943.96
> 7                 60                      4                   
>            5   
>               1176.87
> 8                 60                      4                   
>            5   
>               1389.91
> 9                100                     5                    
>           5    
>              1716.93
> 10              100                     5                     
>          5     
>             4067.68
> 
> where weight is the sampling weights for the 1985 data (I 
> also have sampling 
> weights for the 2007 data). The order of the 1985 data makes 
> no difference 
> to the 2007 data it is just pasted in to obtain the health 
> distribution.
> What I want to do is sample from the 2007 data to make the 
> distribution of 
> health in 2007 look like that in 1985. So I want to draw 
> individuals from 
> 2007 according to the distribution of health in 1985 so I 
> draw individuals 
> with health=1 with prob=0.4, health=2 with prob=0, health=4 
> with prob=0.1 
> and health=5 with prob=5 (where the probabilities comes from 
> the health1985 
> distribution). This should give me a hypothetical 
> distribution of income in 
> 2007 if the distribution of health was as in 1985.
> I cannot see how to do this with the bsample command. Further 
> I am not sure 
> then how to incorporate the sampling weights to ensure that 
> my samples 
> correctly represent the population distributions.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- RE: st: RE: sampling problem
  - From: "join allfish" <[email protected]>

References:
- st: sampling problem
  - From: "join allfish" <[email protected]>

Prev by Date: st: Re: tosql datatypes
Next by Date: st: -pairplot- revised on SSC
Previous by thread: st: sampling problem
Next by thread: RE: st: RE: sampling problem
Index(es):
- Date
- Thread