[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: sampling problem

From	Steven Samuels <[email protected]>
To	[email protected]
Subject	Re: st: RE: sampling problem
Date	Wed, 13 Jun 2007 10:21:58 -0400

I agree with Ben. Raking is a standard method to create sample weights so that the weighted sample agrees with an external population on the distribution of several categorical variables simultaneously. In a recent, program evaluation, I raked the weights of the intervention area so that it matched the age-gender and racial distributions of the control area. In other words, I created a counterfactual distribution.

Raking is set in a larger context by RJ Little and M-M Wu. (1991) Models for Contingency Tables with Known Margins When Target and Sampled Populations Differ. J. Amer. Statist. Assoc. 86:413, pp 87-95.).

A good reference for practice is: Michael P. Battaglia, David Izrael, David C. Hoaglin, and Martin R. Frankel (2004). "Practical Considerations in Raking Survey Data", available from: http:// www.abtassociates.com/Page.cfm?PageID=40005&PBL=1

In Stata, use -survwgt- by Nick Winter, available from ssc.

Steve

On Jun 13, 2007, at 8:08 AM, Ben Jann wrote:

Do you really need sampling for this? My suggestion would be to work
with weights. Maybe have a look at:

DiNardo, John E., Nicole Fortin, and Thomas Lemieux (1996). Labour
Market Institutions and the Distribution of Wages, 1973-1992: A
Semiparametric Approach. Econometrica 64(5): 1001-1046.

ben

On 6/13/07, join allfish <[email protected]> wrote:
Dear Nick,
Thanks for this suggestion - I did think of doing this. The problem is I
have other variables, which are far more complicated and have many more
values, which I want to use for the counterfactuals as well. I was hoping
that there may be a program which could help - or at least some short cut I
could use.
Thanks,
John

>From: "Nick Cox" <[email protected]>
>Reply-To: [email protected]
>To: <[email protected]>
>Subject: st: RE: sampling problem
>Date: Wed, 13 Jun 2007 11:50:03 +0100
>
>Focusing on this (typos corrected)
>
>I want to draw individuals from 2007 according to the distribution
>of health in 1985 so I draw individuals
>with health=1 with prob=0.4,
>health=2 with prob=0,
>health=4 with prob=0.1
>and health=5 with prob=0.5
>(where the probabilities come from the health1985 distribution).
>
>you can work out from your desired sample size the subsample
>sizes you desire. Suppose you want a sample of 1000
>
>use mydata
>bsample 400 if health == 1
>save cfsample
>
>use mydata, clear
>bsample 100 if health == 4
>append using cfsample
>
>use mydata, clear
>bsample 500 if health == 5
>append using cfsample
>
>I would be happy to learn of a smarter solution. Naturally
>you need do nothing about outcomes not to be included
>in your sample. I can't comment on the status of samples
>like this. Bootstrap experts may be able to help further.
>
>Nick
>[email protected]
>
>join allfish (a.k.a. John)
>
> > I want to sample data on the basis of counterfactuals - so
> > what would the
> > distribution of income in 2007 look like if individuals had
> > the distribution
> > of health of 1985.
> >
> > So imagine I have the following data
> >
> > id income2007 health2007
> > health1985
> > wgt1985
> > 1 10 1
> > 1
> > 65.38
> > 2 10 1
> > 1
> > 153.91
> > 3 20 1
> > 1
> > 458.34
> > 4 20 1
> > 1
> > 484.2
> > 5 40 2
> > 1
> > 906.1
> > 6 40 2
> > 4
> > 943.96
> > 7 60 4
> > 5
> > 1176.87
> > 8 60 4
> > 5
> > 1389.91
> > 9 100 5
> > 5
> > 1716.93
> > 10 100 5
> > 5
> > 4067.68
> >
> > where weight is the sampling weights for the 1985 data (I
> > also have sampling
> > weights for the 2007 data). The order of the 1985 data makes
> > no difference
> > to the 2007 data it is just pasted in to obtain the health
> > distribution.
> > What I want to do is sample from the 2007 data to make the
> > distribution of
> > health in 2007 look like that in 1985. So I want to draw
> > individuals from
> > 2007 according to the distribution of health in 1985 so I
> > draw individuals
> > with health=1 with prob=0.4, health=2 with prob=0, health=4
> > with prob=0.1
> > and health=5 with prob=5 (where the probabilities comes from
> > the health1985
> > distribution). This should give me a hypothetical
> > distribution of income in
> > 2007 if the distribution of health was as in 1985.
> > I cannot see how to do this with the bsample command. Further
> > I am not sure
> > then how to incorporate the sampling weights to ensure that
> > my samples
> > correctly represent the population distributions.
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: RE: sampling problem
  - From: "Nick Cox" <[email protected]>
- RE: st: RE: sampling problem
  - From: "join allfish" <[email protected]>
- Re: st: RE: sampling problem
  - From: "Ben Jann" <[email protected]>

Prev by Date: st: updated version of -xml_tab- is available from the SSC
Next by Date: Re: st: RE: sampling problem
Previous by thread: Re: st: RE: sampling problem
Next by thread: RE: st: RE: sampling problem
Index(es):
- Date
- Thread