[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SV: S: SV: st: Survey - raking - calibration - post stratification - calculating weights

From	Steven Samuels <[email protected]>
To	[email protected]
Subject	Re: SV: S: SV: st: Survey - raking - calibration - post stratification - calculating weights
Date	Mon, 8 Dec 2008 12:13:14 -0500

On Dec 8, 2008, at 2:55 AM, Kristian Wraae wrote:

Ok, thanks.

Now I understand how to do the raking procedure.

I have one question though.
Since I have a two step inclusion procedure wouldn't it be moreaccurate to
rake in two steps.

Example:
I know the distribution of medication amongst the 3745 men.
But the 3745 men differs from the 4975 men by being slightlyyounger and weknow that the older you get the more medicin do you get. That alsogoes for
physical activity and smoking.
So if I calculate the expected prevalences amongst the 4975 (inorder torake the 600) from the 3750 I risk making a mistake(underestimating theprevalences in the baclground population). I guess should becalculating the
all prevalences from the 4975, but I don't those data.

So wouldn't it be more correct to:

1. Rake the 3750 so they match the 4975 on age and geography.

2. Calculate all the expected prevalences on age, medication, smoking,
physical activity ect from the now raked 3750 (as we would expectthem to be
had we had a 100% response rate).

3. Use these prevalences to rake the 600 as you showed me?

Your concern is a good one, Kristian. However, the solution youpropose is ad-hoc with no real theoretical justification. I've triedsome complicated raking in the past, but I have never seen areference to the method you propose. You have much questionnaireinformation on too many informative variables; raking can use only asmall part of it. There is a standard approach to this problem:model the probability of participating in the phone interview. Isuggest you consult the text "Statistical Analysis with Missing Data"by Little & Rubin, especially Chapters 3 & 13. In the parlance ofthat book, you must assume that data are "Missing at Random". Thismeans that the probability of having a phone interview dependscompletely on characteristics known from the mail questionnaire orthe census.


Here are the steps:

1. Estimate weight1 = N_i/n_i  as before for the 15 age groups.

2. You can use this weight on the second phase sample of 3,750 toestimate various properties of the population known such asproportions in categories of medication, physical activity smoking.These may be of interest in themselves.

3. Instead of raking, use -logistic- or -logit- (not the surveyversions) on the 3,750 men to predict who participated in thetelephone interview. Consider as covariates: age, geography,medication, physical activity, smoking and any others that might beof use.

4. Generate the predicted probability of participating in thetelephone interview. Call this p_r. Your goal is to get a goodprediction, so compute ROC curves, if possible. (I don't recall ifStata 8 has the -lroc- command.)

5. For the 600 men in the telephone survey, compute: weight2 =(weight1) x (1/p_r).

6. Rake weight2 back to the age categories & geographic categoriesof the 5,000 men. Call the result "weight3".

7. Finally rake weight3 to the Danish Census age/geographicalbreakdowns: Call it "weight4".


7. Use this as your final analysis weight for -svymean-.

You are a long way from the simplicity of Stas's earlier suggestionto use "weight1" on your data. Standard errors that you compute willbe under-estimated, because they do not account for the uncertaintyin the estimating "weight3", and you must state this in your report.If you wish to compute the proper standard errors, you must, I think,bootstrap the process starting no later than Step 3. This is theprice for using the complex sampling design.


-Steve

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- SV: SV: S: SV: st: Survey - raking - calibration - post stratification - calculating weights
  - From: "Kristian Wraae" <[email protected]>

References:
- SV: S: SV: st: Survey - raking - calibration - post stratification - calculating weights
  - From: "Kristian Wraae" <[email protected]>

Prev by Date: Re: st: histogram, read axes
Next by Date: Re: st: histogram, read axes
Previous by thread: SV: S: SV: st: Survey - raking - calibration - post stratification - calculating weights
Next by thread: SV: SV: S: SV: st: Survey - raking - calibration - post stratification - calculating weights
Index(es):
- Date
- Thread