|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: creating sample weights
On Aug 10, 2007, at 10:53 AM, Janelle Knox wrote:
I am trying to create a sample weight for a dataset, which will
correct for variations in gender, age, etc from population means.
Does anyone know how to do this, or where I can find information for
setting up a sample weight.... pweight=?
Thanks,
Jane
Jane, you cannot do it to match population means. However you can do
it to match population percentages in different categories. This
technique for this is known as "raking". In Stata this is available
in Nick Winter's program -survwgt rake-. Type "ssc install
survwgt". By the way, "pweight" is a stata reserved word.
A good reference for practice is: http://www.abtassociates.com/
presentations/raking_survey_data_2_JOS.pdf
Warning: If you are not experienced with weighting, you can run into
many problems. Raking will not fix, and might even worsen, certain
kinds of sample deficiencies. If you have followed recent
discussions on Statalist, you will be aware that not everyone
recommends weighting before doing regressions.
You don't say if there is an existing "design weight". If so, I
assume that it's name is "old_wt". Otherwise, define "old_wt=1"
before running the survwgt program.
1. Create grouped versions of the variables you wish to match in your
original data set.
2. Now create separate data sets for each characteristic that you
wish to match, these will contain the adjusted totals for each
characteristic Suppose your sample size is n=1,252. I will
arbitrarily add or subtract 1 from the category numbered 1 for each
characteristic to make sure your adjusted sample totals add up to the
actual total. Below is an example for creating a data set
"agedat.dta" which contains the age group totals.
3. Merge these into your original data. .
4. The rake instructions are then (for example):
survwgt rake old_wt, by(race gender age_gp) totvars(race_tot
gender_tot age_tot) generate(new_wt)
5. "new_wt" is your new weight variable. It will probably contain
fractions, but these will not affect the regressions.
-Steve
/*CREATE AGE DATA SET WITH ADJUSTED TOTALS SO THAT SAMPLE & POP
PERCENTS MATCH */
local ssize=1252
clear
/* Gender Data Set: 1 10% 2 20% 3 50% 4 20% */
input age_gp pop_pct
1 .1
2 .2
3 .5
4 .2
end
gen age_tot=`ssize'*pop_pct
list
table age_gp , c(sum age_tot) row
sort age_tot
save age_dat, replace
/*****************CODE ENDS ***************************/
Steven Joel Hirsch Samuels
[email protected]
18 Cantine's Island
Saugerties, NY 12477
Phone: 845-246-0774
EFax: 208-498-7441
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/