Jean-Gael:
A probability weight is the number of people represented by those
in a
sample member. Your weights look nothing like numbers of people.
In
your first sample, the HH probability weights (before non-response
adjustments) should be 10.0, because you took a 10% sample of HH.
If
you interviewed every adult in the HH, they retain the HH weight.
If
you interviewed 1/K in a household, the person weight is the HH
weight
x K.
It's not clear whether your frame of tourist workers (sample 2) was
of HH or people. If people, then you should be interviewing only
people who work in tourism, not their HH members--as HH members
would
not have been in the frame. Since I don't know your sampling
scheme,
I don't know how to compute the sampling weight.
When you have 2 samples, as you did here, treat each one as coming
from a different stratum. Transfer the people in sample who work in
tourism to the 2nd stratum, and retain their original sampling
weight.
If villages are strata, then you have 2x10 = 20 sampling strata.
However it sounds like 10 villages are themselves a convenience
sample. If so, then keep the two samples as strata. Your PSU
should
probably be HH. However if you interviewed only one person per HH,
then PSU can be person.
After computing the sampling weights, you can, as Michael states,
use
the -poststratify- option in Stata to reproduce the tourism counts.
Your post-stratification totals (tourism workers, non-tourism
workers,
should add to the estimated population totals in the 10 villages;
0.84% should be tourism workers, and 98.26% should be non-tourism
workers. If you want separate estimates of impact in each village,
then you can use the the villages to also define your post-strata:
10
villages x 2 tourist-worker-status strata.
Finally, unless one goal is to compare tourism and non-tourism
workers, it was not necessary to enhance your sample with tourism
workers. Tourism workers are obviously greatly affected by
tourism,
compared to non-tourism workers. However, they constitute only
0.84%
of the population, so contribute minimally to the overall effects
of
tourism on the population.
if you need further assistance, the University of Florida has a
number
of faculty with experience in survey sampling.
-Steve
On Sat, May 9, 2009 at 5:13 PM, Jean-Gael Collomb <[email protected]>
wrote:
Hello all,
I have a question about using post stratification weights and
using Stata's
survey commands. After setting the weights, I do not get the
proportions I
expected.
My overall research question is to see if tourism (TOURIND)
influences
quality of life in several communities in a rural province of
Namibia. My
aim was to conduct individual interviews in a sample of 10% of all
households in each community. I obtained household census counts
from key
informants within the community and my own double checks during
field work.
This random sample yielded a random sample of 395 interviews, of
which only
9 (2.3%) were conducted with individuals working in tourism.
Given this very
low number of respondents who worked in tourism and my interest
in trying to
understand the impact of tourism, I established a sampling frame
restricted
to individuals working in tourism and interviewed 72 individuals.
[Two of
those interviews were conducted with individuals not employed in
tourism but
living in a household where someone was]. In total, I thus
interviewed 467
people, among which 79 worked in tourism. My full sample
oversampled tourism
employees and i think it would be wrong to derive from it that 17%
(79/467*100) of the population works in tourism. I think Post
stratification
weights should be assigned to my data set to correct for the
oversampling.
In fact, the percentage of the population working in tourism
varies by
communities and thus different weights should be calculated for
different
communities. I used existing reports documenting total numbers of
community
residents employed by local tourism operators and total
population size as a
basis to calculate the "true" distribution of tourism employees
(weight2).
The weights were calculated by dividing the “true” percentage by
the
“oversampled” percentage.
The problem is that when I apply the weights in Stata, I do not
get the
proportion I expected. Specifically, I expected that after svyset
_n
[pweight = samplewt2] and svy: tab tourind, I would find that
0.84% of the
population could be labeled TOURIND, but Stata returns a value of
3.25% (and
similar discrepancies for each community).
I am not sure I am doing something wrong in calculating the
weights,
assigning the weights to my dataset, or entering the tab commands
in svy
mode. I’d greatly appreciate your help in helping move past this
and take
advantage of survey commands in Stata.
Thank you very much if you have time to give me some feedback or
point me
towards the best information source (textbook?).
Cheers,
Jean-Gael Collomb, [email protected]
(PS. I run Stata 10 in Mac OSX)
State code entered:
*ASSIGNING POST STRATIFICATION WEIGHTS
*-------------------------------------
gen samplewt2=0
label var samplewt2 "Post Stratification sample weight 2"
replace samplewt2=0.99975204562360500 if conservancy==1 & sample==1
replace samplewt2=0.04357333333333330 if conservancy==2 & sample==2
replace samplewt2=1.39197814207650000 if conservancy==2 & sample==1
replace samplewt2=0.10144078144078100 if conservancy==3 & sample==2
replace samplewt2=1.18320139407518000 if conservancy==3 & sample==1
replace samplewt2=0.05683908045977010 if conservancy==4 & sample==2
replace samplewt2=1.47985380116959000 if conservancy==4 & sample==1
replace samplewt2=0.01906976744186050 if conservancy==5 & sample==2
replace samplewt2=1.05030411449016000 if conservancy==5 & sample==1
tab tourind
bysort conservancy: tab tourind
*applying weight2 (those derived from IRDNC data)
svyset _n [pweight = samplewt2]
svy: tab tourind, percent
Jean-Gael "JG" Collomb
PhD candidate
School of Natural Resources and Environment / School of Forest
Resources and
Conservation
University of Florida
[email protected]
[email protected]
+1 (352) 870 6696
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/