---
I meant: "A probability weight is the number of people represented by
a sample member."
On Sat, May 9, 2009 at 8:20 PM, <[email protected]> wrote:
> Jean-Gael:
>
>
> A probability weight is the number of people represented by those in a
> sample member. Your weights look nothing like numbers of people. In
> your first sample, the HH probability weights (before non-response
> adjustments) should be 10.0, because you took a 10% sample of HH. If
> you interviewed every adult in the HH, they retain the HH weight. If
> you interviewed 1/K in a household, the person weight is the HH weight
> x K.
>
> It's not clear whether your frame of tourist workers (sample 2) was
> of HH or people. If people, then you should be interviewing only
> people who work in tourism, not their HH members--as HH members would
> not have been in the frame. Since I don't know your sampling scheme,
> I don't know how to compute the sampling weight.
>
> When you have 2 samples, as you did here, treat each one as coming
> from a different stratum. Transfer the people in sample who work in
> tourism to the 2nd stratum, and retain their original sampling weight.
>
> If villages are strata, then you have 2x10 = 20 sampling strata.
> However it sounds like 10 villages are themselves a convenience
> sample. If so, then keep the two samples as strata. Your PSU should
> probably be HH. However if you interviewed only one person per HH,
> then PSU can be person.
>
> After computing the sampling weights, you can, as Michael states, use
> the -poststratify- option in Stata to reproduce the tourism counts.
> Your post-stratification totals (tourism workers, non-tourism workers,
> should add to the estimated population totals in the 10 villages;
> 0.84% should be tourism workers, and 98.26% should be non-tourism
> workers. If you want separate estimates of impact in each village,
> then you can use the the villages to also define your post-strata: 10
> villages x 2 tourist-worker-status strata.
>
> Finally, unless one goal is to compare tourism and non-tourism
> workers, it was not necessary to enhance your sample with tourism
> workers. Tourism workers are obviously greatly affected by tourism,
> compared to non-tourism workers. However, they constitute only 0.84%
> of the population, so contribute minimally to the overall effects of
> tourism on the population.
>
> if you need further assistance, the University of Florida has a number
> of faculty with experience in survey sampling.
>
> -Steve
>
>
>
> On Sat, May 9, 2009 at 5:13 PM, Jean-Gael Collomb <[email protected]> wrote:
>> Hello all,
>>
>> I have a question about using post stratification weights and using Stata's
>> survey commands. After setting the weights, I do not get the proportions I
>> expected.
>>
>> My overall research question is to see if tourism (TOURIND) influences
>> quality of life in several communities in a rural province of Namibia. My
>> aim was to conduct individual interviews in a sample of 10% of all
>> households in each community. I obtained household census counts from key
>> informants within the community and my own double checks during field work.
>> This random sample yielded a random sample of 395 interviews, of which only
>> 9 (2.3%) were conducted with individuals working in tourism. Given this very
>> low number of respondents who worked in tourism and my interest in trying to
>> understand the impact of tourism, I established a sampling frame restricted
>> to individuals working in tourism and interviewed 72 individuals. [Two of
>> those interviews were conducted with individuals not employed in tourism but
>> living in a household where someone was]. In total, I thus interviewed 467
>> people, among which 79 worked in tourism. My full sample oversampled tourism
>> employees and i think it would be wrong to derive from it that 17%
>> (79/467*100) of the population works in tourism. I think Post stratification
>> weights should be assigned to my data set to correct for the oversampling.
>> In fact, the percentage of the population working in tourism varies by
>> communities and thus different weights should be calculated for different
>> communities. I used existing reports documenting total numbers of community
>> residents employed by local tourism operators and total population size as a
>> basis to calculate the "true" distribution of tourism employees (weight2).
>> The weights were calculated by dividing the “true” percentage by the
>> “oversampled” percentage.
>>
>> The problem is that when I apply the weights in Stata, I do not get the
>> proportion I expected. Specifically, I expected that after svyset _n
>> [pweight = samplewt2] and svy: tab tourind, I would find that 0.84% of the
>> population could be labeled TOURIND, but Stata returns a value of 3.25% (and
>> similar discrepancies for each community).
>>
>> I am not sure I am doing something wrong in calculating the weights,
>> assigning the weights to my dataset, or entering the tab commands in svy
>> mode. I’d greatly appreciate your help in helping move past this and take
>> advantage of survey commands in Stata.
>>
>> Thank you very much if you have time to give me some feedback or point me
>> towards the best information source (textbook?).
>>
>> Cheers,
>>
>> Jean-Gael Collomb, [email protected]
>>
>> (PS. I run Stata 10 in Mac OSX)
>>
>>
>>
>> State code entered:
>>
>> *ASSIGNING POST STRATIFICATION WEIGHTS
>>
>> *-------------------------------------
>>
>> gen samplewt2=0
>>
>> label var samplewt2 "Post Stratification sample weight 2"
>>
>> replace samplewt2=0.99975204562360500 if conservancy==1 & sample==1
>>
>> replace samplewt2=0.04357333333333330 if conservancy==2 & sample==2
>>
>> replace samplewt2=1.39197814207650000 if conservancy==2 & sample==1
>>
>> replace samplewt2=0.10144078144078100 if conservancy==3 & sample==2
>>
>> replace samplewt2=1.18320139407518000 if conservancy==3 & sample==1
>>
>> replace samplewt2=0.05683908045977010 if conservancy==4 & sample==2
>>
>> replace samplewt2=1.47985380116959000 if conservancy==4 & sample==1
>>
>> replace samplewt2=0.01906976744186050 if conservancy==5 & sample==2
>>
>> replace samplewt2=1.05030411449016000 if conservancy==5 & sample==1
>>
>> tab tourind
>>
>> bysort conservancy: tab tourind
>>
>> *applying weight2 (those derived from IRDNC data)
>>
>> svyset _n [pweight = samplewt2]
>>
>> svy: tab tourind, percent
>>
>>
>>
>> Jean-Gael "JG" Collomb
>>
>> PhD candidate
>>
>> School of Natural Resources and Environment / School of Forest Resources and
>> Conservation
>>
>> University of Florida
>>
>> [email protected]
>>
>> [email protected]
>>
>> +1 (352) 870 6696
>>
>>
>>
>>
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>>
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/