Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st:appropriate test
From
Rajaram Subramanian Potty <[email protected]>
To
[email protected]
Subject
Re: st:appropriate test
Date
Wed, 13 Oct 2010 10:37:58 +0530
Dear Samuels,
Thank you very much for your suggestions. I have the manuals other
than the survey, that too for lower version stata. But the help in the
svyset suggests FPC is the following:
fpc(varname) requests a finite population correction for the variance
estimates. If varname has values less than or equal to 1, it is
interpreted as a stratum sampling rate f_h = n_h/N_h, where n_h =
number of units sampled from stratum h and N_h = total number of
units in the population belonging to stratum h. If varname has
values greater than or equal to n_h, it is interpreted as containing
N_h. It is an error for varname to have values between 1 and n_h or
to have a mixture of sampling rates and stratus sizes.
So, I may take the sampling rate which is mentioned above.
Thanks and regards,
RAJARAM. S
On Tue, Oct 12, 2010 at 8:43 PM, Steve Samuels <[email protected]> wrote:
> Rajaram responded to me privately with the following:
>
>>I just want to know why we should consider round as a second stage
>>strata as per your suggestion of survey setting.
>>
>>
>>I would like to inform you that the sample frame used in the second
>>stage is not households, it is the number of adult members in the
>>households enumerated in the Census. We have listed all the members in
>>the household and we used this list of members to select the
>>respondents.
>>
>>In the data set the finite population correction (FPC) was not
>>included. So, I just want to know how I should calculate the FPC at
>>the first stage. Please, inform.
>>
>>I would like to inform you that we may first do a descriptive analysis
>>and then we may like to do a multivariate analysis. We also want to
>>know the OR, so we are doing the logistic regression as discussed in
>>my earlier mail.
>>
>
> By definition, strata are groups from which samples are drawn
> independently. For the PSUs that was true of the urban/rural place
> strata. Within existing PSUs, independent samples of adults were taken
> in each round, Therefor stratification is by round is at that stage.
> This specification _might_ reduce standard errors somewhat.
>
> The fpc will be one of two numbers: the number of villages in the
> rural stratum in the district or the number of urban blocks in the
> urban stratum. Those numbers should be available from the Census that
> was used to plan the survey.
>
>
> Steve
>
> Steven J. Samuels
> [email protected]
> 18 Cantine's Island
> Saugerties NY 12477
> USA
> Voice: 845-246-0774
> Fax: 206-202-4783
>
>
> On Fri, Oct 8, 2010 at 2:26 PM, Steve Samuels <[email protected]> wrote:
>> --
>>
>> Rajaram Subramanian Potty
>>
>> I recommend that you add all the sampling stages to your design.
>> Include fpcs, especially in the first stage, because you need all the
>> help that you can get in reducing standard errors.
>>
>> something like:
>> svyset psu [pweight=], strata(place) fpc() || _n, strata(round) fpc()
>>
>> One thing is unclear: the sampling frame you used to select males and
>> females. If your sampling frame consisted of households, for example,
>> then replace "_n" in the -svyset- statement above with the household
>> id variable.
>>
>> Which analysis?
>>
>> As you describe your analysis, it is descriptive (or "enumerative"):
>> you want to estimate prevalence rates in one district in 2003 and 200,
>> and their difference.
>>
>> For a descriptive analysis, significance testing is inappropriate.
>> Why? If you had tested every adult in the district, you would never
>> expect the 2002 and 2008 prevalence rates to be _exactly_ the same.
>> (WG Cochran, (1977). Sampling techniques (3rd ed.). New York: Wiley.,
>> p.39; WE Deming. (1966). Some theory of sampling. New York: Dover
>> Publications, Chapter 7, p 247, "Distinction between enumerative and
>> analytic studies").
>>
>> (There are descriptive studies where hypothesis testing is important,
>> e.g. quality assurance sampling ( P Levy and S Lemeshow, Sampling of
>> Populations, Wiley, 2008; p. 429), but your study doesn't seem to be
>> one of them. )
>>
>> The question is therefore not "Are rates in the two years different?",
>> but "How different are ?" Confidence intervals provide the answer.
>> From a public health point of view, I consider 95% confidence to be
>> too stringent. I'd recommend 90% or even 80%.
>>
>> -svy tab- will provide a direct answer to the question: " What are
>> the rates, and how different are they." I don't find the odds ratios
>> from -svy: logistic- to be informative unless transformed to rate
>> differences; -svy: tab- is based on the logit transform, and does it
>> for you.
>>
>> One other poihnt: If you took equal numbers of people in each village
>> and equal numbers in each urban block, your sample should be
>> self-weighting, and your weighted prevalence rates and observed rates
>> should be very similar. If so, it would simplify your tables to report
>> the observed numerators, denominators, and rates, with the CIs from
>> the weighted analysis.
>>
>> --Steve
>>
>> Steven J. Samuels
>> sjsamuels@gmail. com
>> 18 Cantine's Island
>> Saugerties NY 12477
>> USA
>> Voice: 845-246-0774
>> Fax: 206-202-4783
>>
>> On Fri, Oct 8, 2010 at 1:25 AM, Rajaram Subramanian Potty
>> <rajara999@gmail. com> wrote:
>>> I appologise that I did not give much information. In the year 2002,
>>> there is a cross-sectional study conducted to estimate the STI
>>> prevalence in one of the districts. We have two stratums rural and
>>> urban. From the rural areas, 10 villages (PSUs) were selected
>>> systematically using PPS. 20 urban blcocks (PSUs) were selected from
>>> the list of urban blocks in the district using systematic selection.
>>> We have conducted a compelte census in this selcted areas and prepared
>>> a sampling frame for selecting the adult males and feamles aged 15-49.
>>> The targeted samples of around 6600 were selected from this sample
>>> frame. We have calculated the sample weights.
>>>
>>> Again in the year 2008, we have repeated the survey in the same areas.
>>> Conducted the census and selected the required number of 6600 adult
>>> males and females in the same way as selected in the year, 2002. So,
>>> the respondents selected are independent and it is not a follow-up
>>> study.
>>>
>>> We wanted to test, over all whether the difference in STI prevalence
>>> between the year 2002 and 2008 is signficant or not. Also want to
>>> examine the difference in some particular groups such as place of
>>> residence (rural/Urban), sex, age etc. We are not interested in the
>>> difference in the prevalence by PSUs,
>>>
>>> Presently I am using the simple sytax of survey setting:
>>>
>>> svyset psu [pweight=wt], strata(place)
>>>
>>> svy: logistic syphilis round
>>>
>>> The variable round indicates whether the survey is in the year 2002 or
>>> 2008 and p-value from the logistic regression is used for checking
>>> whether there is any significant difference.
>>>
>>> Thanks and regards,
>>>
>>> RAJARAM. S
>>>
>>>
>>> On Thu, Oct 7, 2010 at 8:48 PM, Steve Samuels <sjsamuels@gmail. com> wrote:
>>>> I agree with Ronan that more information is necessary: are you
>>>> interested in estimating rates and changes just for the sampled PSUs,
>>>> or for the population from which they are sampled? If you are
>>>> interested in rates just for those PSUs, then create a combo PSU-round
>>>> stratum variable, e.g. with:
>>>>
>>>> *********
>>>> egen cstratum = group(area round)
>>>> ********
>>>>
>>>> Then -svyset- a psu variable equal to the second stage sampling unit
>>>> (ssu2) in the survey:
>>>>
>>>> ****************
>>>> svyset ssu2 [pweight= ], strata(cstratum)..
>>>> ****************
>>>>
>>>> If you want to estimate for the population from which the areas were sampled:
>>>>
>>>> ******************
>>>> svyset area [pweight=], strata(original stratum) || ssu2, strata(round)
>>>> ********************
>>>>
>>>> For descriptive estimates of prevalence rates and their differences, I
>>>> recommend -svy: tab-, which uses a logit transformation for
>>>> proportions to avoids CIs that extend below zero. You can add finite
>>>> population corrections if these would make a difference.
>>>> ************************************
>>>> webuse nhanes2
>>>> svy: tab sex diabetes, row ci se llwald
>>>> matrix list e(b)
>>>> lincom _b[p22] - _b[p12]
>>>> *************************************
>>>>
>>>> But you have not given us enough details about the purpose of your
>>>> study that I can be confident of these specifications: for example,
>>>> whether you are confining your estimates to particular
>>>> sub-populations.
>>>>
>>>> I don't agree with Ronan's recommendation of an event-time model. You
>>>> have cross-sectional prevalence data, not a cohort. So you would need
>>>> a "current status" (or "status quo") model: the information for
>>>> each individual is their current age and whether or not they have the
>>>> disease of interest; other words, every individual is right-censored
>>>> or left-censored. From this information it is possible to reconstruct
>>>> a survival curve analogous to a current life table. I'd recommend a
>>>> logistic model, instead. For such regression analyses, don't use the
>>>> fpc's.
>>>>
>>>> Steve
>>>>
>>>> Steven J. Samuels
>>>> sjsamuels@gmail. com
>>>> 18 Cantine's Island
>>>> Saugerties NY 12477
>>>> USA
>>>> Voice: 845-246-0774
>>>> Fax: 206-202-4783
>>>>
>>>>
>>>> On Wed, Oct 6, 2010 at 4:59 AM, Ronan Conroy <rconroy@rcsi. ie> wrote:
>>>>> On 6 DFómh 2010, at 07:42, Rajaram Subramanian Potty wrote:
>>>>>
>>>>>> I have data from two rounds of survey conducted in the same areas
>>>>>> (PSUs). But the individual are selected independently in both the
>>>>>> rounds from these areas using the same statistical approaches. What
>>>>>> would be the appropriate analysis that would be carried out to test
>>>>>> the difference in some of the indicators between the two periods. For,
>>>>>> example I want to test the difference in HIV prevalence between the
>>>>>> two rounds. Is it appropriate to use the survey command by considering
>>>>>> the PSUs are the same in both the rounds and setting the survey design
>>>>>> according to our study. After that fitting svy: logistic to examine
>>>>>> the difference in two rounds, is this correct way of testing the
>>>>>> difference between the two rounds. Kindly suggest.
>>>>>
>>>>>
>>>>> My first reaction would be that the most important thing needed here is a
>>>>> sample weighting scheme that allows you to extrapolate from the sample to
>>>>> the underlying population.
>>>>>
>>>>> Are the areas PSUs or strata? In other words, were the areas selected at
>>>>> random or deliberately chosen? This affects your analysis.
>>>>>
>>>>> If you have presumed age of infection, you could consider using an
>>>>> event-time model approach, using age as the time variable. This would allow
>>>>> you to look at the shape of the hazard function. Even if you don't, the
>>>>> hazard curve will show the cumulative prevalence by age (rather than the
>>>>> incidence) but may still be of interest.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Ronán Conroy
>>>>> Associate Professor
>>>>> Division of Population Health Sciences
>>>>> =================================
>>>>>
>>>>> rconroy@rcsi. ie
>>>>> Royal College of Surgeons in Ireland
>>>>> Epidemiology Department,
>>>>> Beaux Lane House, Dublin 2, Ireland
>>>>> +353 (0)1 402 2431
>>>>> +353 (0)87 799 97 95
>>>>> +353 (0)1 402 2764 (Fax - remember them?)
>>>>> http://rcsi. academia. edu/RonanConroy
>>>>>
>>>>> P Before printing, think about the environment
>>>>
>>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/