Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: McNemar test for survey data
From
Ankit Sakhuja <[email protected]>
To
[email protected]
Subject
Re: st: McNemar test for survey data
Date
Sun, 5 Jan 2014 20:54:32 -0600
Thanks Steve. Using your code I ran the program as below (after using svyset):
set seed 2000
gen u=uniform()
sort u
svy: tab testresult1 testresult2
and I get this as below:
(running tabulate on estimation sample)
Number of strata = 14 Number of obs = 4885
Number of PSUs = 31 Population size = 207888271
Design df = 17
-------------------------------
Testresult1 | Testresult2
|
| 0 1 Total
----------+--------------------
0 | .8335 0 .8335
1 | .0382 .1283 .1665
|
Total | .8717 .1283 1
-------------------------------
Key: cell proportions
Pearson:
Uncorrected chi2(1) = 3598.5695
Design-based F(1, 17) = 3475.6666 P = 0.0000
In this p12 has a value of 0 as below as there is no one with a
positive result with test 2 but negative result with test 1 (everyone
with positive result with test 2 are positive with test 1):
. mat list e(b)
e(b)[1,4]
p11 p12 p21 p22
y1 .83352672 0 .03821686 .12825641
and _b[p12] & _b[p21] is significantly different as below:
. test _b[p12]=_b[p22]
Adjusted Wald test
( 1) p12 - p22 = 0
F( 1, 17) = 333.13
Prob > F = 0.0000
But as when everyone with positive result with test 2 are positive
with test 1 shouldn't the comparison be rather between 0.1283 in p22
and the 0.1665 under total in 2X2 table (to the right on 0.1283)
rather than p21 and p12?
Thanks
Ankit
On Sun, Jan 5, 2014 at 8:39 PM, Steve Samuels <[email protected]> wrote:
> Unfortunately "test1" inherited the foreign's value
> labels. Eliminated here.
>
> SS
>
> Here is the example in
> (http://www.stata.com/statalist/archive/2010-03/msg00937.html),
> specialized to your nomenclature. Roger's approach
> requires an id variable for the reshape, but this does
> not.
>
> ******************CODE BEGINS***********
> sysuse auto, clear
>
> gen test1 = foreign
>
> svyset _n [pw = turn], strata(rep78)
>
> set seed 2000
> gen u=uniform()
> sort u
> gen test2 = _n<39
> svy: tab test1 test2
> lincom _b[p12] - _b[p21]
> *******************CODE ENDS*************
>
> As I stated in the post, the hypothesis _b[p12] = _b[p21] is exactly
> the hypothesis tested in McNemar's test. And, it is equivalent to
> the more useful formulation that the proportions positive are
> the same for test 1 and test 1.
>
> Steve
> [email protected]
>
>
>
> On Jan 5, 2014, at 2:05 PM, Ankit Sakhuja wrote:
>
> Thanks for the input. The survey sample that I am working on is a
> stratified sample using probability weights. It is probability the
> naivety and ignorance on my part but I am still not sure how to make
> the variable -testid- as all observations underwent both tests. To
> give an example my dataset looks like this:
>
> Observation No Result of Test 1 Result of Test 2
> 1 1 1
> 2 1 0
> 3 1 1
> 4 1 0
> 5 1 1
> 6 1 0
> 7 1 1
> 8 0 0
> 9 1 1
> 10 0 0
>
> So that in the above example the result of test 1 is 80% and for test
> 2 is 50% but all 10 observations got both tests.
>
> Or a different example could be that 10 patients were given medication
> A for asthma and after a washout period taking a medication B for the
> same. Then say with first medication 80% had a response and with
> second medication 50% had a response. So all observations got both
> medications (or tests) and therefore I am not sure if variable
> -testid- or -cat- (as in Samuel's example) can be made.
> Thanks again
> Ankit
>
> On Sun, Jan 5, 2014 at 11:39 AM, Roger B. Newson
> <[email protected]> wrote:
>> This problem can probably be solved using -somersd-, -regpar-, -binreg-,
>> -glm-, or some other package that can estimate diferences between 2
>> proportions for clustered data. The first step would be to reshape your data
>> (using either -reshape- or -expgen-) to have 1 observation per study subject
>> per binary test (and therefore 2 observations per study subject as there are
>> 2 binary tests). The binary outcome, in this dataset, would be the test
>> result. For each study subject, it would be the outcome of the first binary
>> test in the first observation for that subject, and the outcome of the
>> second binary test in the second outcome. And the dataset would contain a
>> variable, maybe called -testid-, with the value 1 in observations
>> representing the first test, and 2 in observations representing the second
>> test. The confidence interval to be calculated would be for the difference
>> between 2 proportions, namely the proportion of positive outcomes where
>> -testid- is 2 and the proportion o positive results where -testid- is 1.
>>
>> You do not say what the sampling design is for your complex survey data.
>> However, if this design has clusters, then they will be the clusters to use
>> when estimating your difference between proportions. And, if this design
>> does not have clusters, then the clusters used, when stimating your
>> difference between proportions, will be the study subjects themselves.
>> Either way, your final estimate will be clustered.
>>
>> I hope thhis helps. Let me know if you have any further queries.
>>
>> Best wishes
>>
>> Roger
>>
>> Roger B Newson BSc MSc DPhil
>> Lecturer in Medical Statistics
>> Respiratory Epidemiology, Occupational Medicine
>> and Public Health Group
>> National Heart and Lung Institute
>> Imperial College London
>> Royal Brompton Campus
>> Room 33, Emmanuel Kaye Building
>> 1B Manresa Road
>> London SW3 6LR
>> UNITED KINGDOM
>> Tel: +44 (0)20 7352 8121 ext 3381
>> Fax: +44 (0)20 7351 8322
>> Email: [email protected]
>> Web page: http://www.imperial.ac.uk/nhli/r.newson/
>> Departmental Web page:
>> http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/
>>
>> Opinions expressed are those of the author, not of the institution.
>>
>>
>> On 05/01/2014 16:55, Ankit Sakhuja wrote:
>>>
>>> Dear Members,
>>> I am trying to compare two categorical variables which are not
>>> mutually exclusive such that participants with a positive result in
>>> one group (using method 1) also have a positive result in second group
>>> (using method 2). Now say 30% have positive result by method 1 and 20%
>>> by method two, how can I say that these results are in fact similar or
>>> different? I could potentially use McNemar's but it is a complex
>>> survey data and I am not sure how to go ahead with that. I have seen
>>> discussions about using -somersd- but not sure how to exactly use it
>>> with this data. Would really appreciate any help.
>>> Ankit
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
>>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
>
>
> --
> Ankit
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
--
Ankit
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/