I am guessing that the sequence of the test doesn't matter? If this
is the case I would do the following. Get your data in the long
format, sort the N's and P's into set patterns so that you have:
^^^^^ <- I am only using ^ as a spaceholder for missing obs.
^^^^N
^^^^P
^^^NN
^^^NP
^^^PP
^^NNN
^^NNP
^^NPP
^^PPP
^NNNN
^NNNP
^NNPP
^NPPP
^PPPP
NNNNN
NNNNP
NNNPP
NNPPP
NPPPP
PPPPP
Then do a -egen- command using the -group- option.
Codes to use:
reshape long HIV, i(studyid) j(incident)
drop incident
bys studyid (HIV): gen incident=_n
reshape wide HIV, i(studyid) j(incident)
egen hiv_type=group(HIV1 HIV2 HIV3 HIV4 HIV5 HIV6)
After that you can group the groups again to form your seroconversion,
prevalent positives, consistently seronegative groups.
On Tue, Nov 25, 2008 at 12:45 AM, Polis, Chelsea B. <[email protected]> wrote:
> Dear Statalisters,
>
> I am trying to figure out a way to code individuals as either having incident HIV seroconversion (had at least one negative HIV test, followed by one positive HIV test while under surveillance), prevalent HIV (had one or more positive HIV tests while under surveillance), or HIV-negative (had all HIV-negative tests while under surveillance).
>
> My dataset is set up as such, where N =negative, P=positive, .=not tested at that round, and I "indeterminate". I want to ignore any indeterminate tests, so I haven't included them here in the examples since I assume I will simply need to replace all "I"s with "."s, but help on figuring out a more elegant way to tweak the code to incorporate this fact would also be most appreciated!
>
> Study_id HIV1 HIV2 HIV3 HIV4 HIV5 HIV6
> 1 . N . . N P
> 2 . . N N N .
> 3 P P . . P .
> 4 N P . P P P
> 5 . . . P P P
>
> I also have a variable that shows these patterns in one variable, i.e.
> Study_id HIV
> 1 .N..NP (I would want this to be coded as incident seroconverter)
> 2 ..NNN. (I would want this to be coded as consistently seronegative)
> 3 PP..P. (I would want this to be coded as prevalent positive)
> 4 NP.PPP (I would want this to be coded as incident seroconverter)
> 5 ...PPP (I would want this to be coded as prevalent positive)
>
> These are string variables. Is there a simple formula to use to categorize these women as incident seroconverters, prevalent positives, or consistently seronegative?
>
> I tried something along the lines of:
> gen prevpos=0
> replace prevpos=1 if hiv1==.|hiv1=="P" & hiv2==.|hiv2=="P" & hiv3==.|hiv3=="P" & hiv4==.|hiv4=="P" & hiv5==.|hiv5=="P" & hiv6==.|hiv6=="P"
>
> But I am receiving type mismatch r(109);
>
> Your suggestions would be most appreciated!
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
--
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/