James Hesford answered the question posed by Paul O'Brien for the simple case
where only one OC occurred for each observation. If it is possible for more
than one to exist, then things get a bit trickier, more so if we don't know how
many drugs (or items) might be listed for each subject. Here is a more general
solution. It involves separating the drug names with spaces, not commas; making
new variables to accommodate each item in the original list; -reshape- ing the
data to long form. (This is Stata8, I _think_ it's OK in Stata7).
gen byte id=_n // make an identifier for each observation
replace drug = subinstr(drug, "," , " " , .) // substitute spaces for commas
gen byte wc=wordcount(drug) // how many drugs does each subject get?
qui summ wc // r(max) will hold the most drugs used (in these data, 5)
forvalues i = 1/`r(max)' { // manufacture r(max) = 5 new variables
gen drug`i' = "" // initialize variables to missing for each subj
}
order id //cosmetic, I just like ids to be first!
drop wc // don't need it anymore
forvalues i = 1/`r(max)' { // peek names from original variable
replace drug`i' = word(drug,`i') // and poke them into new variables
}
drop drug // don't really need it anymore; if you do, then make a copy
reshape long drug, i(id) j(drug_order) string
li in 1/10
+-------------------------+
| id drug_o~r drug |
|-------------------------|
1. | 1 1 NONE |
2. | 1 2 |
3. | 1 3 |
4. | 1 4 |
5. | 1 5 |
|-------------------------|
6. | 2 1 NONE |
7. | 2 2 NONE |
8. | 2 3 NONE |
9. | 2 4 MATES6 |
10. | 2 5 NATURAL |
+-------------------------+
You can now edit this long data set in any way you see fit, for example, -drop-
ping observations that don't meet your criteria for OC. Of course, with 41,000
original observations, you end up with 41000 x 5 ~ 200000 observations, which
might be a problem depending on your system.....
Phil
Quoting Paul O'Brien <[email protected]>:
> Database of 41,000 clinic visits with 26 variables for each visit. One
> variable is a string (stk) of contraceptives prescribed, containing 1
> to 4 types, separated by commas.
>
> stk
> NONE,OVYS
> NONE,NONE,NONE,MATES6,NATURAL
> NONE,MICROG
> NONE,NONE
> NONE,FEMODE,NATURAL
> NONE,NONE
> LOGED
> NONE,NONE,MICROG,MATES6,NATURAL
> NONE
> NONE,LEVONEL,MICROG
> MICROG
> NONE,NONE,LOG
> NONE,NONE,NONE
> NONE,MATES6,NATURAL
> NONE,NONE
> MICROG,ULTRA
> NATURAL
> NONE,MICROG,NATURAL
>
> Some are oral contraceptives:OVYS MICROG FEMODE, LOGED
>
> The position of the oral contraceptives in the string varies.
>
> I want all the oral contraceptives in one column:
>
> OVYS
> blank
> MATES6
> MICROG
> blank
> FEMO
> blank
> LOGED
>
> How?
>
> Help gratefull received.
>
> Paul
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
--
Philip Ryan
Associate Professor
Department of Public Health
University of Adelaide
5005 South Australia
AUSTRALIA
CRICOS Provider Number 00123M
-----------------------------------------------------------
This email message is intended only for the addressee(s)
and contains information that may be confidential and/or
copyright. If you are not the intended recipient please
notify the sender by reply email and immediately delete
this email. Use, disclosure or reproduction of this email
by anyone other than the intended recipient(s) is strictly
prohibited. No representation is made that this email or
any attachments are free of viruses. Virus scanning is
recommended and is the responsibility of the recipient.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/