Philip Ryan responding to Paul O'Brien f
> If it is possible for more
> than one to exist, then things get a bit trickier, more so if
> we don't know how
> many drugs (or items) might be listed for each subject. Here
> is a more general
> solution. It involves separating the drug names with spaces,
> not commas; making
> new variables to accommodate each item in the original list;
> -reshape- ing the
> data to long form. (This is Stata8, I _think_ it's OK in Stata7).
>
>
> gen byte id=_n // make an identifier for each observation
>
> replace drug = subinstr(drug, "," , " " , .) // substitute
> spaces for commas
>
> gen byte wc=wordcount(drug) // how many drugs does each subject get?
>
> qui summ wc // r(max) will hold the most drugs used (in
> these data, 5)
>
> forvalues i = 1/`r(max)' { // manufacture r(max) = 5 new variables
> gen drug`i' = "" // initialize variables to
> missing for each subj
> }
>
> order id //cosmetic, I just like ids to be first!
>
> drop wc // don't need it anymore
>
> forvalues i = 1/`r(max)' { // peek names from
> original variable
> replace drug`i' = word(drug,`i') // and poke them into
> new variables
> }
>
> drop drug // don't really need it anymore; if you do, then
> make a copy
>
> reshape long drug, i(id) j(drug_order) string
>
> li in 1/10
>
> +-------------------------+
> | id drug_o~r drug |
> |-------------------------|
> 1. | 1 1 NONE |
> 2. | 1 2 |
> 3. | 1 3 |
> 4. | 1 4 |
> 5. | 1 5 |
> |-------------------------|
> 6. | 2 1 NONE |
> 7. | 2 2 NONE |
> 8. | 2 3 NONE |
> 9. | 2 4 MATES6 |
> 10. | 2 5 NATURAL |
> +-------------------------+
>
> You can now edit this long data set in any way you see fit,
> for example, -drop-
> ping observations that don't meet your criteria for OC. Of
> course, with 41,000
> original observations, you end up with 41000 x 5 ~ 200000
> observations, which
> might be a problem depending on your system.....
For Stata 8, an alternative is available using -split-.
I'd not use a byte identifier with your dataset size.
gen id = _n
split drug, p(,)
keep id `r(varlist)'
reshape long drug, i(id) j(drug_order)
For Stata 7, you could use the unofficial -split-
from SSC:
ssc inst split
gen id == _n
split drug, p(,)
drop drug
keep id drug*
reshape long drug, i(id) j(drug_order)
For wider discussions see
http://www.stata.com/support/faqs/data/splitstr.html
http://www.stata.com/support/faqs/data/multresp.html
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/