Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Working with complex strings
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: Working with complex strings
Date
Wed, 30 Nov 2011 08:36:51 +0000
-split- by default parses on spaces, which clearly is no good here
given that medications can have compound names and dosages will not be
discarded. Steve was evidently pointing to the -parse()- option, not
suggesting that parsing on spaces was the answer.
If we assume that (a) dose always starts with a number and (b) dose
when specified always follows name of medication and (c) names never
have numeric characters, then -split- can be used to parse on numeric
characters. Here I used 1-9 but 0 should be added if it's ever the
first numeric digit:
. split medication, parse(1 2 3 4 5 6 7 8 9) limit(1)
variable created as string:
medication1
. replace medication1 = trim(medication1)
(5 real changes made)
. l
+---------------------------------------------------+
| medication medication1 |
|---------------------------------------------------|
1. | metoprolol 100 mg qday metoprolol |
2. | metoprolol tatrate 150mg bid metoprolol tatrate |
3. | atenelol 150 mg qday atenelol |
4. | hctz 25 mg qday hctz |
5. | PEG interferon PEG interferon |
|---------------------------------------------------|
6. | cimzia 50 mg qday cimzia |
+---------------------------------------------------+
Another approach is to use -moss- (SSC):
. moss medication, match("(.+) [1-9]+") regex
. drop _count _pos1
. rename _match1 medication2
With this regular expression, -moss- misses names without dosages,
which can just be copied across.
. replace medication2 = medication if missing(medication2)
(1 real change made)
. l
+------------------------------------------------------------------------+
| medication medication1 medication2 |
|------------------------------------------------------------------------|
1. | metoprolol 100 mg qday metoprolol metoprolol |
2. | metoprolol tatrate 150mg bid metoprolol tatrate metoprolol tatrate |
3. | atenelol 150 mg qday atenelol atenelol |
4. | hctz 25 mg qday hctz hctz |
5. | PEG interferon PEG interferon PEG interferon |
|------------------------------------------------------------------------|
6. | cimzia 50 mg qday cimzia cimzia |
+------------------------------------------------------------------------+
Nick
On Wed, Nov 30, 2011 at 5:43 AM, Dudekula, Anwar <[email protected]> wrote:
> Thank you very much
>
> I will work on it .Would the parse() option split metoprolol tatrate 150mg bid as
>
> metoprolol tatrate and 150mg bid
>
> Or
>
> metoprolol & tatrate & 150mg & bid
>
> Thank you
> Anwar
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Steve Nakoneshny
> Sent: Wednesday, November 30, 2011 12:38 AM
> To: [email protected]
> Subject: Re: st: Working with complex strings
>
> - help split - would have answered this question.
>
> - split medication, parse( ) -
>
> should do what you want.
On Nov 29, 2011, at 9:54 PM, "Dudekula, Anwar" <[email protected]> wrote:
>> I am working with deidentified hospitaldatabase with patient names(as string variable) and medications (as string variable)as follows
>>
>> Patients_name medication
>> ------------------------------------
>> Patient-1 metoprolol 100 mg qday
>> Patient-1 metoprolol tatrate 150mg bid
>> Patient-1 atenelol 150 mg qday
>> Patient-2 hctz 25 mg qday
>> Patient-2 PEG interferon
>> Patient-3 cimzia 50 mg qday
>>
>> Question: I am interested in name of medication only , not their dosages.Is it possible to split the medication string after the name i.e.,
>>
>> 1) split metoprolol tatrate 150mg bid into metoprolol tatrate & 150mg bid
>> 2) split metoprolol 100 mg qday into metoprolol & 100 mg qday
>>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/