Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Working with complex strings
From
Steve Nakoneshny <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: Working with complex strings
Date
Wed, 30 Nov 2011 08:14:22 -0700
Nick,
I hadn't known about the -p- option of -concat-. That will help me solve an unrelated problem I'm working on, thanks.
Steve
On 2011-11-30, at 2:08 AM, Nick Cox wrote:
> Parsing on spaces can be more helpful than stated here. We just need
> to reject "words" once we have found the first "word" that starts with
> a numeric digit. That can be done in a loop. It also copes with the
> possibility that numeric characters might be found within medication
> names, but _not_ with the possibility that medication names start with
> numeric characters.
>
> . split medication
> variables created as string:
> medication1 medication2 medication3 medication4
>
> . gen found = 0
>
> 4 here is empirical for this example. See how many variables -split- creates.
>
> . qui forval j = 1/4 {
> 2. replace found = 1 if inrange(substr(medication`j', 1, 1), "0", "9")
> 3. replace medication`j' = "" if found
> 4. }
>
> . l
>
> +--------------------------------------------------------------------------------------+
> | medication medicati~1 medicati~2
> medica~3 medica~4 found |
> |--------------------------------------------------------------------------------------|
> 1. | metoprolol 100 mg qday metoprolol
> 1 |
> 2. | metoprolol tatrate 150mg bid metoprolol tatrate
> 1 |
> 3. | atenelol 150 mg qday atenelol
> 1 |
> 4. | hctz 25 mg qday hctz
> 1 |
> 5. | PEG interferon PEG interferon
> 0 |
> |--------------------------------------------------------------------------------------|
> 6. | cimzia 50 mg qday cimzia
> 1 |
> +--------------------------------------------------------------------------------------+
>
>
> Then we put the words back together again:
>
> . egen medname = concat(medication?), p(" ")
>
> . l medication medname
>
> +---------------------------------------------------+
> | medication medname |
> |---------------------------------------------------|
> 1. | metoprolol 100 mg qday metoprolol |
> 2. | metoprolol tatrate 150mg bid metoprolol tatrate |
> 3. | atenelol 150 mg qday atenelol |
> 4. | hctz 25 mg qday hctz |
> 5. | PEG interferon PEG interferon |
> |---------------------------------------------------|
> 6. | cimzia 50 mg qday cimzia |
> +---------------------------------------------------+
>
>
> On Wed, Nov 30, 2011 at 8:36 AM, Nick Cox <[email protected]> wrote:
>> -split- by default parses on spaces, which clearly is no good here
>> given that medications can have compound names and dosages will not be
>> discarded. Steve was evidently pointing to the -parse()- option, not
>> suggesting that parsing on spaces was the answer.
>>
>> If we assume that (a) dose always starts with a number and (b) dose
>> when specified always follows name of medication and (c) names never
>> have numeric characters, then -split- can be used to parse on numeric
>> characters. Here I used 1-9 but 0 should be added if it's ever the
>> first numeric digit:
>>
>> . split medication, parse(1 2 3 4 5 6 7 8 9) limit(1)
>> variable created as string:
>> medication1
>>
>> . replace medication1 = trim(medication1)
>> (5 real changes made)
>>
>> . l
>>
>> +---------------------------------------------------+
>> | medication medication1 |
>> |---------------------------------------------------|
>> 1. | metoprolol 100 mg qday metoprolol |
>> 2. | metoprolol tatrate 150mg bid metoprolol tatrate |
>> 3. | atenelol 150 mg qday atenelol |
>> 4. | hctz 25 mg qday hctz |
>> 5. | PEG interferon PEG interferon |
>> |---------------------------------------------------|
>> 6. | cimzia 50 mg qday cimzia |
>> +---------------------------------------------------+
>>
>> Another approach is to use -moss- (SSC):
>>
>> . moss medication, match("(.+) [1-9]+") regex
>>
>> . drop _count _pos1
>>
>> . rename _match1 medication2
>>
>> With this regular expression, -moss- misses names without dosages,
>> which can just be copied across.
>>
>> . replace medication2 = medication if missing(medication2)
>> (1 real change made)
>>
>> . l
>>
>> +------------------------------------------------------------------------+
>> | medication medication1 medication2 |
>> |------------------------------------------------------------------------|
>> 1. | metoprolol 100 mg qday metoprolol metoprolol |
>> 2. | metoprolol tatrate 150mg bid metoprolol tatrate metoprolol tatrate |
>> 3. | atenelol 150 mg qday atenelol atenelol |
>> 4. | hctz 25 mg qday hctz hctz |
>> 5. | PEG interferon PEG interferon PEG interferon |
>> |------------------------------------------------------------------------|
>> 6. | cimzia 50 mg qday cimzia cimzia |
>> +------------------------------------------------------------------------+
>>
>> Nick
>>
>> On Wed, Nov 30, 2011 at 5:43 AM, Dudekula, Anwar <[email protected]> wrote:
>>> Thank you very much
>>>
>>> I will work on it .Would the parse() option split metoprolol tatrate 150mg bid as
>>>
>>> metoprolol tatrate and 150mg bid
>>>
>>> Or
>>>
>>> metoprolol & tatrate & 150mg & bid
>>>
>>> Thank you
>>> Anwar
>>>
>>> -----Original Message-----
>>> From: [email protected] [mailto:[email protected]] On Behalf Of Steve Nakoneshny
>>> Sent: Wednesday, November 30, 2011 12:38 AM
>>> To: [email protected]
>>> Subject: Re: st: Working with complex strings
>>>
>>> - help split - would have answered this question.
>>>
>>> - split medication, parse( ) -
>>>
>>> should do what you want.
>>
>>
>> On Nov 29, 2011, at 9:54 PM, "Dudekula, Anwar" <[email protected]> wrote:
>>
>>>> I am working with deidentified hospitaldatabase with patient names(as string variable) and medications (as string variable)as follows
>>>>
>>>> Patients_name medication
>>>> ------------------------------------
>>>> Patient-1 metoprolol 100 mg qday
>>>> Patient-1 metoprolol tatrate 150mg bid
>>>> Patient-1 atenelol 150 mg qday
>>>> Patient-2 hctz 25 mg qday
>>>> Patient-2 PEG interferon
>>>> Patient-3 cimzia 50 mg qday
>>>>
>>>> Question: I am interested in name of medication only , not their dosages.Is it possible to split the medication string after the name i.e.,
>>>>
>>>> 1) split metoprolol tatrate 150mg bid into metoprolol tatrate & 150mg bid
>>>> 2) split metoprolol 100 mg qday into metoprolol & 100 mg qday
>>>>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/