Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: RE: extracting a specific portion of a string
From
"Travis Coan" <[email protected]>
To
<[email protected]>
Subject
RE: st: RE: extracting a specific portion of a string
Date
Thu, 17 Mar 2011 10:10:00 -0400
This is actually Jorge's string variable, not mine. Though, I have
certainly enjoyed your and Nick's posts.
Cheers,
Travis
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Eric Booth
Sent: Thursday, March 17, 2011 9:57 AM
To: <[email protected]>
Subject: Re: st: RE: extracting a specific portion of a string
<>
On Mar 17, 2011, at 5:42 AM, <[email protected]>
wrote:
> Wouldn't it be simpler to rename "MOTHER'SBLOOD" "BLOOD,MATERNAL"?
>
> reg
>
I doubt it. It's probably a safe bet that Travis's string variable
takes more than just the 9 values he showed us in the example.
If there were hundreds or thousands of values where "BLOOD" or "SERUM"
were located later in the string, would you really want to write some
form of "replace v1 = "BLOOD,MATERNAL" if v1== "MOTHER'SBLOOD" for every
possible instance? Also, Travis may be interested in extracting more
than just "BLOOD" AND "SERUM" (such as "LIPEMIC", "1ST", "SPECIMEN",
etc.) which could become problematic if you start jumbling up the
variable just to get "BLOOD" to the front of it. Better to leave the
string variable in place and use string functions to flag observations
that contain some substring of interest or extract substrings into other
variables.
On Mar 17, 2011, at 4:04 AM, Nick Cox wrote:
> <snip>
> On a detail that might confuse: Eric used -index()- and -strpos()-. In
> essence, -index()- is the old name that still works, while -strpos()-
> is the new name. It's the same function underneath the names.
This is (at least) the second time Nick has been kind enough to remind
me that strpos() is the modern version of index() -- old habits die
hard. (http://www.stata.com/statalist/archive/2011-02/msg01111.html)
- Eric
__
Eric A. Booth
Public Policy Research Institute
Texas A&M University
[email protected]
Office: +979.845.6754
>
> -----Original Message-----
>> From: Eric Booth <[email protected]>
>> Sent: Mar 17, 2011 12:15 AM
>> To: "<[email protected]>"
<[email protected]>
>> Subject: Re: st: RE: extracting a specific portion of a string
>>
>> <>
>>
>> On Mar 16, 2011, at 10:43 PM, Travis Coan wrote:
>>>
>>> I would take a look at the -substr- function -- typing 'help substr'
should get you there.
>>>
>>
>> You should probably look at all the functions available in -help
string_functions-.
>> Note that -substr- alone wouldn't return the desired result in this
example, e.g.:
>>
>> **********************!
>> clear
>> inp str20(v1)
>> "BLOOD"
>> "BLOOD(LIPEMIC)"
>> "BLOOD(MODERATELYLY"
>> "BLOOD, 2ND SPECIMEN"
>> "BLOOD,1STSPECIMEN"
>> "BLOOD,2NDSPECIMEN"
>> "MOTHER'SBLOOD"
>> "SERUM,1STSPECIMEN"
>> "SERUM,2NDSPECIMEN"
>> end
>>
>> g v2 = substr(v1, 1, 5)
>> **note obs 7
>>
>> //using strpos and substr string functions//
>> g str10 v4 = ""
>> foreach x in "BLOOD" "SERUM" {
>> g v`x' = strpos(v1, "`x'")
>> replace v4 = substr(v1, v`x' , 5) if v`x'>0
>> }
>>
>> //using index//
>> g ind = 0
>> replace ind = 1 if index(v1, "BLOOD")
>> replace ind = 2 if index(v1, "SERUM")
>> la def ii 1 "Blood" 2 "Serum", modify
>> lab val ind ii
>> li
>> **********************!
>>
>> - Eric
>> __
>> Eric A. Booth
>> Public Policy Research Institute
>> Texas A&M University
>> [email protected]
>>
>>
>>>
>>>
>>> From: [email protected]
[mailto:[email protected]] On Behalf Of Mendoza
Aldana, Dr Jorge Antonio (WPRO)
>>> Sent: Wednesday, March 16, 2011 10:36 PM
>>> To: [email protected]
>>> Subject: st: extracting a specific portion of a string
>>>
>>> Dear all,
>>> My dataset has a string variable, from which I need a specific
portion of it. The content of the variable is like:
>>>
>>> BLOOD
>>> BLOOD(LIPEMIC)
>>> BLOOD(MODERATELYLY
>>> BLOOD, 2ND SPECIMEN
>>> BLOOD,1STSPECIMEN
>>> BLOOD,2NDSPECIMEN
>>> MOTHER'SBLOOD
>>> SERUM,1STSPECIMEN
>>> SERUM,2NDSPECIMEN
>>>
>>> and I need to generate a new variable containing either "BLOOD" or
"SERUM"
>>> I would appreciate very much if you can give me some hints on
solving this.
>>> I'm using Stata 11.1 on a Windows XP machine
>>> Kind regards,
>>> Jorge
>>>
>>>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/