Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: splitting strings
From
Daniel Henriksen <[email protected]>
To
[email protected]
Subject
Re: st: splitting strings
Date
Mon, 14 Feb 2011 09:47:00 +0100
Hello Nick, Scott and Eric (and everyone else)
Thank you all for your input, the suggested solutions, and for
pointing out the differences! Nicks solution is the closest to what I
need, but I think I know where I can use Scott's way (in another
context).
@ Nick: I'm glad I didn't miss something when searching the archives
for this problem :-)
Cheers
Daniel
2011/2/10 Eric Booth <[email protected]>:
> <>
>
> Scott's solution doesn't check V2 for the word that should parse V1 (within the same observation), but instead checks whether V1 contains some text found anywhere in V2.
> This could be a problem if V1 contains segments that can be found in several levels of V2, see the 3rd observation I added below:
>
> ***!
> clear
> input str29 v1 str10 v2
> "hello John Smith how are you?" "John Smith"
> "I’m fine Jane, how about you?" "Jane,"
> "I'm Jane, but you're John Smith, right?" ", ri"
> end
> levelsof v2
> return list
> split v1, parse(`=r(levels)')
> l
> ***!
>
> Nick's solution would split the 3rd obs on "ri", but Scott's would split it on "Jane" from the 2nd obs.
>
> - Eric
> __
> Eric A. Booth
> Public Policy Research Institute
> Texas A&M University
> [email protected]
> Office: +979.845.6754
>
>
>
> On Feb 10, 2011, at 7:20 AM, Scott Merryman wrote:
>
>> If the data set is small then this should work:
>>
>> clear
>> input str29 v1 str10 v2
>> "hello John Smith how are you?" "John Smith"
>> "I’m fine Jane, how about you?" "Jane,"
>> end
>> levelsof v2
>> return list
>> split v1, parse(`=r(levels)')
>> l
>>
>> Scott
>>
>>
>> On Thu, Feb 10, 2011 at 7:08 AM, Daniel Henriksen
>> <[email protected]> wrote:
>>> I'm sorry, the example is unreadable.
>>> In observation one :
>>> V1: hello John Smith how are you?
>>> V2 John Smith
>>> V1_1 hello
>>> V1_2 how are you?
>>>
>>> Observation two:
>>> V1: I’m fine Jane, how about you?
>>> V2: Jane,
>>> V1_1 I'm fine
>>> V1_2 how about you?
>>>
>>> So the splitting of V1 varies from observation to observation
>>> depending on the string text in V2
>>> Hope this makes sense
>>>
>>> /Daniel
>>>
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
2011/2/10 Nick Cox <[email protected]>:
> What a nice question. I'm credited as the original author of -split-, following earlier joint work with Michael Blasnik, and I don't think I thought of this rather natural question when writing it.
>
> I checked to see whether I had solved it by accident and I hadn't. Whatever you specify as argument to -parse()- is taken literally and not checked to see if it is a variable name.
>
> However, there is a work-around.
>
> clonevar V1_2 = V1
> replace V1_2 = subinstr(V1_2, V2, "&", .)
> split V1_2, parse(&)
>
> The essential is to use as new separator -- "&" in the example -- something that does not otherwise occur. You can test any potential separator by e.g.
>
> assert strpos(V1, "&") == 0
>
> Nick
> [email protected]
>
> Daniel Henriksen
>
> I have a question regarding splitting up strings.
> Is it possible to split up a string using a string from another
> variable defined in the same observation. I'm thinking of using the
> "split" command.
> Here's an example, where V1 is the string I'd like to split and V2 is
> where I'd like to split (different from each observation). V1_1 and
> V1_2 are the results of the splitting
> V1 V2
> V1_1 V1_2
> hello John Smith how are you? John Smith hello how are you?
> I'm fine Jane, how about you Jane, I'm fine
> how about you?
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
--
Daniel Henriksen
Ph.d. studerende, læge
Infektionsmedicinsk afd Q / Akut Modtage Afdelingen
Odense Universitetshospital
Bygning 2, 1. sal
Sdr. Boulevard 29
5000 Odense C
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/