Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: separating string variable into components


From   "Radu Ban" <[email protected]>
To   [email protected]
Subject   Re: st: RE: separating string variable into components
Date   Mon, 14 Aug 2006 08:44:07 -0400

Brilliant! This works very nice. Thanks

-Radu

2006/8/14, Nick Cox <[email protected]>:
As the original author of -split-, I can see a fairly painless
way to use that command. The trick is to see that you definitely
do _not_ want to parse on spaces, as not only is there a variable
number of spaces between substrings, but also spaces separate
elements within substrings. But each substring ends with
a right parenthesis. Now

split mystring, p(")")

parses on right parentheses, but will delete them, but that
is trivial. You may want to put them back, for which

foreach v in `r(varlist)' {
       replace `v' = `v' + ")"
}

-- or you prefer to take out the left parentheses, for which

foreach v in `r(varlist)' {
       replace `v' = subinstr(`v', "(","",.)
}

Note here that r(varlist) is left behind by -split- as a list
of the names of the variables it creates, but will be zapped by
the next r-class command. You can do it directly by naming those
variables if you prefer.

Nick
[email protected]

P.S. I follow the terminology that () are parentheses, []
brackets and {} are braces. Using brackets in the wide
sense either creates ambiguity or commits you to needing
to say round, square and curly to disambiguate.

Radu Ban

> I have a string variable that looks like this:
>
> mystring
>    (1 2 3) (1 2 2)  (7 8 9)    (1 3 4)
>  (2 3 4)    (1 2 3) (10 11 12)
>
> etc. The numbers inside the brackets are made up. The problem is that
> the number of spaces between brackets is not constant. Also the number
> of brackets is not constant across observations. I want to split this
> variable so that each bracket is contained in its own variable, i.e.
>
> split1   split2    split3         split4
> (1 2 3)  (1 2 2)  (7 8 9)        (1 3 4)
> (2 3 4)  (1 2 3)  (10 11 12)   <blank>
>
> I've tried the -split- command, with various numbers of spaces as the
> parse character, but that doesn't work, i.e. it doesn't split if i
> specify too many blanks, or it creates blank observations if i specify
> too few blanks.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index