Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: How do I split a string variable without spaces by capital letters?
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: How do I split a string variable without spaces by capital letters?
Date
Mon, 19 Aug 2013 17:33:23 +0100
Along these lines you could prefix every upper-case letter with a space.
clonevar v2 = v1
qui foreach L in `c(ALPHA)' {
replace v2 = subinstr(v2, "`L'", " `L'", .)
}
split v2
For c(ALPHA) see results of -creturn list-.
That doesn't presuppose just two substrings.
Nick
[email protected]
On 19 August 2013 16:36, Eric A. Booth <[email protected]> wrote:
> <>
> Agreed, -moss- is great for this, but also you can do this using
> built-in string functions if you are interested, example:
>
> *****************!
> clear all
> inp str13(v1)
> "TestOne"
> "ThisistestTwo"
> "AndThree"
> end
>
> g v2 = reverse(v1)
> g pos = .
> g l = length(v1)
> foreach x in `c(ALPHA)' {
> replace pos = strpos(v2, "`x'") if inlist(pos, ., 0, l)
> }
> drop v2
> g first = substr(v1, 1, l-pos)
> g second = substr(v1, l-pos+1, l)
> list
> *****************!
> EAB
>
>
>
> On Mon, Aug 19, 2013 at 10:31 AM, Robert Picard <[email protected]> wrote:
>> You can use -moss- (available from SSC) to handle this problem. The
>> following works with your example:
>>
>> moss v1, match("([A-Z][^A-Z]*)") regex
>>
>> The pattern indicates that you are looking for substrings that start
>> with a capital letter (i.e [A-Z]) followed by zero or more non-capital
>> letters (i.e. [^A-Z]*).
>>
>> On Mon, Aug 19, 2013 at 10:06 AM, Andrew Dickens <[email protected]> wrote:
>>> Hi all,
>>>
>>> I'm currently running Stata 10, and I'm having a problem splitting a string
>>> variable by capital letters. Elena Vidal posted something under a similar
>>> title, http://www.stata.com/statalist/archive/2011-11/msg01195.html, but the
>>> her problem is somewhat different than mine and I was unable to
>>> troubleshoot.
>>>
>>> An example of my data is as follows:
>>>
>>> clear all
>>> inp str13(v1)
>>> "TestOne"
>>> "ThisistestTwo"
>>> "AndThree"
>>> end
>>>
>>> The problem is the capital letter I wish to split each cell by is not
>>> consistently placed.
>>>
>>> I tried splitting using this code:
>>>
>>> split v1, p(upper(a-z))
>>> or
>>> split v1, p(upper(.))
>>>
>>> but this just generates an identical variable to v1.
>>>
>>> What I would like to do is create two new variables, so the first
>>> observation of my example would have "Test" in the first new variable and
>>> "One" in the second new variable. Suggestions would be greatly appreciated.
>>>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/