Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: How do I split a string variable without spaces by capital letters?
From
Haluk Vahaboglu <[email protected]>
To
"[email protected]" <[email protected]>
Subject
RE: st: How do I split a string variable without spaces by capital letters?
Date
Tue, 20 Aug 2013 00:06:35 +0000
Nick thank you again for your immediate reply which I was waiting in eager.
Thank you for the advice "keep the original till its safe to drop it".
I understand that while modifying the loop I cleaned the comma and unnoticed this after the error report.
Haluk Vahaboğlu
----------------------------------------
> Date: Tue, 20 Aug 2013 00:50:31 +0100
> Subject: Re: st: How do I split a string variable without spaces by capital letters?
> From: [email protected]
> To: [email protected]
>
> My line was
>
> replace v2 = subinstr(v2, "`L'", " `L'", .)
>
> but you left out the first comma. You had
>
> replace v1= subinstr(v1 "`L'", " `L'", .)
>
> and Stata complained that it can make no sense of
>
> v1 "A"
>
> Not using
>
> clonevar v2 = v1
>
> was not an error here. But why I did clone -v1- as -v2-? The
> suggestion is always to leave the original of such a string variable
> in your dataset until you are sure that you have no further need for.
> Suppose you work on -v1- and then mess it up. That's no good, as you
> have read it in again.
>
> Nick
> [email protected]
>
>
> On 20 August 2013 00:37, Haluk Vahaboglu <[email protected]> wrote:
>> Nick may I ask a simple question (surely not simple to me),
>> I am trying to learn the secrets of Stata. For this purpose, I test on my Stata 12.1 Ubuntu-64 bit system codes posted to this list for that those might be useful in my future studies.
>> In this context, I run your loop with a small modification as shown below:
>>
>> clear all
>> inp str13(v1)
>> "TestOne"
>> "ThisistestTwo"
>> "AndThree"
>> end
>> foreach L in `c(ALPHA)' {
>> replace v1= subinstr(v1 "`L'", " `L'", .)
>> }
>>
>> It is really a surprise to me but this did not work. Returned error:
>> v1"A" invalid name
>> r(198);
>>
>> It is working in the format you posted:
>> clonevar v2 = v1
>> qui foreach L in `c(ALPHA)' {
>> replace v2 = subinstr(v2, "`L'", " `L'", .)
>> }
>>
>> I wonder why this loop fails without "clonevar v2=v1"? I guess there is a very easy answer to this which I can not see.
>
>>> Date: Mon, 19 Aug 2013 17:33:23 +0100
>>> Subject: Re: st: How do I split a string variable without spaces by capital letters?
>>> From: [email protected]
>>> To: [email protected]
>>>
>>> Along these lines you could prefix every upper-case letter with a space.
>>>
>>> clonevar v2 = v1
>>>
>>> qui foreach L in `c(ALPHA)' {
>>> replace v2 = subinstr(v2, "`L'", " `L'", .)
>>> }
>>>
>>> split v2
>>>
>>> For c(ALPHA) see results of -creturn list-.
>>>
>>> That doesn't presuppose just two substrings.
>>>
>>> Nick
>>> [email protected]
>>>
>>>
>>> On 19 August 2013 16:36, Eric A. Booth <[email protected]> wrote:
>>>> <>
>>>> Agreed, -moss- is great for this, but also you can do this using
>>>> built-in string functions if you are interested, example:
>>>>
>>>> *****************!
>>>> clear all
>>>> inp str13(v1)
>>>> "TestOne"
>>>> "ThisistestTwo"
>>>> "AndThree"
>>>> end
>>>>
>>>> g v2 = reverse(v1)
>>>> g pos = .
>>>> g l = length(v1)
>>>> foreach x in `c(ALPHA)' {
>>>> replace pos = strpos(v2, "`x'") if inlist(pos, ., 0, l)
>>>> }
>>>> drop v2
>>>> g first = substr(v1, 1, l-pos)
>>>> g second = substr(v1, l-pos+1, l)
>>>> list
>>>> *****************!
>>>> EAB
>>>>
>>>>
>>>>
>>>> On Mon, Aug 19, 2013 at 10:31 AM, Robert Picard <[email protected]> wrote:
>>>>> You can use -moss- (available from SSC) to handle this problem. The
>>>>> following works with your example:
>>>>>
>>>>> moss v1, match("([A-Z][^A-Z]*)") regex
>>>>>
>>>>> The pattern indicates that you are looking for substrings that start
>>>>> with a capital letter (i.e [A-Z]) followed by zero or more non-capital
>>>>> letters (i.e. [^A-Z]*).
>>>>>
>>>>> On Mon, Aug 19, 2013 at 10:06 AM, Andrew Dickens <[email protected]> wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> I'm currently running Stata 10, and I'm having a problem splitting a string
>>>>>> variable by capital letters. Elena Vidal posted something under a similar
>>>>>> title, http://www.stata.com/statalist/archive/2011-11/msg01195.html, but the
>>>>>> her problem is somewhat different than mine and I was unable to
>>>>>> troubleshoot.
>>>>>>
>>>>>> An example of my data is as follows:
>>>>>>
>>>>>> clear all
>>>>>> inp str13(v1)
>>>>>> "TestOne"
>>>>>> "ThisistestTwo"
>>>>>> "AndThree"
>>>>>> end
>>>>>>
>>>>>> The problem is the capital letter I wish to split each cell by is not
>>>>>> consistently placed.
>>>>>>
>>>>>> I tried splitting using this code:
>>>>>>
>>>>>> split v1, p(upper(a-z))
>>>>>> or
>>>>>> split v1, p(upper(.))
>>>>>>
>>>>>> but this just generates an identical variable to v1.
>>>>>>
>>>>>> What I would like to do is create two new variables, so the first
>>>>>> observation of my example would have "Test" in the first new variable and
>>>>>> "One" in the second new variable. Suggestions would be greatly appreciated.
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/