Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Robert Picard <picard@netbox.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: How do I split a string variable without spaces by capital letters? |
Date | Mon, 19 Aug 2013 11:31:21 -0400 |
You can use -moss- (available from SSC) to handle this problem. The following works with your example: moss v1, match("([A-Z][^A-Z]*)") regex The pattern indicates that you are looking for substrings that start with a capital letter (i.e [A-Z]) followed by zero or more non-capital letters (i.e. [^A-Z]*). On Mon, Aug 19, 2013 at 10:06 AM, Andrew Dickens <adickens@econ.yorku.ca> wrote: > Hi all, > > I'm currently running Stata 10, and I'm having a problem splitting a string > variable by capital letters. Elena Vidal posted something under a similar > title, http://www.stata.com/statalist/archive/2011-11/msg01195.html, but the > her problem is somewhat different than mine and I was unable to > troubleshoot. > > An example of my data is as follows: > > clear all > inp str13(v1) > "TestOne" > "ThisistestTwo" > "AndThree" > end > > The problem is the capital letter I wish to split each cell by is not > consistently placed. > > I tried splitting using this code: > > split v1, p(upper(a-z)) > or > split v1, p(upper(.)) > > but this just generates an identical variable to v1. > > What I would like to do is create two new variables, so the first > observation of my example would have "Test" in the first new variable and > "One" in the second new variable. Suggestions would be greatly appreciated. > > Thank you for your consideration. > > Andrew > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/