Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Re: problem with split command |
Date | Wed, 29 Feb 2012 09:20:35 +0000 |
Joseph is naturally right. In addition, 1. The help for -split- gives an example in which parsing is on ")" but it is desired to keep the ")" and the answer is simply that if you use -split- in this way you must put them back yourself. This is similar to your problem. 2. The main point is that -split- is not designed directly for this kind of problem because when it was introduced there were already several ways to use existing string functions [N.B., not commands] to solve that kind of problem easily. Joseph has mentioned one. Here's another gen numeral = real(substr(state_name, -4, 4)) gen state = substr(state_name, 1, length(state_name) - 4) Once -numeral- exists, gen state = subinstr(state_name, numeral, "", .) is another way to do it. Here's another gen numeral = substr(state_name, strpos(state_name, "2"), .) gen state = substr(state_name, 1, strpos(state_name, "2") - 1) Nick On Wed, Feb 29, 2012 at 3:49 AM, Joseph Coveney <jcoveney@bigplanet.com> wrote: > Forgot to mention: for this year's survey and afterward, try the alternative below. You can use Stata's regular expressions, too. > > > . input str30 state_name > > state_name > 1. "Andhra2012" > 2. "Arunachal2012" > 3. "Assam2012" > 4. "Bihar2012" > 5. "UttarPradesh2012" > 6. end > > . > . generate byte first_numeral = indexnot(state_name, "`c(alpha)'`c(ALPHA)'") > > . generate long year = real(substr(state_name, first_numeral, .)) > > . replace state_name = substr(state_name, 1, first_numeral - 1) > (5 real changes made) > > . > . list, noobs separator(0) abbreviate(20) > > +-------------------------------------+ > | state_name first_numeral year | > |-------------------------------------| > | Andhra 7 2012 | > | Arunachal 10 2012 | > | Assam 6 2012 | > | Bihar 6 2012 | > | UttarPradesh 13 2012 | > +-------------------------------------+ > > . > . exit > > end of do-file Joseph Coveney You're almost there: finish the job by concatenating "2" and statename2: generate int year = real("2" + statename2) Prakash Singh wrote: I need help on using -split- command. I am working with Stata 10. I am working with survey data of Indian states, In the survey data the variable state_name are put jointly with year in which the state is surveyed, in this case 2005 to 2009. So the state_name variable looks like... Andhra2006 Arunachal2005 Assam2006 Bihar2007 UttarPradesh2009 and so on. Now I would like to create two separate variables out of it i.e. state_name and year_survey. I have used the following command split state_name, pares(2) gen(statename) But the problem I am facing is the statename2 variable which is actually year variable is coming without 2 i.e. 005, 006 etc. Please suggest me as I have read the -split- help and Statalist postings on -split- but could not work it out. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/