Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Prakash Singh <prakashbhu@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Re: problem with split command |
Date | Wed, 29 Feb 2012 15:30:38 +0530 |
Thanks again This is what I did after Joseph suggestion. split state_name, p(2) gen(statename) drop statename2 gen year = substr(state_name, -4, 4) Prakash On Wed, Feb 29, 2012 at 2:50 PM, Nick Cox <njcoxstata@gmail.com> wrote: > Joseph is naturally right. In addition, > > 1. The help for -split- gives an example in which parsing is on ")" > but it is desired to keep the ")" and the answer is simply that if you > use -split- in this way you must put them back yourself. This is > similar to your problem. > > 2. The main point is that -split- is not designed directly for this > kind of problem because when it was introduced there were already > several ways to use existing string functions [N.B., not commands] to > solve that kind of problem easily. Joseph has mentioned one. Here's > another > > gen numeral = real(substr(state_name, -4, 4)) > gen state = substr(state_name, 1, length(state_name) - 4) > > Once -numeral- exists, > > gen state = subinstr(state_name, numeral, "", .) > > is another way to do it. > > Here's another > > gen numeral = substr(state_name, strpos(state_name, "2"), .) > gen state = substr(state_name, 1, strpos(state_name, "2") - 1) > > Nick > > On Wed, Feb 29, 2012 at 3:49 AM, Joseph Coveney <jcoveney@bigplanet.com> wrote: > >> Forgot to mention: for this year's survey and afterward, try the alternative below. You can use Stata's regular expressions, too. >> >> >> . input str30 state_name >> >> state_name >> 1. "Andhra2012" >> 2. "Arunachal2012" >> 3. "Assam2012" >> 4. "Bihar2012" >> 5. "UttarPradesh2012" >> 6. end >> >> . >> . generate byte first_numeral = indexnot(state_name, "`c(alpha)'`c(ALPHA)'") >> >> . generate long year = real(substr(state_name, first_numeral, .)) >> >> . replace state_name = substr(state_name, 1, first_numeral - 1) >> (5 real changes made) >> >> . >> . list, noobs separator(0) abbreviate(20) >> >> +-------------------------------------+ >> | state_name first_numeral year | >> |-------------------------------------| >> | Andhra 7 2012 | >> | Arunachal 10 2012 | >> | Assam 6 2012 | >> | Bihar 6 2012 | >> | UttarPradesh 13 2012 | >> +-------------------------------------+ >> >> . >> . exit >> >> end of do-file > > Joseph Coveney > > You're almost there: finish the job by concatenating "2" and statename2: > > generate int year = real("2" + statename2) > > > Prakash Singh wrote: > > I need help on using -split- command. I am working with Stata 10. > I am working with survey data of Indian states, In the survey data the > variable state_name are put jointly with year in which the state is > surveyed, in this case 2005 to 2009. So the state_name variable looks > like... > Andhra2006 > Arunachal2005 > Assam2006 > Bihar2007 > UttarPradesh2009 > > and so on. > Now I would like to create two separate variables out of it i.e. > state_name and year_survey. > > I have used the following command > split state_name, pares(2) gen(statename) > > But the problem I am facing is the statename2 variable which is > actually year variable is coming without 2 i.e. 005, 006 etc. > > Please suggest me as I have read the -split- help and Statalist postings > on -split- but could not work it out. > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/