Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Re: problem with split command
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: Re: problem with split command
Date
Wed, 29 Feb 2012 09:20:35 +0000
Joseph is naturally right. In addition,
1. The help for -split- gives an example in which parsing is on ")"
but it is desired to keep the ")" and the answer is simply that if you
use -split- in this way you must put them back yourself. This is
similar to your problem.
2. The main point is that -split- is not designed directly for this
kind of problem because when it was introduced there were already
several ways to use existing string functions [N.B., not commands] to
solve that kind of problem easily. Joseph has mentioned one. Here's
another
gen numeral = real(substr(state_name, -4, 4))
gen state = substr(state_name, 1, length(state_name) - 4)
Once -numeral- exists,
gen state = subinstr(state_name, numeral, "", .)
is another way to do it.
Here's another
gen numeral = substr(state_name, strpos(state_name, "2"), .)
gen state = substr(state_name, 1, strpos(state_name, "2") - 1)
Nick
On Wed, Feb 29, 2012 at 3:49 AM, Joseph Coveney <[email protected]> wrote:
> Forgot to mention: for this year's survey and afterward, try the alternative below. You can use Stata's regular expressions, too.
>
>
> . input str30 state_name
>
> state_name
> 1. "Andhra2012"
> 2. "Arunachal2012"
> 3. "Assam2012"
> 4. "Bihar2012"
> 5. "UttarPradesh2012"
> 6. end
>
> .
> . generate byte first_numeral = indexnot(state_name, "`c(alpha)'`c(ALPHA)'")
>
> . generate long year = real(substr(state_name, first_numeral, .))
>
> . replace state_name = substr(state_name, 1, first_numeral - 1)
> (5 real changes made)
>
> .
> . list, noobs separator(0) abbreviate(20)
>
> +-------------------------------------+
> | state_name first_numeral year |
> |-------------------------------------|
> | Andhra 7 2012 |
> | Arunachal 10 2012 |
> | Assam 6 2012 |
> | Bihar 6 2012 |
> | UttarPradesh 13 2012 |
> +-------------------------------------+
>
> .
> . exit
>
> end of do-file
Joseph Coveney
You're almost there: finish the job by concatenating "2" and statename2:
generate int year = real("2" + statename2)
Prakash Singh wrote:
I need help on using -split- command. I am working with Stata 10.
I am working with survey data of Indian states, In the survey data the
variable state_name are put jointly with year in which the state is
surveyed, in this case 2005 to 2009. So the state_name variable looks
like...
Andhra2006
Arunachal2005
Assam2006
Bihar2007
UttarPradesh2009
and so on.
Now I would like to create two separate variables out of it i.e.
state_name and year_survey.
I have used the following command
split state_name, pares(2) gen(statename)
But the problem I am facing is the statename2 variable which is
actually year variable is coming without 2 i.e. 005, 006 etc.
Please suggest me as I have read the -split- help and Statalist postings
on -split- but could not work it out.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/