Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: copy part of a string
From
ChrisAnsen <[email protected]>
To
[email protected]
Subject
st: copy part of a string
Date
Sat, 15 Oct 2011 18:21:39 +0200
Dear all
I run into an issue with STATA today.
I have a datalist with over 1000 string variables in the following type
1. "M200B + M201 + B001"
2. "M200B + M201"
3. "M200 + M300"
4. ...
5. and so on.
Now I want to read the first part of the string, example: "M200B" and
insert it in a new column and then read the second and the third part of
it if applicable.
I am doing this by using the command:
gen code1_1 = regexs(1) if regexm(code1,
"(([a-zA-Z]+[0-9]+[0-9]+[0-9][a-zA-Z])|([a-zA-Z]+[0-9]+[0-9]+[0-9])")
Now this gets my what I want, having what is before the + sign.
Now I want what is after the + sign and I am doing it be using the
following command:
gen code1_2 = regexs(2) if regexm(code1, "(([+
]+[a-zA-Z]+[0-9]+[0-9]+[0-9]+[a-zA-Z]))")
This gives the value if it is in the form of "M200B" and by adding an OR
and transforming it to:
gen code1_2 = regexs(2) if regexm(code1, "(([+
]+[a-zA-Z]+[0-9]+[0-9]+[0-9]+[a-zA-Z])|([+ ]+[a-zA-Z]+[0-9]+[0-9]+[0-9])")
I am getting an error that it is outside range, or something similar.
Can someone tell me where I am making the mistake, or if there is an
other way to do it?
I though of using a dummy variable as a mid-step but I do not like the
idea because later when I have six variable "M2008 + .......+M20" it
will be messy, and it should be durable on the "correct" way.
Also I know how to make it more tide up by using [0-9] for example so
please do not mention any of those advices :)
Thank you all in advance
Best regards
Christina Christiansen, DK
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/