Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | ChrisAnsen <lakridstina@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | st: copy part of a string |
Date | Sat, 15 Oct 2011 18:21:39 +0200 |
Dear all I run into an issue with STATA today. I have a datalist with over 1000 string variables in the following type 1. "M200B + M201 + B001" 2. "M200B + M201" 3. "M200 + M300" 4. ... 5. and so on. Now I want to read the first part of the string, example: "M200B" and insert it in a new column and then read the second and the third part of it if applicable. I am doing this by using the command:gen code1_1 = regexs(1) if regexm(code1, "(([a-zA-Z]+[0-9]+[0-9]+[0-9][a-zA-Z])|([a-zA-Z]+[0-9]+[0-9]+[0-9])")
Now this gets my what I want, having what is before the + sign. Now I want what is after the + sign and I am doing it be using the following command:gen code1_2 = regexs(2) if regexm(code1, "(([+ ]+[a-zA-Z]+[0-9]+[0-9]+[0-9]+[a-zA-Z]))")
This gives the value if it is in the form of "M200B" and by adding an OR and transforming it to:gen code1_2 = regexs(2) if regexm(code1, "(([+ ]+[a-zA-Z]+[0-9]+[0-9]+[0-9]+[a-zA-Z])|([+ ]+[a-zA-Z]+[0-9]+[0-9]+[0-9])")
I am getting an error that it is outside range, or something similar. Can someone tell me where I am making the mistake, or if there is an other way to do it? I though of using a dummy variable as a mid-step but I do not like the idea because later when I have six variable "M2008 + .......+M20" it will be messy, and it should be durable on the "correct" way. Also I know how to make it more tide up by using [0-9] for example so please do not mention any of those advices :) Thank you all in advance Best regards Christina Christiansen, DK * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/