Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Dimitriy V. Masterov" <dvmaster@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: copy part of a string |
Date | Sat, 15 Oct 2011 13:28:19 -0400 |
Chris, Suppose your variable was called x. Then the command would be split x, parse(" + ") You can loop over your variables with split if you want to do all of them. This will not work if you have cases without spaces, like "M200+M300". Then you may need to split x, parse("+"), and then use -trim- to get rid of the leading and trailing blanks if they exist. DVM On Sat, Oct 15, 2011 at 12:21 PM, ChrisAnsen <lakridstina@gmail.com> wrote: > Dear all > > I run into an issue with STATA today. > > I have a datalist with over 1000 string variables in the following type > > 1. "M200B + M201 + B001" > 2. "M200B + M201" > 3. "M200 + M300" > 4. ... > 5. and so on. > > Now I want to read the first part of the string, example: "M200B" and > insert it in a new column and then read the second and the third part of > it if applicable. > > I am doing this by using the command: > > gen code1_1 = regexs(1) if regexm(code1, > "(([a-zA-Z]+[0-9]+[0-9]+[0-9][a-zA-Z])|([a-zA-Z]+[0-9]+[0-9]+[0-9])") > > > Now this gets my what I want, having what is before the + sign. > > Now I want what is after the + sign and I am doing it be using the > following command: > > gen code1_2 = regexs(2) if regexm(code1, "(([+ > ]+[a-zA-Z]+[0-9]+[0-9]+[0-9]+[a-zA-Z]))") > > This gives the value if it is in the form of "M200B" and by adding an OR > and transforming it to: > > gen code1_2 = regexs(2) if regexm(code1, "(([+ > ]+[a-zA-Z]+[0-9]+[0-9]+[0-9]+[a-zA-Z])|([+ ]+[a-zA-Z]+[0-9]+[0-9]+[0-9])") > > I am getting an error that it is outside range, or something similar. > > Can someone tell me where I am making the mistake, or if there is an > other way to do it? > > I though of using a dummy variable as a mid-step but I do not like the > idea because later when I have six variable "M2008 + .......+M20" it > will be messy, and it should be durable on the "correct" way. > > Also I know how to make it more tide up by using [0-9] for example so > please do not mention any of those advices :) > > Thank you all in advance > > Best regards > Christina Christiansen, DK > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/