Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <n.j.cox@durham.ac.uk> |
To | "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |
Subject | st: RE: RE: puzzling string conversion |
Date | Thu, 10 Feb 2011 15:56:24 +0000 |
Code closer to Dimitri's original is gen id = mystring count if missing(real(id)) & (id != "") qui while r(N) { replace id = regexr(id,"[^0-9]","") count if missing(real(id)) & (id != "") } destring id, gen(numid) format numid %30.0f Here r(N) is emitted by -count- and is non-zero (positive) while there's work still to do. Nick n.j.cox@durham.ac.uk Nick Cox Your -while- condition will be interpreted as referring to -id[1]- regardless. It does not itself loop over the data. The -replace- statement would be sufficient in itself if the regexp is what you want. There are various solutions to extracting numeric characters only from a string. Here is another, more pedestrian in style. gen id = "" gen char = "" local length = substr("`: type mystring'",4,.) qui forval i = 1/`length' { replace char = substr(mystring, `i', 1) replace id = id + char if inrange(real(char), 0, 9) } Dimitri Szerman I got this puzzling result. I have a string variable, mystring, which has both numeric and non-numeric characters. I'd like to extract only the numeric ones, and form a numeric variable with this (in fact, it's going to be an id). I'm using regular expressions, and this is what I'm doing input str30 mystring "111.aaa.22.2/33-33" "011.xyz.22.2/33-33" "101.abc.22.2/33-33" "222.foo.22.2/33-33" "111.bla.22.2/33-33" end gen id = mystring while regexm(id, "[^0-9]" ) { replace id = regexr(id,"[^0-9]","") } destring id, gen(numid) And it works fine. However, if mystring has an observation which contains very few (when compared to the other observations) non-numeric characters, this seems to break down: clear input str30 mystring "A" "011.xyz.22.2/33-33" "101.abc.22.2/33-33" "222.foo.22.2/33-33" "111.bla.22.2/33-33" end gen id = mystring while regexm(id, "[^0-9]" ) { replace id = regexr(id,"[^0-9]","") } destring id, gen(numid) Am I missing something? Why doesn't this work? Any suggestions? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/