Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: RE: RE: RE: puzzling string conversion
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
st: RE: RE: RE: puzzling string conversion
Date
Thu, 10 Feb 2011 16:14:32 +0000
But there is no need to proceed character by character.
replace id = regexr(id,"[^0-9]*","")
should speed things up a bit.
Nick
[email protected]
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Nick Cox
Sent: 10 February 2011 15:56
To: '[email protected]'
Subject: st: RE: RE: puzzling string conversion
Code closer to Dimitri's original is
gen id = mystring
count if missing(real(id)) & (id != "")
qui while r(N) {
replace id = regexr(id,"[^0-9]","")
count if missing(real(id)) & (id != "")
}
destring id, gen(numid)
format numid %30.0f
Here r(N) is emitted by -count- and is non-zero (positive) while there's work still to do.
Nick
[email protected]
Nick Cox
Your -while- condition will be interpreted as referring to -id[1]- regardless.
It does not itself loop over the data. The -replace- statement would be sufficient in itself if the regexp is what you want.
There are various solutions to extracting numeric characters only from a string. Here is another, more pedestrian in style.
gen id = ""
gen char = ""
local length = substr("`: type mystring'",4,.)
qui forval i = 1/`length' {
replace char = substr(mystring, `i', 1)
replace id = id + char if inrange(real(char), 0, 9)
}
Dimitri Szerman
I got this puzzling result. I have a string variable, mystring, which
has both numeric and non-numeric characters. I'd like to extract only
the numeric ones, and form a numeric variable with this (in fact, it's
going to be an id). I'm using regular expressions, and this is what
I'm doing
input str30 mystring
"111.aaa.22.2/33-33"
"011.xyz.22.2/33-33"
"101.abc.22.2/33-33"
"222.foo.22.2/33-33"
"111.bla.22.2/33-33"
end
gen id = mystring
while regexm(id, "[^0-9]" ) {
replace id = regexr(id,"[^0-9]","")
}
destring id, gen(numid)
And it works fine. However, if mystring has an observation which
contains very few (when compared to the other observations)
non-numeric characters, this seems to break down:
clear
input str30 mystring
"A"
"011.xyz.22.2/33-33"
"101.abc.22.2/33-33"
"222.foo.22.2/33-33"
"111.bla.22.2/33-33"
end
gen id = mystring
while regexm(id, "[^0-9]" ) {
replace id = regexr(id,"[^0-9]","")
}
destring id, gen(numid)
Am I missing something? Why doesn't this work? Any suggestions?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/