Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: destring ignores more than what specified in ignore()
From
"Impavido, Gregorio" <[email protected]>
To
"'STATALIST ([email protected])'" <[email protected]>
Subject
st: destring ignores more than what specified in ignore()
Date
Sun, 20 Nov 2011 20:51:55 -0500
I looked at the many FAQ on destring but could not find an answer for my problem. Hence, the post and hopefully, it is not a duplicate.
I have a dataset with an unknown (ex ante) number of string variables containing entries of the following three types: (i) "###.###"; (ii) "n.a."; and "n.s.".
These variables should be numeric and I would like to destring them by coding:
foreach var of varlist * {
capture confirm numeric variable `var'
if _rc {
destring `var', replace ignore("n.a." "n.s.")
}
}
This does not work as destring, for some inexplicable (to me) reason, treats "." as a separate non numeric character from "n.a." or "n.s.".
Therefore, it drops the "." in the entries like "###.###" changing them in double numeric ######. Same happens if option is specified as ignore("n.a" "n.s") (i.e., without final ".").
First question (of two): Why is destring ignoring more things than what specified in the option ignore()?
I found two ways around this odd behaviour of destring.
The first option uses an extra line of code and it is:
foreach var of varlist * {
capture confirm numeric variable `var'
if _rc {
replace `var' = "na" if inlist(`var', "n.a.", "n.s.") // this gets rid of the "."
destring `var', replace ignore("na") // no "." here!!!
}
}
This preserves both the order and the variable labels of my original string variables (which I need in subsequent code) but it uses again the dreaded destring command (after seeing how it treats "n.a.", I don't "trust" it anymore).
The second option uses generate with the real() function but also more lines of code as real() does not work with replace.
foreach var of varlist * {
capture confirm numeric variable `var'
if _rc {
replace `var' = "." if inlist(`var', "n.a.", "n.s.")
local lbl : variable label `var'
gen `var'r = real(`var')
label var `var'r `"`lbl'"'
order `var'r, after(`var')
drop `var'
}
}
Both loops seem to end up with numeric only variables in the same order and with the same variable labels as the original dataset. My second question is: should we use real() instead of destring when possible, which is more "fool proof" (my third loop is much faster than the other two)?
Finally, is there a more efficient way to get where I want without writing all this code (especially the last loop)?
Thanks in advance for suggestions
Gregorio
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/