Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: destring ignores more than what specified in ignore()
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: destring ignores more than what specified in ignore()
Date
Mon, 21 Nov 2011 08:21:58 +0000
-destring- ignores characters, not substrings. The problem is at most
that this is not clear to you when you read the help. -destring- did
what you told it to do, which was, among other things, to remove ".".
You need to fix your "n.a." and "n.s." first, e.g. within a loop
replace `var' = subinstr("`var'", "n.a.", ".", .)
replace `var' = subinstr("`var'", "n.s.", ".", .)
or as you did it.
-destring- is just a wrapper for -real()-, so -real()- is not really
an alternative except in so far as -destring- is not understood. Your
code is shorter and more efficient than -destring- as it can be
tailored to your problem. In fact your last code segment can be
shortened as -real("n.a.")- for example results in numeric missing.
Nick
On Mon, Nov 21, 2011 at 1:51 AM, Impavido, Gregorio <[email protected]> wrote:
> I looked at the many FAQ on destring but could not find an answer for my problem. Hence, the post and hopefully, it is not a duplicate.
>
> I have a dataset with an unknown (ex ante) number of string variables containing entries of the following three types: (i) "###.###"; (ii) "n.a."; and "n.s.".
>
> These variables should be numeric and I would like to destring them by coding:
>
> foreach var of varlist * {
> capture confirm numeric variable `var'
> if _rc {
> destring `var', replace ignore("n.a." "n.s.")
> }
> }
>
> This does not work as destring, for some inexplicable (to me) reason, treats "." as a separate non numeric character from "n.a." or "n.s.".
>
> Therefore, it drops the "." in the entries like "###.###" changing them in double numeric ######. Same happens if option is specified as ignore("n.a" "n.s") (i.e., without final ".").
>
>
> First question (of two): Why is destring ignoring more things than what specified in the option ignore()?
>
> I found two ways around this odd behaviour of destring.
>
> The first option uses an extra line of code and it is:
>
> foreach var of varlist * {
> capture confirm numeric variable `var'
> if _rc {
> replace `var' = "na" if inlist(`var', "n.a.", "n.s.") // this gets rid of the "."
> destring `var', replace ignore("na") // no "." here!!!
> }
> }
>
> This preserves both the order and the variable labels of my original string variables (which I need in subsequent code) but it uses again the dreaded destring command (after seeing how it treats "n.a.", I don't "trust" it anymore).
>
> The second option uses generate with the real() function but also more lines of code as real() does not work with replace.
>
> foreach var of varlist * {
> capture confirm numeric variable `var'
> if _rc {
> replace `var' = "." if inlist(`var', "n.a.", "n.s.")
> local lbl : variable label `var'
> gen `var'r = real(`var')
> label var `var'r `"`lbl'"'
> order `var'r, after(`var')
> drop `var'
> }
> }
>
> Both loops seem to end up with numeric only variables in the same order and with the same variable labels as the original dataset. My second question is: should we use real() instead of destring when possible, which is more "fool proof" (my third loop is much faster than the other two)?
>
> Finally, is there a more efficient way to get where I want without writing all this code (especially the last loop)?
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/