Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: destring ignores more than what specified in ignore()
From
"Impavido, Gregorio" <[email protected]>
To
"[email protected]" <[email protected]>
Subject
RE: st: destring ignores more than what specified in ignore()
Date
Mon, 21 Nov 2011 10:27:39 -0500
Thank you Nick. It indeed wasn't clear to me that destring works with characters and not substrings (I should have looked at the ado file first...). It is now clear that destring creates local macros of each individual character specified in ignore() (lines 51-59 of destring.ado) and replaces them with "" in lines 229-230 before applying real(). This means (if understood correctly) that your last suggestion:
destring <varlist>, replace ignore("nas")
does not work as by starting with "n.a." or "n.s.", I am still left with ".." after the substitution. However, by adding
| `temp'==".."
in line 238 of destring, then you suggestion works like a charm. This is (I believe) equivalent to using the force option as you also suggest.
All your other suggestions work perfectly. So thank you again.
Gregorio
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Nick Cox
Sent: Monday, November 21, 2011 5:36 AM
To: '[email protected]'
Subject: RE: st: destring ignores more than what specified in ignore()
On the information here
destring <varlist>, replace ignore("nas")
or
destring <varlist>, replace force
should work. Note that you don't need to set up your own loop or a prior filter of numeric variables; -destring- will do both for you.
Nick
[email protected]
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Nick Cox
Sent: 21 November 2011 08:22
To: [email protected]
Subject: Re: st: destring ignores more than what specified in ignore()
-destring- ignores characters, not substrings. The problem is at most
that this is not clear to you when you read the help. -destring- did
what you told it to do, which was, among other things, to remove ".".
You need to fix your "n.a." and "n.s." first, e.g. within a loop
replace `var' = subinstr("`var'", "n.a.", ".", .)
replace `var' = subinstr("`var'", "n.s.", ".", .)
or as you did it.
-destring- is just a wrapper for -real()-, so -real()- is not really
an alternative except in so far as -destring- is not understood. Your
code is shorter and more efficient than -destring- as it can be
tailored to your problem. In fact your last code segment can be
shortened as -real("n.a.")- for example results in numeric missing.
Nick
On Mon, Nov 21, 2011 at 1:51 AM, Impavido, Gregorio <[email protected]> wrote:
> I looked at the many FAQ on destring but could not find an answer for my problem. Hence, the post and hopefully, it is not a duplicate.
>
> I have a dataset with an unknown (ex ante) number of string variables containing entries of the following three types: (i) "###.###"; (ii) "n.a."; and "n.s.".
>
> These variables should be numeric and I would like to destring them by coding:
>
> foreach var of varlist * {
> capture confirm numeric variable `var'
> if _rc {
> destring `var', replace ignore("n.a." "n.s.")
> }
> }
>
> This does not work as destring, for some inexplicable (to me) reason, treats "." as a separate non numeric character from "n.a." or "n.s.".
>
> Therefore, it drops the "." in the entries like "###.###" changing them in double numeric ######. Same happens if option is specified as ignore("n.a" "n.s") (i.e., without final ".").
>
>
> First question (of two): Why is destring ignoring more things than what specified in the option ignore()?
>
> I found two ways around this odd behaviour of destring.
>
> The first option uses an extra line of code and it is:
>
> foreach var of varlist * {
> capture confirm numeric variable `var'
> if _rc {
> replace `var' = "na" if inlist(`var', "n.a.", "n.s.") // this gets rid of the "."
> destring `var', replace ignore("na") // no "." here!!!
> }
> }
>
> This preserves both the order and the variable labels of my original string variables (which I need in subsequent code) but it uses again the dreaded destring command (after seeing how it treats "n.a.", I don't "trust" it anymore).
>
> The second option uses generate with the real() function but also more lines of code as real() does not work with replace.
>
> foreach var of varlist * {
> capture confirm numeric variable `var'
> if _rc {
> replace `var' = "." if inlist(`var', "n.a.", "n.s.")
> local lbl : variable label `var'
> gen `var'r = real(`var')
> label var `var'r `"`lbl'"'
> order `var'r, after(`var')
> drop `var'
> }
> }
>
> Both loops seem to end up with numeric only variables in the same order and with the same variable labels as the original dataset. My second question is: should we use real() instead of destring when possible, which is more "fool proof" (my third loop is much faster than the other two)?
>
> Finally, is there a more efficient way to get where I want without writing all this code (especially the last loop)?
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/