|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: regexr and missing values
Howie:
I don't have any special insight into the problem you document, but I
can suggest two potential work-arounds to try. (Note: these are
untested!)
1) Use wildcards to match additional characters and thus (hopefully)
avoid blanks, i.e.:
gen test = lfpatfin
replace test = regexs(1)+"E" if regexm(lfpatfin,"^([A-Z]*)U$")
2) Wrap your -regexr()- command in a -cond()- statement, i.e.:
gen test = cond(missing(lfpatfin), "", regexr(lfpatfin,"U$", "E"))
Two other things to investigate, in light of your inability to
reproduce the problem with other data:
1. It may be the case that what appears as a "blank" is really some
"funny" character (or characters) that appears blank when listed to
the screen, but is not seen as blank by the -regex- functions.
Perhaps -tab lfpatfin, missing- might turn up multiple "blanks".
Otherwise, you could write the data series in question to a file,
then examine that with -hexdump-. (Or examine it with a high-end
text editor -- though you might not have one at your disposal.) I
think this explanation is unlikely, however, as it would have to fool
the -missing()- function as well as -list-, but not -regex?-.
2. I notice that both you and Yun Liu seem to run into this problem
at observation numbers above 20000. Is it possible that some
internal limit is inadvertently triggering this problem? One
potential way to test that theory is to -sort- your data so that
"problem blanks" appear as lower observations, while "non-problem
blanks" now end up above 20000, and repeat your command. (Obviously,
use a copy of your dataset!) If the problem is invariant to sort
order, you can likely eliminate the observation number as a
contributor to this problem. (Unless your generic dataset for
testing was sufficiently large, you might not have triggered this
source of error should it exist.)
HTH,
Mike
On Oct 17, 2008, at 4:34 PM, Howard Lempel wrote:
Hello all,
I'm using Stata 10 (last updated 10/10/07) and am having a bit of
trouble with the -regexr- function. I can't tell if I've stumbled
on a bug or if I'm doing something wrong.
I am trying to use -regexr- to transform a string variable called
lfpatfin. I'd like to take every observation where the last letter
in lfpatfin is "U" and substitute an "E" for the "U". The code
appears to work except that two observations where lfpatfin was
missing have been replaced with an "E". This appears to be similar
to a problem Yun Liu had with -regexm- on July 16 in this thread:
http://www.stata.com/statalist/archive/2008-07/msg00596.html, but I
can't tell if Yun's issue was ever resolved. I have been unable to
reproduce the problem using the auto dataset or a dataset generic
dataset I created. My code and some output follows. I did nothing
to test in between generating it and the -list- command.
gen test = regexr(lfpatfin,"U$", "E")
list lfpatfin test in 1/1000 if lfpatfin != test
+-----------------+
| lfpatfin test |
|-----------------|
70. | FRU FRE |
105. | RFU RFE |
148. | U E |
161. | U E |
554. | FU FE |
|-----------------|
861. | FU FE |
914. | U E |
+-----------------+
list lfpatfin test if missing(lfpatfin) & !missing(test)
+-----------------+
| lfpatfin test |
|-----------------|
20074. | E |
24067. | E |
+-----------------+
. list lfpatfin test in 16000/16200 if missing(lfpatfin)
+-----------------+
| lfpatfin test |
|-----------------|
16156. | |
16162. | |
16166. | |
16170. | |
16175. | |
|-----------------|
16176. | |
16179. | |
16180. | |
16183. | |
16186. | |
|-----------------|
16197. | |
+-----------------+
For what it's worth, I try to make similar changes to lfpatfin
(substituting "B"s for final "D"s) later in my code and had the
same problem.
I'd appreciate it a lot if anyone has any explanation. I also do
not know how to see what Stata has updated since my last update,
but I would be grateful if anyone knows where to go for that - I'd
like to check whether the -regex- functions have been changed.
Unfortunately, I don't have the admin rights to update my version
of Stata.
Thanks for your consideration.
Howie
Howie Lempel
Research Assistant
The Brookings Institution | Economic Studies
1775 Massachusetts Ave NW | Washington DC 20036
[email protected] | p: (202) 238-3576
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/