In many regular expression engines one can use the symbol \b to denote a
word boundary. For instance, in unix, the following use of '\b' allows
us to select only those lines in a file that contain the letter 's'
where it stands alone, not next to any other letter.
UNIX> cat z
dogs and cats
sss
s, he said
george's crown
UNIX> egrep 's' z
dogs and cats
sss
s, he said
george's crown
UNIX> egrep '\bs\b' z
s, he said
george's crown
UNIX>
Is there a way to do this in Stata? The following attempt did not work:
. list
+-----------------------+
| var1 var2 |
|-----------------------|
1. | 1 dogs and cats |
2. | 2 sss |
3. | 3 s, he said |
4. | 4 george's crown |
+-----------------------+
. list if regexm(var2, "s")
+-----------------------+
| var1 var2 |
|-----------------------|
1. | 1 dogs and cats |
2. | 2 sss |
3. | 3 s, he said |
4. | 4 george's crown |
+-----------------------+
. list if regexm(var2, "\bs\b")
. list if regexm(var2, "\\bs\\b")
Thanks for any info
Jake Wegelin
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/