Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: String function headache.
From
Scott Talkington <[email protected]>
To
[email protected]
Subject
Re: st: String function headache.
Date
Mon, 25 Apr 2011 09:08:56 -0400
That is very helpful, thanks. I wasn't sure whether the "#" character
was an operator of some kind, and that was the reason I was getting odd
results. Apparently it's not, in this case, but it often is. The other
thing that always confuses me about these string functions combined with
foreach is that I'm never sure where to place the quotes, especially if
operators are involved.
--Scott
On 4/25/2011 5:47 AM, Nick Cox wrote:
To expand on this, with problem-solving hints.
Learning software from definitions is like learning mathematics from
definitions. If you know the concept already, or are super-smart, you
can see immediately what is implied. The rest of us need examples.
In my class learning mathematics in secondary [high] school, there was
one guy who always seemed to understand each new mathematical idea
immediately. (He became a mountaineer, but that is a different story:
http://en.wikipedia.org/wiki/Alan_Rouse ). Almost all the rest of us
needed examples. (In fact I now guess that he sometimes played small
psychological games with us, as usually he had read ahead on his own.)
I don't think I've ever used -strmatch()- before answering this
question. I've always used -strpos()- for finding literal matches or
turned to -regex*()-. That just means what it says, but I had to find
out too quite how -strmatch()- works.
In my experience, as in Scott's example, the real problem involves a
dataset I care about with variables. But when I don't understand, I
fire up -display- and play with very simple examples.
I found this.
In looking for a literal character, an pattern expression matches itself,
. di strmatch("2", "2")
1
but matching means matching, not inclusion:
. di strmatch("42", "2")
0
You need the pattern to be big enough
. di strmatch("42", "?2")
1
. di strmatch("42", "*2")
1
. di strmatch("42", "*2*")
1
A silly analogy: will a shirt fit you? If it's too small, the answer
is just a No. If it fits exactly, or it's bigger than you are, the
answer is a Yes, and you then have to decide whether too big is a
problem or not. (No for formal wear, possibly OK if you want something
really loose.) Similarly with -strmatch()- the pattern can be bigger
than you need, but the answer will still be a Yes.
On Mon, Apr 25, 2011 at 9:28 AM, Nick Cox<[email protected]> wrote:
If you want to check for occurrence, just use -strpos()- instead. I
often see people on this list struggling with the regex functions or
-strmatch()- when a simpler function will do the job. I have offered a
talk on functions for the London users' meeting and this point is
already one of the slides.
foreach y in # {
forvalues x=1/6 {
replace mynumber `x'= strpos(mystring`x', "`y'")> 0
}
Otherwise, my understanding is this: a pattern that is just a literal
character will be matched only by strings that are exactly that
character; for almost all matching problems, you must specify * and/or
?. You seem to be expecting -strmatch()- to behave more like
-regexm()-, but they have different jobs.
But as said -strpos()- is easier to figure out.
Nick
On Mon, Apr 25, 2011 at 4:45 AM, Scott Talkington<[email protected]> wrote:
I just can't seem to make this work. What I want to do is search for any
occurrence of the "#" character in a string variable and set a flag for that
observation. I'm searching 6 different strings labeled something like
mystring1 mystring2 etc. and the flags are mynumber1 mynumber2 etc..
So my do file:
forvalues x=1/6 {
foreach y in # {
replace mynumber `x'= strmatch(mistring`x', "`y'")
}
}
I just listed one character in the y list above, but in reality I'm not
having a problem with normal strings like "APT" but with wildcards and with
the number sign character itself.
I assumed that placing a "?" character iyn the search string (s2) would
match zero or one characters + the "#" but it seems to be matching all
strings with one character that are either a number or a letter. Huh?
If I include the wildcard (either the asterisk or the question mark)
*anywhere* (either in the "foreach" part of the do file or in the "replace"
command) it just doesn't work the way I expect it to. There's a difference
between what I get depending on how many quotes I use and where as well,
but I'm just not getting anything that does what I want it to. I've even
tried using the backslash character to indicate that I don't want the "#" to
be read as an operator, but I'm not even sure where to put the backslash or
how to arrange the quotation marks. It's driving me nuts. There's some
rule here that I'm just not getting.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/