David Elliott <[email protected]> writes,
> I want to tokenize groups of numbers separated by the "|" character:
> e.g.: 1 2 3 | 4 5 6| 7 8 | 9 so that I have each group in a positional
> macro _1 = 1 2 3, _2 = 4 5 6 ... However, I have found that tokenize
> does not behave as I expected.
> [...]
This is a perfect problem for Mata. We can write a subroutine so that,
in our ado-file, we can code
program ...
...
mata: mytokenize("input")
...
end
and, if local macro input contains "1 2 3 | 4 5 6 | 7 8 | 9", after
-mata: mytokenizes("input")- runs, local macros _1, _2, _3, _4, and _5
will be defined to be
_1 = "1 2 3"
_2 = "4 5 6"
_3 = "7 8"
_4 = "9"
_5 = ""
In the above, note that I am passing the NAME of the local macro to
-mytokenize()-. I could just as easily write -mytokenize()- to accept
the contents of the local macro, so that, rather than coding
mata: mytokenize("input")
I would code
mata: mytokenize("`input'")
Actually, writing -mytokenize()- to accept input the second way would be
easier, but I'm opting for the first because, if input contains a long,
long string, our ado-file will run a little faster.
Anyway, here's the full solution
------------------------------------ myfile.ado --- BEGIN ---
*! version ...
program myfile
...
mata: mytokenize("input")
...
end
mata:
void mytokenize(string scalar macname)
{
string scalar s
real scalar i, l
s = strtrim(st_local(macname))
i = 1
while (l = strpos(s, "|")) {
if (l>1) {
st_local(strofreal(i++),
strtrim(substr(s, 1, l-1)))
s = strtrim(substr(s, l+1, .))
}
else s = strtrim(substr(s, 2, .))
}
if (s != "") st_local(strofreal(i++), s)
st_local(strofreal(i), "")
}
end
------------------------------------ myfile.ado ----- END ---
That is the full solution and I wanted to show that just to make clear
mechanically where everything goes in the final ado-file, but what I
actually did to write -mytokenize()- was create a do-file where I could
easily test it, planning later to change it to the final ado-file:
------------------------------------- testit.do --- BEGIN ---
clear
mata:
void mytokenize(string scalar macname)
{
string scalar s
real scalar i, l
s = strtrim(st_local(macname))
i = 1
while (l = strpos(s, "|")) {
if (l>1) {
st_local(strofreal(i++),
strtrim(substr(s, 1, l-1)))
s = strtrim(substr(s, l+1, .))
}
else s = strtrim(substr(s, 2, .))
}
if (s != "") st_local(strofreal(i++), s)
st_local(strofreal(i), "")
}
end
local test "1 2 3 | 4 5 6 | 7 8"
mata: mytokenize("test")
mac list
local test "1 2 3 | 4 5 6 |"
mata: mytokenize("test")
mac list
------------------------------------- testit.do ----- END ---
Concerning -mytokenize()-,
1. The declarations at the top are all optional. That is, rather than
code
void mytokenize(string scalar macname)
{
string scalar s
real scalar i, l
s = strtrim(st_local(macname))
...
I could just code,
void mytokenize(macname)
{
s = strtrim(st_local(macname))
...
I include the declarations becuase (a) that's my style (it helps me
to avoid mistakes), and because (b) they make it a little easier for
others to understand my code (because I have told the reader how I
intend to use s, i, and l).
2. The guts of the program is the while loop:
while (l = strpos(s, "|")) {
...
}
The use of the single equal sign is tricky. The -while- statement
does *NOT* say, "while l is equal to strpos(...)". If I wanted
that, I would have coded -while (l==strpos(...))-.
The -while- statement says, "assign to l the value of strpos(...);
while l is not equal to zero".
-strpos(...)- tells me the position of the next "|" in s.
I save that in l. -strpos(...)- returns 0 if there is no "|" in
s. I continue doing the loop as long as there is another "|".
3. Inside the loop, I have separate code for l==1 and l>1. I assume
l==1 should never happen, but I wanted to cover all the contingencies.
If l==1, then the input was something like:
1 2 3 | 4 5 6 || 7 8
1 2 3 | 4 5 6 | | 7 8
1 2 3 | 4 5 6 | 7 8 |
and I treat them all as if the input were
1 2 3 | 4 5 6 | 7 8
Perhaps David wants to do something different in those cases.
4. Note that when I'm all done (last line of program), I code
st_local(strofreal(i), "")
I set the last macro `#' macro to empty string just in case it
already exists.
I have a postscript: Mata is a great string-processing language. When you do
not find what you need in Stata, think about writing your own Mata function to
provide exactly what you need. I find it easier to do that than to work
around in the ado-language the limitations of what is more convenient to
obtain.
Mata string functions have a second advantage: they work with long, long
strings. Strings as a long as a macro, or longer.
-- Bill
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/