The other issue is: do you need a regex? or would a normal string
matching expression expression do the trick:
replace v1=1 if strpos(v1,"[]")>0
A regex expression actually invokes a regex program each time it is
used which is far more computationally expensive than a simple match
expression. With a large dataset, this can be a non-trivial
difference.
By way of example:
x---------begin code--------x
// you must -clear- before running
set memory 500m
sysuse auto
keep make
expand 250000
des
set rmsg on
gen byte test1 = (strpos(make,"AMC"))
gen byte test2 = (strmatch(make,"AMC*"))
gen byte test3 = (regexm(make,"AMC"))
// and just to show we get the same result
tab2 test*
set rmsg off
x---------end code--------x
On my system I get the following timings:
. gen byte test1 = (strpos(make,"AMC"))
r; t=6.24 16:12:39
. gen byte test2 = (strmatch(make,"AMC*"))
r; t=5.69 16:12:45
. gen byte test3 = (regexm(make,"AMC.*"))
r; t=22.89 16:13:07
giving a fourfold difference between strmatch and regexm in terms of
processing time.
In the example you gave there appears to be no need for the
sophistication (and debugging problems oft associated with) of a
regex.
DCElliott
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/