Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: regex-syntax error: "regexp: unmatched []"; no possibility to stop the do-file


From   David Elliott <[email protected]>
To   [email protected]
Subject   Re: st: regex-syntax error: "regexp: unmatched []"; no possibility to stop the do-file
Date   Mon, 26 Jan 2009 16:19:43 -0400

The other issue is: do you need a regex? or would a normal string
matching expression expression do the trick:
replace v1=1 if strpos(v1,"[]")>0
A regex expression actually invokes a regex program each time it is
used which is far more computationally expensive than a simple match
expression.  With a large dataset, this can be a non-trivial
difference.

By way of example:
x---------begin code--------x
// you must -clear- before running
set memory 500m
sysuse auto
keep make
expand 250000
des
set rmsg on
gen byte test1 = (strpos(make,"AMC"))
gen byte test2 = (strmatch(make,"AMC*"))
gen byte test3 = (regexm(make,"AMC"))
// and just to show we get the same result
tab2 test*
set rmsg off
x---------end code--------x

On my system I get the following timings:

. gen byte test1 = (strpos(make,"AMC"))
r; t=6.24 16:12:39

. gen byte test2 = (strmatch(make,"AMC*"))
r; t=5.69 16:12:45

. gen byte test3 = (regexm(make,"AMC.*"))
r; t=22.89 16:13:07

giving a fourfold difference between strmatch and regexm in terms of
processing time.

In the example you gave there appears to be no need for the
sophistication (and debugging problems oft associated with) of a
regex.

DCElliott
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index