Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: problem with regexm leading to "regexp: unterminated ()" error for all observations
From
Steven Samuels <[email protected]>
To
[email protected]
Subject
Re: st: RE: problem with regexm leading to "regexp: unterminated ()" error for all observations
Date
Fri, 3 Jun 2011 10:06:35 -0400
Jamie
I get no error when I cut and paste from the Wikipedia page, but I get no matches either. I wouldn't expect matches, because Stata's regular expression parser doesn't recognize the {} repeat function. (I do get matches with BBEdit's regular expression parser.) So you'll have to implement this outside of Stata.
Steve
[email protected]
On Jun 3, 2011, at 9:35 AM, Nick Cox wrote:
I guess there are small problems at least on various levels here.
First, the regular expression may well be long for Stata; Mata doesn't seem to have the same limits.
Second, I don't think the syntax {2} is supported by Stata.
I'd see if you can make progress by breaking it down into steps. Declare postcodes invalid and then change your mind each time they satisfy one of the possible patterns.
My own postcode is DH1 2NJ. Just a coincidence, but I like it.
Nick
[email protected]
Jamie Fagg
I've a problem with the function -regexm-. I get the following message:
regexp: unterminated ()
Frederico Belotti raised this in 2009
(http://www.stata.com/statalist/archive/2009-04/msg00573.html) and
Martin Weiss suggested contacting
Tech support but as far as I can see there is no other comment referring
to the error.
(http://www.stata.com/statalist/archive/2009-04/msg00575.html).
My aim: to find out which of a list of 22,907 postcodes conform to the
UK standard syntax.
I've never used regular expressions before, and I started trying to
build the regular expression myself yesterday and ran a few options
with some (limited) success before a colleague pointed me to a
pre-written regular expression on Wikipedia
(http://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom).
As this seems highly complex, has been done, and I really only want to
do this once, it would be very helpful to be able to simply use it
within Stata.
I have run the regular expression through a javascript regular
expression checker here (http://regexpal.com/) and it seemed to work
correctly, picking out the valid (E1 4NS, SW8 2XR)
versions of the postcodes in the example below.
This is an example of the code I used plus sample data if users want to
see if they can reproduce the error.
I would very much appreciate any feedback about this,
Best wishes,
Jamie
******start of example*********
input str15 postcode
E1 4NS
EI 4NS
SW8 2XR
SW8 ZXR
end
#delimit ;
//regular expression to define whether postcode is syntactically correct
ge postcodevalid = 1 if regexm(postcode,"(GIR 0AA)|(((A[BL]|B[ABDHLNRSTX]
?|C[ABFHMORTVW]|D[ADEGHLNTY]|E[HNX]?|F[KY]|G[LUY]?|H[ADGPRSUX]
|I[GMPV]|JE|K[ATWY]|L[ADELNSU]?|M[EKL]?|N[EGNPRW]?|O[LX]|P[AEHLOR]
|R[GHM]|S[AEGKLMNOPRSTY]?|T[ADFNQRSW]|UB|W[ADFNRSV]|YO|ZE)[1-9]?[0-9]
|((E|N|NW|SE|SW|W)1|EC[1-4]|WC[12])[A-HJKMNPR-Y]|(SW|W)([2-9]|[1-9]
[0-9])|EC[1-9][0-9]) [0-9][ABD-HJLNP-UW-Z]{2})")==1;
*****end of example*******
******My Stata specs********
Stata/SE 11.1 for Windows (32-bit)
Stata executable
folder: C:\Program Files\Stata11\
name of file: StataSE.exe
currently installed: 04 Nov 2010
Ado-file updates
folder: C:\Program Files\Stata11\ado\updates\
names of files: (various)
currently installed: 04 Jan 2011
Utilities updates
folder: C:\Program Files\Stata11\utilities
names of files: (various)
currently installed: 01 Sep 2010
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/