Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: AW: regular expression matching


From   [email protected]
To   [email protected]
Subject   Re: st: AW: regular expression matching
Date   Mon, 11 Jan 2010 14:33:13 -0500

Just to finish this off, perhaps. Joe was looking for a regular
expression to match everything preceding the first comma. The
following will work in BBEdit' and in Stata; the expression to be
matched is inside the curved brackets.

"^([^\,]+)\,.+$"

If the last ".+$" is omitted, the expression will work in Stata but
not in BBEdit.

The following expression works in BBEdit but not in Stata:
"^(.+?)\,.+$"

Stata's parser apparently does not incorporate the non-greedy matching
function provided by "?".

-Steve

On Mon, Jan 11, 2010 at 10:27 AM,  <[email protected]> wrote:
> "regexm(address,"^([0-9a-zA-Z\.\-\' ]+)\,")"
> does.
>
> Steve
>>>> <>
>>>>
>>>> If you do insist on using -string- functions (see [D], p. 224):
>>>>
>>>>
>>>> *************
>>>> clear
>>>> input str60 address
>>>> "4905 Lakeway Drive, College Station, Texas 77845 USA"
>>>> "673 Jasmine Street, Los Angeles, CA 90024"
>>>> "2376 First street, San Diego, CA 90126"
>>>> "6 West Central St, Tempe AZ 80068"
>>>> "1234 Main St. Cambridge, MA 01238-1234"
>>>> end
>>>>
>>>> compress
>>>>
>>>> gen str25 first=substr(address, 1, strpos(address, ",")-1)
>>>> l address first, noo
>>>> *************
>>>>
>>>>
>>>>
>>>> HTH
>>>> Martin
>>>>
>>>>
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: [email protected]
>>>> [mailto:[email protected]] Im Auftrag von joe j
>>>> Gesendet: Montag, 11. Januar 2010 14:23
>>>> An: [email protected]
>>>> Betreff: Re: st: AW: regular expression matching
>>>>
>>>> fantastic! thanks much Martin.
>>>>
>>>> On Mon, Jan 11, 2010 at 2:07 PM, Martin Weiss <[email protected]> wrote:
>>>>>
>>>>> <>
>>>>>
>>>>>
>>>>>
>>>>> *************
>>>>> clear
>>>>> input str60 address
>>>>> "4905 Lakeway Drive, College Station, Texas 77845 USA"
>>>>> "673 Jasmine Street, Los Angeles, CA 90024"
>>>>> "2376 First street, San Diego, CA 90126"
>>>>> "6 West Central St, Tempe AZ 80068"
>>>>> "1234 Main St. Cambridge, MA 01238-1234"
>>>>> end
>>>>>
>>>>> split address, parse(,)
>>>>> ren address1 first
>>>>>
>>>>> l address first, noo
>>>>> *************
>>>>>
>>>>>
>>>>>
>>>>> HTH
>>>>> Martin
>>>>>
>>>>>
>>>>> -----Ursprüngliche Nachricht-----
>>>>> Von: [email protected]
>>>>> [mailto:[email protected]] Im Auftrag von joe j
>>>>> Gesendet: Montag, 11. Januar 2010 14:03
>>>>> An: [email protected]
>>>>> Betreff: st: regular expression matching
>>>>>
>>>>> >From a string address variable I want to extract the portion of the
>>>>> text preceding the 'first' comma.
>>>>>
>>>>> Let me illustrate this with the following example:
>>>>>
>>>>> clear
>>>>> input str60 address
>>>>> "4905 Lakeway Drive, College Station, Texas 77845 USA"
>>>>> "673 Jasmine Street, Los Angeles, CA 90024"
>>>>> "2376 First street, San Diego, CA 90126"
>>>>> "6 West Central St, Tempe AZ 80068"
>>>>> "1234 Main St. Cambridge, MA 01238-1234"
>>>>> end
>>>>>
>>>>> >From the address column, I want to create a column named First:
>>>>>
>>>>> 4905 Lakeway Drive
>>>>> 673 Jasmine Street
>>>>> 2376 First street
>>>>> 6 West Central St
>>>>> 1234 Main St. Cambridge
>>>>>
>>>>> I tried the following:
>>>>> gen first = regexs(1) if (regexm(address, "(.*)[,]"))
>>>>>
>>>>> This however extracts everything in address preceding the last comma,
>>>>> not the first comma.
>>>>>
>>>>> Any pointers would be appreciated.
>>>>> JJ
>>>>> *
>>>>> *   For searches and help try:
>>>>> *   http://www.stata.com/help.cgi?search
>>>>> *   http://www.stata.com/support/statalist/faq
>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>
>>>>>
>>>>> *
>>>>> *   For searches and help try:
>>>>> *   http://www.stata.com/help.cgi?search
>>>>> *   http://www.stata.com/support/statalist/faq
>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/statalist/faq
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/statalist/faq
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>
>>
>>
>> --
>> Steven Samuels
>> [email protected]
>> 18 Cantine's Island
>> Saugerties NY 12477
>> USA
>> 845-246-0774
>>
>
>
>
> --
> Steven Samuels
> [email protected]
> 18 Cantine's Island
> Saugerties NY 12477
> USA
> 845-246-0774
>



-- 
Steven Samuels
[email protected]
18 Cantine's Island
Saugerties NY 12477
USA
845-246-0774

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index