Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: RE: regular expressions has too many literals
From
Kieran McCaul <[email protected]>
To
"[email protected]" <[email protected]>
Subject
RE: st: RE: regular expressions has too many literals
Date
Tue, 26 Feb 2013 12:41:12 +0800
...
OK, how about this:
In your existing dataset:
use main, clear
gen byte flag=0
save main, replace
use the names dataset with one variable -team- that contains the team names.
use names, clear
forvalues i = 1/`=_N' {
local name = team[i]
preserve
use main, clear
replace flag = 1 if regexm(string,"`name'")
save main, replace
restore
}
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Dimitriy V. Masterov
Sent: Tuesday, 26 February 2013 12:20 PM
To: Statalist
Subject: Re: st: RE: regular expressions has too many literals
Unfortunately regular expressions are required here since the string contains additional idiosyncratic text. I should have made that explicit.
DVM
On Mon, Feb 25, 2013 at 7:52 PM, Kieran McCaul <[email protected]> wrote:
> ...
>
>
> Put the team names in a new dataset with a variable name that is the same as the string variable in the existing dataset that you are searching.
>
> Now merge the two datasets on that variable name and _merge==3 will indicate the matches.
>
>
>
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Dimitriy V.
> Masterov
> Sent: Tuesday, 26 February 2013 11:41 AM
> To: Statalist
> Subject: st: regular expressions has too many literals
>
> I would like to do something like this:
>
> keep if regexm(string,"Buffalo Bills") | regexm(string,"Dallas
> Cowboys") | regexm(string,"Miami Dolphins") | regexm(string,"New York
> Giants") | regexm(string,"New England Patriots") |
> regexm(string,"Philadelphia Eagles") | regexm(string,"New York Jets")
> | regexm(string,"Washington Redskins") | regexm(string,"Baltimore
> Ravens") | regexm(string,"Chicago Bears") | regexm (string,"Cincinnati
> Bengals") | regexm(string,"Detroit Lions") | regexm(string,"Cleveland
> Browns") | regexm(string,"Green Bay Packers")
> | regexm(string,"Pittsburgh Steelers") | regexm(s tring,"Minnesota
> Vikings") | regexm(string,"Houston Texans") | regexm(string,"Atlanta
> Falcons") | regexm(string,"Indianapolis Colts") |
> regexm(string,"Carolina Panthers") | regexm(string,"Jacksonville
> Jaguars") | regexm(string,"New Orleans Saints") |
> regexm(string,"Tennessee Titans") | regexm(string,"Tampa Bay
> Buccaneers") | regexm(string,"Denver Broncos") |
> regexm(string,"Arizona Cardinals") | regexm(string,"Kansas City
> Chiefs") | regexm(string,"San Francisco 49ers") |
> regexm(string,"Oakland Raiders") | regexm(string,"Seattle Seahawks") |
> regexm(string,"San Diego Chargers") | regexm(string,"St. Louis Rams")
>
> Just looking at this, you know the expression is too long for Stata to evaluate. Is the only way around this to loop over the 32 team names like this:
>
> gen keepers = .
> foreach team in "Buffalo Bills" "Dallas Cowboys" "Miami Dolphins" "New York Giants" "New England Patriots" "Philadelphia Eagles" "New York Jets" "Washington Redskins" "Baltimore Ravens" "Chicago Bears"
> "Cincinnati Bengals" "Detroit Lions" "Cleveland Browns" "Green Bay Packers" "Pittsburgh Steelers" "Minnesota Vikings" "Houston Texans"
> "Atlanta Falcons" "Indianapolis Colts" "Carolina Panthers"
> "Jacksonville Jaguars" "New Orleans Saints" "Tennessee Titans" "Tampa Bay Buccaneers" "Denver Broncos" "Arizona Cardinals" "Kansas City Chiefs" "San Francisco 49ers" "Oakland Raiders" "Seattle Seahawks"
> "San Diego Chargers" "St. Louis Rams" {
> replace keepers = 1 if regexm(string,"`team'") } keep if keepers
> ==1
>
> Or is there a more clever way?
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/