Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Re: finding a word within a string variable in Stata 12
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: Re: finding a word within a string variable in Stata 12
Date
Thu, 22 Mar 2012 06:56:20 +0000
For completeness, note that
gen benev = strpos(orgname, "Benev") & strpos(orgname, "Assoc")
gets you there too (and that as above your two statements could be
collapsed to one). I am not dogmatic against regex machinery. For
examples see
Cox, N.J. 2011
Speaking Stata: MMXI and all that: Handling Roman numerals within Stata.
Stata Journal 11(1): 126-142.
Abstract. The problem of handling Roman numerals in Stata is used to
illustrate issues arising in the handling of classification codes in
character string form and their numeric equivalents. The solutions
include Stata programs and Mata functions for conversion from numeric
to string and from string to numeric. Defining acceptable input and
trapping and flagging incorrect or unmanageable inputs are key
concerns in good practice. Regular expressions are especially valuable
for this problem.
and -moss- from SSC by Robert Picard and myself. I just find myself
pointing out how easy -strpos()- is to use in many problems.
Nick
On Thu, Mar 22, 2012 at 1:28 AM, Michael Mulcahy
<[email protected]> wrote:
> I have been using regexm way too much recently - I'm categorizing non-profit organizations based strings of organizational names, such as:
>
>
> obs1: orgname == "Seattle Brotherhood of Whatever Benevolent Association" and
> obs2: orgname == "Memphis Big Capital Employees Benevolent Assoc"
> obs3: orgname == "Peoria Association of Dairy Farmers"
> My klunky approach is:
>
>
> gen benev = 0
> replace benev = regexm(orgname, "Benev") & regexm(orgname, "Assoc")
>
>
> This codes obs1 & obs2 as "1", and leaves obs3 as "0"
Nick Cox <[email protected]>
> I haven't tried to see what doesn't work with the regex machinery
> because this problem seems to call only for
>
> gen construction = strpos(sic, "construction") > 0
On Wed, Mar 21, 2012 at 7:28 PM, Navarro Paniagua, Maria
>> I am trying to find a word (for instance construction) within a string
>> variable (sic), the string can have as categories (construction 1, b
>> construction)
>>
>>
>>
>> Could you please help me with this?
>>
>>
>>
>> gen construction = regexs(1) if regexm(sic, "[construction]+")
>>
>> g one = 1 if strmatch(sic, "*constr*")
>>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/