But that is open to the same comment. Your result from -egen-
counts how many observations in total satisfy the stated criteria.
In data cleaning knowing which they are is the key issue,
at least in my experience.
Anders Alexandersson
Ah, thanks Nick. I forgot in the haste how to create regular sums. I
meant
egen found = total( regexm(codeks, "A") + regexm(codeks, "X") ) >= 1
Having now read the FAQ for regular expressions at
http://www.stata.com/support/faqs/data/regex.html
it seems that regexm() uses the pipe character for logical or, so I
also suggest this solution:
gen found = regexm(codeks, ["A" | "X"])
Nick Cox <[email protected]> wrote:
> Anders' solution makes use of -sum()-. That would cumulate
> from observation to observation. It sounds to me as if
> Thaddee wants to look at each observation separately.
>
> See also my solution suggested earlier.
>
> (Stata had an -index()- function, but from Stata 10 it is available
> only under version control. -strpos()- is now the equivalent.)
>
> Thaddee Badibanga <[email protected]> wrote:
>
> > I'd like to create an index from a
> > variable which is a pseudo numeric or a string(numeric
> > as well character). This index will allow me to
> > eliminate some observations in the dataset. To give
> > you an idea, the variable I termed codeks is as
> > follows:
> > codeks:101 102 01A 01X 0AX ...103 ... 111 112 ...11111
> >
> > I'd like to create an index that assigns 1 if codeks
> > includes A or X or AX and 0 otherwise. I have done
> > this in other programs. In one program for instance,
> > this can be done as:
> > found=indexc(codeks,"A","X")
> >
> > I will really appreciate your help. I have spent more
> > than 3 hours without success.
>
> I am not aware of a similar function in Stata. But the regexm() string
> function combined with a Boolean expression should work. This FAQ
> explains Boolean expressions in Stata:
> http://www.stata.com/support/faqs/data/trueorfalse.html
> For example, regexm(codeks, "A") would evaluate to 1 if codeks has the
> string A, and to 0 otherwise.
>
> I have not tried the following, but I think it will work as you
> intended:
> gen found = sum( regexm(codeks, "A") + regexm(codeks, "X") ) >= 1
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/