Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Manipulation of string variable using -regexm-
From
Federico Belotti <[email protected]>
To
[email protected]
Subject
Re: st: Manipulation of string variable using -regexm-
Date
Fri, 11 Oct 2013 22:59:12 +0200
Dear Herve
my suggestion is to use the command -screening-, a Stata's user-written string variables exploring and recoding tool.
You need to search and install it using
findit screening
Once installed, the syntax you are looking for to obtain a new numeric variable equal to 0 if not star, 1 if only *, 2 if *- and 3 if *+ is the following
screening, source(CurrRtg, upper) key(end "\*" end "\*-" end "\*\+" end "[A-Z]") new(mark, numeric) recode(1 "1" 2 "2" 3 "3" 4 "0")
where
1) the option -source()- specifies the source variable that have to be recoded (note the suboption -upper- which allows to perform a case-insensitive match (uppercase));
2) the option -key()- specifies the keywords you are looking for (in this case represented by regular expressions);
3) the option -new()- specifies the name of the new variable to be created (in this case, I called it "mark". Note the suboption -numeric- that allows to get the newly created variable as a numeric variable);
4) the option -recode()- specifies the user-defined coding scheme following the keywords order.
See -help screening- for more details.
Hope this helps.
Federico
On Oct 11, 2013, at 6:40 PM, STOLOWY, Herve wrote:
> Dear Statalisters:
>
> Using Stata 12.1, I want to extract a portion of a string variable using
> regular expressions, i.e. -regexs- and -regexm-.
>
> My string variable has different possible values. Example:
>
> A
> A *
> A *-
> A *+
> B
> B *
> B *-
> B *+
> etc.
>
> I would like to get a variable with the content filled with the * or *- or
> *+ or with this type of coding:
>
> 0 if not star
> 1 if only *
> 2 if *-
> 3 if *+
>
> The * or *- or *+ always appear at the end on the value.
>
> I tried the following syntax:
>
> gen var_star =3D regexs(0) if(regexm(CurrRtg, "\*" "\*+" "\*-"))
>
> Unfortunately, I get a * in all cases there is a * included in the value,
> but I do not get the *- or *+.
>
> I have difficulties with the syntax of -regexm-.
>
> There is maybe another way to get the same result.
>
> Best regards
>
> Herve Stolowy
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
--
Federico Belotti, PhD
Research Fellow
Centre for Economics and International Studies
University of Rome Tor Vergata
tel/fax: +39 06 7259 5627
e-mail: [email protected]
web: http://www.econometrics.it
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/