Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: routine for matching of a str-variable
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: routine for matching of a str-variable
Date
Wed, 11 May 2011 18:19:33 +0100
The solution depends on the problem.
The third law of string operations says "Never use regex machinery if
you don't need it.".
Thus the distinct values of your string variable will be given by
-levelsof- and looping over those levels will give you as many
indicators as you need.
Even better, -tab, generate()- will do it for you in one line!
regex machinery would presumably only be needed if you suspected
spelling mistakes and then you would still need to think how much
latitude you need to allow.
Nick
On Wed, May 11, 2011 at 5:52 PM, Thomas Zimmermann <[email protected]> wrote:
> I want to check the prevalence of 200+ pharmaceutical agents in a dataset of
> 14000+ ATC-codes in 3327 patients. The table with the pharmaceutical agents
> is organised this way:
>
> "pharmaceutical agent (str)" "atc-code (str)"
> i=1
> 2 Memantine N06DX01
> 3 Estron G03CA07
> 4 Promestrien G03CA09
> i=200
>
>
> I'm looking for a routine that first creates a new variable "atc-code". this
> var should store the information (1), if the atc-code is matched, (0) if
> it's not.
>
> my workaround (if it deserves that name) til now is to copy 200+times, then
> re"submit" the different atc-code by hand, :-(.
>
> gen byte N06DX01 = regexm(atc-code, "^[N]+[0]+[6]+[D]+[X]+[0]+[1]+")
> label var N06DX01 "Memantine"
> tabulate N06DX01
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/