Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: routine for matching of a str-variable |
Date | Wed, 11 May 2011 18:19:33 +0100 |
The solution depends on the problem. The third law of string operations says "Never use regex machinery if you don't need it.". Thus the distinct values of your string variable will be given by -levelsof- and looping over those levels will give you as many indicators as you need. Even better, -tab, generate()- will do it for you in one line! regex machinery would presumably only be needed if you suspected spelling mistakes and then you would still need to think how much latitude you need to allow. Nick On Wed, May 11, 2011 at 5:52 PM, Thomas Zimmermann <t.zimmermann@uke.de> wrote: > I want to check the prevalence of 200+ pharmaceutical agents in a dataset of > 14000+ ATC-codes in 3327 patients. The table with the pharmaceutical agents > is organised this way: > > "pharmaceutical agent (str)" "atc-code (str)" > i=1 > 2 Memantine N06DX01 > 3 Estron G03CA07 > 4 Promestrien G03CA09 > i=200 > > > I'm looking for a routine that first creates a new variable "atc-code". this > var should store the information (1), if the atc-code is matched, (0) if > it's not. > > my workaround (if it deserves that name) til now is to copy 200+times, then > re"submit" the different atc-code by hand, :-(. > > gen byte N06DX01 = regexm(atc-code, "^[N]+[0]+[6]+[D]+[X]+[0]+[1]+") > label var N06DX01 "Memantine" > tabulate N06DX01 > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/