Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Nick Cox" <n.j.cox@durham.ac.uk> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: RE: range of a stringvariable |
Date | Wed, 28 Apr 2010 18:19:55 +0100 |
I don't see that, even with the conditional here. . di inrange("E30B", "E300", "E499") 1 And clearly the last character is not an "A". Of course, if you are telling me that ICD-9 codes are in some order that is not completely consistent with Stata's, then (a) I didn't know that and (b) the code may need adjustment (but not the principles). Nick n.j.cox@durham.ac.uk -----Original Message----- From: Richard Goldstein [mailto:richgold@ix.netcom.com] Sent: 28 April 2010 17:53 To: statalist@hsphsun2.harvard.edu Cc: Nick Cox Subject: Re: st: RE: range of a stringvariable depending on what the data actually look like, Nick's code will not give the correct answer; e.g., "E30B" will meet his condition but not the OP's condition Rich On 4/28/10 12:41 PM, Nick Cox wrote: > Some simpler ways of approaching this have not quite come to the surface > in this thread. > > Four key points: > > 1. You are not obliged to create lots of little variables. > > 2. You are not obliged to convert any bits and pieces to real unless you > genuinely want those results for other purposes. > > 3. Inequalities apply to strings as well as to numbers. The order > concerned is just alphanumeric order, precisely that used by Stata to > -sort- string variables. > > 4. -substr()- understands negative indexes as counted from the end of a > string. > > Thus > > if inrange(substr(code, 1, 4), "E300", "E499") & substr(code, -1, 1) != > "A" > > is a complete answer to the first question. Similarly > > if substr(code,-1,1) == "A" > > is a complete answer to the second question. > > It's the driest of dry reading but the functions section of the > documentation is an eye-opener in terms of the toolkit offered. > > Nick > n.j.cox@durham.ac.uk > > Tomas Lind > > Choose individuals based on a string variable with a range of values > > I am working with ICD-10 codes (codes for different types of diseases). > The > codes start with a letter A - Z followed by 2 or 3 digits. In some cases > they might end with the letter A. Say that I have a dataset with 5 > subjets > (id=1 to 5) with these ICD-10 codes (fake data, in reality I have > millions > of subjects): > > I460 E343 I46 C764 E438 > > How can I choose individuals with ICD-10 codes in the range E300 to E499 > (not including codes that end up with A). What about if I want to > include > codes that ends with an A. (There is a convenient command for ICD-9 > codes, > but not for ICD-10 codes.) * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/