Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: range of a stringvariable

From	"Nick Cox" <[email protected]>
To	<[email protected]>
Subject	Re: st: RE: range of a stringvariable
Date	Wed, 28 Apr 2010 18:19:55 +0100

I don't see that, even with the conditional here. 

. di inrange("E30B", "E300", "E499")
1

And clearly the last character is not an "A". 

Of course, if you are telling me that ICD-9 codes are in some order that
is not completely consistent with Stata's, then (a) I didn't know that
and (b) the code may need adjustment (but not the principles). 

Nick 
[email protected] 


-----Original Message-----
From: Richard Goldstein [mailto:[email protected]] 
Sent: 28 April 2010 17:53
To: [email protected]
Cc: Nick Cox
Subject: Re: st: RE: range of a stringvariable


depending on what the data actually look like, Nick's code will not give
the correct answer; e.g., "E30B" will meet his condition but not the
OP's condition

Rich

On 4/28/10 12:41 PM, Nick Cox wrote:
> Some simpler ways of approaching this have not quite come to the
surface
> in this thread. 
> 
> Four key points: 
> 
> 1. You are not obliged to create lots of little variables. 
> 
> 2. You are not obliged to convert any bits and pieces to real unless
you
> genuinely want those results for other purposes. 
> 
> 3. Inequalities apply to strings as well as to numbers. The order
> concerned is just alphanumeric order, precisely that used by Stata to
> -sort- string variables. 
> 
> 4. -substr()- understands negative indexes as counted from the end of
a
> string. 
> 
> Thus 
> 
> if inrange(substr(code, 1, 4), "E300", "E499") & substr(code, -1, 1)
!=
> "A" 
> 
> is a complete answer to the first question. Similarly 
> 
> if substr(code,-1,1) == "A" 
> 
> is a complete answer to the second question.
> 
> It's the driest of dry reading but the functions section of the
> documentation is an eye-opener in terms of the toolkit offered. 
> 
> Nick 
> [email protected] 
> 
> Tomas Lind
> 
> Choose individuals based on a string variable with a range of values
> 
> I am working with ICD-10 codes (codes for different types of
diseases).
> The
> codes start with a letter A - Z followed by 2 or 3 digits. In some
cases
> they might end with the letter A. Say that I have a dataset with 5
> subjets
> (id=1 to 5) with these ICD-10 codes (fake data, in reality I have
> millions
> of subjects):
> 
> I460  E343  I46  C764  E438
> 
> How can I choose individuals with ICD-10 codes in the range E300 to
E499
> (not including codes that end up with A). What about if I want to
> include
> codes that ends with an A. (There is a convenient command for ICD-9
> codes,
> but not for ICD-10 codes.) 

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: st: updates on SSC: -findname- and -rcspline-
Next by Date: st: "doing anything the quickest way does no harm"
Previous by thread: st: updates on SSC: -findname- and -rcspline-
Next by thread: st: "doing anything the quickest way does no harm"
Index(es):
- Date
- Thread