Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: using the command substr in stata


From   Svend Juul <[email protected]>
To   <[email protected]>
Subject   Re: st: using the command substr in stata
Date   Wed, 30 Jul 2008 09:44:37 +0200

Carmen wrote:

I have a string variable called disease_ICD (oldvar) which has 
the values of "International Statistical Classification of 
Diseases and Related Health Problems - ICD 9 and ICD 10"

I need to create a new variable disease_ICDgroup (newvar) 
containing grouped values of disease_ICD (oldvar). The 
equivalent in EPI INFO is:

. Define newvar TEXTINPUT

. IF(substring(oldvar,1,3)>="001"AND substring(oldvar,1,3)<"799"THEN
ASSIGN newvar ="1" END

. IF(substring(oldvar,1,3)>="A00"AND substring(oldvar,1,3)<"U99"THEN
ASSIGN newvar ="1" END

. IF(substring(oldvar,1,3)>="800"AND substring(oldvar,1,3)<"999"THEN
ASSIGN newvar ="2" END

. IF(substring(oldvar,1,3)>="V00"AND substring(oldvar,1,3)<"Y99"THEN
ASSIGN newvar ="2" END

Note: The first three characters from oldvar are the same 
in all banks (more than 20 banks) which allowed me to create 
ranges and commands that can be used in all banks. 

How do I do this in STATA?

===============================================================

First, I would generate a help variable -old3- since this is used
repeatedly:

   generate str old3 = substr(oldvar,1,3)

Next, it goes:

   generate str newvar = ""
   replace newvar = "1" if old3>="001" & old3<"799"
   replace newvar = "1" if old3>="A00" & old3<"U99"
   replace newvar = "2" if old3>="800" & old3<"999"
   replace newvar = "2" if old3>="V00" & old3<"Y99"

I wonder, however, if you want -newvar- to be string; numaric 
variables are handier:

   generate newvar = .
   replace newvar = 1 if old3>="001" & old3<"799"
   replace newvar = 1 if old3>="A00" & old3<"U99"
   replace newvar = 2 if old3>="800" & old3<"999"
   replace newvar = 2 if old3>="V00" & old3<"Y99"

Note that you may use the relational operators > and < with 
strings. The rule is that strings follow dictionary sequence; 
however, all uppercase letters come before lowercase, numbers 
come before letters, and spaces or blanks come before 
anything else. So:

   " " < "12" < "2" < "A" < "AA" < "Z" < "a"

You could have found information about the substr() function by:

   findit substring

Hope this helps
Svend
__________________________________________

Svend Juul
Institut for Folkesundhed, Afdeling for Epidemiologi
(Institute of Public Health, Department of Epidemiology)
Vennelyst Boulevard 6
DK-8000  Aarhus C, Denmark
Phone:  +45 8942 6090
Home:   +45 8693 7796
Email:  [email protected]
__________________________________________ 

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index