Title | Labeling ICD codes with their descriptions | |
Author | Rebecca Pope, StataCorp |
While you cannot label ICD-9-CM or ICD-10 codes directly, you can still display information about their descriptions. There are two options:
Suppose you have data containing patient record IDs and ICD-9-CM diagnosis codes that look like
recid dx 150781 9110 150913 4241 151088 4254 151125 9033 151154 78650 151165 8028 151207 51881 151344 3051 151415 4321 151487 V140
Stata's icd9 generate, icd9p generate, and icd10 generate commands with the description option create a new variable with the description of the corresponding code.
. icd9 generate descr = dx, description . list, clean noobs recid dx descr 150781 9110 abrasion trunk 150913 4241 aortic valve disorder 151088 4254 prim cardiomyopathy nec 151125 9033 injury ulnar vessels 151154 78650 chest pain nos 151165 8028 fx facial bone nec-close 151207 51881 acute respiratry failure 151344 3051 tobacco use disorder 151415 4321 subdural hemorrhage 151487 V140 hx-penicillin allergy . describe Contains data from icd9exdata.dta obs: 10 vars: 3 20 Oct 2015 18:02 size: 330 (_dta has notes) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- recid float %9.0g Patient record ID dx str5 %9s Diagnosis descr str24 %24s label for dx ------------------------------------------------------------------------------- Sorted by: recid Note: Dataset has changed since last saved.
With the descriptions added, the size of the dataset is 330 bytes. We may be able to reduce the size of the dataset using encode.
To add a label to a numeric value, first create a string variable with the diagnosis description, then use encode.
. icd9 generate descr = dx, description long . encode descr, generate(dxlabeled) label(descrip)
The new variable is long by default, but we can use compress to make sure it is stored in the smallest possible numeric type.
. compress variable dxlabeled was long now byte (30 bytes saved)
Finally, drop the created string variable because it is unnecessary.
. drop descr
While you could also remove the original, unencoded diagnosis variable, you should keep it if you plan to do data manipulation based on the codes or if you might need to combine your dataset with new data in the future. Our dataset now looks like this:
. list, clean noobs recid dx dxlabeled 150781 9110 911.0 abrasion trunk 150913 4241 424.1 aortic valve disorder 151088 4254 425.4 prim cardiomyopathy nec 151125 9033 903.3 injury ulnar vessels 151154 78650 786.50 chest pain nos 151165 8028 802.8 fx facial bone nec-close 151207 51881 518.81 acute respiratry failure 151344 3051 305.1 tobacco use disorder 151415 4321 432.1 subdural hemorrhage 151487 V140 V14.0 hx-penicillin allergy
In general, using encode results in a smaller dataset than adding a variable that contains the descriptions.
. describe Contains data from icd9exdata.dta obs: 10 vars: 3 20 Oct 2015 18:02 size: 100 (_dta has notes) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- recid float %9.0g Patient record ID dx str5 %9s Diagnosis dxlabeled byte %32.0g descrip label for dx ------------------------------------------------------------------------------- Sorted by: recid Note: Dataset has changed since last saved.
The version of our dataset after using encode is only 100 bytes.