ICD-10 codes are the standard for reporting international morbidity and mortality figures. They are also used by many countries to code diagnosis information for healthcare encounters such as a visit to a doctor or an admission to the hospital. The codes can be found in many administrative datasets such as death certificates, hospital discharge records, and medical billing forms.
When data are gathered from multiple sources, they may not be fully standardized. There can also be reporting errors. icd10 is designed to address these common challenges with secondary data. Finally, the number of codes means that analyzing the data in a meaningful way is often impossible without summarizing the information. Whether you want to add text to codes or create indicator variables, icd10 makes working with ICD-10 diagnosis codes easy.
We have 2010 mortality data for the United States—more than 2.4 million deaths.
. use female agerc cause place using vital10.dta, clear (US mortality data, 2010 -- CDC Vital Statistics) . describe Contains data from vital10.dta obs: 2,472,542 US mortality data, 2010 -- CDC Vital Statistics vars: 4 31 Mar 2015 13:46 size: 32,143,046
storage display value | ||
variable name type format label variable label | ||
female float %9.0g female Decedent is female, female=1, | ||
male=0 | ||
place byte %8.0g pod Place of death and status | ||
cause str4 %9s Cause of death (ICD-10 code) | ||
agerc float %14.0g agerc Age, Census recode | ||
Sorted by: |
We want to identify all deaths due to respiratory illnesses. Any of 275 codes can currently be used to define a respiratory illness, far more than we would ever want to type! A plausible alternative is to use a lookup table, but definitions are often provided in terms of a range of codes, leaving you to type the codes at least once to create the lookup table anyway.
icd10 provides a straightforward and fast alternative. All respiratory diagnoses fall in the range of J10 to J98.9, so the only thing we need to do is type
. icd10 generate resp = cause, range(J10/J989)
You do not need to provide separate ranges for category (3-character) and subcategory (4-character) codes because the range() option of icd10 treats category codes as the lowest value in a range.
We may wish to further examine deaths from pneumonia. We want to add an indicator for a pneumonia cause of death to only those decedents that we already know have a respiratory diagnosis.
. icd10 gen pneumonia = cause if resp==1, range(J12/J189) . tabulate pneumonia
pneumonia | Freq. Percent Cum. | |
0 | 187,594 79.07 79.07 | |
1 | 49,660 20.93 100.00 | |
Total | 237,254 100.00 |
We see that about 21% of all deaths from respiratory illnesses in the US in 2010 were from pneumonia.
Of course, we can use icd10 for many other tasks, such as checking that codes are defined, adding WHO's official descriptions of the codes to our dataset, standardizing formats, and more.
For now, though, let's look at a few of the many cases when data management with icd10 is useful.
It is useful if you need to identify populations with diseases or causes of deaths for reports. For example, we could create a summary dataset and then export it using export excel:
We have further customized the format of this table using Excel cell formatting.
It is useful if you want to create nicely labeled frequency plots. Using icd10 generate with the description and long options combined with Stata's commands to create and graph summary data, we can create graphs such as
It is useful if you are calculating basic epidemiological statistics. For example, we could create a summary dataset and add population information to calculate the number of deaths due to pneumonia by age and sex and then compare age-standardized rates by sex.
. contract female agerc if pneumonia==1, freq(pneudeaths) . describe using as2010, short Contains data Census 2010 population (by age and sex) (output omitted) . describe using 2010, short Contains data Census 2010 population by age (output omitted) . merge 1:1 female agerc using as2010, nogenerate (output omitted) . dstdize pneudeaths pop agerc, by(female) using(2010) (2 observations excluded because of missing values)
-> female= 0 | ||
-----Unadjusted----- Std. | ||
Pop. Stratum Pop. | ||
Stratum Pop. Cases Dist. Rate[s] Dst[P] s*P | ||
0-4 yrs 10319427 143 0.068 0.0000 0.065 0.0000 | ||
5-14 yrs 20969500 27 0.138 0.0000 0.133 0.0000 | ||
15-24 yr 22317842 89 0.147 0.0000 0.141 0.0000 | ||
25-34 yr 20632091 206 0.136 0.0000 0.133 0.0000 | ||
35-44 yr 20435999 408 0.135 0.0000 0.133 0.0000 | ||
45-54 yr 22142359 1038 0.146 0.0000 0.146 0.0000 | ||
55-64 yr 17601148 2084 0.116 0.0001 0.118 0.0000 | ||
65-74 yr 10096519 3416 0.067 0.0003 0.070 0.0000 | ||
75-84 yr 5476762 6809 0.036 0.0012 0.042 0.0001 | ||
85+ yrs 1789679 9186 0.012 0.0051 0.018 0.0001 | ||
Totals: 151781326 23406 Adjusted Cases: 29474.7 | ||
Crude Rate: 0.0002 | ||
Adjusted Rate: 0.0002 | ||
95% Conf. Interval: [0.0002, 0.0002] | ||
-> female= 1 | ||
-----Unadjusted----- Std. | ||
Pop. Stratum Pop. | ||
Stratum Pop. Cases Dist. Rate[s] Dst[P] s*P | ||
0-4 yrs 9881935 114 0.063 0.0000 0.065 0.0000 | ||
5-14 yrs 20056351 32 0.128 0.0000 0.133 0.0000 | ||
15-24 yr 21308500 61 0.136 0.0000 0.141 0.0000 | ||
25-34 yr 20431857 135 0.130 0.0000 0.133 0.0000 | ||
35-44 yr 20634607 313 0.131 0.0000 0.133 0.0000 | ||
45-54 yr 22864357 789 0.146 0.0000 0.146 0.0000 | ||
55-64 yr 18881581 1473 0.120 0.0001 0.118 0.0000 | ||
65-74 yr 11616910 2623 0.074 0.0002 0.070 0.0000 | ||
75-84 yr 7584360 6534 0.048 0.0009 0.042 0.0000 | ||
85+ yrs 3703754 14178 0.024 0.0038 0.018 0.0001 | ||
Totals: 156964212 26252 Adjusted Cases: 21813.8 | ||
Crude Rate: 0.0002 | ||
Adjusted Rate: 0.0001 | ||
95% Conf. Interval: [0.0001, 0.0001] | ||
Summary of Study Populations: | ||
female N Crude Adj_Rate Confidence Interval | ||
0 151781326 0.000154 0.000194 [ 0.000192, 0.000197] | ||
1 156964212 0.000167 0.000139 [ 0.000137, 0.000141] |
Finally, it is useful if you want to create an indicator variable for analysis. For example, we may want to calculate and plot the marginal effect of age group on the probability of pneumonia as the cause of death, after controlling for whether the decedent is female.
. logit pneumonia i.female i.agerc (output omitted) . quietly margins agerc . marginsplot, title("Predictive Margins of Age with 95% CIs") xtitle(Age) xlabel(, angle(45)) ytitle("Pr(Pneumonia Death)")
In short, whether you simply want to verify that your data are valid or are using the codes as a step in a larger project, icd10 provides valuable tools for reporting and research.
The ICD-10 codes used in Stata are copyrighted to WHO. To see information about the copyright and updates to the codes, type
. icd10 query ICD-10 Version and Change Log License agreement ICD-10 codes used by permission of the World Health Organization (WHO), from: International Statistical Classification of Diseases and Related Health Problems, Tenth Revision (ICD-10) 2010 Edition. Vols. 1-3. Geneva, World Health Organization, 2011. See copyright icd10 for the ICD-10 copyright notification. Edition 2010, 2015 update Per the license agreement with WHO, "Official WHO Updates combined 1996-2012 Volume 1" was reviewed for potential changes scheduled for implementation on January 1, 2015. Between 2014 and 2015: 0 codes added, 0 codes deleted, 0 code descriptions changed. (output omitted)
You can read more about ICD coding, including tips for working with records with multiple diagnosis codes, in [D] icd and more about icd10 in [D] icd10.
The icd9 and icd9p commands for ICD-9-CM diagnosis and procedure codes have also been improved. See [D] icd9.
For more about the other commands used above, see