Works with
NCHS ICD-10-CM diagnosis codes (healthcare encounter and claims data)
CMS ICD-10-PCS procedure codes (healthcare claims data)
Data-management commands let you
Generate new variables based on codes
Indicators for different conditions
Short descriptions
Category codes from billable codes
And more
Verify that variables contain valid codes and flag invalid codes
Standardize format of codes
Interactive utilities let you
Look up descriptions for codes
Search for codes from keywords
Specify the version of codes that your dataset contains
The U.S. now uses ICD-10-CM and ICD-10-PCS to encode diagnoses and procedures in administrative healthcare data, such as claims for medical services.
The icd10cm and icd10pcs commands support these systems, just as Stata has supported previous ICD releases.
These commands make your research and reporting life easier. When administrative data are gathered from multiple sources, the format of the codes may not be standardized, or there may be reporting errors. Furthermore, the sheer number of codes in these systems means that analyzing the data in a meaningful way can be difficult.
icd10cm and icd10pcs let you easily verify that codes are valid and add variables such as code descriptions and indicators for whether patients have a particular diagnosis or procedure. They also let you interactively look up descriptions of codes.
Let's imagine that we want to compare costs for the different types of Cesarean sections and delivery. We are conducting our study using hospital discharge records and have data from the last six months of 2016. New ICD-10-CM/PCS codes are released every October, so our data are a mix of 2016 and 2017 codes. The commands make processing such data easy.
First, let's check that the codes are valid. We will specify the version(2016) option for codes recorded before October 1. We type
. use discharges16 (Discharges, 2016 Q3-Q4) . icd10cm check diag1 if dmonth <= tm(2016m9), version(2016) (diag1 contains defined codes; no missing values)
Now, we can check discharges between October 2016 and December 2016 by specifying version(2017).
. icd10cm check diag1 if dmonth >= tm(2016m10), version(2017) (diag1 contains no missing values) diag1 contains undefined codes: 1. Invalid placement of period 0 2. Too many periods 0 3. Code too short 0 4. Code too long 0 5. Invalid 1st char (not A-Z) 0 6. Invalid 2nd char (not 0-9) 0 7. Invalid 3rd char (not 0-9 A or B) 0 8. Invalid 4th char (not 0-9 or A-Z) 0 9. Invalid 5th char (not 0-9 or A-Z) 0 10. Invalid 6th char (not 0-9 or A-Z) 0 11. Invalid 7th char (not 0-9 or A-Z) 0 77. Valid only for previous versions 3 88. Valid only for later versions 0 99. Code not defined 0 ___________ Total 3
We have three problems in the last quarter of 2016.
We'll probably want to know what codes are causing problems. To do this, we can add the summary option, which will show the frequency of each problematic code and the reason it is causing us trouble.
. icd10cm check diag1 if dmonth >= tm(2016m10), version(2017) summary (diag1 contains no missing values) (output omitted) Summary of invalid and undefined codes
diag1 Count Problem | ||
T8351XA 3 Valid only for previous versions | ||
It appears that the hospital used the obsolete code T83.51XA. Usually, we would want to fix problems like this. We could try to use the original medical record. With just administrative data at hand, we also could start by finding out what the code T83.51XA is for.
. icd10cm lookup T8351XA, version(2016) T83.51XA Infect/inflm reaction due to indwell urinary catheter, init
We would then search for a suitable alternative using icd10cm search with keywords from the description of T83.51XA and substitute the alternative it found by typing
. replace diag1 = .... if dmonth >=tm(2016m10) & diag1=="T8351XA"
Before we did that, we might want to check the dates on which three problems occurred by typing
. icd10cm check diag1 if dmonth >= tm(2016m10), version(2017) generate(probtype) (output omitted) . tabulate probtype dmonth
result of check for | Discharge month | |||
diag1 | 2016m10 2016m11 2016m12 | Total | ||
Defined code | 333 323 336 | 992 | ||
Valid only for previo | 2 1 0 | 3 | ||
Total | 335 324 336 | 995 |
Instead of bothering with any of this, we will ignore this problem because T83.51XA has nothing to do with pregnancy.
Having satisfied ourselves that there are no errors in our data that will affect our study, we are ready to begin in earnest. Let's first create a variable that marks the portion of the data in which we are interested.
Deliveries and Cesarean sections can be identified by one of four ICD-10-PCS codes: 10D.00Z0, 10D.00Z1, 10D.00Z2, and 10E.0XZZ. We want to flag all records that have one of these codes in proc1, the primary procedure code, as eligible for our study.
. icd10pcs generate insample = proc1, range(10D.00* 10E.0XZZ)
We were able to abbreviate the codes starting with “10D.00” because only these three codes fall in this group.
Now that we have insample, we can add the modifier if insample==1 to the end of Stata commands to restrict ourselves to the relevant data.
We can then create a variable that has the code and description for just the study-eligible records.
. icd10pcs generate delivery = proc1 if insample==1, description addcode(begin)
Let's look at the four codes of interest:
. tabulate delivery
delivery | Freq. Percent Cum. | ||
10D.00Z0 Extraction of Products of Co.. | 6 2.17 2.17 | ||
10D.00Z0 Extraction of Products of Co.. | 93 33.70 35.87 | ||
10D.00Z0 Extraction of Products of Co.. | 177 64.13 100.00 | ||
Total | 276 100.00 |
The first thing we discover is that only three of the four codes appear in the data. That does not bother us; the fourth is an uncommonly used code.
Before we can fit our regression of cost (variable billed) on the codes and length of stay (variable los) we must create a new numeric variable, which we will name dtype, containing the values 1, 2, and 3 for the three codes. We type
. encode proc1 if insample==1, generate(dtype)
and then we fit our regression:
. regress billed i.dtype los
Source | SS df MS | Number of obs = 276 | |
F(3, 272) = 340.64 | |||
Model | 80247.8212 3 26749.2737 | Prob > F = 0.0000 | |
Residual | 21359.4251 272 78.5272982 | R-squared = 0.7898 | |
Adj R-squared = 0.7875 | |||
Total | 101607.246 275 369.480896 | Root MSE = 8.8616 |
billed | Coefficient Std. err. t P>|t| [95% conf. interval] | ||
dtype | |||
10D00Z1 | -5.635262 4.717685 -1.19 0.233 -14.92308 3.652558 | ||
10E0XZZ | -15.019 4.80023 -3.13 0.002 -24.46933 -5.568676 | ||
los | 3.519866 .1659896 21.21 0.000 3.193078 3.846654 | ||
_cons | 22.25525 4.966573 4.48 0.000 12.47744 32.03306 | ||
We want the average cost for each of the codes after controlling for length of stay. Stata's margins will give us the average cost.
. margins dtype Predictive margins Number of obs = 276 Model VCE: OLS Expression: Linear prediction, predict()
Delta-method | |||
Margin std. err. t P>|t| [95% conf. interval] | |||
dtype | |||
10D00Z0 | 31.85836 4.667969 6.82 0.000 22.66842 41.0483 | ||
10D00Z1 | 26.2231 .921179 28.47 0.000 24.40955 28.03665 | ||
10E0XZZ | 16.83936 .6794237 24.78 0.000 15.50176 18.17696 | ||
The averages are $31,858, $26,223, and $16,839. For a visual comparison, we can create a bar chart:
. marginsplot, recast(bar) title("Predictive Margins and 95% Confidence Intervals") subtitle("Billed Amount in $1,000s") ytitle("Predictive Margins of Billed Amount") ylabel(0(10)40) xtitle("Delivery Procedure Code") Variables that uniquely identify margins: dtype
You can read more about ICD coding, including tips for working with records with multiple diagnosis codes, in the Introduction to ICD commands.
Also see worked examples for the individual coding systems:
The ICD-10-CM coding system is a licensed adaptation of the World Health Organization's ICD-10. Copyright information for ICD-10 can be found in the ICD-10 copyright notification.