<- See Stata 18's new features
Highlights
Estimation
Heterogeneity over cohort and time
Panel data
Repeated cross-sectional data
Four estimators
Regression adjustment (RA)
Inverse-probability weighting (IPW)
Augmented inverse-probability weighting (AIPW)
Two-way fixed-effects regression (TWFE)
Plots of treatment-effects heterogeneity
Test of parallel pretreatment trends
Aggregation of ATETs over:
Cohort
Time
Exposure to treatment
Simultaneous confidence intervals
See more causal inference features
When average treatment effects vary over time and over cohort, you can now use the new hdidregress and xthdidregress commands to estimate heterogeneous average treatment effects on the treated (ATETs). Use hdidregress with repeated cross-sectional data and xthdidregress with panel data. Choose from one of four estimators, including regression adjustment and inverse-probability weighting. Plot ATETs time profiles for each cohort with estat atetplot. Aggregate the ATETs within cohort, time, and exposure to treatment with estat aggregation. Explore more postestimation features.
Treatment effects measure the causal effect of a treatment on an outcome. A treatment is a new drug regimen, a surgical procedure, a training program, or even an ad campaign intended to affect an outcome such as blood pressure, mobility, employment, or sales. It is of interest to estimate an ATET.
The standard difference-in-differences (DID) estimator, implemented in existing commands didregress and xtdidregress, estimates an ATET that is common to all groups across time. When groups are treated at different points in time, the assumption about a constant ATET may be violated. The new commands implement estimation methods that account for heterogeneity of the ATET and provide cohort-specific and time-specific ATET estimates.
We would like to know if a school-district-level program, Healthy Habits, reduces students' body mass index (BMI) in the school district. We have fictional data on the Healthy Habits program. This program incorporates more exercise time and augments the intake of fruits and vegetables. Our data are at the school district level and include information on whether a school participates in the program, hhabit, and the BMI of students in the district, bmi. We have repeated samples of students ages 11 to 14 from 40 school districts from 2013 to 2021.
For the outcome model, we believe that the mother's education, medu, is a good predictor of the health habits of children. We also believe that participation in sports, sports, affects bmi. Finally, we control for whether the student is a girl to account for behavioral differences and differences in body types of boys and girls at this age.
For the treatment model, we use the number of parks in the district (parksd) to model hhabit. We conjecture that school districts with more parks consider exercise spaces more important in their urban planning than those with fewer parks. These districts are therefore more amenable to the Healthy Habits program.
We use the aipw estimator to model both the outcome and the treatment. The aipw estimator has a double-robustness property, implying that only one of the outcome model or the treatment model needs to be correctly specified to obtain consistent estimates.
We fit the following model:
. hdidregress aipw (bmi medu i.girl i.sports) (hhabit parksd), group(schools) time(year) note: variable _did_cohort, containing cohort indicators formed by treatment variable hhabit and group variable schools, was added to the dataset. Computing ATET for each cohort and time: Cohort 2015 (8): ........ done Cohort 2017 (8): ........ done Cohort 2019 (8): ........ done Treatment and time information Time variable: year Time interval: 2013 to 2021 Control: _did_cohort = 0 Treatment: _did_cohort > 0
_did_cohort | ||
Number of cohorts | 4 | |
Number of obs | ||
Never treated | 11355 | |
2015 | 1231 | |
2017 | 2097 | |
2019 | 2042 | |
Robust | ||
Cohort | ATET std. err. z P>|z| [95% conf. interval] | |
2015 | ||
year | ||
2014 | .6544681 .5946048 1.10 0.271 -.5109359 1.819872 | |
2015 | -1.226451 .379168 -3.23 0.001 -1.969607 -.4832957 | |
2016 | -2.491842 .4169657 -5.98 0.000 -3.30908 -1.674605 | |
2017 | -2.72486 .2363878 -11.53 0.000 -3.188171 -2.261548 | |
2018 | -2.786634 .6672867 -4.18 0.000 -4.094492 -1.478776 | |
2019 | -3.980456 .2993279 -13.30 0.000 -4.567127 -3.393784 | |
2020 | -.604415 .5929199 -1.02 0.308 -1.766517 .5576866 | |
2021 | -.6522272 .3640416 -1.79 0.073 -1.365736 .0612812 | |
2017 | ||
year | ||
2014 | .6635794 .3089663 2.15 0.032 .0580167 1.269142 | |
2015 | -1.3933 .3871204 -3.60 0.000 -2.152042 -.6345582 | |
2016 | .5947865 .4065947 1.46 0.144 -.2021245 1.391697 | |
2017 | -1.71427 .4565384 -3.75 0.000 -2.609069 -.8194714 | |
2018 | -3.170542 .5221368 -6.07 0.000 -4.193912 -2.147173 | |
2019 | -2.967701 .4247053 -6.99 0.000 -3.800108 -2.135294 | |
2020 | .0360098 .6868764 0.05 0.958 -1.310243 1.382263 | |
2021 | -.957117 .3510986 -2.73 0.006 -1.645258 -.2689763 | |
2019 | ||
year | ||
2014 | -1.434451 .5163232 -2.78 0.005 -2.446426 -.422476 | |
2015 | 1.010288 .4808165 2.10 0.036 .067905 1.952671 | |
2016 | -.3809733 .4336764 -0.88 0.380 -1.230963 .4690169 | |
2017 | .5199519 .4849723 1.07 0.284 -.4305763 1.47048 | |
2018 | -.0315794 .5863875 -0.05 0.957 -1.180878 1.117719 | |
2019 | -3.602114 .3498692 -10.30 0.000 -4.287845 -2.916383 | |
2020 | -1.388906 .6765493 -2.05 0.040 -2.714919 -.0628943 | |
2021 | -.6222491 .5510466 -1.13 0.259 -1.70228 .4577824 | |
We specified the outcome model in the first set of parentheses and the treatment model in the second set of parentheses. We also specified option group(schools) to define that treatment occurs at the school level and to identify schools as the clustering variable. Finally, we specified a time variable year in option time().
The note below the command indicates that the categorical variable _did_cohort is generated with cohort information. Units in the same cohort start the treatment at the same time. We see that there are three cohorts in our data: 2015, 2017, and 2019. In addition, we see that 11,355 observations are never treated. The time variable year ranges from 2013 to 2021.
The estimation table reports the ATET for each cohort in each year. For example, for the cohort 2015 in the year 2016, the ATET estimate is –2.5, which implies the Healthy Habits program, on average, reduces BMI by 2.5 for students in a district of the 2015 cohort in 2016 relative to the scenario where the district does not participate. The other estimates can be interpreted similarly.
It is difficult to see the trends in ATETs just by looking at all the ATETs estimates. We can use estat atetplot to visualize the time profile of ATETs for each cohort. We specify option sci to show the simultaneous confidence bands guaranteed to cover the true values of ATETs across all the cohorts and time with a predefined probability level.
. estat atetplot, sci
After fitting the model, we can use estat aggregation to aggregate the ATETs within cohort, time, and exposure to treatment. It provides a summary of different aspects of ATETs. For example, we use estat aggregation, cohort to summarize the ATETs of each cohort within time. We also specify option graph to obtain a graph of aggregations in addition to the tabular output.
. estat aggregation, cohort graph ATET over cohort Number of obs = 16,725 (Std. err. adjusted for 40 clusters in schools)
Robust | ||
Cohort | ATET std. err. z P>|z| [95% conf. interval] | |
2015 | -2.065755 .1999412 -10.33 0.000 -2.457633 -1.673877 | |
2017 | -1.7781 .4013978 -4.43 0.000 -2.564825 -.9913744 | |
2019 | -1.869405 .4650349 -4.02 0.000 -2.780857 -.9579538 | |
If we want to summarize ATETs within time, we specify option time with estat aggregation.
. estat aggregation, time graph ATET over time Number of obs = 16,725 (Std. err. adjusted for 40 clusters in schools)
Robust | ||
Time | ATET std. err. z P>|z| [95% conf. interval] | |
2015 | -1.226451 .379168 -3.23 0.001 -1.969607 -.4832957 | |
2016 | -2.491842 .4169657 -5.98 0.000 -3.30908 -1.674605 | |
2017 | -2.111619 .3654785 -5.78 0.000 -2.827943 -1.395294 | |
2018 | -3.028686 .4278557 -7.08 0.000 -3.867268 -2.190104 | |
2019 | -3.449829 .2670184 -12.92 0.000 -3.973176 -2.926483 | |
2020 | -.6624494 .44865 -1.48 0.140 -1.541787 .2168884 | |
2021 | -.7575068 .2816374 -2.69 0.007 -1.309506 -.2055078 | |
Finally, if we want to summarize ATETs over different lengths of exposure to treatment, we specify option dynamic.
. estat aggregation, dynamic graph Duration of exposure ATET Number of obs = 16,725 (Std. err. adjusted for 40 clusters in schools)
Robust | ||
Exposure | ATET std. err. z P>|z| [95% conf. interval] | |
-5 | -1.434451 .5163232 -2.78 0.005 -2.446426 -.422476 | |
-4 | 1.010288 .4808165 2.10 0.036 .067905 1.952671 | |
-3 | .1338267 .3091619 0.43 0.665 -.4721195 .739773 | |
-2 | -.4256324 .4292553 -0.99 0.321 -1.266957 .4156925 | |
-1 | .3727141 .3197563 1.17 0.244 -.2539967 .999425 | |
0 | -2.285098 .3827362 -5.97 0.000 -3.035248 -1.534949 | |
1 | -2.344265 .3829047 -6.12 0.000 -3.094744 -1.593785 | |
2 | -2.045521 .3911543 -5.23 0.000 -2.81217 -1.278873 | |
3 | -1.045601 .6840119 -1.53 0.126 -2.38624 .2950372 | |
4 | -2.145004 .5952525 -3.60 0.000 -3.311678 -.978331 | |
5 | -.604415 .5929199 -1.02 0.308 -1.766517 .5576866 | |
6 | -.6522272 .3640416 -1.79 0.073 -1.365736 .0612812 | |
Read more in the Stata Causal Inference and Treatment-Effects Estimation Reference Manual; see [CAUSAL] hdidregress and [CAUSAL] xthdidregress.
View all the new features in Stata 18 and, in particular, New in linear models.