Home  /  Products  /  Stata 17  /  Treatment-effects estimation using lasso
This page announced the new features in Stata 17. Please see our Stata 18 page for the new features in Stata 18.

Treatment-effects estimation using lasso

Highlights

  • Estimate treatment effects with high-dimensional controls

    • High-dimensional controls in the outcome model
    • High-dimensional controls in the treatment model
  • Flexible model specification

    • Outcome model can be linear, logit, probit, or poisson
    • Treatment assignment model can be logit or probit
  • Different measures of treatment effects

    • ATE: average treatment effects
    • ATET: average treatment effect on the treated
    • POM: potential-outcome mean
  • Robust estimation

    • Double robustness: only one of the models needs to be correctly specified
    • Neyman orthogonality: guard against model-selection mistakes made by lasso
  • Double machine learning

    • Cross-fitting and resampling

You use treatment-effects estimators to draw causal inferences from observational data. Perhaps you want to estimate the effect of a drug regimen on blood pressure, the effect of a surgical procedure on mobility, the effect of a training program on employment, or the effect of an ad campaign on sales.

You use lasso inferential estimators when you are interested in inference on a few covariates while controlling for many other potential covariates. (And when we say many, we mean hundreds, thousands, or more!)

You can now use these estimators simultaneously. With the new telasso command, you can estimate treatment effects while controlling for many potential covariates.

For example, you can type

. telasso (y1 x1-x100) (treat w1-w100)

to estimate the effect of the binary treatment treat on the continuous outcome y1 while controlling for predictors x1 through x100 in the outcome model and for w1 through w100 in the treatment model. The obtained estimates benefit from robustness properties of both the treatment-effects estimators and lasso.

With telasso, you get everything you expect from treatment effects and from lasso. You can estimate the average treatment effect, the average treatment effect on the treated, and the potential-outcome means. You can model continuous, binary, and count outcomes and choose between a logit or probit treatment model. And for selection of controls, you can choose between lasso or square-root lasso estimation and choose from several selection methods, such as BIC and cross-validation.


Let's see it work

We would like to compare two types of lung transplants: bilateral lung transplant (BLT) and single lung transplant (SLT). BLT is usually associated with a higher death rate in the short term after the operation but with a more significant improvement in the quality of life than SLT. As a result, for patients who need to decide between these two treatment options, knowing the effect of BLT (versus SLT) on life quality is essential. Therefore, we want to estimate the effect of the treatment transtype on the outcome fev1p. This outcome represents the percentage of forced expiratory volume in one second (FEV1) that the patient has relative to a healthy person.

Our data include 29 variables recording characteristics of the patients and donors. We use these variables and the interactions between them as controls in our model. It would be tedious to type these variable names one by one to distinguish between continuous and categorical variables. vl is a suite of commands that simplifies this process.

The following code creates the control variable list and stores it in the global macro $allvars.

. quietly vl set

. vl create cvars = vlcontinuous - (fev1p)
note: $cvars initialized with 12 variables.

. vl create fvars = vlcategorical - (transtype)
note: $fvars initialized with 17 variables.

. vl sub allvars = c.cvars i.fvars c.cvars#i.fvars

Now we are ready to use telasso to estimate the average treatment effects. We assume a linear outcome model and a logit treatment model, the defaults. We type

. telasso (fev1p $allvars) (transtype $allvars)

Estimating lasso for outcome fev1p if tran~e = 0 using plugin method ...
Estimating lasso for outcome fev1p if tran~e = 1 using plugin method ...
Estimating lasso for treatment tran~e using plugin method ...
Estimating ATE ...

Treatment-effects lasso estimation    Number of observations      =        937
Outcome model:   linear               Number of controls          =        454
Treatment model: logit                Number of selected controls =          8

Robust
fev1p Coefficient std. err. z P>|z| [95% conf. interval]
ATE
transtype
(BLT
vs
SLT) 37.51841 .1606703 233.51 0.000 37.20351 37.83332
POmean
transtype
SLT 46.4938 .2021582 229.99 0.000 46.09757 46.89002

If all the patients were to choose a BLT, the FEV1% is expected to be 38 percentage points higher than the average of 46% expected if all patients were to choose an SLT. Among the 454 control variables, telasso selects only 8 of them.

It is common to estimate the average treatment effect to determine the effect on those who actually received the treatment. To estimate this value, we add the atet option.

. telasso (fev1p $allvars) (transtype $allvars), atet

Estimating lasso for outcome fev1p if tran~e = 0 using plugin method ...
Estimating lasso for outcome fev1p if tran~e = 1 using plugin method ...
Estimating lasso for treatment tran~e using plugin method ...
Estimating ATET ...

Treatment-effects lasso estimation    Number of observations      =        937
Outcome model:   linear               Number of controls          =        454
Treatment model: logit                Number of selected controls =          8

Robust
fev1p Coefficient std. err. z P>|z| [95% conf. interval]
ATET
transtype
(BLT
vs
SLT) 35.78157 .1831478 195.37 0.000 35.42261 36.14053
POmean
transtype
SLT 43.35214 1.268976 34.16 0.000 40.86499 45.83929

For the patients who have a BLT, we expect the average FEV1% to be 36 percentage points higher than if all of them choose an SLT.

The estimates that we obtained above relied on a key assumption of lasso, the sparsity assumption, which requires that only a small number of the potential covariates are in the "true" model. We can use a double machine learning technique to allow for more covariates in the true model. To do this, we add the xfold(5) option to split the sample into five groups and perform cross-fitting and add the resample(3) option to repeat the cross-fitting procedure with three samples.

To guarantee that we can later reproduce the estimation results, we also set the random-number seed. We type

. set seed 12345671

. telasso (fev1p $allvars) (transtype $allvars), xfolds(5) resample(3) nolog

Treatment-effects lasso estimation    Number of observations       =       937
                                      Number of controls           =       454
                                      Number of selected controls  =        16
Outcome model:   linear               Number of folds in cross-fit =         5
Treatment model: logit                Number of resamples          =         3

Robust
fev1p Coefficient std. err. z P>|z| [95% conf. interval]
ATE
transtype
(BLT
vs
SLT) 37.52837 .1683194 222.96 0.000 37.19847 37.85827
POmean
transtype
SLT 46.4941 .2040454 227.86 0.000 46.09418 46.89402

The estimated treatment effect is very similar to the one reported by the first telasso command, but the selected model included 16 controls instead of 8. The similarity of the estimates across the different specifications suggests that our first model did not suffer from a violation of the sparsity assumption.


Additional resources

See more examples and information on telasso in [TE] telasso.

Learn more about treatment effects in the Stata Treatment Effects Reference Manual.

Learn more about lasso in the Stata Lasso Reference Manual.