FAQ: Why is there no intercept in lasso inferential commands? Is it possible to get an intercept?

Home / Resources & support / FAQs / Why is there no intercept in lasso inferential commands? Is it possible to get an intercept?

Why is there no intercept in lasso inferential commands? Is it possible to get an intercept?

Title		Why is there no intercept in lasso inferential commands? Is it possible to get an intercept?
Author		Miguel Dorta, StataCorp

The lasso inferential commands implement three lasso-based methods for estimating the coefficients and standard errors of specified variables of interest and for selecting from potential control covariates to be included in the model. The methods are double selection, partialing-out, and cross-fit partialing-out. For each of them, there are commands for linear, logistic, and Poisson regression models. Also, for both partialing-out and cross-fit partialing-out, there is a command for instrumental-variable linear regression.

This FAQ has been structured as follows:

The reason that an intercept is not reported
Computing a point estimate of the intercept after double-selection regressions
An intercept cannot be estimated from partialing-out and cross-fit partialing-out regressions

The reason that an intercept is not reported

All the implemented methods perform a lasso stage, where multiple lasso models are fit to select controls, and a final estimation stage, where the coefficients and standard errors for the variables of interest are computed. The intercept is regarded as one of the controls (treated as always included); therefore, if an intercept is also added as a variable of interest, it will be perfectly collinear with the intercept in the controls. We do not report the selected controls because their standard errors would not be valid. Therefore, we do not report the intercept.

Computing a point estimate of the intercept after double-selection regressions

For the double-selection lasso regression commands (dsregress, dslogit, and dspoisson), a point estimate of the intercept can be computed in the final estimation stage.

In the examples below, we show you how to compute a point estimate of the intercept. We begin by loading one of the datasets used in the documentation.

. webuse breathe, clear
(Nitrogen dioxide and attention)

Next we create a global macro for a list of potential control covariates using factor-variable syntax, which implies 41 potential control covariates.

. global controlvars i.(sex grade overweight feducation msmoke)##c.(sev_home age)

In the examples below, we will be using the same dataset and the global macro controlvars.

Example 1: dsregress

We fit a double-selection linear model for the react variable, specifying no2_class and no2_home as variables of interest and the controls from the global macro controlvars.

. dsregress react no2_class no2_home, controls($controlvars)

Estimating lasso for react using plugin
Estimating lasso for no2_class using plugin
Estimating lasso for no2_home using plugin

Double-selection linear model         Number of obs               =      1,053
                                      Number of controls          =         41
                                      Number of selected controls =          7
                                      Wald chi2(2)                =      20.99
                                      Prob > chi2                 =     0.0000



                              Robust
       react   Coefficient  std. err.      z    P>|z|     [95% conf. interval]
   
   no2_class      1.94622   .4248716     4.58   0.000     1.113487    2.778953
    no2_home    -.3717156   .2445907    -1.52   0.129    -.8511047    .1076734

Note: Chi-squared test is a Wald test of the coefficients of the variables
      of interest jointly equal to zero. Lassos select controls for model
      estimation. Type lassoinfo to see number of selected variables in each
      lasso.

. estimates store dsregress

The three "Estimating lasso" messages indicate that dsregress performed the lasso stage. The corresponding list of selected controls is stored in the macro e(controls_sel). This allows us to use regress to reproduce the point estimates of the final estimation stage, where the intercept takes on the value 904.10475.

. quietly regress react no2_class no2_home `e(controls_sel)' 
     if e(sample), vce(robust)

. estimates store rep_dsreg

. etable, estimates(dsregress rep_dsreg) column(estimates) keep(no2_class 
     no2_home _cons) cstat(_r_b, nformat(%9.5f)) novarlab



                       dsregress rep_dsreg

no2_class                1.94622   1.94622
no2_home                -0.37172  -0.37172
_cons                            904.10475
Number of observations      1053      1053

Example 2: dslogit

Here we fit a double-selection logit model for the lbweight variable, specifying indicators for meducation as variables of interest, and we use the same potential controls from Example 1.

. dslogit lbweight i.meducation, controls($controlvars)

Estimating lasso for lbweight using plugin
Estimating lasso for 2bn.meducation using plugin
Estimating lasso for 3bn.meducation using plugin
Estimating lasso for 4bn.meducation using plugin

Double-selection logit model          Number of obs               =      1,058
                                      Number of controls          =         41
                                      Number of selected controls =          6
                                      Wald chi2(3)                =       1.70
                                      Prob > chi2                 =     0.6361



                             Robust
    lbweight   Odds ratio   std. err.      z    P>|z|     [95% conf. interval]
   
  meducation                                                                  
    Primary      .3385649   .4093585    -0.90   0.370     .0316559    3.621004
  Secondary      .2286818   .2718619    -1.24   0.215     .0222487     2.35049
 University      .2514901   .3000166    -1.16   0.247     .0242703    2.605953

Note: Chi-squared test is a Wald test of the coefficients of the variables
      of interest jointly equal to zero. Lassos select controls for model
      estimation. Type lassoinfo to see number of selected variables in each
      lasso.

. estimates store dslogit

Similarly, dslogit performed a corresponding lasso stage and stored the selected controls in the macro e(controls_sel). So we can use logit to reproduce the point estimates of the final estimation stage, where the value for the implicit intercept is 0.09494.

. quietly logit lbweight i.meducation `e(controls_sel)' if 
     e(sample), or vce(robust)

. estimates store rep_dslog

. etable, estimates(dslogit rep_dslog) column(estimates) keep(i.meducation 
     _cons) cstat(_r_b, nformat(%9.5f)) novarlab



                       dslogit rep_dslog

meducation                              
  Primary              0.33856   0.33856
  Secondary            0.22868   0.22868
  University           0.25149   0.25149
_cons                            0.09494
Number of observations    1058      1058

Example 3: dspoisson

Now we fit a double-selection Poisson model for the correct variable, specifying no2_class and no2_home as variables of interest, and we specify the same potential controls as in the previous examples.

. dspoisson correct no2_class no2_home, controls($controlvars)

Estimating lasso for correct using plugin
Estimating lasso for no2_class using plugin
Estimating lasso for no2_home using plugin

Double-selection Poisson model        Number of obs               =      1,053
                                      Number of controls          =         41
                                      Number of selected controls =          3
                                      Wald chi2(2)                =       9.36
                                      Prob > chi2                 =     0.0093



                             Robust
     correct           IRR   std. err.      z    P>|z|     [95% conf. interval]
   
   no2_class     .9993293   .0002192    -3.06   0.002     .9988997    .9997591
    no2_home     1.000062   .0000966     0.64   0.521     .9998728    1.000251

Note: Chi-squared test is a Wald test of the coefficients of the variables
      of interest jointly equal to zero. Lassos select controls for model
      estimation. Type lassoinfo to see number of selected variables in each
      lasso.

. estimates store dspoisson

Analogously, dspoisson performed a corresponding lasso stage and stored the selected controls in the macro e(controls_sel). We can then reproduce the point estimates for the final estimation stage showing the value of the implicit intercept (111.52364).

. quietly poisson correct no2_class no2_home `e(controls_sel)' if e(sample), 
     irr vce(robust)

. estimates store rep_dspoi

. etable, estimates(dspoisson rep_dspoi) column(estimates) keep(no2_class 
     no2_home _cons) cstat(_r_b, nformat(%9.5f)) novarlabel



                       dspoisson rep_dspoi

no2_class                0.99933   0.99933
no2_home                 1.00006   1.00006
_cons                            111.52364
Number of observations      1053      1053

An intercept cannot be estimated from partialing-out and cross-fit partialing-out regressions

The partialing-out commands are poregress, pologit, popoisson, and poivregress. The cross-fit partialing-out commands are xporegress, xpologit, xpopoisson, and xpoivregress. For all of these commands, the final estimation stage is performed using partial-covariate variables (zero-mean residuals). Therefore, if an intercept would be included in the final estimation stage, its value would be zero like in the next example. Consequently, an intercept in terms of the original covariates cannot be computed from those commands.

Example 4: poregress

The code below reproduces the point estimates of poregress demonstrating that, if an intercept would be included, its value would be virtually zero.

poregress react no2_class no2_home, controls($controlvars)
estimates store poregress
mark touse if e(sample)
local sel1 `e(lasso_selected_1)' 
local sel2 `e(lasso_selected_2)' 
local sel3 `e(lasso_selected_3)' 
quietly {
	regress react `sel1' if touse 
	predict double res_react, residual
	regress no2_class `sel2' if touse 
	predict double res_no2_class, residual
	regress no2_home `sel3'  if touse 
	predict double res_no2_home, residual
	regress res_react res_no2_class res_no2_home if touse, vce(robust)
	estimates store rep_poreg
}
etable, estimates(poregress rep_poreg) column(estimates) cstat(_r_b, nformat(%9.5f)) novarlab

And this is the output of the last command:

. etable, estimates(poregress rep_poreg) column(estimates) 
     cstat(_r_b, nformat(%9.5f)) novarlab 



                       poregress rep_poreg

no2_class                1.91259          
no2_home                -0.35376          
res_no2_class                      1.91259
res_no2_home                      -0.35376
_cons                              0.00000
Number of observations      1053      1053

For the other po and xpo commands, manually reproducing the final estimation stage is not straightforward as in poregress. That being said, partial-covariate variables are used similarly, so there cannot be an intercept.

Products

New in Stata 19

Why Stata

All features

Disciplines

Stata/MP

StataNow

Order Stata

Purchase

Order Stata

Bookstore

Stata Press

Stata Journal

Gift Shop

Learn

Free webinars

NetCourses

Classroom and web training

Organizational training

Video tutorials

Third-party courses

Web resources

Teaching with Stata

Support

Training

Video tutorials

FAQs

Statalist: The Stata Forum

Resources

Technical support

Customer service

Alerts

Company

News and events

Customer service

Careers

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Privacy policy

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Required cookies

Advertising cookies

Required cookies

These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Accept Cookies


		Robust
react		Coefficient std. err. z P>\|z\| [95% conf. interval]

no2_class		1.94622 .4248716 4.58 0.000 1.113487 2.778953
no2_home		-.3717156 .2445907 -1.52 0.129 -.8511047 .1076734


dsregress rep_dsreg

no2_class 1.94622 1.94622
no2_home -0.37172 -0.37172
_cons 904.10475
Number of observations 1053 1053


		Robust
lbweight		Odds ratio std. err. z P>\|z\| [95% conf. interval]

meducation
Primary		.3385649 .4093585 -0.90 0.370 .0316559 3.621004
Secondary		.2286818 .2718619 -1.24 0.215 .0222487 2.35049
University		.2514901 .3000166 -1.16 0.247 .0242703 2.605953


dslogit rep_dslog

meducation
Primary 0.33856 0.33856
Secondary 0.22868 0.22868
University 0.25149 0.25149
_cons 0.09494
Number of observations 1058 1058


		Robust
correct		IRR std. err. z P>\|z\| [95% conf. interval]

no2_class		.9993293 .0002192 -3.06 0.002 .9988997 .9997591
no2_home		1.000062 .0000966 0.64 0.521 .9998728 1.000251


dspoisson rep_dspoi

no2_class 0.99933 0.99933
no2_home 1.00006 1.00006
_cons 111.52364
Number of observations 1053 1053


poregress rep_poreg

no2_class 1.91259
no2_home -0.35376
res_no2_class 1.91259
res_no2_home -0.35376
_cons 0.00000
Number of observations 1053 1053

Stata/MP4 Annual License (download)