Home / Products / Stata 17 / BIC for lasso penalty selection

This page announced the new features in Stata 17. Please see our Stata 19 page for the new features in Stata 19.

BIC for lasso penalty selection

Highlights

BIC penalty parameter selection with lasso for prediction

Lasso
Square-root lasso
Elastic net

BIC penalty parameter selection with lasso for inference

Partialing-out estimators
Cross-fit partialing-out estimators
Double-selection estimators

BIC penalty parameter with treatment-effect estimation with lasso
Plot the BIC function

Selection of the penalty parameter is fundamental to lasso analysis. Choose a small penalty parameter, and you risk including too many variables in your model. Choose a large one, and you might exclude important variables.

Now, we can use the Bayesian information criterion (BIC) to select the penalty parameters in lasso-related commands for both prediction and inference.

For prediction, we can choose the penalty parameters by minimizing BIC in lasso, elasticnet, and sqrtlasso. For inference, we can also choose penalty parameters by minimizing BIC in dsregress, dslogit, dspoisson, poregress, pologit, popoisson, poivregress, xporegress, xpologit, xpopoisson, xpoivregress, and telasso.

After lasso with BIC penalty parameter selection, we can plot the BIC function, which shows the values of the BIC criterion over the grid of penalty parameters. The plot also shows the minimum BIC, which is the value of the selected penalty parameter.

To choose the penalty parameters based on BIC, just specify option selection(bic).

For a linear model for y, with candidate covariates x1-x100, to use BIC for selection, we type

. lasso linear y x1-x100, selection(bic)

To look at the fitted BIC function plot, we type

. bicplot

Using double selection to estimate and test the effect of d1 on y, with control variables x1 to x100, is equally simple; we type

. dsregress y d1, controls(x1-x100) selection(bic)

Again, we may use bicplot after.

Let's see it work

Using BIC in lasso for prediction

Datasets used with lasso typically have many variables. To get started, we use the variable management tool vl to save ourselves from typing many variable names manually.

. use https://www.stata-press.com/data/r17/fakesurvey_vl
(Fictitious survey data with vl)

. vl rebuild
Rebuilding vl macros ...



                                    Macro's contents                          
             
Macro           # Vars   Description                                          

System                                                                        
  $vldummy              98   0/1 variables                                    
  $vlcategorical        16   categorical variables                            
  $vlcontinuous         29   continuous variables                             
  $vluncertain          16   perhaps continuous, perhaps categorical variables
  $vlother              12   all missing or constant variables                
User                                                                          
  $demographics          4   variables                                        
  $factors             110   variables                                        
  $idemographics             factor-variable list                             
  $ifactors                  factor-variable list

vl created a set of global macros, each one with a set of variables that we can use during estimation. vl makes life easier when you are dealing with large sets of covariates.

Next, we use splitsample to split the data into training data and testing data. The training data will be used to fit the lasso model, and the testing data will be used to evaluate the fitted model's prediction performance.

. set seed 12345671
. splitsample, generate(sample) nsplit(2)
. label define svalues 1 "Training" 2 "Testing"
. label values sample svalues

Now, we are ready to fit a lasso model by using BIC to select the penalty parameter. To do that, we need to specify the selection(bic) option.

. lasso linear q104 ($idemographics) $ifactors $vlcontinuous
> if sample == 1, selection(bic)
Evaluating up to 100 lambdas in grid ...
Grid value 1:     lambda = 1.059075   no. of nonzero coef. =       4
                  BIC =  2653.83
Grid value 2:     lambda =   .96499   no. of nonzero coef. =       5
                  BIC = 2654.907
        ...(output omitted)...
Grid value 17:    lambda = .2390354   no. of nonzero coef. =      44
                  BIC = 2663.639
... selection BIC complete ... minimum found

Lasso linear model                          No. of obs        =        458
                                            No. of covariates =        273
Selection: Bayesian information criterion



                                          No. of                          
                                         nonzero    Out-of-sample         
      ID       Description      lambda     coef.    R-squared          BIC

       1      first lambda    1.059075         4       0.0339      2653.83
      10     lambda before    .4584484        17       0.2552     2614.289
    * 11   selected lambda    .4177211        18       0.2806     2604.524
      12      lambda after    .3806119        21       0.3066     2606.103
      17       last lambda    .2390354        44       0.4220     2663.639

 * lambda selected by Bayesian information criterion

The penalty parameter selected by the minimum BIC criterion was 0.42.

We can look at the fitted BIC function plot by typing bicplot.

. bicplot

The BIC function decreases quickly before the minimum at λ=0.42.

Using BIC in dsregress for inference

Suppose we are interested in knowing the effect of air pollution (no2_class) on childrens' reaction time (react), controling for covariates. However, we are uncertain about which control variables to include in the model. We can use dsregress to consistently estimate the coefficient on no2_class while using lasso to select control variables.

We specify the selection(bic) option to use bic to select the penalty parameter in each lasso performed by dsregress. We include a set of 32 controls stored in the global macros cc and fc.

. dsregress react no2_class, controls($cc i.($fc)) selection(bic) Estimating lasso for react using BIC Estimating lasso for no2_class using BIC Double-selection linear model Number of obs = 1,036 Number of controls = 32 Number of selected controls = 11 Wald chi2(1) = 22.18 Prob > chi2 = 0.0000

Robust

react Coefficient std. err. z P>|z| [95% conf. interval]

no2_class 2.315295 .4916547 4.71 0.000 1.35167 3.278921

We see that 11 of 32 controls are selected. Our point estimate for the effect of nitrogen dioxide on reaction time is 2.3, meaning that we expect reaction time to go up by 2.3 milliseconds for each microgram per cubic meter increase in nitrogen dioxide. This value is statistically different from 0.

dsregress actually ran two lassos, one for react and one for no2_class. We can plot the BIC function for both lassos by typing

. bicplot, for(react)

and

. bicplot, for(no2_class)

Additional resources

Learn more about Stata's lasso features.

Read more about lasso in the Stata Lasso Reference Manual.

See [LASSO] bicplot for more examples and information on BIC for lasso.

This page announced the new features in Stata 17. Please see our Stata 19 page for the new features in Stata 19.

BIC for lasso penalty selection

Highlights

BIC penalty parameter selection with lasso for prediction

BIC penalty parameter selection with lasso for inference

BIC penalty parameter with treatment-effect estimation with lasso

Plot the BIC function

Let's see it work

Using BIC in lasso for prediction

Using BIC in dsregress for inference

Additional resources

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies


		Macro's contents

Macro		# Vars Description

System
$vldummy		98 0/1 variables
$vlcategorical		16 categorical variables
$vlcontinuous		29 continuous variables
$vluncertain		16 perhaps continuous, perhaps categorical variables
$vlother		12 all missing or constant variables
User
$demographics		4 variables
$factors		110 variables
$idemographics		factor-variable list
$ifactors		factor-variable list


		No. of
		nonzero Out-of-sample
ID		Description lambda coef. R-squared BIC

1		first lambda 1.059075 4 0.0339 2653.83
10		lambda before .4584484 17 0.2552 2614.289
* 11		selected lambda .4177211 18 0.2806 2604.524
12		lambda after .3806119 21 0.3066 2606.103
17		last lambda .2390354 44 0.4220 2663.639


		Robust
react		Coefficient std. err. z P>\|z\| [95% conf. interval]

no2_class		2.315295 .4916547 4.71 0.000 1.35167 3.278921

Stata/MP4 Annual License (download)

This page announced the new features in Stata 17. Please see our Stata 19 page for the new features in Stata 19.

BIC for lasso penalty selection

Highlights

BIC penalty parameter selection with lasso for prediction

BIC penalty parameter selection with lasso for inference

BIC penalty parameter with treatment-effect estimation with lasso

Plot the BIC function

Let's see it work

Using BIC in lasso for prediction

Using BIC in dsregress for inference

Additional resources

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies