Machine learning via H2O: Ensemble decision trees

Home / Products / Features / Machine learning via H2O

Order

Machine learning via H2O: Ensemble decision trees New

Stata’s integration of H2O machine learning provides a powerful, scalable, and user-friendly framework for applying modern machine learning techniques. Interact with an H2O cluster seamlessly within Stata to train and evaluate predictive models efficiently while leveraging Stata's extensive data management. Use the suite of h2oml commands, or let the Control Panel interface guide you through your end-to-end data-analysis process.

Learn about H2O machine learning in Stata.

Watch Random forest and gradient boosting machine via H2O. New

Ensemble decision trees: Gradient boosting machine (GBM) and random forest

GBM for regression for continuous and count responses
GBM for binary classification
GBM for multiclass classification
Random forest for regression
Random forest for binary classification
Random forest for multiclass classification
Many loss functions for GBM models
Many encoding schemes for categorical variables
Monotonicity constraints on predictors in GBM models
Model selection using cross-validation
Early stopping

Hyperparameter tuning

Select best-performing model by tuning

Number of trees
Learning rate of each tree in GBM models
Learning rate decay in GBM models
Maximum depth of each tree
Minimum number of observations for splitting a leaf node
Sampling rate for selecting predictor subset per tree in GBM
Sampling value for selecting the number of predictors in random forest
Sampling rate for selecting observations per tree
Minimum node-split threshold
Number of histogram bins for continuous and categorical predictors

Many tuning metrics for regression and classification analysis
Two grid-search methods: Cartesian and random
Different early-stopping methods for random grid search

Tuning and estimation summaries

Display various model performance metrics
Summarize cross-validation results
Summarize results from hyperparameter grid search
Select the best model after performing a grid search
Explore alternative models after grid search
Compare goodness of fit for machine learning models
Plot score history

Model performance evaluation

Binary classification

Display a confusion matrix
Display threshold-based metrics
Produce receiver operating characteristic (ROC) curve plot
Produce precision–recall curve plot

Multiclass classification

Display a confusion matrix
Display area under the curve (AUC) and area under the precision–recall curve (AUCPR)
Display hit-ratio tables

Postestimation frame and estimation results

Define frame for postestimation analysis
Store and restore model estimation results

Prediction

Fitted values predictions after regression
Class predictions after classification
Predicted probabilities for outcome levels after classification

Machine learning explainability

Shapley additive explanation (SHAP) value plots for interpretability

SHAP beeswarm plots

Partial dependence plots (PDPs)
Individual conditional expectation (ICE) plots
Variable importance plots

Decision tree analysis

Save decision tree structures as DOT files and display rule sets

Control panel

Additional resources

Machine Learning in Stata Using H2O Reference Manual

See New in Stata 19 to learn about what was added in Stata 19.

Products

New in Stata 19

Why Stata

All features

Disciplines

Stata/MP

StataNow

Order Stata

Purchase

Order Stata

Bookstore

Stata Press

Stata Journal

Gift Shop

Learn

Free webinars

NetCourses

Classroom and web training

Organizational training

Video tutorials

Third-party courses

Web resources

Teaching with Stata

Support

Training

Video tutorials

FAQs

Statalist: The Stata Forum

Resources

Technical support

Customer service

Alerts

Company

News and events

Customer service

Careers

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Privacy policy

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Required cookies

Advertising cookies

Required cookies

These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Accept Cookies

Stata/MP4 Annual License (download)