Start, connect, and query an H2O cluster

Syntax

Start a new H2O cluster or connect to an existing H2O cluster

    h2o init [, init_options]

Connect to an existing H2O cluster

    h2o connect [, connect_options]

Close the connection to an existing H2O cluster

    h2o disconnect

Shut down the H2O cluster

    h2o shutdown [, force]

Query current H2O cluster information

    h2o query [, detail]

Query current H2O cluster credentials information

    h2o credentials query

Clear H2O cluster credentials information

    h2o credentials clear

Enable/disable H2O job progress bar

    h2o set progress { on | off }

Set the time zone on the H2O cluster

    h2o set timezone tz

List all the acceptable time zones by the H2O cluster

    h2o list timezones [pattern]

pattern is one of the following: *, _all, *name*, *name, or name*. Specifying nothing, _all, or * lists all results. Specifying *name* lists all results containing name. Specifying *name lists all results ending with name. Specifying name* lists all results starting with name.

 init_options                              Description
 -----------------------------------------------------------------------------------
 port(#)                                   port number the cluster listens to
 nthreads(#)                               number of threads to use
 -----------------------------------------------------------------------------------
 
 connect_options                           Description
 -----------------------------------------------------------------------------------
 url(string)                               full URL of the cluster to connect
 ip(string)                                IP address where the cluster is running
 port(#)                                   port number the cluster listens to
 username(string)                          username used to connect
 password(string)                          password used to connect
 -----------------------------------------------------------------------------------

Description

h2o provides utilities for accessing H2O from within Stata. H2O is a scalable and distributed open-source machine learning and predictive analytics platform. With these utilities, users can start or connect to an H2O cluster to access H2O’s capabilities. See H2O intro for more discussion about H2O clusters.

h2o init attempts to connect to a local H2O cluster by default. If one is not found, it starts a new local H2O cluster and connects to it. When a local H2O cluster is created using h2o init, random credentials are automatically generated the first time to secure the cluster using HTTP Basic Authentication. These random credentials are automatically saved with Stata preferences and are reused by h2o init for future H2O sessions. See this Technical note for details about credentials.

h2o connect connects to an existing local or remote H2O cluster. The remote cluster is specified by an IP address and a port number, or by a URL address.

h2o disconnect closes the connection to the H2O cluster. The cluster is still up and running, and it can be reconnected using h2o connect. See Close and disconnect from the H2O cluster for more information.

h2o shutdown shuts down the H2O cluster from within Stata. The cluster is completely destroyed and all the resources within it are discarded. See Close and disconnect from the H2O cluster for more information.

h2o query lists the current H2O cluster information.

h2o credentials query lists the currently used credentials. See this Technical note for more details.

h2o credentials clear clears the stored credentials. See this Technical note for more details.

h2o set progress sets whether to display the H2O execution progress. The execution progress is displayed as a percentage.

h2o set timezone sets the time zone on the H2O cluster.

h2o list timezone lists all acceptable time zones and their aliases or those that meet specified criteria by the H2O cluster.

Options

Options for h2o init

port(#) specifies the port number the H2O cluster listens to. The default is 54321. It must be an integer between 1 and 65535.

nthreads(#) specifies the maximum number of parallel threads to use when launching the H2O cluster. This option is used only when Stata starts a local H2O cluster.

Options for h2o connect

url(string) specifies the full URL address of the H2O cluster to connect. There are two ways to connect to an existing H2O cluster: either by specifying a full URL address in the form of ip:port or through the IP address and port number. If none of those is specified, h2o connect will check whether there is an H2O cluster running at localhost:54321 with IP address 127.0.0.1. If there is a cluster, h2o connect will connect to it. Otherwise, an error is issued. url() may not be specified with ip() and port().

ip(string) specifies the IP address where the H2O cluster is running. The address is specified as a string of format #.#.#.#. ip() may not be specified with url().

If ip() is specified and there is an existing H2O cluster running on this address, h2o connect will try to connect to this specified cluster. If not successful, h2o connect will throw an error.

port(#) specifies the port number the H2O cluster listens to. The default is 54321. It must be an integer between 1 and 65535. port() may not be specified with url().

username(string) specifies the username to be used to connect to the H2O cluster. This option is used when the H2O cluster is secured using the HTTP Basic Authentication. See Technical note for more details.

password(string) specifies the password to be used to connect to the H2O cluster. This option is used when the H2O cluster is secured using the HTTP Basic Authentication. See Technical note for more details.

Options for h2o shutdown

force forces the H2O cluster to shut down from Stata.

h2o shutdown will fail by default and issue the warning “…Shutting it down will discard all resources within the cluster…”. This is because shutting down the H2O cluster will destroy the process that starts it. Specifying the force option will force the H2O cluster to shut down and will destroy everything within the cluster.

See Close and disconnect from the H2O cluster for more discussion.

Options for h2o query

detail specifies to display summary information of the nodes within the cluster in addition to displaying the H2O cluster information.

Technical note

When a local H2O cluster is created using h2o init, random credentials are automatically generated the first time to secure the cluster with HTTP Basic Authentication. These random credentials are automatically saved with Stata’s preferences and are reused by h2o init for future H2O sessions. When connecting to the cluster created by using h2o init, h2o connect will also use these credentials by default, unless you specify other credentials with the username() and password() options.

Clusters created by h2o init that may require access from outside Stata will require the credentials. You can type h2o credentials query to access current credentials. Credentials can be cleared and automatically regenerated the next time h2o init is called. To clear the credentials, use h2o credentials clear.

If you require a new h2o session without credentials or with different credentials, you can use H2O’s command line to create your H2O cluster and then use h2o connect in Stata to connect to that cluster.

Examples

 Launch a local H2O cluster
     . h2o init

 Query the H2O cluster information
     . h2o query

 Same as above, but also display each node's information
     . h2o query, detail

 List all available time zones on the H2O cluster
     . h2o list timezones

 Same as above, but list all US time zones
     . h2o list timezones US*

 Close the connection to the H2O cluster and reconnect to it
     . h2o disconnect
     . h2o connect

 Shut down the H2O cluster
     . h2o shutdown, force

Stored results

 h2o query stores the following in r():

 Scalars
   r(nodes)            number of nodes connecting to the H2O cluster
   r(total_cores)      total cores on the H2O cluster
   r(allowed_cores)    number of cores allowed to use by the client

 Macros
   r(url)              H2O connection URL
   r(version)          H2O version
   r(datatimezone)     H2O cluster data parsing time zone
   r(timezone)         H2O cluster time zone
   r(freemem)          H2O cluster free memory

 h2o credentials query stores the following in r():

 Macros
   r(username)         username currently used by the H2O cluster
   r(password)         password currently used by the H2O cluster