Start, connect, and query an H2O cluster

Syntax

Start a new H2O cluster or connect to an existing H2O cluster

    h2o init [, init_options]

Connect to an existing H2O cluster

    h2o connect [, connect_options]

Shut down the H2O cluster

    h2o shutdown [, force]

Query current H2O cluster information

    h2o query [, detail]

Open H2O Flow UI

    h2o flow

Enable/disable H2O job progress bar

    h2o set progress { on | off }

Set the time zone on the H2O cluster

    h2o set timezone tz

List all the acceptable time zones by the H2O cluster

    h2o list timezones [pattern]

pattern is one of the following: *, _all, *name*, *name, or name*. Specifying nothing, _all, or * lists all results. Specifying *name* lists all results containing name. Specifying *name lists all results ending with name. Specifying name* lists all results starting with name.

 init_options                              Description
 -----------------------------------------------------------------------------------
 ip(string)                                IP address where the cluster is running
 port(#)                                   port number the cluster listens to
 nthreads(#)                               number of threads to use
 novercheck                                specify not to check the cluster and
                                             client H2O versions
 -----------------------------------------------------------------------------------
 
 connect_options                           Description
 -----------------------------------------------------------------------------------
 url(string)                               full URL of the cluster to connect
 ip(string)                                IP address where the cluster is running
 port(#)                                   port number the cluster listens to
 novercheck                                specify not to check the cluster and
                                             client H2O versions
 -----------------------------------------------------------------------------------

Description

h2o provides utilities for accessing H2O from within Stata. H2O is a scalable and distributed open-source machine learning and predictive analytics platform. With these utilities, users can start or connect to an H2O cluster to access H2O’s capabilities. See H2O intro for more discussion about H2O clusters.

h2o init attempts to connect to a local or remote H2O cluster by default. If one is not found, it starts a new local H2O cluster and connects to it. The remote cluster is specified by an IP address and a port number.

h2o connect connects to an existing local or remote H2O cluster. The remote cluster is specified by an IP address and a port number, or by a URL address.

h2o shutdown shuts down or disconnects the H2O cluster from within Stata.

h2o query lists the current H2O cluster information.

h2o flow opens H2O Flow UI in the browser.

h2o set progress sets whether to display the H2O execution progress. The execution progress is displayed as a percentage.

h2o set timezone sets the time zone on the H2O cluster.

h2o list timezone lists all acceptable time zones and their aliases or those that meet specified criteria by the H2O cluster.

Options

Options for h2o init

ip(string) specifies the IP address where the H2O cluster is running. The address is specified as a string of format #.#.#.#.

By default, h2o init will check whether there is an H2O cluster running at localhost:54321 with IP address 127.0.0.1. When ip() is specified, h2o init will check on this specified address. If there is a cluster, h2o init will try to connect to it. If the connection fails, h2o init will launch a local H2O cluster running at localhost:54321.

port(#) specifies the port number the H2O cluster listens to. The default is 54321. It must be an integer between 1 and 65535.

nthreads(#) specifies the maximum number of parallel threads to use when launching the H2O cluster. This option is used only when Stata starts a local H2O cluster.

novercheck specifies not to check whether the H2O version on the cluster matches the H2O version that Stata, the client, uses.

When connecting to an existing H2O cluster, if the H2O versions do not match, a warning is displayed to indicate the differences. You can still interact with the remote cluster. This will only cause problems when there are changes in the REST API between the two versions, and those changes could cause a failure on the client side. novercheck suppresses this check and the display of this message.

When a local H2O cluster is launched from within Stata, there will be no such problem and thus no warning is displayed.

Options for h2o connect

url(string) specifies the full URL address of the H2O cluster to connect. There are two ways to connect to an existing H2O cluster: either by specifying a full URL address in the form of ip:port or through the IP address and port number. If none of those is specified, h2o connect will check whether there is an H2O cluster running at localhost:54321 with IP address 127.0.0.1. If there is a cluster, h2o connect will connect to it. Otherwise, an error is issued. url() may not be specified with ip() and port().

ip(string) specifies the IP address where the H2O cluster is running. The address is specified as a string of format #.#.#.#. ip() may not be specified with url().

If ip() is specified and there is an existing H2O cluster running on this address, h2o connect will try to connect to this specified cluster. If not successful, h2o connect will throw an error.

port(#) specifies the port number the H2O cluster listens to. The default is 54321. It must be an integer between 1 and 65535. port() may not be specified with url().

novercheck specifies not to check whether the H2O version on the cluster matches the H2O version that Stata, the client, uses.

When connecting to an existing H2O cluster, if the H2O versions do not match, a warning is displayed to indicate the differences. You can still interact with the remote cluster. This will only cause problems when there are changes in the REST API between the two versions, and those changes could cause a failure on the client side. novercheck suppresses this check and the display of this message.

Options for h2o shutdown

force specifies that the H2O cluster be forced to shut down from Stata.

If the cluster was started locally by Stata through h2o init, then h2o shutdown will fail by default and claim “Shutting it down will also close Stata”. This is because shutting down the H2O server will destroy the Java virtual machine (JVM) and exit the application that initialized the JVM. The JVM is initialized through Stata, so it will also close Stata. Specifying the force option will force the shutting down of the cluster and exit Stata.

If the cluster exists on a remote machine and you connected to it by typing h2o connect in Stata, then h2o shutdown will close the H2O session between Stata and the cluster. Specifying the force option will shut down the remote H2O cluster.

See Close and disconnect from the H2O cluster for more discussion.

Options for h2o query

detail specifies to display summary information of the nodes within the cluster in addition to displaying the H2O cluster information.

Examples

 Launch a local H2O cluster
     . h2o init

 Query the H2O cluster information
     . h2o query

 Same as above, but also display each node's information
     . h2o query, detail

 List all available time zones on the H2O cluster
     . h2o list timezones

 Same as above, but list all US time zones
     . h2o list timezones US*

 Shut down the H2O cluster and exit Stata
     . h2o shutdown, force

Stored results

 h2o query stores the following in r():

 Scalars
   r(nodes)            number of nodes connecting to the H2O cluster
   r(total_cores)      total cores on the H2O cluster
   r(allowed_cores)    number of cores allowed to use by the client

 Macros
   r(url)              H2O connection URL
   r(version)          H2O version
   r(datatimezone)     H2O cluster data parsing time zone
   r(timezone)         H2O cluster time zone
   r(freemem)          H2O cluster free memory