Import and export an H2O frame¶

Syntax¶

Import files into an H2O frame on the H2O cluster

    _h2oframe _import impath, into(newframename)
            [_h2oframe_import_options]

Upload local files into an H2O frame on the H2O cluster

    _h2oframe _upload uppath, into(newframename)
            [_h2oframe_upload_options]

Export current H2O frame to a delimited text file

    _h2oframe _export [using] filename [if] [in]
            [, _h2oframe_export_options]

Export subset of current H2O frame to a delimited text file

    _h2oframe _export [columnlist] using filename [if] [in]
            [, _h2oframe_export_options]

impath is the complete URL or normalized file path of the file(s) to be imported. impath can be the location of the file to be imported or the path to a directory with multiple files (of same format) to be imported. If impath contains embedded spaces, enclose it in double quotes.

uppath is the normalized file path of the file to be uploaded; it is the location of the local file to be uploaded. If uppath contains embedded spaces, enclose it in double quotes.

columnlist is a list of column names in the current H2O frame; see Specifying a list of columns for more information.

filename is the destination .csv file.

 _h2oframe_import_options                  Description
 -----------------------------------------------------------------------------------
 * into(newframename)                      destination H2O frame
   header(#)                               treat first line of data as data or column
                                             headers
   delimiter("char")                       use char as delimiter
   skipcols(numlist)                       skip the specified columns
   nastring(string)                        interpret the specified strings as
                                             missing values
   pattern(string)                         import file(s) that matches the regular
                                             expression; applies only if impath is a
                                             folder
 -----------------------------------------------------------------------------------
 * into() is required.
 
 _h2oframe_upload_options                  Description
 -----------------------------------------------------------------------------------
 * into(newframename)                      destination H2O frame
   header(#)                               treat first line of data as data or column
                                             headers
   delimiter("char")                       use char as delimiter
   skipcols(numlist)                       skip the specified columns
   nastring(string)                        interpret the specified strings as
                                             missing values
 -----------------------------------------------------------------------------------
 * into() is required.
 
 _h2oframe_export_options                  Description
 -----------------------------------------------------------------------------------
   replace                                 overwrite existing filename
 -----------------------------------------------------------------------------------

Description¶

_h2oframe _import loads files to an H2O cluster as an H2O frame. The data are loaded in parallel, using multi-threading, which makes it fast. The specified path can be a complete URL, a normalized path for the file(s), or a folder that contains the file(s). The path must be a valid cluster-side path for each node in the H2O cluster, which means the path must be accessible by each node within the cluster.

_h2oframe _upload pushes a local file from disk to an H2O cluster as an H2O frame. In H2O jargon, it pushes data from the client to the cluster. The specified path must be a local path.

_h2oframe _export exports an existing H2O frame to a .csv file on the local disk. Make sure you have enough disk space to accommodate the destination file because the H2O frame on the H2O cluster may be very large.

Options¶

Options for _h2oframe _import¶

into(newframename) specifies the destination H2O frame into which the files are imported. into() is required.

header(#) specifies how to parse the first line of data. -1 means that the first line is parsed as data, and 1 means that the first line is parsed as column headers. 0 means to guess. The default is 0.

delimiter(“char”) allows you to specify a different separation character. For instance, if values in the file are separated by a semicolon, then you would specify delimiter(“;”). Specify delimiter(“\t”) to use a tab character, or specify delimiter(” “) to use whitespace as a delimiter. The default is delimiter(“,”).

skipcols(numlist) specifies the columns to be skipped (in other words, not imported). The columns are specified as indices starting from 1.

nastring(string) specifies a list of strings to be interpreted as missing values.

pattern(string) specifies a regular expression used to match one or more files if impath is a folder. For example, specifying *.csv will import all .csv files in the specified folder to the H2O frame.

Options for _h2oframe _upload¶

into(newframename) specifies the destination H2O frame into which the files are uploaded. into() is required.

header(#) specifies how to parse the first line of data. -1 means that the first line is parsed as data, and 1 means that the first line is parsed as column headers. 0 means to guess. The default is 0.

delimiter(“char”) allows you to specify a different separation character. For instance, if values in the file are separated by a semicolon, then you would specify delimiter(“;”). Specify delimiter(“\t”) to use a tab character, or specify delimiter(” “) to use whitespace as a delimiter. The default is delimiter(“,”).

skipcols(numlist) specifies the columns to skip from upload. The columns are specified as indices starting from 1.

nastring(string) specifies a list of strings to be interpreted as missing values.

Options for _h2oframe _export¶

replace specifies that filename be replaced if it already exists.

Examples¶

 Read a file into the H2O cluster as an H2O frame named auto
     . _h2oframe _import https://www.stata.com/examples/auto.csv, into(auto)

 Look at what we just loaded
     . _h2oframe _get auto
     . list

 -----------------------------------------------------------------------------------
 Setup
     . sysuse auto, clear
     . export delimited auto.csv

 Upload auto.csv into the H2O cluster as an H2O frame named auto2
     . _h2oframe _upload auto.csv, into(auto2)

 Look at what we just loaded
     . _h2oframe _get auto2
     . list

 -----------------------------------------------------------------------------------
 Setup
     . _h2oframe _change auto

 Export the whole H2O frame to myauto.csv
     . _h2oframe _export myauto.csv

 -----------------------------------------------------------------------------------
 Setup
     . _h2oframe _change auto

 Same as above, but only export a subset of the data. We use the replace option
 because myauto.csv already exists.
     . _h2oframe _export make mpg rep78 foreign in 1/10 using myauto.csv, replace

Stored results¶

 _h2oframe _import and _h2oframe _upload store the following in r():

 Scalars
   r(N)                number of rows in the H2O frame
   r(k)                number of columns in the H2O frame