Import and export an H2O frame¶
Syntax¶
Import files into an H2O frame on the H2O cluster
_h2oframe import impath, into(newframename) [h2oframe_import_options]
Upload local files into an H2O frame on the H2O cluster
_h2oframe upload uppath, into(newframename) [h2oframe_upload_options]
Export current H2O frame to a delimited text file
_h2oframe export [using] filename [if] [in] [, h2oframe_export_options]
Export subset of current H2O frame to a delimited text file
_h2oframe export [columnlist] using filename [if] [in] [, h2oframe_export_options]
impath is the complete URL or normalized file path of the file(s) to be imported. impath can be the location of the file to be imported or the path to a directory with multiple files (of same format) to be imported. If impath contains embedded spaces, enclose it in double quotes.
uppath is the normalized file path of the file to be uploaded; it is the location of the local file to be uploaded. If uppath contains embedded spaces, enclose it in double quotes.
columnlist is a list of column names in the current H2O frame; see Specifying a list of columns for more information.
filename is the destination .csv file.
h2oframe_import_options Description
-----------------------------------------------------------------------------------
* into(newframename) destination H2O frame
header(#) treat first line of data as data or column
headers
delimiter("char") use char as delimiter
skipcols(numlist) skip the specified columns
nastring(string) interpret the specified strings as
missing values
pattern(string) import file(s) that matches the regular
expression; applies only if impath is a
folder
-----------------------------------------------------------------------------------
* into() is required.
h2oframe_upload_options Description
-----------------------------------------------------------------------------------
* into(newframename) destination H2O frame
header(#) treat first line of data as data or column
headers
delimiter("char") use char as delimiter
skipcols(numlist) skip the specified columns
nastring(string) interpret the specified strings as
missing values
-----------------------------------------------------------------------------------
* into() is required.
h2oframe_export_options Description
-----------------------------------------------------------------------------------
replace overwrite existing filename
-----------------------------------------------------------------------------------
Description¶
_h2oframe import loads files to an H2O cluster as an H2O frame. The data are loaded in parallel, using multi-threading, which makes it fast. The specified path can be a complete URL, a normalized path for the file(s), or a folder that contains the file(s). The path must be a valid cluster-side path for each node in the H2O cluster, which means the path must be accessible by each node within the cluster.
_h2oframe upload pushes a local file from disk to an H2O cluster as an H2O frame. In H2O jargon, it pushes data from the client to the cluster. The specified path must be a local path.
_h2oframe export exports an existing H2O frame to a .csv file on the local disk. Make sure you have enough disk space to accommodate the destination file because the H2O frame on the H2O cluster may be very large.
Options¶
Options for _h2oframe import¶
into(newframename) specifies the destination H2O frame into which the files are imported. into() is required.
header(#) specifies how to parse the first line of data. -1 means that the first line is parsed as data, and 1 means that the first line is parsed as column headers. 0 means to guess. The default is 0.
delimiter(“char”) allows you to specify a different separation character. For instance, if values in the file are separated by a semicolon, then you would specify delimiter(“;”). Specify delimiter(“\t”) to use a tab character, or specify delimiter(” “) to use whitespace as a delimiter. The default is delimiter(“,”).
skipcols(numlist) specifies the columns to be skipped (in other words, not imported). The columns are specified as indices starting from 1.
nastring(string) specifies a list of strings to be interpreted as missing values.
pattern(string) specifies a regular expression used to match one or more files if impath is a folder. For example, specifying *.csv will import all .csv files in the specified folder to the H2O frame.
Options for _h2oframe upload¶
into(newframename) specifies the destination H2O frame into which the files are uploaded. into() is required.
header(#) specifies how to parse the first line of data. -1 means that the first line is parsed as data, and 1 means that the first line is parsed as column headers. 0 means to guess. The default is 0.
delimiter(“char”) allows you to specify a different separation character. For instance, if values in the file are separated by a semicolon, then you would specify delimiter(“;”). Specify delimiter(“\t”) to use a tab character, or specify delimiter(” “) to use whitespace as a delimiter. The default is delimiter(“,”).
skipcols(numlist) specifies the columns to skip from upload. The columns are specified as indices starting from 1.
nastring(string) specifies a list of strings to be interpreted as missing values.
Options for _h2oframe export¶
replace specifies that filename be replaced if it already exists.
Examples¶
Read a file into the H2O cluster as an H2O frame named auto
. _h2oframe import https://www.stata.com/examples/auto.csv, into(auto)
Look at what we just loaded
. _h2oframe get auto
. list
-----------------------------------------------------------------------------------
Setup
. sysuse auto, clear
. export delimited auto.csv
Upload auto.csv into the H2O cluster as an H2O frame named auto2
. _h2oframe upload auto.csv, into(auto2)
Look at what we just loaded
. _h2oframe get auto2
. list
-----------------------------------------------------------------------------------
Setup
. _h2oframe change auto
Export the whole H2O frame to myauto.csv
. _h2oframe export myauto.csv
-----------------------------------------------------------------------------------
Setup
. _h2oframe change auto
Same as above, but only export a subset of the data. We use the replace option
because myauto.csv already exists.
. _h2oframe export make mpg rep78 foreign in 1/10 using myauto.csv, replace
Stored results¶
_h2oframe import and _h2oframe upload store the following in r():
Scalars
r(N) number of rows in the H2O frame
r(k) number of columns in the H2O frame