Put Stata variables into an H2O frame and vice versa¶
Syntax¶
Save data in memory to an H2O frame on the H2O cluster
_h2oframe _put [varlist] [if] [in] , into(newframename) [_put_options]
Load an existing H2O frame as the current Stata dataset
_h2oframe _get [using] framename [if] [in] [, _get_options]
Load a subset of columns in an existing H2O frame as the current Stata dataset
_h2oframe _get columnlist using framename [if] [in] [, _get_options]
varlist is a list of variable names in Stata’s current dataset.
columnlist is a list of column names in the H2O frame; see Specifying a list of columns for more information.
_put_options Description
-----------------------------------------------------------------------------------
* into(newframename) destination H2O frame
nolabel output numeric values (not labels) of
labeled variables
-----------------------------------------------------------------------------------
* into() is required.
_get_options Description
-----------------------------------------------------------------------------------
case(preserve|lower|upper) preserve the case or read column names as
lowercase (the default) or uppercase
asfloat load all floating-point data as floats
asdouble load all floating-point data as doubles
clear replace data in memory
-----------------------------------------------------------------------------------
Description¶
_h2oframe _put exports Stata’s current dataset to an H2O frame on the H2O cluster.
_h2oframe _get loads an existing H2O frame to Stata as the current dataset. All enum (categorical) columns are stored as string variables in the dataset.
When exporting Stata’s current dataset into an H2O frame with _h2oframe _put, all of Stata’s categorical/factor variables are stored as enum (categorical) columns. Read What is an H2O frame? for more information about the data types in an H2O frame. On the other hand, when loading an H2O frame into Stata by using _h2oframe _get, all enum (categorical) columns are stored as string variables in the dataset.
Options¶
Options for _h2oframe _put¶
into(newframename) specifies the destination H2O frame in which to store the Stata variables. into() is required.
nolabel specifies that the numeric values of labeled variables be exported to the H2O frame rather than the label associated with each value.
Options for _h2oframe _get¶
case(preserve|lower|upper) specifies the case of the column names after loading. The default is case(lower).
asfloat loads numeric data from the H2O frame as type float. The default storage type of the columns is determined by set type.
asdouble loads numeric data from the H2O frame as type double. The default storage type of the columns is determined by set type.
clear specifies to replace the data in memory, even though the current data have not been saved to disk.
Examples¶
Setup
. sysuse auto
Export this dataset to an H2O frame named auto1
. _h2oframe _put, into(auto1)
Look at what we just loaded
. _h2oframe _change auto1
. _h2oframe _describe
Read a subset of the data into another H2O frame named auto2 and then list
the contents of the frame
. _h2oframe _put make mpg foreign in 1/50, into(auto2)
. _h2oframe _change auto2
. _h2oframe _list
-----------------------------------------------------------------------------------
Load the data from the H2O frame auto1 into Stata as the current dataset and
then list the data
. _h2oframe _get auto1, clear
. list
-----------------------------------------------------------------------------------
Same as above, but only load a subset of the data
. _h2oframe _get make mpg rep78 foreign using auto1 in 1/10, clear
. list
Stored results¶
_h2oframe _get stores the following in r():
Scalars
r(N) number of rows loaded from the H2O frame
r(k) number of columns loaded from the H2O frame