Create a new H2O frame¶
Syntax¶
_h2oframe _create newframename [, options]
options Description
-----------------------------------------------------------------------------------
rows(#) specify number of rows
cols(#) specify number of columns
norandomize specify not to generate the data values randomly
value(#) specify the value for all numeric columns when
norandomize is specified
realfraction(#) specify the fraction of real columns
realrange(#) specify the range of values for real columns
catfraction(#) specify the fraction of categorical columns
factors(#) specify the number of factor levels in each
categorical column
intfraction(#) specify the fraction of int columns
intrange(#) specify the range of values for int columns
binfraction(#) specify the fraction of binary-valued categorical
columns
binonefraction(#) specify the fraction of ones for
binary-valued categorical columns
timefraction(#) specify the fraction of time columns
strfraction(#) specify the fraction of string columns
missfraction(#) specify the fraction of total entries in the frame
to be missing
response prepend an additional response column to the frame
resfactors(#) specify the number of factor levels in the response
column
rseed(#) specify the random-number seed used to generate the
random values
rseedcoltype(#) specify the random-number seed used to generate the
random column types
-----------------------------------------------------------------------------------
Description¶
_h2oframe _create creates a new H2O frame with random data. The new H2O frame may contain real, int, enum (categorical), time, and string columns. If you are not familiar with H2O frames, read What is an H2O frame?.
Options¶
rows(#) specifies the number of rows to generate in the destination H2O frame. The default is 10,000.
cols(#) specifies the number of columns to generate in the destination H2O frame. The default is 10.
norandomize specifies not to randomly generate the data values in the numeric columns of the destination H2O frame.
If norandomize is specified, the data values in the destination H2O frame will be equal to the value specified in value(), or they will be missing values if the missing fraction specified in missfraction() is not 0.
value(#) specifies the value for the numeric columns of the destination H2O frame when norandomize is specified. The default is 0.
realfraction(#) specifies the fraction of real columns in the destination H2O frame. The default is 0.5.
realrange(#) specifies the range of data values for all real columns. The default is 100.0, which means that all data values in real columns are between -100.0 and 100.0, inclusive.
catfraction(#) specifies the fraction of enum (categorical) columns in the destination H2O frame. The default is 0.2.
factors(#) specifies the number of factor levels in each enum column. The default is 100.
intfraction(#) specifies the fraction of int columns in the destination H2O frame. The default is 0.2.
intrange(#) specifies the range of data values for all int columns. The default is 100, which means that all data values in int columns are between -100 and 100, inclusive.
binfraction(#) specifies the fraction of binary-valued enum columns in the destination H2O frame. The default is 0.1.
binonefraction(#) specifies the fraction of ones in a binary-valued enum column. The default is 0.02.
timefraction(#) specifies the fraction of time columns in the destination H2O frame. The default is 0.
strfraction(#) specifies the fraction of string columns in the destination H2O frame. The default is 0.
missfraction(#) specifies the fraction of total entries in the destination H2O frame to be missing. The default is 0 if norandomize is specified and is 0.01 otherwise.
response specifies that an additional response column be prepended to the destination H2O frame, which makes the total number of columns cols() + 1.
resfactors(#) specifies the number of factor levels in the response column added with the option response.
rseed(#) sets the random-number seed used to generate data values in the destination H2O frame. This option can be used to reproduce the data in the H2O frame.
rseedcoltype(#) sets the random-number seed used to generate column types in the destination H2O frame. This option can be used to reproduce the data in the H2O frame.
Examples¶
Create a new H2O frame with 10,000 rows and 10 columns
. _h2oframe _create frame1, rseed(17) rseedcoltype(17)
. _h2oframe _change frame1
. _h2oframe _describe
. _h2oframe _list in 1/10
Same as above, but include a string column
. _h2oframe _create frame2, strfraction(0.1) rseed(17) rseedcoltype(17)
. _h2oframe _change frame2
. _h2oframe _describe
. _h2oframe _list in 1/10
Create a new H2O frame with all real values set to 5
. _h2oframe _create frame3, norandomize value(5)
. _h2oframe _change frame3
. _h2oframe _describe
. _h2oframe _list in 1/10