Split an existing H2O frame into multiple H2O frames¶
Syntax¶
_h2oframe split framename, into(newframelist) [options)]
options Description
-----------------------------------------------------------------------------------
* into(newframelist) specify list of destination H2O frames
split(numlist) specify numlist of proportions or ratios for the split
rseed(#) specify random-number seed
replace replace the H2O frames if they already
exist
-----------------------------------------------------------------------------------
* into() is required.
Description¶
_h2oframe split splits an existing H2O frame into a list of H2O frames based on the specified proportions or ratios for each frame.
Options¶
into(newframelist) specifies a list of names for the new frames that will be created by splitting the existing H2O frame. into() is required and at least two names must be specified.
split(numlist) splits the H2O frame into a list of H2O frames whose sizes are proportional to the values of numlist. The values of numlist can be any positive number. You can specify proportions that sum to 1, or you can specify integers that define ratios for the sizes of the resulting frames. Regardless of whether you specify decimals less than 1 or integers, the proportions of the split are given by the values in numlist divided by their sum. The number of values specified in split() must be equal to the number of names specified in into(). The default is split(0.75 0.25).
Unlike Stata’s splitsample command, which performs an exact split on Stata’s current dataset, _h2oframe split may not provide an exact split. Instead, the proportions among the resulting H2O frames may be approximate to the proportions you provided. This is because H2O uses a probabilistic splitting method to segment the original frame, which will be more efficient for big data than the exact splitting method.
rseed(#) sets the random-number seed. This option can be used to reproduce the split.
replace specifies that if an H2O frame with the same name as specified in into() already exists, its content will be replaced by the new H2O frame.
Examples¶
Setup
. webuse iris, clear
. _h2oframe put, into(iris)
Split the iris H2O frame into two H2O frames, with approximately 80% of the data
stored in frame iris1 and 20% of the data stored in frame iris2
. _h2oframe split iris, into(iris1 iris2) split(0.8 0.2) rseed(17)
List all H2O frames
. _h2oframe dir
-----------------------------------------------------------------------------------
Split the iris data into two samples by using splitsample, with 80% of observations
in sample 1 and 20% of observations in sample 2
. webuse iris, clear
. splitsample, generate(svar, replace) split(0.8 0.2)
Store each sample in a separate H2O frame
. _h2oframe put if svar==1, into(iris3)
. _h2oframe put if svar==2, into(iris4)
List all H2O frames
. _h2oframe dir