Work with H2O frames¶
In this entry, we discuss how to manipulate data on the H2O cluster from within Stata. If you are new to the H2O cluster, see Introduction to integration with H2O for more information. When we refer to ``the data’’ in this entry, we are referring to an H2O frame, which is the main object used for data manipulation within the H2O cluster. Note that an H2O frame has no relationship to Stata’s data frame.
H2O frames live on the H2O cluster and do not exist in Stata’s memory. Stata loads data into the H2O cluster and stores the data in the form of an H2O frame. Once the data are stored, all the data manipulation operations on them, such as data generation and replacement, are carried out by H2O instead of by Stata. Users can send requests from within Stata to manipulate and keep track of those frames. Below, we present how this works.
Introduction to H2O frames¶
Create and manipulate H2O frames¶
- Create a new H2O frame
- Import and export an H2O frame
- Put Stata variables into an H2O frame and vice versa
- Split an existing H2O frame into multiple H2O frames
- Display names of all H2O frames on the H2O cluster
- Rename existing H2O frame
- Drop frame from H2O cluster
- Make a copy of an H2O frame
- Append multiple H2O frames rowwise and columnwise
Switch H2O frames¶
Work on current H2O frame¶
- Describe data in current H2O frame
- List values of columns in the current H2O frame
- Drop or keep columns or observations from the current H2O frame
- Summary statistics for the current H2O frame
- Rename columns in the current H2O frame
- Change column types in current H2O frame
- Ascending and descending sort
- Distinct values of a column in current H2O frame
- Create or change contents of column in current H2O frame
- Scale and center columns in the current H2O frame