Describe data in current H2O frame¶
Syntax¶
_h2oframe _describe [columnlist] [, options]
columnlist is a list of column names in the H2O frame; see Specifying a list of columns for more information.
options Description
-----------------------------------------------------------------------------------
simple display only column names
fullnames do not abbreviate column names
numbers display column number along with name
replace make dataset of description, not written report (the default)
clear replace the data in memory; only valid with replace
columnlist programmer's option; store r(columnlist) in addition to
usual stored results
-----------------------------------------------------------------------------------
Description¶
_h2oframe _describe produces a summary of the data in the current H2O frame.
For a compact listing of column names, use _h2oframe _describe, simple.
Options¶
simple displays only the column names in a compact format. simple may not be combined with other options.
fullnames specifies that _h2oframe _describe display the full names of the columns. The default is to present an abbreviation when the column name is longer than 15 characters. The fullnames and numbers options may not be specified together.
numbers specifies that _h2oframe _describe present the column number with the column name. If numbers is specified, column names are abbreviated when the name is longer than eight characters. The numbers and fullnames options may not be specified together.
replace and clear are alternatives to the options above. _h2oframe _describe usually produces a written report, and the options above specify what the report is to contain. If you specify replace, however, no report is produced; the information about the H2O frame that the report would have presented will be loaded into Stata as the current dataset. Each observation of the new data describes a column in the H2O frame; see _h2oframe _describe, replace below.
clear may be specified only when replace is specified. clear specifies that the data in memory be cleared and replaced with the description information, even if the original data have not been saved to disk.
columnlist, an option for programmers, specifies that r(columnlist) be stored in addition to the usual stored results. r(columnlist) will contain the names of the columns described.
Remarks¶
Remarks are presented under the following headings:
_h2oframe _describe¶
If _h2oframe _describe is typed without any column names, the contents of the data in the working H2O frame are described.
_h2oframe _describe, replace¶
_h2oframe _describe with the replace option is rarely used. _h2oframe _describe, replace replaces the data in memory with a dataset in which each observation describes a column in the current H2O frame. The new variables are
position, a variable containing the numeric position of the original column (1, 2, 3, …).
column, a variable containing the name of the original column, such as “make”, “price”, “mpg”, and so on.
type, a variable containing the storage type of the original column, such as “real”, “int”, “enum”, and “string”. See What is an H2O frame? for more information about the data types in an H2O frame.
missing, a variable containing the number of missing values in the original column.
zeros, a variable containing the number of zeros in the original column.
pinf, a variable containing the number of values set to positive infinity in the original column.
ninf, a variable containing the number of values set to negative infinity in the original column.
cardinality, a variable containing the number of categorical levels in the original column if the column is type enum.
Examples¶
Setup
. sysuse auto
. _h2oframe _put, into(auto)
. _h2oframe _change auto
Describe dataset in current H2O frame
. _h2oframe _describe
Describe all columns whose names begin with t* for the current H2O frame
. _h2oframe _describe t*
Describe dataset in current H2O frame, displaying full column names
. _h2oframe _describe, fullnames
Replace the dataset in memory with meta information on the current H2O frame
. _h2oframe _describe, replace
Stored results¶
_h2oframe _describe stores the following in r():
Scalars
r(N) number of rows in the H2O frame
r(k) number of columns in the H2O frame
Macro
r(columnlist) columns described (if columnlist specified)
_h2oframe _describe, replace stores nothing in r().