Distinct values of a column in current H2O frame¶
Syntax¶
_h2oframe unique columnname [if] [in] [, options]
options Description
-----------------------------------------------------------------------------------
clean display string values without compound double quotes
missing include missing values of columnname in calculation
separate(separator) separator to serve as punctuation for the values of returned
list; default is a space
-----------------------------------------------------------------------------------
Description¶
_h2oframe unique displays a list of the distinct values of the column columnname. columnname may not be a string column.
Options¶
clean displays string values without compound double quotes. By default, each distinct string value is displayed within compound double quotes, because these are the most general delimiters. If you know that the string values in columnname do not include embedded spaces or embedded quotes, then clean is an appropriate option. clean does not affect the display of values from numeric columns.
missing specifies that missing values of columnname be included in the calculation. The default is to exclude them.
separate(separator) specifies a separator to serve as punctuation for the values of the returned list. The default is a space. A useful alternative is a comma.
Remarks¶
_h2oframe unique serves two different functions. First, it gives a compact display of the distinct values of columnname. More commonly, it is useful when you desire to cycle through the distinct values of columnname with (for example) foreach. _h2oframe unique leaves behind a list in r(uniques) that may be used in a subsequent command.
_h2oframe unique may hit the limits imposed by your Stata. However, it is typically used when the number of distinct values of columnname is not extremely large.
Examples¶
. sysuse auto
. _h2oframe put, into(auto)
. _h2oframe change auto
. _h2oframe unique rep78
. display "`r(uniques)'"
. _h2oframe unique rep78, sep(,)
. display "`r(uniques)'"
Stored results¶
_h2oframe unique stores the following in r():
Scalar
r(r) number of distinct values
Macro
r(uniques) list of distinct values