Thanks to Kit Baum, a new version of the -keyby- package is now available for download from SSC. In Stata, use the -ssc- command to do this, or -adoupdate- if you already have an earlier version of -keyby-.
The -keyby- package is described as below on my website. The new version has a second module -keybygen-, which sorts a dataset by a varlist, which does not necessarily uniquely identify the observations, and generates a new variable, containing, in each observation, the sequential order of the observation in its by-group. This variable is appended to the end of the existing varlist to form a primary key, which uniquely identifies the observations, and by which the dataset is sorted. The -keyby- package is therefore a "clean" version of -sort-. It has a companion package -addinby-, also downloadable from SSC, which is a "clean" version of -merge-. Together, the 2 packages can be used to enforce the relational database model, in which a dataset is a mathematical function, whose domain is the set of existing value combinations of its primary key variables, and whose range is the set of all possible value combinations of its non-key variables.
Best wishes
Roger
Roger B Newson BSc MSc DPhil
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group
National Heart and Lung Institute
Imperial College London
Royal Brompton Campus
Room 33, Emmanuel Kaye Building
1B Manresa Road
London SW3 6LR
UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: [email protected]
Web page: http://www.imperial.ac.uk/nhli/r.newson/
Departmental Web page:
http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/
Opinions expressed are those of the author, not of the institution.
-----------------------------------------------------------------------------
package keyby from http://www.imperial.ac.uk/nhli/r.newson/stata10
-----------------------------------------------------------------------------
TITLE
keyby: Key the dataset by a variable list
DESCRIPTION/AUTHOR(S)
keyby sorts the dataset currently in memory by the variables in a
varlist, checking that the variables in the varlist uniquely
identify the observations. This makes the variables in the
varlist a primary key for the dataset in memory. If the user does
not specify otherwise, then keyby also reorders the variables in
the varlist to the start of the variable order in the dataset, and
checks that all values of these variables are nonmissing.
keybygen sorts the dataset currently in memory by the variables in
a varlist, preserving the existing order of observations within
each by-group, and then generates a new variable, containing the
sequential order of each observation within its by-group, to form
a primary key with the existing variables in the varlist. keyby
and keybygen can be useful if the user combines multiple datasets
using merge, which may cause a dataset in memory to become
unsorted.
Author: Roger Newson
Distribution-Date: 19april2009
Stata-Version: 10
INSTALLATION FILES (click here to install)
keyby.ado
keybygen.ado
keyby.sthlp
keybygen.sthlp
-----------------------------------------------------------------------------
(click here to return to the previous screen)
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/