Hi Jacob,
Another solution is to keep the last observation for each Name and drop
the rest as follows:
by Name, sort: drop if _n != _N
Then you can summarize or count.
Mostafa
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of jwegelin
Sent: Friday, January 04, 2008 12:00 PM
To: [email protected]
Subject: st: tabulate question
There is probably an obvious answer to this.
Suppose you have a dataset in long form with repeated (say, yearly)
measures on a large number of individuals. The data are *not* balanced:
Some individuals have 10 or more observations (rows); some have only one
observation. You'd like to find out how many individuals are in your
dataset.
In the current example, mytmp.dta contains NAME, the id variable
(labeled numeric), as well as Sex01 (labeled numeric), Age, and Year.
Below are two solutions, neither particularly elegant. Either way, a
couple steps are required to find that there are 3969 individuals in the
dataset. Both ways replace the long dataset with a dataset which is only
useful for counting the number of individuals. Is there a more elegant
solution?
One way *not* to do it is
tabulate NAME
since this causes an inconveniently long table to scroll past which has
a row for each NAME.
One solution is to pick any variable other than NAME (provided the
variable has no missing values) and collapse as follows:
. use mytmp, clear
. summarize NAME
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
NAME | 8282 1995.673 1154.142 1 3969
. collapse (count) nYears=Sex01, by(NAME)
. summarize NAME
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
NAME | 3969 1985 1145.896 1 3969
Another way is to use reshape:
. use mytmp, clear
. keep NAME Age Year Sex01
. reshape wide Age, i(NAME) j(Year)
(note: j = 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007)
Data long -> wide
------------------------------------------------------------------------
-----
Number of obs. 8282 -> 3969
Number of variables 4 -> 24
j variable (22 values) Year -> (dropped)
xij variables:
Age -> Age1986 Age1987 ...
Age2007
------------------------------------------------------------------------
-----
. summarize NAME
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
NAME | 3969 1985 1145.896 1 3969
Thanks for any suggestions.
Jacob A. Wegelin
Assistant Professor
Department of Biostatistics
Virginia Commonwealth University
730 East Broad Street Room 3006
P. O. Box 980032
Richmond VA 23298-0032
U.S.A.
http://www.people.vcu.edu/~jwegelin
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/