Chih-Mao Hsieh
 
> I have a 
> data file with three columns: citing, cited, nclass.  For 
> every "citing", there are multiple "cited", and for each 
> "cited" there is a "nclass".  The file is sorted by citing, 
> then nclass.  I need a program to count the number of 
> unique "nclass" strings associated to each "citing".
> 
> As a simple example, given the following data file "data.dta":
> 
> citing     cited         nclass
> 100         20            12
> 100         22            15
> 100         23            15
> 101         32            14
> 101         33            15
> 101         34            15
> 101         40            17
> 
> I need the following output file:
> 
> citing    numpatclass
> 100            2             [12 and 15 are unique, 15 is repeated]
> 101            3             [14, 15, 17 are unique, 15 is repeated]
Phil Ryan gave excellent advice explaining how 
this can be done, without loops, by using -by:-. 
In addition, note the FAQ 
How do I compute the number of distinct observations?
http://www.stata.com/support/faqs/data/distinct.html
which explains approaches using -by:-, similar in 
spirit to Phil's solution, and also gives manual 
references and references to user-written software
in this area. 
Thus, a canned solution here is 
bysort citing : egen numpatclass = nvals(nclass)
by citing : keep if _n== 1 
Nick 
[email protected] 
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/