Chih-Mao Hsieh
> I have a
> data file with three columns: citing, cited, nclass. For
> every "citing", there are multiple "cited", and for each
> "cited" there is a "nclass". The file is sorted by citing,
> then nclass. I need a program to count the number of
> unique "nclass" strings associated to each "citing".
>
> As a simple example, given the following data file "data.dta":
>
> citing cited nclass
> 100 20 12
> 100 22 15
> 100 23 15
> 101 32 14
> 101 33 15
> 101 34 15
> 101 40 17
>
> I need the following output file:
>
> citing numpatclass
> 100 2 [12 and 15 are unique, 15 is repeated]
> 101 3 [14, 15, 17 are unique, 15 is repeated]
Phil Ryan gave excellent advice explaining how
this can be done, without loops, by using -by:-.
In addition, note the FAQ
How do I compute the number of distinct observations?
http://www.stata.com/support/faqs/data/distinct.html
which explains approaches using -by:-, similar in
spirit to Phil's solution, and also gives manual
references and references to user-written software
in this area.
Thus, a canned solution here is
bysort citing : egen numpatclass = nvals(nclass)
by citing : keep if _n== 1
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/