Dear statalisters,
I am trying to calculate the so-called h index for a large number of
scientists. The h index of a scientist and the highest integer h such
that the scientist has h papers cited at least h times.
For example, for the scientist below, the h index is 19.
scientist_id article_id nbcites
GEORGE 10101157 8
GEORGE 12242494 10
GEORGE 11156976 12
GEORGE 9409826 19
GEORGE 7635312 23
GEORGE 7799970 23
GEORGE 11290701 28
GEORGE 8034742 42
GEORGE 8334302 43
GEORGE 2656402 74
GEORGE 2005819 79
GEORGE 2643162 111
GEORGE 8943317 127
GEORGE 1956405 146
GEORGE 9314530 153
GEORGE 2404021 204
GEORGE 3049620 302
GEORGE 2195038 373
GEORGE 2476649 393
GEORGE 2005809 527
GEORGE 6365931 614
GEORGE 6365930 670
I have written a program that calculates this for one scientist (see
below). The problem is that I have a very large number of scientists,
and so would like to combine the program below with "by scientist_id:"
I am not sure exactly how to do that in stata. Could any one help?
Thanks,
Pierre
gen h_index=.;
local N = _N;
forvalues i = 1(1)`N'
{;
display `i';
replace h_index=`N'-`i'+1 if (nbcites[`i']>=`N'-`i'+1 & h_index==.);
replace h_index=`N'-`i'+1 if (nbcites[`i']>=`N'-`i'+1 &
h_index<`N'-`i'+1 & h_index!=.);
};
-------------------------------------------------------------------
Pierre Azoulay
Assistant Professor of Strategy
Massachusetts Institute of Technology
Sloan School of Management
50 Memorial Drive — E52-555
Cambridge, MA 02142-1947
Tel [Sloan]: (617) 258-9766
Tel [NBER]: (617) 588-1464
Fax: (617) 253-2660
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/