| |
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: RE: Re: SSC archive stats
Kit,
Could you explain the relationship between the SSC Archive activity
statistics that you circulate monthly, and the access statistics for
software packages on RePEc/LogEc?
For example, in your October 2006 SSC Archive activity email, the top
entry is
npkghit author package
1. 956.00 John Luke Gallup OUTREG
The RePEc download statistics for -outreg- show the following:
Access Statistics for the software item
Month Abstract Views Downloads
2006-10 224 121
What is the source of the large difference between the SSC and RePEc
downloads?
Cheers,
Mark
NB: For everybody's convenience, Kit's posting to Statalist in which he
describes how the npkghit figure is calculated is reproduced below.
> -----Original Message-----
> From: Kit Baum [mailto:[email protected]]
> Sent: Tuesday, October 18, 2005 11:23 PM
> To: David Roodman
> Cc: statalist
> Subject: st: Re: SSC archive stats
>
> David,
>
> I'm afraid that my prior explanations of this issue have been
> lost in the mists of time; my earliest post on Statalist @
> HSPH (Jan 2001) says 'please see prior month's explanation',
> but the archives do not go back to 2000, and not even Google
> can find them. So I will try to reconstruct the logic.
>
> The web server records every "hit" on an ado-file. ado-files
> are part of a package, which may contain a single ado- (and
> hlp-) file or may contain several or many indeed (e.g.
> egenmore). A 'score' should not be inflated by an author's
> inclusion of many files in a package, but we don't track
> packages; we track hits on individual ado files. Many ado
> files are multiply authored, and we want each author to
> receive credit for his or her work. So we have one file--an
> extract from the web server log
> /http://fmwww.bc.edu/fmrc/reports/report.ssc.html
> which says that, e.g., the single file xtabond2.ado was
> requested 478 times last month.
>
> From the RePEc templates that define the SSC Archive, we
> generate another file (with perl)* in which each record
> contains the name of one ado-file, the SSC package of which
> it is a part, the number of ado-files in that package and an
> author's name. There is a record for each author|ado
> combination. So for instance your xtabond2 records in this
> file look like
> XTABOND2 /repec/bocode/x/xtabond2.ado 2 David Roodman
> XTABOND2 /repec/bocode/x/xtab2_p.ado 2 David Roodman
> defining a package which has two components and one author.
>
> Stata then reads the first file (containing the web server log
> excerpt) and merges the second file on the URL field shared
> by both files (the URL from which that ado may be downloaded
> at SSC). npkghit is generated as nhit/nmods -- so if
> xtabond2.ado and xtab2_p were both downloaded 478 times,
> npkghit would be 478 as well. But last month, xtab2_p was
> downloaded only 393 times, so npkghit is now potentially
> fractional (and for xtabond2 is 435.5). We then collapse this
> file to compute the sum of npkghit by(author package). This
> gives us the first listing distributed in my monthly emails,
> which shows e.g.
> 2. 435.50 David Roodman XTABOND2
>
> We then collapse to compute the sum of npkghit by(author) to
> generate the second listing, e.g.
> 6. David Roodman 594.25
> which reflects the totals from the several packages you have
> authored on SSC.
>
> In contrast to the citation-count literature fashionable
> among tenure and promotion committees, I do not give each
> author of a package with N authors 1/Nth of the hits; I give
> each author all the hits.
>
> So where are these fractions coming from? Recall that when
> you 'ssc install xtabond2, replace' Stata is smart--it only
> downloads the files which have changed. You updated
> xtabond2.ado but did not update xtab2_p.ado recently. Those
> updating their copies of xtabond2 installed only one file,
> while those installing it for the first time installed two.
> That explains why the 'hit counts' are not equal for all
> files in a package. (If people are downloading files from a
> web browser--even to just look at them on the screen--this
> would also be the case; they might look at the main .ado and
> not be interested in ancillary files).
>
> Historical note: the one-page Stata program that does this
> manipulation is named 'forthedean.do', written to satisfy a
> UK economist who thought these stats would be appreciated by the Dean.
>
> Now that I have explained this, I should be able to find the
> explanation in the Statalist archives for the next five years or so!
>
> Cheers
> Kit
>
> * Note to Bill Gould: this perl program predated Stata's
> -file- command. If I wrote it today, I'd do it in Stata. It
> couldn't readily be done at the time I started crunching
> these numbers.
>
> Kit Baum, Boston College Economics
> http://ideas.repec.org/e/pba1.html
>
>
> On Oct 18, 2005, at 5:45 PM, David Roodman wrote:
>
> > Kit, is there documentation somewhere for how you massage the SSC
> > download stats? How do the fractions come about in the
> number of hits?
> > Thanks much.
> >
> > --David
> >
> >
> >
> > David Roodman
> >
> > Research Fellow
> >
> > Center for Global Development
> >
> > 1776 Massachusetts Ave. NW
> >
> > Washington, DC 20036
> >
> > [email protected]
> >
> > +1 (202) 416-0723
> >
> > fax: +1 (202) 416-0750
> >
> >
> >
> >
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/