Nick, Thanks for the tips.
The first suggestion that you mentioned, essentially the following:
. egen cited2 = group(cited) ;
. gen allcited = "" ;
. tostring citing ;
. tostring cited2 ;
. bysort citing (cited2) : replace allcited = allcited[_n-1] + " " +
cited2 ;
. by citing : keep if _n == _N ;
. bysort allcited (citing) : gen counter = _n - 1 ;
. sort citing ;
As can be expected, when it tries to do the first -bysort-, it returns
the error message "no room to add more variables due to width". My
question is: Is there a best way to truncate the "concatenation" before
it goes over the max (presumably 255?), preferably without any loops?
Chihmao.
P.S. I tried the second option with reshape that you suggested -- it is
consuming much more computing time than this -bysort- method, so I will
stick with this.
-----------------------------------------------------
Chihmao Hsieh
John M. Olin School of Business
Washington University
Box 1133, One Brookings Drive
St. Louis, MO 63130
Email: [email protected]
http://students.olin.wustl.edu/~hsieh
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Nick Cox
Sent: Tuesday, September 30, 2003 11:41 AM
To: [email protected]
Subject: st: RE: RE: RE: RE: RE: Using -collapse- extensively to find
historical, irregular matches: Better way?
Should be _n - 1, not _N - 1.
Nick
[email protected]
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]On Behalf Of Nick Cox
> Sent: 30 September 2003 16:59
> To: [email protected]
> Subject: st: RE: RE: RE: RE: Using -collapse- extensively to find
> historical, irregular matches: Better way?
>
>
> Chih-Mao Hsieh
> >
> > I had been shying away from converting "cited" to
> > strings because the numbers are in the millions, i.e.
> > strings would be length 7. Many of the "citing" patents
> > have more than 35-40 "cited" patents, and so the
> > concatenation might surpass the string's length limit.
> >
> > Of course, the chances are not high that two patents
> > would match each other over the first 35 patents, so your
> > way does appear to be better.
>
> Another way is to -reshape-, something
> like this:
>
> bysort citing (cited) : gen j = _n
> reshape wide cited, i(citing) j(j)
> bysort cited* (citing) : gen counter = _N - 1
>
> At this moment, I think that's a lot better
> than my earlier suggestions.
>
> Nick
> [email protected]
>
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/