Chih-Mao Hsieh
> The first suggestion that you mentioned, essentially the following:
>
> . egen cited2 = group(cited) ;
>
> . gen allcited = "" ;
> . tostring citing ;
> . tostring cited2 ;
>
> . bysort citing (cited2) : replace allcited = allcited[_n-1] + " " +
> cited2 ;
> . by citing : keep if _n == _N ;
> . bysort allcited (citing) : gen counter = _n - 1 ;
> . sort citing ;
>
> As can be expected, when it tries to do the first -bysort-,
> it returns
> the error message "no room to add more variables due to width". My
> question is: Is there a best way to truncate the
> "concatenation" before
> it goes over the max (presumably 255?), preferably without
> any loops?
In general, as memory is short, -compress- and -drop- any
variables you don't need.
You are of course right that for this approach -cited2- needs
to be string. However, once you have -cited2- you do not
need -cited-, at least for the purpose of identifying which
groups match. (-cited- is needed for identifying on which
patents they match.)
In addition, you could -drop- any observations for which
no patent is cited, although there may be none.
You could match on the first so many patents, e.g. 7:
egen cited2 = group(cited)
gen allcited = ""
bysort citing (cited2) : replace allcited = allcited[_n-1] + " " +
string(cited2) if _n <= 7
by citing : replace allcited = allcited[_n-1] if mi(allcited)
by citing : keep if _n == _N
bysort allcited (citing) : gen counter = _n - 1
sort citing
> P.S. I tried the second option with reshape that you
> suggested -- it is
> consuming much more computing time than this -bysort-
> method, so I will
> stick with this.
A pity, as I think that -reshape- offers a much cleaner
approach.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/