|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: how to generate sum of distinct id1, by id2, in the last n years
From |
"Austin Nichols" <[email protected]> |
To |
[email protected] |
Subject |
Re: st: how to generate sum of distinct id1, by id2, in the last n years |
Date |
Tue, 18 Sep 2007 11:06:42 -0400 |
Pierre Azoulay <[email protected]>:
The language setting up the problem seems perversely unclear: "create
a variable that records the sum of distinct [id values] in the last 3
years" does not seem what you want at all, though a sum can help you
get what you want, if you want the number of distinct values of id
across years t, t-1, and t-2 saved in a new variable at t, like so:
clear
input star_id id year nbpapers
1 2 1972 1
1 2 1973 0
1 2 1974 2
1 2 1975 3
1 2 1976 0
1 2 1977 4
1 3 1970 1
1 3 1971 0
1 3 1972 0
1 3 1973 2
1 4 1978 2
1 4 1979 1
1 5 1977 4
1 5 1978 1
1 5 1979 0
1 5 1980 1
1 5 1981 1
end
g obs=_n
expand 3
bys obs: gen n=_n
gen yr=year+n-1
bys star yr id: g d=_n==1
egen ndistinct=sum(d), by(star yr)
drop if n>1
collapse ndist, by(star year)
fillin star y
li, noo clean
On 9/17/07, Pierre Azoulay <[email protected]> wrote:
> Dear Statalisters,
>
> I have what I believe a simple programming question that I can't quite solve.
> I have a panel of dyads, where each member of the dyad is a coauthor.
> Each dyad is composed or a "superstar" and a "simple joe/jane."
>
> For instance:
>
> star_id id year nbpapers
> ---------------------------------------------------------
> 1 2 1972 1
> 1 2 1973 0
> 1 2 1974 2
> 1 2 1975 3
> 1 2 1976 0
> 1 2 1977 4
> 1 3 1970 1
> 1 3 1971 0
> 1 3 1972 0
> 1 3 1973 2
> 1 4 1978 2
> 1 4 1979 1
> 1 5 1977 4
> 1 5 1978 1
> 1 5 1979 0
> 1 5 1980 1
> 1 5 1981 1
>
> So superstar #1 has 4 "simple joe collaborators" numbered 2,3,4, and 5.
> In each year, the data records how many publications exist for
> superstar i and simple joe/jane j.
>
>
> I would like to collapse this data at the superstar/year level and
> create a variable that records the sum of distinct "simple joes" in
> the last 3 years.
> In other words, I'd like to create the variable stk_nbcoauth_it that is:
>
> star_id year stk_nbcoauth_it
> ---------------------------------
> 1 1970 1
> 1 1971 1
> 1 1972 2
> 1 1973 2
> 1 1974 2
> 1 1975 2
> 1 1976 1
> 1 1977 2
> 1 1978 3
> 1 1979 3
> 1 1980 2
> 1 1981 2
>
> I have fiddle with bysort star_id id (year), but without clear
> success. Could anyone help?
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/