|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: questions about duplicate observations
.
Use -by:- for your calculations. -collapse- only at the end. Recall that
the mean of a constant is that constant.
e.g.
bysort yeara cnum (offering_amt) : gen max = offering_amt[_N]
by yeara cnum : gen num = sum(bond_yield * offering_amt)
by yeara cnum : gem den = sum(offering_amt)
by yeara cnum : gen avewt_bind_yield = num[_N] / den[_N]
Nick
[email protected]
Wen Xia Ge <[email protected]>
I have a dataset that contains thousands of firms with bond issues from
1980-2004. Some firms have single bond issue in a given year, while some
firms have multiple bond issues in a given year. For example, the
dataset looks like this (I have more variables than listed here):
issue_id issuer_id issuer_cusip offering_year offering_amt
bond_yield maturity
1 3 000361 1989 ###
### ###
2 3 000361 1993 ###
### ###
3 4 00077D 1994 ###
### ###
4 4 00077D 1994 ###
### ###
5 4 00077T 1993 ###
### ###
6 4 00077T 1993 ###
### ###
7 6 000800 1982 700000
### ###
8 6 000800 1982 680000
### ###
9 6 000800 1982 800000
### ###
10 6 000800 1984 ###
### ###
For firms with single bond issue in a year, they will be kept in the
dataset as it is. I want to use two approaches to handle firms with
multiple bond issues in a given year:
First, I want to keep the issue with the largest offering_amt in the
dataset (for example, for company 000800, drop issue 7 and 8, keep issue
9, for year 1982, and keep issue 10 for year 1984), so that in the
resulting panel data, in a given year, a company can only show up at
most once (with the largest offering amount);
Second, I want to calculate weighted average bond_yield (using
offering_amt as the weights), average offering_amt and average maturity,
and save it as a new observation while dropping original multiple bond
issues. For example, in the resulting panel data, there will be one
observation for 000800 in 1982, the offering_amt will take the average
of 700000, 680000, and 800000, and the bond_yiled will take the weighted
value of the three bond_yield on issue 7, 8 and 9.
I tried the following command:
sort yeara cnum offering_amt
collapse (last) offering_amt, by(yeara cnum)
And:
sort yeara cnum
collapse (max) offering_amt, by(yeara cnum)
The resulting dataset contains only three variables: yeara cnum and
offering_amt, but I need to keep other variables.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/