As I understand it, this is not a collapsing problem
at all, as you want as many observations at the
end as you start with.
One way to approach this is to notice that
sum of all others = sum of all - value for this
which suggests the following line of attack:
bysort product: gen ave_others = sum(markup)
by product: replace ave_others = (ave_others[_N] - markup) / (_N - 1)
This would need some tweaking if markup was ever missing:
gen byte touse = markup < .
bysort touse product: gen ave_others = sum(markup) if touse
by touse product: replace ave_others = (ave_others[_N] - markup) / (_N - 1)
Some searching would have pointed to a ragbag of tricks in
How do I create variables summarizing for each individual properties of the
other members of a group?
http://www.stata.com/support/faqs/data/members.html
Nick
[email protected]
Jason Hwang
> I'm trying to generate a dataset of statistics by collapsing another
> dataset, but each of the id's I'm collapsing by, I want to use every
> observation in the dataset except the one for the id under
> consideration.
> Would there be a way to do this?
>
> For example, I have data of the form:
>
> company product markup
> 100 31 .3
> 100 55 .2
> 111 31 0
> 111 55 .1
> 120 31 .1
> 120 55 .1
>
> Now I want to ask the question: for each company, calculate
> the average
> markup of each product it produces, where the average is
> taken over all
> companies that sell the product except the company itself. So
> I want to
> end up with
>
> company product averagemarkup
> 100 31 .05
> 100 55 .1
> 111 31 .2
> 111 55 .15
> 120 31 .15
> 120 55 .15
>
> Obviously collapsing the data the standard way is not going
> to do this. I
> need to do this for hundreds of thousands of observations (hundreds of
> companies and thousands of products) so am looking for a way
> to do this that would be relatively quick.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/