I am trying to calculate some characteristics of the median
observation within groups defined by several variables. As an
example, I have data (sales and ages) on firms by year in different
industries. I would like to find the age of the firm with the median
value on sales for each year and industry. That is:
AGE SALES IND YEAR ID
2 1.04 3339 1991 1
3 1.75 3339 1991 2
3 3.08 3339 1991 3
31 .496 3339 1991 4
42 .546 3339 1991 5
42 1.5 3339 1991 6
5 . 3411 1991 7
8 .584 3411 1991 8
30 .491 3411 1991 9
19 .944 3411 1991 10
20 .692 3411 1991 11
28 1.81 3411 1991 12
29 .601 3411 1991 13
32 .509 3411 1991 14
42 .938 3411 1991 15
42 .886 3411 1991 16
The median sales for industry 3411 in 1991 is .692; I'd like a new
variable, say medAge that contains 20. The problem comes from groups
with an even number of members like industry 3339...here I'm trying
to get the average of the two observations that make up the median
calculation. That is, I need medAge to contain (42+2)/2 = 22 from the
observations with sales of 1.04 and 1.5. If groups only contained odd
numbers of observations, I could merge back into on the dataset using
the median by group, but this obviously won't work for groups with
even numbers of observations. Is this something where I should use
forvalues within each group...or is there a faster way?
Thanks in advance,
John
=================================
John Hund
Visiting Assistant Professor
Jones Graduate School of Business
Rice University
Houston, TX 77005
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/