> -----Original Message-----
> From: Yi, Bingsheng [mailto:[email protected]]
> Sent: Tuesday, June 18, 2002 7:11 PM
> To: [email protected]
> Subject: st: how to generate groups based on some
> characteristics and obtain the mean/median value for each group
>
>
> Dear Statalisters,
>
> I wonder whether you will help me figure out the codes to solve the
> following problem:
>
> I have 12 years panel data containing these four variables:
> Tobin's q,
> size, 4-digit industry code (ind4), and id. For each year, I
> want to make
> some adjusments in one variable (Tobin's q) based on the
> other two variables
> (industry and size). First I need to ensure that there are
> lat least 10
> firms within each industry. If the number of firms within a
> 4-digit industry
> code is less than 10, I use 3-digit industry code generated
> by gen str4
> ind3=substr(ind4,1,3), see whether the number of firms with
> the same 3-digit
> industry code is greater or equal to 10, if not, then generate and use
> 2-digit industry code. So in the end there are at least 10
> firms within an
> industry ( which are classified by 4-digit, 3-digit, 2-digit,
> or 1-digit
> industry code). The problem is how to get and record the
> number of firms in
> each industry.
For this piece, try something like this. First, generate four variables
indicating the 4, 3, 2 and 1-digit industry codes for **ALL** records,
named ind1, ind2, ind3, ind4. Then:
* generate the number of records in each group
forval i=1/4 {
sort ind`i'
by ind`i': gen num`i'=_N
}
* group the records
gen finalgrp = ind1
forval i=2/4 {
replace finalgrp = ind`i' if num`i'>=10
}
This should create the variable "finalgrp", which will contain the
grouping you desire. THen you can calculate whatever statistics you
want in those groups.
Nick Winter
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/