Special Thanks to N Cox and N Winter!!!
I've tried the codes provided by N Winter, it works but there is still a
problem. The codes cannot ensure that there are at least 10 firms within
each final group, I also tried other way, but the results are similar. I
can't figure out the reason and have to seek your helps again.
*generate the number of records in each group, ind4 is the 4-digit industry
code
gen str4 ind3=substr(ind4,1,3)
gen str4 ind2=substr(ind4,1,2)
gen str4 ind1=substr(ind4,1,1)
forval i=1/4 {
sort ind`i'
by ind`i': gen num`i'=_N
}
* group the records
gen str4 industry=ind1
drop if num1<10 * exclude an industry if it contains less than 10 firms*
forval i=2/4 {
replace industry=ind`i' if num`i'>=10
}
sort industry
by industry: gen _freq=_N
list ind4 industry _freq if _freq<10
After running the above codes, there are still some industry with less than
10 firms
ind4 industry _freq
41. 1044 104 1
79. 1321 13 5
80. 1330 13 5
81. 1390 13 5
82. 1320 13 5
83. 1320 13 5
282. 1610 16 2
283. 1611 16 2
284. 1622 162 9
285. 1623 162 9
286. 1623 162 9
287. 1623 162 9
288. 1623 162 9
289. 1623 162 9
290. 1623 162 9
291. 1629 162 9
292. 1623 162 9
There are 9 firms with 3-digit industry code as 162, if the 2-digit industry
code "16" have been used, then there would have been 11 firm within industry
16. I don't know why the above codes
didn't do it as it's supposed to do.
I also try the following codes, it still does not solve the problem.
* group the records
gen str4 industry=ind1
drop if num1<10 * the 1-digit industry code should contain the largest
number of firms, if it's less than 10, such an industry shouldn't be
considered any more*
replace industry=ind3 if num4<10 & num3>10
replace industry=ind2 if num4<10 & num3<10 & num2>10
replace industry=ind1 if num4<10 & num3<10 & num2<10
tabulate industry * this is to the number of firms contained in each
industry*
. tabulate industry
industry | Freq. Percent Cum.
------------+-----------------------------------
1 | 234 5.10 5.10
10 | 22 0.48 5.57
104 | 1 0.02 5.60
13 | 5 0.11 5.71
15 | 11 0.24 5.95
152 | 18 0.39 6.34
16 | 2 0.04 6.38
162 | 9 0.20 6.58
26 | 20 0.44 19.82
262 | 2 0.04 19.86
267 | 19 0.41 20.27
27 | 37 0.81 21.08
271 | 1 0.02 21.10
275 | 17 0.37 21.47
For firms in industry 262, they can go to industry 26, but why the codes
didn't do this?
More importantly, I wonder whether you could give me some ideas for the
consequent problem:
Suppose finally each industry contains at least 10 firms, I want to
subdivide firms in each industry into three groups based on size: the small,
middle and large groups.
The small group contains the smallest 30% firms in size within an industry,
the middle group contains the middle 40% firms in size (30% to 70%), and the
large group contain firms whose size belongs to the largest 30% in that
industry. How I can subdivide firms according to size in an industry? I need
to get the median or mean value of Tobin's q for each of these groups within
an industry. I also need to get the size range for each group,the problem is
how to get and record them???
Next I need to decide which group a firm belongs to in an industry based on
its size. If in industry 10, the size range in small group is 12 to 20, and
the size of a firm in industry 10 is 15, then the industry-size adjusted
Tobin's q = The Tobin's q of the firm in industry 10 - the mean/median
value of the small group in industry 10.
I'm sorry to trouble you again, I greatly appreciate you helps and am
looking forward to your reply!!!
Bing
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/