Dear statalister,
I have a problem in creating a variable recording the sum of another
variable by group in panel data. I would be grateful if any of you
could help me in this.
The data is a 12 year unbalanced panel, including more than 15000
domestic firms and 9000 foreign firms in 82 three-digit industries. I
want to calculate the foreign presence in each three-digit industry,
that is,
FDI= sum of the output of foreign firms/sum of the output of all
firms, for each industry and each year.
I tried this:
(*first, calculate the sum of output of all firms in each industry)
egen alloutput=total(output) if year==1986, by(industry)
egen alloutput87=total(output) if year==1987, by(industry)
......
egen alloutput97=total(output) if year==1997, by(industry)
replace alloutput=alloutput87 if year==1987
...
replace alloutput=alloutput97 if year==1997
(*second, calculate the sum of output of foreign firms in each industry)
egen FDI=total(output) if year==1986 & firm=="foreign", by(industry)
egen FDI87=total(output) if year==1987 &firm=="foreign", by(industry)
......
egen FDI97=total(output) if year==1997, & firm=="foreign", by(industry)
replace FDI=FDI87 if year==1987 & firm=="foreign"
...
replace FDI=FDI97 if year==1997 & firm=="foreign"
egen FDInew=max(FDI), by(industry)
egen FDInew87=max(FDI) if year==1987, by(industry)
....
egen FDInew=max(FDI) if year==1997, by(industry)
replace FDInew=FDInew87 if year==1987
...
replace FDInew=FDInew97 if year==1997
(*third,calculate the foreign presence)
gen FDIpresence=FDInew/alloutput
However, some of the ratio are over 1, which is not possible. I tried
to use "in" instead of "if" in the above syntax and obtained different
results.
I know I must have made some mistakes but can't figured it out myself.
Could anyone here help me? Thanks a lot.
Natalie.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/