You are assuming that the relations between X1 and Z at the industry
level match the relation at the firm level. This creates several
problems.
First, you must use variables that add naturally from firm to industry.
Industry sales equal the sum of firm sales. However, neither the sum
nor mean of firm returns on assets equals total industry income divided
by total industry assets (industry ROA).
Second, you must assume these relations are stable both within and
across industries. This is highly unlikely. Industries vary
dramatically in almost all such relations.
When you estimate X(i)= b Z(i) + e, and then use Z to predict X, this
assumes X(i)/Z(i) = a constant. Predicting Income based on a regression
of income on assets assumes all firms have the same return on assets. A
multivariate analysis just assumes more constant relations.
Industries vary dramatically in these relations. I know of few such
relations where the data would not easily reject the assumption that b
is constant across industries. Indeed, within industries, the data
often can reject hypotheses of such constant b's -- firms differ in
return on assets, return on sales, capital intensity, etc. Within
industries, firms have significant variation in almost any ratio you
wish to examine while your model assumes a constant ratio within and
across industries.
This is an ill-advised proxy. You might be well advised to look further
for data since most industry data comes from aggregations of firm data.
Phil Bromiley
University of California, Irvine
[email protected] asked:
> I would like to run panel regression using firm level annual
> data such that
> y=a0+a1x1+a2x2+...
> The problem here is that I have no data for x1(firm
> level),thus I instead,
> tried to regress x1 on related variables using "industry
> level" data and use
> the fitted values to generate x1.
>
> More specifically,
>
> 1st step: run X1=b0+b1*Z1+b2*Z2+... and find out fitted
> values of all b's (X1
> and all Z's are industry level data, but not annually )
>
> 2nd step: using fitted value b's and firm level data,
> find out x1 such that x1=b0hat + b1hat*z1+b2hat*z2
> (all z's are firm level annual data, )
>
> 3rd step: run first regression.
>
> It look like generating proxy var. for x1.
> I would like to know if anything wrong this approach under
> the case that there
> exist no data or if there are better suggestions.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/