joe,
if you're going to use a heckman two-step, you should really spend some
time to derive a selection equation from a maximization problem -- using
heckman's 1979 econometrica paper as a starting point.
that said, my feeling from what you've written is that it would be very
hard to come up with a reasonable & intuitive selection model based on the
variables you described.
i'm not quite clear on what the question is that you're trying to answer,
but i gather it's something like: "if we entice a company to export more
-- by changing the prices/costs that it faces -- does this company also
choose to spend more on research ?" and that the selection problem is: a
cost shock might entice more exporters to enter the market -- which we
could mistakenly observe as a decrease in export intensity. is that
right?
the simplest way that i can think of to address this problem would be to
create an "exporter" dummy for whether exports exceed zero -- and then to
estimate 2sls with two right-hand endogenous variables -- both the
exporter dummy and export intensity.
alternatively, you could try re-weighting the data based on your x
variables -- i.e., split the sample into quantiles using your x variables
& estimate a sampling frequency (the rate at which you observe a non-zero
export intensity) within each bin -- and then weight your regressions
using the inverse of that frequency. this would be similar to imputing
export intensity (based on your xes) for the non-exporters.
i'd say maybe try a couple different ways & hopefully there are some
robust relationships that aren't too sensitive to how you cut the data.
again, i wouldn't recommend the heckman procedure for this particular
problem but i think there are other ways you can try modeling the
selection.
chris
On Tue, 9 Sep 2003, joe jacob wrote:
> Hi Chris,
>
> Your comments have been quite useful. Many thanks for that.
>
> I think dropping IRD(learning from embodied R&D, which is an endogenous
> variable) from export intensity (EXPINT) and export decision equations can
> solve one big problem. (I agree that cost related factors are crucial to
> export (and so are technological). I am using these variables in the export
> intensity equation.)
>
> I still wonder if I could combine the selection and simultaneous estimation
> procedures (The idea is to derive IMR from heckman selection estimation and
> then insert that in the export intensity equation in the simultaneous
> estimation procedure). I describe these as Stata commands below.
>
> (1)Deriving IMR for use in step 2.
>
> .heckman EXPINT drd droy wagerate skill size size2 forg
> twostep select(drd droy wagerate skill outshare gov forg)
> mills(IMR)
>
> (2)Estimating simultaneously equations with dependent variables IRD and
> EXPINT using 3sls
>
> .reg3 (IRD EXPINT drd droy skill outshare gov forg ) (EXPINT IMR drd droy
> wagerate skill size size2 forg ) 3sls inst(drd droy wagerate skill
> outshare size size2 gov forg)
>
> Note that IMR estimated from step 1 is used in step 2 (in the second part
> where EXPINT is the dependent variable). My concern now is, is inserting IMR
> from step 1 in step 2 the right way of addressing selection bias?
>
> An alternative is to discard the question of bias as you hinted and do only
> the simultaneous estimation of step 2 above (without IMR variable).
>
> Thanks in advance,
>
> Joe
>
>
> >From: Chris Rohlfs <[email protected]>
> >Reply-To: [email protected]
> >To: [email protected]
> >Subject: Re: st: 3sls, selection
> >Date: Tue, 9 Sep 2003 10:06:23 -0500 (CDT)
> >
> >joe,
> >
> >this is a difficult problem.
> >
> >so heckman wrote the two-step method with the particular example of
> >education in mind, where agents have perfect foresight & face a decision
> >between a wage offer in the high school labor market versus a wage offer
> >in the college labor market. the primary feature of the model is that
> >agents maximize a known function based on variables unobservable to the
> >econometrician. and that the variable they're maximizing (in this case
> >wages) is the dependent variable of interest. in this education example,
> >you can use the model to estimate how much an agent's schooling decision
> >affects his/her wages.
> >
> >ok -- so let's say you had a simple model in which firms decide whether or
> >not to enter the international sector or remain in the domestic sector
> >based entirely on long-term profits. in that case, i think the heckman
> >two-step would apply -- and you could use such a model to determine how
> >much the decision to export affects a company's profits.
> >
> >i think it makes a big difference that you're using REVENUE (as far as i
> >can tell, that's what EXPINT is) rather than PROFITS. i'd think that most
> >of the factors that firms consider are cost-related, not revenue-related
> >-- most of the variation in REVENUE is going to be driven by scale. even
> >if you had the profits data, though -- my feeling is that the model is
> >getting extremely complicated at this point & a simpler model would
> >probably do a much better job of explaining the data in a credible way.
> >
> >i would strongly recommend considering another approach toward modeling
> >selection. you do have a lot of cost-related variables. you might want
> >to consider just assuming that the selection is entirely based on observed
> >cost variables (in which case unweighted least squares would still be
> >unbiased).
> >
> >chris
> >
> >On Tue, 9 Sep 2003, joe jacob wrote:
> >
> > > Chris and others,
> > >
> > > I should apologise for not describing the variables in the first mail.
> >Let
> > > me explain.
> > >
> > > I have an establishment level data set for about 8 years (100,000 plus
> > > observations)
> > >
> > > The key equation of interest is
> > > IRD= EXPINT+ drd+ droy+ skill+ outshare+gov+ forg /*Eqn 1.*/
> > >
> > > where, IRD captures learning efforts from embodied R&D (derived from
> > > sectoral R&D stock of OECD countries and distributed across
> >establishments
> > > of a developing country) EXPINT is the export intensity variable.(Other
> > > variables are basically control variables.) Since this variable
> >(EXPINT) is
> > > an endogenous variable we have a second equation,
> > >
> > > EXPINT = IRD+ drd+ droy+ wagerate+ skill+ size+ size2+ forg /*Eqn 2.*/
> > >
> > > This calls for using a simultaneous estimation procedure like 3sls.
> > >
> > > The problem is, since all firms don't export, there is a selection bias,
> > > which has to be accounted for using the Heckman procedure.
> > >
> > > The selection variables for EXPINT are the following.
> > >
> > > IRD, drd, droy,wagerate, skill, outshare, gov, forg.
> > >
> > >
> > > What I originally thought (albeit not probably correctly) was to
> >estimate
> > > Eqn2 using heckman procedure, calculate the inverse mills ratio (IMR),
> >and
> > > then plug this variable in equation 2 and apply 3sls to equation 1 and
> >2.
> > > But when I do heckman I can't account for the endogeneity of the
> >variable
> > > IRD.
> > >
> > > Hope the problem is clear now.
> > >
> > > Thanks in advance for any help.
> > >
> > > Joe
> > >
> > > >From: Chris Rohlfs <[email protected]>
> > > >Reply-To: [email protected]
> > > >To: [email protected]
> > > >Subject: Re: st: 3sls, selection
> > > >Date: Mon, 8 Sep 2003 15:36:58 -0500 (CDT)
> > > >
> > > >jacob,
> > > >
> > > >could you please describe the variables you're looking at ?
> > > >
> > > >chris
> > > >
> > > >On Mon, 8 Sep 2003, joe jacob wrote:
> > > >
> > > > > Dear all,
> > > > >
> > > > > This is my first mail to statalist and this mail is made after days
> >of
> > > > > learning from the discussions in the listserver.
> > > > >
> > > > > I have a two-equation system to estimate.
> > > > >
> > > > > Eq. (1) y1 = y2 + x1 + x2 +x3+x4+ u
> > > > > Eq. (2) y2 = y1 + x1 + x2+v,
> > > > >
> > > > > with the endogenous variables y1 and y2 (both continuous) appearing
> >in
> > > >the
> > > > > RHS of both equations. Thus a simultaneous equation is of course the
> > > >right
> > > > > way to proceed.
> > > > >
> > > > > But variable y1 needs to be corrected for the Selection hazard using
> >the
> > > > > Heckman procedure. This is because some observations are zero due to
> > > >'self
> > > > > selection'. Thus we have a selection equation involving the
> >variables
> > > >(y2,
> > > > > x1, x2 ,x3,x4,x5).
> > > > >
> > > > > One approach I could think of is to calculate the IMR from heckman
> > > > > estimation of equation 1, plugging it back in the same equation and
> > > >running
> > > > > a 3sls estimation involving equations 1 and 2. BUT I think that does
> >not
> > > > > make much sense because IMR is calculated from two equations (Eqn 1
> >and
> > > >the
> > > > > selection equation) that has an endogenous explanatory variable
> >(y2).
> > > > >
> > > > > My question is how could I take care of these two problems. 1. The
> > > > > endogeneity (simultaneity) of y1 and y2 , 2.the selection bias
> > > >pertaining to
> > > > > variable y1.
> > > > >
> > > > > Thanks in advance for your kind suggestions.
> > > > >
> > > > > Sincerely,
> > > > >
> > > > > Itty Jacob
> > > > >
> > > > > PS: My apologies for any wrong terminology.
> > > > >
> > > > > _________________________________________________________________
> > > > > Need more e-mail storage? Get 10MB with Hotmail Extra Storage.
> > > > > http://join.msn.com/?PAGE=features/es
> > > > >
> > > > > *
> > > > > * For searches and help try:
> > > > > * http://www.stata.com/support/faqs/res/findit.html
> > > > > * http://www.stata.com/support/statalist/faq
> > > > > * http://www.ats.ucla.edu/stat/stata/
> > > > >
> > > >
> > > >*
> > > >* For searches and help try:
> > > >* http://www.stata.com/support/faqs/res/findit.html
> > > >* http://www.stata.com/support/statalist/faq
> > > >* http://www.ats.ucla.edu/stat/stata/
> > >
> > > _________________________________________________________________
> > > Meet Virgo. Fall in love. http://server1.msn.co.in/features/virgo03/
> >With
> > > perfection!
> > >
> > > *
> > > * For searches and help try:
> > > * http://www.stata.com/support/faqs/res/findit.html
> > > * http://www.stata.com/support/statalist/faq
> > > * http://www.ats.ucla.edu/stat/stata/
> > >
> >
> >*
> >* For searches and help try:
> >* http://www.stata.com/support/faqs/res/findit.html
> >* http://www.stata.com/support/statalist/faq
> >* http://www.ats.ucla.edu/stat/stata/
>
> _________________________________________________________________
> Got a wish? Make it come true.
> http://server1.msn.co.in/msnleads/citibankpersonalloan/citibankploanjuly03.asp?type=txt
> Best personal loans!
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/