[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: 3sls, selection

From	Chris Rohlfs <[email protected]>
To	[email protected], joe jacob <[email protected]>
Subject	Re: st: 3sls, selection
Date	Tue, 9 Sep 2003 14:44:02 -0500 (CDT)
joe,

if you're going to use a heckman two-step, you should really spend some
time to derive a selection equation from a maximization problem -- using
heckman's 1979 econometrica paper as a starting point.

that said, my feeling from what you've written is that it would be very
hard to come up with a reasonable & intuitive selection model based on the
variables you described.

i'm not quite clear on what the question is that you're trying to answer,
but i gather it's something like: "if we entice a company to export more
-- by changing the prices/costs that it faces -- does this company also
choose to spend more on research ?"  and that the selection problem is: a
cost shock might entice more exporters to enter the market -- which we
could mistakenly observe as a decrease in export intensity.  is that
right?

the simplest way that i can think of to address this problem would be to
create an "exporter" dummy for whether exports exceed zero -- and then to
estimate 2sls with two right-hand endogenous variables -- both the
exporter dummy and export intensity.

alternatively, you could try re-weighting the data based on your x
variables -- i.e., split the sample into quantiles using your x variables
& estimate a sampling frequency (the rate at which you observe a non-zero
export intensity) within each bin -- and then weight your regressions
using the inverse of that frequency.  this would be similar to imputing
export intensity (based on your xes) for the non-exporters.

i'd say maybe try a couple different ways & hopefully there are some
robust relationships that aren't too sensitive to how you cut the data.  
again, i wouldn't recommend the heckman procedure for this particular
problem but i think there are other ways you can try modeling the
selection.

chris

On Tue, 9 Sep 2003, joe jacob wrote:

> Hi Chris,
> 
> Your comments have been quite useful.  Many thanks for that.
> 
> I think dropping IRD(learning from embodied R&D, which is an endogenous 
> variable) from export intensity (EXPINT) and export decision equations can 
> solve one big problem. (I agree that cost related factors are crucial to 
> export (and so are technological). I am using these variables in the export 
> intensity equation.)
> 
> I still wonder if I could combine the selection and simultaneous estimation 
> procedures (The idea is to derive IMR from heckman selection estimation and 
> then insert that in the export intensity equation in the simultaneous 
> estimation procedure). I describe these as Stata commands below.
> 
> (1)Deriving IMR for use in step 2.
> 
> .heckman EXPINT drd droy wagerate  skill size size2 forg
> twostep select(drd droy wagerate skill  outshare gov forg)
> mills(IMR)
> 
> (2)Estimating simultaneously equations with dependent variables IRD and 
> EXPINT using 3sls
> 
> .reg3 (IRD  EXPINT drd droy skill outshare gov forg ) (EXPINT IMR drd droy 
> wagerate  skill size size2 forg ) 3sls  inst(drd droy  wagerate   skill  
> outshare   size size2 gov forg)
> 
> Note that IMR estimated from step 1 is used in step 2 (in the second part 
> where EXPINT is the dependent variable). My concern now is, is inserting IMR 
> from step 1 in step 2 the right way of addressing selection bias?
> 
> An alternative is to discard the question of bias as you hinted and do only 
> the simultaneous estimation of step 2 above (without IMR variable).
> 
> Thanks in advance,
> 
> Joe
> 
> 
> >From: Chris Rohlfs <[email protected]>
> >Reply-To: [email protected]
> >To: [email protected]
> >Subject: Re: st: 3sls, selection
> >Date: Tue, 9 Sep 2003 10:06:23 -0500 (CDT)
> >
> >joe,
> >
> >this is a difficult problem.
> >
> >so heckman wrote the two-step method with the particular example of
> >education in mind, where agents have perfect foresight & face a decision
> >between a wage offer in the high school labor market versus a wage offer
> >in the college labor market.  the primary feature of the model is that
> >agents maximize a known function based on variables unobservable to the
> >econometrician.  and that the variable they're maximizing (in this case
> >wages) is the dependent variable of interest.  in this education example,
> >you can use the model to estimate how much an agent's schooling decision
> >affects his/her wages.
> >
> >ok -- so let's say you had a simple model in which firms decide whether or
> >not to enter the international sector or remain in the domestic sector
> >based entirely on long-term profits.  in that case, i think the heckman
> >two-step would apply -- and you could use such a model to determine how
> >much the decision to export affects a company's profits.
> >
> >i think it makes a big difference that you're using REVENUE (as far as i
> >can tell, that's what EXPINT is) rather than PROFITS.  i'd think that most
> >of the factors that firms consider are cost-related, not revenue-related
> >-- most of the variation in REVENUE is going to be driven by scale.  even
> >if you had the profits data, though -- my feeling is that the model is
> >getting extremely complicated at this point & a simpler model would
> >probably do a much better job of explaining the data in a credible way.
> >
> >i would strongly recommend considering another approach toward modeling
> >selection.  you do have a lot of cost-related variables.  you might want
> >to consider just assuming that the selection is entirely based on observed
> >cost variables (in which case unweighted least squares would still be
> >unbiased).
> >
> >chris
> >
> >On Tue, 9 Sep 2003, joe jacob wrote:
> >
> > > Chris and others,
> > >
> > > I should apologise for not describing the variables in the first mail. 
> >Let
> > > me explain.
> > >
> > > I have an establishment level data set for about 8 years (100,000 plus
> > > observations)
> > >
> > > The key equation of interest is
> > > IRD=  EXPINT+ drd+ droy+ skill+ outshare+gov+ forg  /*Eqn 1.*/
> > >
> > > where, IRD captures learning efforts from embodied R&D (derived from
> > > sectoral R&D stock of OECD countries and distributed across 
> >establishments
> > > of a developing country) EXPINT is the export intensity variable.(Other
> > > variables are basically control variables.)  Since this variable 
> >(EXPINT) is
> > > an endogenous variable we have a second equation,
> > >
> > > EXPINT = IRD+ drd+ droy+ wagerate+  skill+ size+ size2+ forg  /*Eqn 2.*/
> > >
> > > This calls for using a simultaneous estimation procedure like 3sls.
> > >
> > > The problem is, since all firms don't export, there is a selection bias,
> > > which has to be accounted for using the Heckman procedure.
> > >
> > > The selection variables for EXPINT are the following.
> > >
> > > IRD, drd, droy,wagerate, skill,  outshare, gov, forg.
> > >
> > >
> > > What I originally thought (albeit not probably correctly) was to 
> >estimate
> > > Eqn2 using heckman procedure, calculate the inverse mills ratio (IMR), 
> >and
> > > then plug this variable in equation 2 and apply 3sls to equation 1 and 
> >2.
> > > But when I do heckman I can't account for the endogeneity of the 
> >variable
> > > IRD.
> > >
> > > Hope the problem is clear now.
> > >
> > > Thanks in advance for any help.
> > >
> > > Joe
> > >
> > > >From: Chris Rohlfs <[email protected]>
> > > >Reply-To: [email protected]
> > > >To: [email protected]
> > > >Subject: Re: st: 3sls, selection
> > > >Date: Mon, 8 Sep 2003 15:36:58 -0500 (CDT)
> > > >
> > > >jacob,
> > > >
> > > >could you please describe the variables you're looking at ?
> > > >
> > > >chris
> > > >
> > > >On Mon, 8 Sep 2003, joe jacob wrote:
> > > >
> > > > > Dear all,
> > > > >
> > > > > This is my first mail to statalist and this mail is made after days 
> >of
> > > > > learning from the discussions in the listserver.
> > > > >
> > > > > I have a two-equation system to estimate.
> > > > >
> > > > > Eq. (1)  y1 = y2 + x1 + x2 +x3+x4+  u
> > > > > Eq. (2)  y2 = y1 + x1 + x2+v,
> > > > >
> > > > > with the endogenous variables y1 and y2 (both continuous) appearing 
> >in
> > > >the
> > > > > RHS of both equations. Thus a simultaneous equation is of course the
> > > >right
> > > > > way to proceed.
> > > > >
> > > > > But variable y1 needs to be corrected for the Selection hazard using 
> >the
> > > > > Heckman procedure. This is because some observations are zero due to
> > > >'self
> > > > > selection'. Thus we have a selection equation involving the 
> >variables
> > > >(y2,
> > > > > x1, x2 ,x3,x4,x5).
> > > > >
> > > > > One approach I could think of is to calculate the IMR from heckman
> > > > > estimation of equation 1, plugging it back in the same equation and
> > > >running
> > > > > a 3sls estimation involving equations 1 and 2. BUT I think that does 
> >not
> > > > > make much sense because IMR is calculated from two equations (Eqn 1 
> >and
> > > >the
> > > > > selection equation) that has an endogenous explanatory variable 
> >(y2).
> > > > >
> > > > > My question is how could I take care of these two problems. 1. The
> > > > > endogeneity (simultaneity) of y1 and y2 , 2.the selection bias
> > > >pertaining to
> > > > > variable y1.
> > > > >
> > > > > Thanks in advance for your kind suggestions.
> > > > >
> > > > > Sincerely,
> > > > >
> > > > > Itty Jacob
> > > > >
> > > > > PS: My apologies for any wrong terminology.
> > > > >
> > > > > _________________________________________________________________
> > > > > Need more e-mail storage? Get 10MB with Hotmail Extra Storage.
> > > > > http://join.msn.com/?PAGE=features/es
> > > > >
> > > > > *
> > > > > *   For searches and help try:
> > > > > *   http://www.stata.com/support/faqs/res/findit.html
> > > > > *   http://www.stata.com/support/statalist/faq
> > > > > *   http://www.ats.ucla.edu/stat/stata/
> > > > >
> > > >
> > > >*
> > > >*   For searches and help try:
> > > >*   http://www.stata.com/support/faqs/res/findit.html
> > > >*   http://www.stata.com/support/statalist/faq
> > > >*   http://www.ats.ucla.edu/stat/stata/
> > >
> > > _________________________________________________________________
> > > Meet Virgo. Fall in love. http://server1.msn.co.in/features/virgo03/ 
> >With
> > > perfection!
> > >
> > > *
> > > *   For searches and help try:
> > > *   http://www.stata.com/support/faqs/res/findit.html
> > > *   http://www.stata.com/support/statalist/faq
> > > *   http://www.ats.ucla.edu/stat/stata/
> > >
> >
> >*
> >*   For searches and help try:
> >*   http://www.stata.com/support/faqs/res/findit.html
> >*   http://www.stata.com/support/statalist/faq
> >*   http://www.ats.ucla.edu/stat/stata/
> 
> _________________________________________________________________
> Got a wish? Make it come true. 
> http://server1.msn.co.in/msnleads/citibankpersonalloan/citibankploanjuly03.asp?type=txt 
> Best personal loans!
> 




*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Prev by Date: re: st: stata update
Next by Date: st: weighting in bootstrapping
Previous by thread: Re: st: 3sls, selection
Next by thread: st: Survival time to prevalence data - efficient code?
Index(es):
- Date
- Thread