Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: comparing equality of coefficients from two subsamples
From
Rebecca Pope <[email protected]>
To
[email protected]
Subject
Re: st: comparing equality of coefficients from two subsamples
Date
Thu, 21 Feb 2013 11:24:25 -0600
The FAQ link was intended to be helpful in a "first-prinicples" sense.
I sent it because you seemed to not understand what Jay was saying
about constraining variances & it provided a simple introduction. You
won't be able to use those exact steps with your problem however, not
least because -aweight-s aren't allowed with -xtreg-.
Now, let's try to clarify what you are wanting before proceeding any
further because I want to make sure that we're clear on the use of
"interaction".
Say, for example that you are interested in the model
log(wage) = intercept + tenure + tenure^2 + not_smsa + wks_ue
where not_smsa indicates that the respondent doesn't live in a
metropolitan area and wks_ue is the number of weeks she was unemployed
in the previous year. This data is from -webuse nlswork-, the example
given with -xtreg-.
Now, suppose that you think that the effect of wks_ue differs by
whether or not the respondent lives in an urban area. For this, you
have a simple interaction term. (You can think of this like your
policy indicator).
The Stata syntax for this is:
xtreg ln_w tenure c.tenure#c.tenure i.not_smsa##c.wks_ue, fe
Now, suppose you further hypothesize that the model above does not
apply equally to southern areas. The model could differ in multiple
ways, but the two that are of interest
concern the south somehow moderating the effect of unemployment and
rural residence. You can approach this in one of two ways.
The first is to simply model the difference with respect to not_smsa
and wks_ue; all other effects are the same. The second does not
constrain any of the coefficients to be equal across groups, here
south/not south.
The first syntax is: xtreg ln_w tenure c.tenure#c.tenure
i.south##i.not_smsa##c.wks_ue, fe
This is what you say you want in your most recent post.
The second approach though is what you have written:
xtreg ln_w tenure c.tenure#c.tenure i.not_smsa##c.wks_ue if south==0, fe
xtreg ln_w tenure c.tenure#c.tenure i.not_smsa##c.wks_ue if south==1, fe
If you estimate these equations, you get different parameter estimates
for _all_ terms by "south". This is why I said that you were working
with a fully-interacted model. To understand this, note that you must
estimate the two equations above _as one_ in order to test whether
rural unemployment differs in the south (or your government policy
differs by firm type).
The correct Stata syntax is:
xtreg ln_w i.south#c.tenure i.south#c.tenure#c.tenure
i.south##i.not_smsa##c.wks_ue, fe
Do not take "simply" above to mean that it is somehow inferior. I just
mean that the model has fewer parameters to estimate. Your choice of
specification must be theory-driven. If you think that approach 2 is
incorrect, then nothing stops you using approach 1. However, that
isn't what you indicated you were estimating when you wrote 2 separate
equations.
With all of these approaches, you get 1 error term for both groups. Is
this a problem? It depends on your groups. You have to look at your
data and decide. If you decide you shouldn't constrain the variance,
you'll need to choose an appropriate approach at that point.
Now, what do you observe with respect to the coefficients? Probably
that the "pooled" regression does not exactly reproduce the
coefficients of the separate regressions with -xtreg, fe-. This
shouldn't surprise you. -xtreg, fe- is estimating a model on the
"demeaned" data, the so-called "within" estimator. When you pool the
observations, you alter the calculation of the mean within the j-th
unit. This occurs because there are some respondents, in this example,
who have lived in and out of the south. If that weren't the case,
"1.south" would be dropped from the FE part of our model when we
pooled results and we would be left with a single overall intercept.
Quite apart from that, if you submitted
xtreg ln_w tenure c.tenure#c.tenure i.south##i.not_smsa##c.wks_ue, fe
thinking you were going to get the same results as:
xtreg ln_w tenure c.tenure#c.tenure i.not_smsa##c.wks_ue if south==0, fe
xtreg ln_w tenure c.tenure#c.tenure i.not_smsa##c.wks_ue if south==1, fe
you would be wrong, even if you were using simple linear regression
because you are working with fundamentally different views of how your
grouping variable relates to the other covariates.
I hope this helps,
Rebecca
On Wed, Feb 20, 2013 at 4:10 PM, Mario Jose <[email protected]> wrote:
> Thank you Rebecca for the links, they were very useful to understand
> the previous Jay's comment.
> I have implemented the strategy of Bill Gould (allowing for different
> variances), but it appeared the message of error "weight must be
> constant within id"... Anyway I do not want to introduce interactions
> with all independent variables but to only one.
>
> Below I expose what the specific problem I have.
>
> I have a panel sample of firms, and in the middle of the period
> (2004) it was implemented by the government a specific fiscal
> measure. I want to test whether this measure had impacts on the
> profits reported by firms. As I think that the measure had impacts in
> a specific subsample of firms, I divided the sample in two subsamples
> - group1 group2 (splitted according the debt/assets ratio of firms).
>
> I run the model for the two groups separately:
> xtreg, Y x1 control1 control2 ... i.pos i.pos#c.x1 if group==1, fe
> xtreg, Y x1 control1 control2 ... i.pos i.pos#c.x1 if group==2, fe.
>
> (pos is binary taking value 1 for years after the implementation of the policy)
>
> and I obtain the following estimates for group 1 and 2, respectively:
>
> *******output excerpt************
>
> -----------------------------------------------------------------------------------
> | Robust
> Y | Coef. Std. Err. t P>|t| [95%
> Conf. Interval]
> ------------------+----------------------------------------------------------------
> x1 | -2.053274 .5641935 -3.64 0.000 -3.159248
> -.9473006
> control1 | .5904103 .0267907 22.04 0.000 .5378933 .6429273
> control2 | .0947558 .0233539 4.06 0.000 .0489758 .1405358
> ... | -.0234459 .2617354 -0.09 0.929 -.5365189
> .4896271
> year dum.. |
> 1.pos | -.5814072 .1512517 -3.84 0.000 -.877902 -.2849124
> 1.pos#c.x1 | 1.256448 .4183398 3.00 0.003 .4363875 2.076508
> _cons | -6.099231 1.766059 -3.45 0.001 -9.561191 -2.637272
> ------------------+----------------------------------------------------------------
> sigma_u | 2.1744991
> sigma_e | .77651905
> rho | .88690051 (fraction of variance due to u_i)
> -----------------------------------------------------------------------------------
>
>
> -----------------------------------------------------------------------------------
> | Robust
> Y | Coef. Std. Err. t P>|t| [95%
> Conf. Interval]
> ------------------+----------------------------------------------------------------
> x1 | -2.047585 .6997248 -2.93 0.003 -3.41921
> -.6759593
> control1 | .4552402 .0232387 19.59 0.000 .4096868 .5007936
> control2 | .028412 .0110095 2.58 0.010 .0068306 .0499933
> ...
> year dum .. |
> 1.pos | -.4291118 .1817098 -2.36 0.018 -.7853059 -.072917
> 1.pos#c.x1 |.6220617 .5078439 1.22 0.221 -.3734318 1.617555
> cons | -7.341474 1.606579 -4.57 0.000 -10.49075 -4.192201
> ------------------+----------------------------------------------------------------
> sigma_u | 2.4369753
> sigma_e | .70849863
> rho | .92206421 (fraction of variance due to u_i)
> -----------------------------------------------------------------------------------
>
> **********end of excerpt*************
>
> These results are in the direction of the predicted, but when I pooled
> the sample for me to compare the coefs, the estimates appear to be
> significantly different. They are as follows:
>
> *******output excerpt************
> --------------------------------------------------------------------------------------------------
> | Robust
> Y | Coef. Std. Err. t
> P>|t| [95% Conf. Interval]
> ---------------------------------+----------------------------------------------------------------
> x1 | -1.601963 .5324727 -3.01
> 0.003 -2.645681 -.5582453
> control1 | .5435240 .0232387 19.59 0.000
> .4096868 .5007936
> control2 | .03976 .0110095 2.58 0.010
> .0068306 .0499933
> ... |
> year dum .. |
> 1.pos | -.382873 .1487651 -2.57 0.010
> -.6744726 -.0912734
> pos#c.x1 | .5273469 .4331443 1.22 0.223
> -.3216739 1.376368
> 1.group | .2575 .175552 1.47 0.142
> -.0866054 .60
> 1.group#c.x1 | -.8550352 .5470408 -1.56 0.118
> -1.927308 .217238
> 1.group#pos | -.2539677 .1681945 -1.51 0.131
> -.5836514 .075716
> 1.goup#pos#c.x1 | .8948809 .528096 1.69 0.090
> -.140258 1.93002
> _cons | -6.485282 1.161574 -5.58 0.000
> -8.762123 -4.208441
> ---------------------------------+----------------------------------------------------------------
> sigma_u | 2.2954577
> sigma_e | .76123454
> rho | .90092029 (fraction of variance due to u_i)
>
> **********end of excerpt*************
>
> Do you find something wrong with the last equation?
>
> I would appreciate any help.
> Best
> MJ
>
<snip>
On Wed, Feb 20, 2013 at 4:10 PM, Mario Jose <[email protected]> wrote:
> Thank you Rebecca for the links, they were very useful to understand
> the previous Jay's comment.
> I have implemented the strategy of Bill Gould (allowing for different
> variances), but it appeared the message of error "weight must be
> constant within id"... Anyway I do not want to introduce interactions
> with all independent variables but to only one.
>
> Below I expose what the specific problem I have.
>
> I have a panel sample of firms, and in the middle of the period
> (2004) it was implemented by the government a specific fiscal
> measure. I want to test whether this measure had impacts on the
> profits reported by firms. As I think that the measure had impacts in
> a specific subsample of firms, I divided the sample in two subsamples
> - group1 group2 (splitted according the debt/assets ratio of firms).
>
> I run the model for the two groups separately:
> xtreg, Y x1 control1 control2 ... i.pos i.pos#c.x1 if group==1, fe
> xtreg, Y x1 control1 control2 ... i.pos i.pos#c.x1 if group==2, fe.
>
> (pos is binary taking value 1 for years after the implementation of the policy)
>
> and I obtain the following estimates for group 1 and 2, respectively:
>
> *******output excerpt************
>
> -----------------------------------------------------------------------------------
> | Robust
> Y | Coef. Std. Err. t P>|t| [95%
> Conf. Interval]
> ------------------+----------------------------------------------------------------
> x1 | -2.053274 .5641935 -3.64 0.000 -3.159248
> -.9473006
> control1 | .5904103 .0267907 22.04 0.000 .5378933 .6429273
> control2 | .0947558 .0233539 4.06 0.000 .0489758 .1405358
> ... | -.0234459 .2617354 -0.09 0.929 -.5365189
> .4896271
> year dum.. |
> 1.pos | -.5814072 .1512517 -3.84 0.000 -.877902 -.2849124
> 1.pos#c.x1 | 1.256448 .4183398 3.00 0.003 .4363875 2.076508
> _cons | -6.099231 1.766059 -3.45 0.001 -9.561191 -2.637272
> ------------------+----------------------------------------------------------------
> sigma_u | 2.1744991
> sigma_e | .77651905
> rho | .88690051 (fraction of variance due to u_i)
> -----------------------------------------------------------------------------------
>
>
> -----------------------------------------------------------------------------------
> | Robust
> Y | Coef. Std. Err. t P>|t| [95%
> Conf. Interval]
> ------------------+----------------------------------------------------------------
> x1 | -2.047585 .6997248 -2.93 0.003 -3.41921
> -.6759593
> control1 | .4552402 .0232387 19.59 0.000 .4096868 .5007936
> control2 | .028412 .0110095 2.58 0.010 .0068306 .0499933
> ...
> year dum .. |
> 1.pos | -.4291118 .1817098 -2.36 0.018 -.7853059 -.072917
> 1.pos#c.x1 |.6220617 .5078439 1.22 0.221 -.3734318 1.617555
> cons | -7.341474 1.606579 -4.57 0.000 -10.49075 -4.192201
> ------------------+----------------------------------------------------------------
> sigma_u | 2.4369753
> sigma_e | .70849863
> rho | .92206421 (fraction of variance due to u_i)
> -----------------------------------------------------------------------------------
>
> **********end of excerpt*************
>
> These results are in the direction of the predicted, but when I pooled
> the sample for me to compare the coefs, the estimates appear to be
> significantly different. They are as follows:
>
> *******output excerpt************
> --------------------------------------------------------------------------------------------------
> | Robust
> Y | Coef. Std. Err. t
> P>|t| [95% Conf. Interval]
> ---------------------------------+----------------------------------------------------------------
> x1 | -1.601963 .5324727 -3.01
> 0.003 -2.645681 -.5582453
> control1 | .5435240 .0232387 19.59 0.000
> .4096868 .5007936
> control2 | .03976 .0110095 2.58 0.010
> .0068306 .0499933
> ... |
> year dum .. |
> 1.pos | -.382873 .1487651 -2.57 0.010
> -.6744726 -.0912734
> pos#c.x1 | .5273469 .4331443 1.22 0.223
> -.3216739 1.376368
> 1.group | .2575 .175552 1.47 0.142
> -.0866054 .60
> 1.group#c.x1 | -.8550352 .5470408 -1.56 0.118
> -1.927308 .217238
> 1.group#pos | -.2539677 .1681945 -1.51 0.131
> -.5836514 .075716
> 1.goup#pos#c.x1 | .8948809 .528096 1.69 0.090
> -.140258 1.93002
> _cons | -6.485282 1.161574 -5.58 0.000
> -8.762123 -4.208441
> ---------------------------------+----------------------------------------------------------------
> sigma_u | 2.2954577
> sigma_e | .76123454
> rho | .90092029 (fraction of variance due to u_i)
>
> **********end of excerpt*************
>
> Do you find something wrong with the last equation?
>
> I would appreciate any help.
> Best
> MJ
>
> 2013/2/20 Rebecca Pope <[email protected]>:
>> Jay has given you important advice as it pertains to the group
>> residual variances.
>
>> You are correct that Wooldridge gives an explanation of interaction
>> terms. He also notes that a fully interacted model (as I assume you
>> will be estimating since your initial post seemed to suggest that you
>> expect different coefficients for all covariates for males and
>> females) assumes group error homogeneity (pg 245 of the 4th ed).
>> Unfortunately, there doesn't appear to be any discussion, at least in
>> that section, of how to address heteroskedasticity between the groups.
>> I didn't read through the rest of the book
>
>> You might want to take a look at this FAQ by Bill Gould:
>> http://www.stata.com/support/faqs/statistics/pooling-data-and-chow-tests/
>>
>> And these slides from a talk by Bobby Gutierrez:
>> http://www.stata.com/meeting/fnasug08/gutierrez.pdf
>>
>> Only you can see your data and judge whether the constrained variance
>> model is appropriate or not. I wouldn't just dismiss the issue out of
>> hand though.
>>
>> Rebecca
>>
>> On Wed, Feb 20, 2013 at 5:47 AM, Mario Jose <[email protected]> wrote:
>>> Thanks you for comments. Testing for equality of coefficients from
>>> different subsamples, as suggested by Marteen, can be solved by
>>> interactions.
>>> There is an excellent explanation of the procedure in Wooldridge:
>>> Introd.Econometrics ModernApproach; pp. 243-246 and pp. 449-450 and in
>>> the following link:
>>> http://www.stata.com/support/faqs/statistics/chow-tests/
>>>
>>> Best,
>>> MJ
>>>
>>> 2013/2/18 JVerkuilen (Gmail) <[email protected]>:
>>>> As someone else indicated, your syntax is odd.
>>>>
>>>> The main question I have is whether you want to allow for different
>>>> group residual variances. If not, interaction. If so, then I guess the
>>>> easiest approach would be -suest-.
>>>>
>>>> On Mon, Feb 18, 2013 at 11:15 AM, Mario Jose <[email protected]> wrote:
>>>>> Dear Statalisters,
>>>>>
>>>>> I have tryed to solve the question below, searching for help in the
>>>>> Stata Archiv without too much success...
>>>>>
>>>>> I have estimated a fixed effects linear regression for two different
>>>>> groups on my sample (say, sex male/female), using this strategy:
>>>>> xtreg dv iv, if sex==male
>>>>> xtreg dv iv, if sex==female
>>>>>
>>>>> I am interested in testing whether or not the coefficient b1 is
>>>>> identical to each other in the two subsamples.
>>>>>
>>>>> I would really appreciate any help.
>>>>> Regards
>>>>> MJ
>>>>> *
>>>>> * For searches and help try:
>>>>> * http://www.stata.com/help.cgi?search
>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>
>>>>
>>>>
>>>> --
>>>> JVVerkuilen, PhD
>>>> [email protected]
>>>>
>>>> http://lesswrong.com/
>>>>
>>>> "Everybody loves progress but nobody likes change." ---Fortune cookie, 1/13/13.
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> * http://www.ats.ucla.edu/stat/stata/
>>>
>>> 2013/2/18 JVerkuilen (Gmail) <[email protected]>:
>>>> As someone else indicated, your syntax is odd.
>>>>
>>>> The main question I have is whether you want to allow for different
>>>> group residual variances. If not, interaction. If so, then I guess the
>>>> easiest approach would be -suest-.
>>>>
>>>> On Mon, Feb 18, 2013 at 11:15 AM, Mario Jose <[email protected]> wrote:
>>>>> Dear Statalisters,
>>>>>
>>>>> I have tryed to solve the question below, searching for help in the
>>>>> Stata Archiv without too much success...
>>>>>
>>>>> I have estimated a fixed effects linear regression for two different
>>>>> groups on my sample (say, sex male/female), using this strategy:
>>>>> xtreg dv iv, if sex==male
>>>>> xtreg dv iv, if sex==female
>>>>>
>>>>> I am interested in testing whether or not the coefficient b1 is
>>>>> identical to each other in the two subsamples.
>>>>>
>>>>> I would really appreciate any help.
>>>>> Regards
>>>>> MJ
>>>>> *
>>>>> * For searches and help try:
>>>>> * http://www.stata.com/help.cgi?search
>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>
>>>>
>>>>
>>>> --
>>>> JVVerkuilen, PhD
>>>> [email protected]
>>>>
>>>> http://lesswrong.com/
>>>>
>>>> "Everybody loves progress but nobody likes change." ---Fortune cookie, 1/13/13.
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> * http://www.ats.ucla.edu/stat/stata/
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
>>
>>
>>
>> On Wed, Feb 20, 2013 at 5:47 AM, Mario Jose <[email protected]> wrote:
>>> Thanks you for comments. Testing for equality of coefficients from
>>> different subsamples, as suggested by Marteen, can be solved by
>>> interactions.
>>> There is an excellent explanation of the procedure in Wooldridge:
>>> Introd.Econometrics ModernApproach; pp. 243-246 and pp. 449-450 and in
>>> the following link:
>>> http://www.stata.com/support/faqs/statistics/chow-tests/
>>>
>>> Best,
>>> MJ
>>>
>>> 2013/2/18 JVerkuilen (Gmail) <[email protected]>:
>>>> As someone else indicated, your syntax is odd.
>>>>
>>>> The main question I have is whether you want to allow for different
>>>> group residual variances. If not, interaction. If so, then I guess the
>>>> easiest approach would be -suest-.
>>>>
>>>> On Mon, Feb 18, 2013 at 11:15 AM, Mario Jose <[email protected]> wrote:
>>>>> Dear Statalisters,
>>>>>
>>>>> I have tryed to solve the question below, searching for help in the
>>>>> Stata Archiv without too much success...
>>>>>
>>>>> I have estimated a fixed effects linear regression for two different
>>>>> groups on my sample (say, sex male/female), using this strategy:
>>>>> xtreg dv iv, if sex==male
>>>>> xtreg dv iv, if sex==female
>>>>>
>>>>> I am interested in testing whether or not the coefficient b1 is
>>>>> identical to each other in the two subsamples.
>>>>>
>>>>> I would really appreciate any help.
>>>>> Regards
>>>>> MJ
>>>>> *
>>>>> * For searches and help try:
>>>>> * http://www.stata.com/help.cgi?search
>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>
>>>>
>>>>
>>>> --
>>>> JVVerkuilen, PhD
>>>> [email protected]
>>>>
>>>> http://lesswrong.com/
>>>>
>>>> "Everybody loves progress but nobody likes change." ---Fortune cookie, 1/13/13.
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> * http://www.ats.ucla.edu/stat/stata/
>>>
>>> 2013/2/18 JVerkuilen (Gmail) <[email protected]>:
>>>> As someone else indicated, your syntax is odd.
>>>>
>>>> The main question I have is whether you want to allow for different
>>>> group residual variances. If not, interaction. If so, then I guess the
>>>> easiest approach would be -suest-.
>>>>
>>>> On Mon, Feb 18, 2013 at 11:15 AM, Mario Jose <[email protected]> wrote:
>>>>> Dear Statalisters,
>>>>>
>>>>> I have tryed to solve the question below, searching for help in the
>>>>> Stata Archiv without too much success...
>>>>>
>>>>> I have estimated a fixed effects linear regression for two different
>>>>> groups on my sample (say, sex male/female), using this strategy:
>>>>> xtreg dv iv, if sex==male
>>>>> xtreg dv iv, if sex==female
>>>>>
>>>>> I am interested in testing whether or not the coefficient b1 is
>>>>> identical to each other in the two subsamples.
>>>>>
>>>>> I would really appreciate any help.
>>>>> Regards
>>>>> MJ
>>>>> *
>>>>> * For searches and help try:
>>>>> * http://www.stata.com/help.cgi?search
>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>
>>>>
>>>>
>>>> --
>>>> JVVerkuilen, PhD
>>>> [email protected]
>>>>
>>>> http://lesswrong.com/
>>>>
>>>> "Everybody loves progress but nobody likes change." ---Fortune cookie, 1/13/13.
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> * http://www.ats.ucla.edu/stat/stata/
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/