Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: regress with vce(robust) and hascons

From	Steven Samuels <[email protected]>
To	[email protected]
Subject	Re: st: regress with vce(robust) and hascons
Date	Mon, 13 Dec 2010 13:00:20 -0500

Thanks, Jeff. I suggest that, when the vce(robust) and -hascons-options are present, the results contain a remark about the change inthe F test.



Steve

On Dec 13, 2010, at 12:21 PM, Jeff Pitblado, StataCorp LP wrote:

Michael N. Mitchell <[email protected]> is using the -hascons-option with -regress, vce(robust)- and noticed the model F statistichas adifferent interpretation than the one for -regress- without the -vce(robust)-

option:

I am puzzled by the behavior of Stata when I include the -vce(robust)-
option along with the -hascons- option.
Consider the example below in which I estimate a model predicting -price-from -foreign- but do so using a cell means model by specifyingibn.foreignand thus include the -hascons- option. I further want robuststandard errors
so specify the -vce(robust)- option.

. sysuse auto, clear
(1978 Automobile Data)
. regress price ibn.foreign, vce(robust) hascons
Linear regression Number of obs= 74F( 2, 72)= 165.64Prob > F= 0.0000R-squared= 0.0024Root MSE= 2966.4
------------------------------------------------------------------------------
             |               Robust
price | Coef. Std. Err. t P>|t| [95% Conf.Interval]-------------+----------------------------------------------------------------
     foreign |
0 | 6072.423 431.2084 14.08 0.0005212.825 6932.0211 | 6384.682 553.6754 11.53 0.0005280.95 7488.413
             |
       _cons |  (omitted)
------------------------------------------------------------------------------

The omnibus F test shows 2 degrees of freedom, but I only expected 1df. Theomnibus F test appears to be testing the joint hypothesis that eachof the
cell means is 0 (see below).

. test 0.foreign 1.foreign

 ( 1)  0bn.foreign = 0
 ( 2)  1.foreign = 0

       F(  2,    72) =  165.64
            Prob > F =    0.0000

But because I specified -hascons- I expect it to test the equalityof thecell means. This is the case when I omit the -vce(robust)-, asshown below.

. regress price ibn.foreign, hascons
Source | SS df MS Number of obs= 74-------------+------------------------------ F( 1, 72)= 0.17Model | 1507382.66 1 1507382.66 Prob > F= 0.6802Residual | 633558013 72 8799416.85 R-squared= 0.0024-------------+------------------------------ Adj R-squared= -0.0115Total | 635065396 73 8699525.97 Root MSE= 2966.4
------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf.Interval]-------------+----------------------------------------------------------------
     foreign |
0 | 6072.423 411.363 14.76 0.0005252.386 6892.461 | 6384.682 632.4346 10.10 0.0005123.947 7645.417
------------------------------------------------------------------------------

In this case, the omnibus F test matches the test of the equality ofthe
cell means.

. test 0.foreign = 1.foreign

 ( 1)  0bn.foreign - 1.foreign = 0

       F(  1,    72) =    0.17
            Prob > F =    0.6802

Perhaps someone can help me understand where I am askew in mythinking about
this.


Short reply:

There is no bug in the value of the F statistic when the -vce(robust)-optionis used with the -hascons- option. The -vce(robust)- causes -regress-toperform all inference based on the linearized variance estimatorinstead of

using the reduction in error sum of squares.

We did notice that -regress, hascons vce(robust)- reports an '(omitted)'
intercept when it shouldn't.  This will be fixed in the next executable
update.

Long reply:

Let's look at the model F statistic that -regress- reports. Firstlet's fit a

simple linear regression of 'price' on the 'foreign' factor variable:

***** BEGIN:
. regress price i.foreign

Source | SS df MS Number of obs= 74-------------+------------------------------ F( 1, 72)= 0.17Model | 1507382.66 1 1507382.66 Prob > F= 0.6802Residual | 633558013 72 8799416.85 R-squared= 0.0024-------------+------------------------------ Adj R-squared =-0.0115Total | 635065396 73 8699525.97 Root MSE= 2966.4


------------------------------------------------------------------------------

------------------------------------------------------------------------------
***** END:

The model F statistic is 0.17. This value is a function of thereduction inthe error sum of squares, and is the ratio of the model mean squaresover the

error mean squares.

***** BEGIN:
. di "MS(model) = " e(mss)/e(df_m)
MS(model) = 1507382.7

. di "MS(error) = " e(rmse)^2
MS(error) = 8799416.9

. di (e(mss)/e(df_m))/e(rmse)^2
.17130484
***** END:

We can also compute this value by performing a Wald test on all the

coefficients in the model (excluding the intercept), with the Nullhypothesis

that they are all equal to zero:

***** BEGIN:
. test [#1]

( 1)  0b.foreign = 0
( 2)  1.foreign = 0
      Constraint 1 dropped

      F(  1,    72) =    0.17
           Prob > F =    0.6802
***** END:

We see that the ANOVA style F statistic (based on the ratio of meansqures) is

computationally equivalent to the Wald F statistic.

For this particular model, the above Null hypothesis also implies thattheexpected value of 'price' for Foreign cars is equal to the expectedvalue of

'price' for Domestic.

Now let's refit our model with the -noconstant- option, we'll also usethe

-bn- operator on 'foreign' to prevent Stata from omitting a base level.

***** BEGIN:
. regress price bn.foreign, nocons

Source | SS df MS Number of obs= 74-------------+------------------------------ F( 2, 72)= 159.91Model | 2.8143e+09 2 1.4071e+09 Prob > F= 0.0000Residual | 633558013 72 8799416.85 R-squared= 0.8162-------------+------------------------------ Adj R-squared= 0.8111Total | 3.4478e+09 74 46592355.7 Root MSE= 2966.4


------------------------------------------------------------------------------

price | Coef. Std. Err. t P>|t| [95% Conf.Interval]-------------+----------------------------------------------------------------

    foreign |

0 | 6072.423 411.363 14.76 0.000 5252.3866892.461 | 6384.682 632.4346 10.10 0.000 5123.9477645.417

------------------------------------------------------------------------------
***** END:

Notice that the value and degrees of freedom for the model F statistichaschanged; so have the sum of squares for the model. Here we reproducethe mean

squares and model F stiatistic for this model:

***** BEGIN:
. di "MS(model) = " e(mss)/e(df_m)
MS(model) = 1.407e+09

. di "MS(error) = " e(rmse)^2
MS(error) = 8799416.9

. di (e(mss)/2)/e(rmse)^2
159.91266
***** END:

Here is the equivalent Wald test:

***** BEGIN:
. test [#1]

( 1)  0bn.foreign = 0
( 2)  1.foreign = 0

      F(  2,    72) =  159.91
           Prob > F =    0.0000
***** END:

It is now clear that the Null hypothesis for this model F statistic is

not the same as our previous model. Here the Null hypothesis is that'price'

has an expected value of zero.

Now let's refit Michael's -hascons- model, without the -vce(robust)-option:


***** BEGIN:
. regress price bn.foreign, hascons


------------------------------------------------------------------------------

price | Coef. Std. Err. t P>|t| [95% Conf.Interval]-------------+----------------------------------------------------------------

    foreign |

0 | 6072.423 411.363 14.76 0.000 5252.3866892.461 | 6384.682 632.4346 10.10 0.000 5123.9477645.417

------------------------------------------------------------------------------
***** END:

Again, the model F statistic is derived from a reduction in the errorsum ofsquares. The -hascons- option implies that there is a constant in themodel,thus the model F statistic will test against the mean-only model likewe did

in our first model fit.  Here is the F statistic computed using the mean
squares:

***** BEGIN:
. di "MS(model) = " e(mss)/e(df_m)
MS(model) = 1507382.7

. di "MS(error) = " e(rmse)^2
MS(error) = 8799416.9

. di e(mss)/e(rmse)^2
.17130484
***** END:

And here is the F statistic from the Wald test:

***** BEGIN:
. test [#1]

( 1)  0bn.foreign = 0
( 2)  1.foreign = 0

      F(  2,    72) =  159.91
           Prob > F =    0.0000
***** END:

Notice that, given our current model fit, the Null hypothesis for thisWaldtest is that the expected value of 'price' is zero. This is not thesame as

the Null for the reported model F statistic in the -hascons- model.

Now consider refitting this model with robust/linearized varianceestimates

(VCE).  Using the -vce(robust)- option causes -regress- to perform all

inference using the linearized VCE. The analog/equivalence betweenthe Wald F

statistic and ANOVA style F statistic breaks down in this case.  With

-vce(robust)-, -regress- is forced to use the Wald F statistic; thereis noequivalent linearized version of the F statistic formed from the ratioof mean

squares.

Here is the -hascons- model fit with linearized VCE:

***** BEGIN:
. regress price bn.foreign, hascons vce(robust)

Linear regression Number of obs= 74F( 2, 72)= 165.64Prob > F= 0.0000R-squared= 0.0024Root MSE= 2966.4


------------------------------------------------------------------------------
            |               Robust

price | Coef. Std. Err. t P>|t| [95% Conf.Interval]-------------+----------------------------------------------------------------

    foreign |

0 | 6072.423 431.2084 14.08 0.000 5212.8256932.0211 | 6384.682 553.6754 11.53 0.000 5280.957488.413

------------------------------------------------------------------------------
***** END:

Notice that -regress- doesn't even report the ANOVA table when -vce(robust)-is speicifed. We can still compute the means squares from -regress-'s-e()-

results:

***** BEGIN:
. di "MS(model) = " e(mss)/e(df_m)
MS(model) = 753691.33

. di "MS(error) = " e(rmse)^2
MS(error) = 8799416.9

. di e(mss)/e(rmse)^2
.17130484
***** END:

However -vce(robust)- prevents us from using them to make inferences.

Here we show that the model F statistic reported by -regress,vce(robust)-

comes from the Wald test on the model coefficients.

***** BEGIN:
. test [#1]

( 1)  0bn.foreign = 0
( 2)  1.foreign = 0

      F(  2,    72) =  165.64
           Prob > F =    0.0000
***** END:

--Jeff
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: regress with vce(robust) and hascons
  - From: Stas Kolenikov <[email protected]>

References:
- Re: st: regress with vce(robust) and hascons
  - From: [email protected] (Jeff Pitblado, StataCorp LP)

Prev by Date: Re: st: regress with vce(robust) and hascons
Next by Date: Re: st: How to define an external Mata class within the namespace of an ado-file
Previous by thread: Re: st: regress with vce(robust) and hascons
Next by thread: Re: st: regress with vce(robust) and hascons
Index(es):
- Date
- Thread