FAQ: Computing the Chow statistic

Home / Resources & support / FAQs / Computing the Chow statistic

How can I compute the Chow test statistic?

Title		Computing the Chow statistic
Author		William Gould, StataCorp

Note:This FAQ has been updated for Stata 14. The data are simulated, so results are different from previous versions because of the new 64-bit Mersenne Twister pseudorandom numbers.

You can include the dummy variables in a regression of the full model and then use the test command on those dummies. You could also run each of the models and then write down the appropriate numbers and calculate the statistic by hand—you also have access to functions to get appropriate p-values.

Here is a longer answer:

Let’s start with the Chow test to which many refer. Consider the model,

    y = a + b*x1 + c*x2 + u

and say we have two groups of data. We could fit that model on the two groups separately,

    y = a1 + b1*x1 + c1*x2 + u         for group == 1

    y = a2 + b2*x1 + c2*x2 + u         for group == 2

and we could fit a single, pooled regression

    y = a  + b*x1  + c*x2 + u          for both groups

In the last regression, we are asserting that a1==a2, b1==b2, and c1==c2. The formula for the “Chow test” of this constraint is

         ess_c - (ess_1+ess_2)
         ---------------------
                  k
    ---------------------------------
            ess_1 + ess_2
           ---------------
           N_1 + N_2 - 2*k

and this is the formula to which people refer. ess_1 and ess_2 are the error sum of squares from the separate regressions, ess_c is the error sum of squares from the pooled (constrained) regression, k is the number or estimated parameters (k=3 in our case), and N_1 and N_2 are the number of observations in the two groups.

The resulting test statistic is distributed F(k, N_1+N_2-2*k).

Let’s try this. I have created small datasets:

 clear
 set obs 100
 set seed 1234
 generate x1 = uniform() 
 generate x2 = uniform()
 generate y = 4*x1 - 2*x2 + 2*invnormal(uniform())
 generate group = 1
 save one, replace

 clear
 set obs 80
 generate x1 = uniform()
 generate x2 = uniform()
 generate y = -2*x1 + 3*x2 + 8*invnormal(uniform())
 generate group = 2
 save two, replace 

 use one, clear
 append using two
 save combined, replace

The models are different in the two groups, the residual variances are different, and so are the number of observations. With this dataset, I can carry forth the Chow test. First, I run the separate regressions:

. regress y x1 x2 if group==1
   

      Source         SS           df       MS      Number of obs   =       100

      F(2, 97)        =     19.08

       Model    156.695964         2  78.3479821    Prob > F        =    0.0000

    Residual    398.206631        97    4.105223    R-squared       =    0.2824

      Adj R-squared   =    0.2676

       Total    554.902595        99  5.60507672    Root MSE        =    2.0261




           y   Coefficient  Std. err.      t    P>|t|     [95% conf. interval]

   

          x1     3.586476   .6442618     5.57   0.000     2.307795    4.865158

          x2    -1.915656   .7189693    -2.66   0.009    -3.342611   -.4887013

       _cons     .3636508   .5146138     0.71   .4819    -.6577151    1.385017




. regress y x1 x2 if group==2


      Source         SS           df       MS      Number of obs   =        80

      F(2, 77)        =      0.87

       Model    107.332801         2  53.6664005    Prob > F        =    0.4227

    Residual    4745.17268        77  61.6256192    R-squared       =    0.0221

      Adj R-squared   =   -0.0033

       Total    4852.50548        79    61.42412    Root MSE        =    7.8502




           y   Coefficient  Std. err.      t    P>|t|     [95% conf. interval]

   

          x1    -2.860412   2.840325    -1.01   0.317    -8.516223    2.795398

          x2     2.971855   3.161894     0.94   0.350    -3.324281    9.267991

       _cons    -1.108295   2.200774    -0.50   0.616    -5.490597    3.274006

Then I run the combined regression:

. regress y x1 x2 

      Source         SS           df       MS      Number of obs   =       180

      F(2, 177)       =      0.34

       Model    21.1546997         2  10.5773499    Prob > F        =    0.7157

    Residual    5587.34576       177  31.5669252    R-squared       =    0.0038

      Adj R-squared   =   -0.0075

       Total    5608.50046       179  31.3324048    Root MSE        =    5.6184





           y   Coefficient  Std. err.      t    P>|t|     [95% conf. interval]

   

          x1     1.081461   1.337802     0.81   0.420    -1.558633    3.721556

          x2    -.2318499   1.489658    -0.16   0.876    -3.171626    2.707926

       _cons    -.1078511   1.056195    -0.10   0.919    -2.192207    1.976505

For the Chow test,

           ess_c - (ess_1+ess_2)
           ---------------------
                    k
     ---------------------------------
               ess_1 + ess_2
              ---------------
              N_1 + N_2 - 2*k

here are the relevant numbers copied from the output above:

    ess_c =  5587.34576            (from combined regression)

    ess_1 =   398.206631           (from group==1 regression)
    ess_2 =  4745.17268            (from group==2 regression)

        k = 3                      (we estimate 3 parameters)
      N_1 = 100                    (from group==1 regression)
      N_2 =  80                    (from group==2 regression)

So, plugging in, we get

      5587.34576 - (398.206631+4745.17268)              443.96645
      ------------------------------------              ---------
                      3                                     3
    -----------------------------------------  =     ---------------
            398.206631 + 4745.17268                     5143.3793
            -----------------------                     ---------
                 100+80-2*3                                174

                                                        147.98882
                                               =       ----------
                                                        29.559651


                                               =        5.0064466

The Chow test is F(k,N_1+N_2-2*k) = F(3,174), so our test statistic is F(3,174) = 5.0064466.

Now I will do the same problem by running one regression and using test to test certain coefficients equal to zero. What I want to do is fit the model

     y = a3 + b3*x1 + c3*x2 + a3'*g2 + b3'*g2*x1 + c3'*g2*x2 + u

where g2=1 if group==2 and g2=0 otherwise. I can do this by typing

. generate g2 = (group==2)
. generate g2x1 = g2*x1
. generate g2x2 = g2*x2
. regress y x1 x2 g2 g2x1 g2x2

Think about the predictions from this model. The model says

    y =     a3   +       b3*x1 +       c3*x2 + u     when g2==0
    y = (a3+a3') + (b3+b3')*x1 + (c3+c3')*x2 + u     when g2==1

Thus the model is equivalent to fitting the separate models

    y = a1 + b1*x1 + c1*x2 + u         for group == 1
    y = a2 + b2*x1 + c2*x2 + u         for group == 2

The relationship being

    a1 = a3               a2 = a3 + a3'
    b1 = b3               b2 = b3 + b3'
    c1 = c3               c2 = c3 + c3'

Some of you may be concerned that in the pooled model (the one estimating a3, b3, etc.), we are constraining the var(u) to be the same for each group, whereas, in the separate-equation model, we estimate different variances for group 1 and group 2. This does not matter, because the model is fully interacted. That is probably not convincing, but what should be convincing is that I am about to obtain the same F(3,174) = 5.01 answer and, in my concocted data, I have different variances in each group.

So, here is the result of the alternative test coefficients against 0 in a pooled specification:

. generate g2 = (group==2)
 
. generate g2x1 = g2*x1
    
. generate g2x2 = g2*x2

. regress y x1 x2 g2 g2x1 g2x2


      Source         SS           df       MS      Number of obs   =       180

      F(5, 174)       =      3.15

       Model    465.121148         5  93.0242295    Prob > F        =    0.0096

    Residual    5143.37931       174  29.5596512    R-squared       =    0.0829

      Adj R-squared   =    0.0566

       Total    5608.50046       179  31.3324048    Root MSE        =    5.4369




           y   Coefficient  Std. err.      t    P>|t|     [95% conf. interval]

   

          x1     3.586476   1.728796     2.07   0.040      .174367    6.998585

          x2    -1.915656   1.929264    -0.99   0.322    -5.723428    1.892115

          g2    -1.471946   2.056721    -0.72   0.475    -5.531279    2.587387

        g2x1    -6.446889   2.618856    -2.46   0.015     -11.6157   -1.278075

        g2x2     4.887512   2.918483     1.67   0.096    -.8726743     10.6477

       _cons     .3636508   1.380901     0.26   0.793    -2.361822    3.089124


 
. test g2 g2x1 g2x2
 
 ( 1)  g2 = 0
 ( 2)  g2x1 = 0
 ( 3)  g2x2 = 0

       F(  3,   174) =    5.01
            Prob > F =    0.0024

Same answer.

This definition of the “Chow test” is equivalent to pooling the data, fitting the fully interacted model, and then testing the group 2 coefficients against 0.

That is why I said, “Chow Test is a term I have heard used by economists in the context of testing a set of regression coefficients being equal to 0.”

Admittedly, this leaves a lot unsaid.

The issue of the variance of u being equal in the two groups is subtle, but I do not want that to get in the way of understanding that the Chow test is equivalent to the “pool the data, interact, and test” procedure. They are equivalent.

Concerning variances, the Chow test itself is testing against a pooled, uninteracted model and so has buried in it an assumption of equal variances. It is really a test that the coefficients are equal and variance(u) in the groups are equal. It is, however, a weak test of the equality of variances because that assumption manifests itself only in how the pooled coefficient estimates are manufactured. Because the Chow test and the “pool the data, interact, and test” procedure are the same, the same is true of both procedures.

Your second concern might be that in the “pool the data, interact, and test” procedure there is an extra assumption of equality of variances because everything comes from the pooled model. As shown, this is not true. It is not true because the model is fully interacted, so the assumption of equal variances never makes a difference in the calculation of the coefficients.

In Stata 12 or more recent versions, you can also use the contrast command with factor variables to perform the same test:

. regress y c.x1##i.g2 c.x2##i.g2


      Source         SS           df       MS      Number of obs   =       180

      F(5, 174)       =      3.15

       Model    465.121148         5  93.0242295    Prob > F        =    0.0096

    Residual    5143.37931       174  29.5596512    R-squared       =    0.0829

      Adj R-squared   =    0.0566

       Total    5608.50046       179  31.3324048    Root MSE        =    5.4369




           y   Coefficient  Std. err.      t    P>|t|     [95% conf. interval]

   

          x1     3.586476   1.728796     2.07   0.040      .174367    6.998585

        1.g2    -1.471946   2.056721    -0.72   0.475    -5.531279    2.587387

               

     g2#c.x1   

          1     -6.446889   2.618856    -2.46   0.015     -11.6157   -1.278075

               

          x2    -1.915656   1.929264    -0.99   0.322    -5.723428    1.892115

               

     g2#c.x2   

          1      4.887512   2.918483     1.67   0.096    -.8726743     10.6477

               

       _cons     .3636508   1.380901     0.26   0.793    -2.361822    3.089124




. contrast g2 g2#c.x1 g2#c.x2,overall 

Contrasts of marginal linear predictions

Margins: asbalanced




                       df           F        P>F

   

          g2            1        0.51     0.4752

               

     g2#c.x1            1        6.06     0.0148

               

     g2#c.x2            1        2.80     0.0958

               

     Overall            3        5.01     0.0024

               

 Denominator          174

An additional example can be found in the “Chow tests” section of [R] contrast.

How can I compute the Chow test statistic?

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

Source	SS df MS	Number of obs = 100
		F(2, 97) = 19.08
Model	156.695964 2 78.3479821	Prob > F = 0.0000
Residual	398.206631 97 4.105223	R-squared = 0.2824
		Adj R-squared = 0.2676
Total	554.902595 99 5.60507672	Root MSE = 2.0261


y		Coefficient Std. err. t P>\|t\| [95% conf. interval]

x1		3.586476 .6442618 5.57 0.000 2.307795 4.865158
x2		-1.915656 .7189693 -2.66 0.009 -3.342611 -.4887013
_cons		.3636508 .5146138 0.71 .4819 -.6577151 1.385017

Source	SS df MS	Number of obs = 80
		F(2, 77) = 0.87
Model	107.332801 2 53.6664005	Prob > F = 0.4227
Residual	4745.17268 77 61.6256192	R-squared = 0.0221
		Adj R-squared = -0.0033
Total	4852.50548 79 61.42412	Root MSE = 7.8502


y		Coefficient Std. err. t P>\|t\| [95% conf. interval]

x1		-2.860412 2.840325 -1.01 0.317 -8.516223 2.795398
x2		2.971855 3.161894 0.94 0.350 -3.324281 9.267991
_cons		-1.108295 2.200774 -0.50 0.616 -5.490597 3.274006

Source	SS df MS	Number of obs = 180
		F(2, 177) = 0.34
Model	21.1546997 2 10.5773499	Prob > F = 0.7157
Residual	5587.34576 177 31.5669252	R-squared = 0.0038
		Adj R-squared = -0.0075
Total	5608.50046 179 31.3324048	Root MSE = 5.6184


y		Coefficient Std. err. t P>\|t\| [95% conf. interval]

x1		1.081461 1.337802 0.81 0.420 -1.558633 3.721556
x2		-.2318499 1.489658 -0.16 0.876 -3.171626 2.707926
_cons		-.1078511 1.056195 -0.10 0.919 -2.192207 1.976505


y		Coefficient Std. err. t P>\|t\| [95% conf. interval]

x1		3.586476 1.728796 2.07 0.040 .174367 6.998585
x2		-1.915656 1.929264 -0.99 0.322 -5.723428 1.892115
g2		-1.471946 2.056721 -0.72 0.475 -5.531279 2.587387
g2x1		-6.446889 2.618856 -2.46 0.015 -11.6157 -1.278075
g2x2		4.887512 2.918483 1.67 0.096 -.8726743 10.6477
_cons		.3636508 1.380901 0.26 0.793 -2.361822 3.089124


y		Coefficient Std. err. t P>\|t\| [95% conf. interval]

x1		3.586476 1.728796 2.07 0.040 .174367 6.998585
1.g2		-1.471946 2.056721 -0.72 0.475 -5.531279 2.587387

g2#c.x1
1		-6.446889 2.618856 -2.46 0.015 -11.6157 -1.278075

x2		-1.915656 1.929264 -0.99 0.322 -5.723428 1.892115

g2#c.x2
1		4.887512 2.918483 1.67 0.096 -.8726743 10.6477

_cons		.3636508 1.380901 0.26 0.793 -2.361822 3.089124


		df F P>F

g2		1 0.51 0.4752

g2#c.x1		1 6.06 0.0148

g2#c.x2		1 2.80 0.0958

Overall		3 5.01 0.0024

Denominator		174

Stata/MP4 Annual License (download)

How can I compute the Chow test statistic?

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies