RE: st: p-value after cluster option

From   "Maarten Buis" <[email protected]>
To   <[email protected]>
Subject   RE: st: p-value after cluster option
Date   Tue, 29 Aug 2006 09:39:01 +0200

Rich appears to be right, the problem does indeed appear to be the
degrees of freedom. The p-values are calculated using a 
t-distribution not a normal distribution, whereby the number of 
degrees of freedom are N - number of variables - 1. However if the 
number of degrees of freedom is large the t is well approximated 
by a normal, and 1627 degrees of freedom is more than enough to 
make this approximation work extremely well.  However, with the 
cluster option the degrees of freedom are determined by the number 
of clusters (since these are now the number of independent bits of 
information, not the number observations). So if you have few 
clusters the normal approximation may no longer work. Stata stores 
the appropriate degrees of freedom in e(df_r), so you can use that 
to recreate the p-values, like in the example below:


*-----------------begin example----------------
sysuse auto, clear
reg price mpg foreign, cluster(rep78)
di "t = " abs(_b[mpg]/_se[mpg])
di "df = " e(df_r)
di "p = " 2*ttail(e(df_r),abs(_b[mpg]/_se[mpg]))
di "p is not " 2*norm(-abs(_b[mpg]/_se[mpg]))
*-----------------end example-------------------

---sara borelli wrote:
> in my regression I have N=1647 and  have 19 variables.
> To calculate the P-values I used the tables of the
> standard normal distribution as in the standard way to
> calculate the P-values.

--- Richard Goldstein wrote:
> > sounds like a problem with you're getting the
> > degrees of freedom correct 

--- sara borelli wrote:
> > > But when I use cluster, the p-values look "wrong",
> > > that is if I calculate the p-value using the
> > > "new-clustered" t-statistics and the statistical
> > > tables I get a result that is different from the
> > > stata output

