I was wondering what may explain the following F(.,.) valuse when i use the
cluster option. I have about 40 households per cluister, and four clusters
(total of 168 unique households). I'd like to run the model at the cluster
level to estimate a Difference in Difference model.
Initially I thought the issues was that since there are only 4 clusters, I'd
not be able to estimate it since its using 4 cluster means to estimate the
standard errors. However the problem still remains if i cluster at the
survey code (or household) level
-MODEL 1 -
reg y1 DiD vdc post season cdum2 cdum4, cluster(clust)
Regression with robust standard errors Number of obs =
672
F( 1,
3) = .
Prob >
F = .
R-squared = 0.1220
Number of clusters (village) = 4 Root MSE =
.29762
reg y1 DiD vdc post season vdum2 vdum4, cluster(survey)
Regression with robust standard errors Number of obs =
672
F( 5,
167) = .
Prob >
F = .
R-squared = 0.1220
Number of clusters (survey) = 168 Root MSE =
.29762
Model 1 estimates the SEs at the cluster level, while Model 2 does it at the
ID level. Model 3 uses the robust option. and everything works out fine. The
help suggests that I may be estimating more parameters than i can possible
estimate with the data. I am not sure i see that since i have a sample of
over 670 observations, and I am estimating betwen 5 - 8 variable at most.
I was hoping someone has some intuition here as to what may be messing me
up.