Al,
Other patterns in the data can generate this problem. For example, you
might have a variable that is the cluster equivalent of a singleton dummy:
the variable has two values, =x for all obs in one cluster and =y for the
rest.
It's ad-hocky, but try running your regression clustering on household
(survey code) but dropping one variable at a time and seeing if and when
the problem goes away. This will let you trace the problem.
Cheers,
Mark
NB: Does the above recommendation remind anyone else of the following
ancient computing joke?
Q: How do you know that's an IBM repairman on the side of the road with a
flat tire?
A: He changes each tire, one after the other, until he finds out which one
is flat.
> Hi Mark,
>
> Yes! I clicked that and it goes on to talk about situations in which
> F(.,.)
> goes missing. All the discussion is about when the number of parameters is
> equal to or more than the number of observations. For example, "You might
> see chi2(6) or F(6, 5). If you were to count the number of coefficients
> that would be constrained to 0 in a model test in this case, you would
> find
> that number to be greater than 6. You could find out what that number is
> by
> reestimating the model parameters without the robust and cluster()
> options".
>
> I dont think this is my problem - I have enough observations (about 40
> observations per cluster per season (so about 120 since i have three
> seasons)). Also I can estimate the model with robust, but not with
> cluster().
>
> So i am not sure what is going on.
>
> Thanks for your email Mark!
>
> -Anerdy
>
>
>
>>From: "Mark Schaffer" <[email protected]>
>>Reply-To: [email protected]
>>To: [email protected]
>>CC: "mes " <[email protected]>
>>Subject: Re: st: SE with cluster option
>>Date: Tue, 18 Oct 2005 19:17:49 +0100 (BST)
>>
>>Al,
>>
>> > Hi Everyone,
>> >
>> > I was wondering what may explain the following F(.,.) valuse when i
>> use
>> > the cluster option. I have about 40 households per cluister, and four
>> > clusters (total of 168 unique households). I'd like to run the model
>> at
>> > the cluster level to estimate a Difference in Difference model.
>> >
>> > Initially I thought the issue was that since there are only 4
>> clusters,
>> > I'd not be able to estimate it since its using 4 cluster means to
>>estimate
>> > the standard errors.
>>
>>You are right - in effect, you have 4 observations ("super-observations"
>>is perhaps more accurate) to calculate your var-cov matrix, which means
>>you won't get very far this way.
>>
>> > However the problem still remains if i cluster at the
>> > survey code (or household) level
>>
>>Is there a clickable hyperlink on the missing F-stat in this case, and if
>>so, what does it say?
>>
>>--Mark
>>
>>
>> > -MODEL 1 -
>> >
>> > reg y1 DiD vdc post season cdum2 cdum4, cluster(clust)
>> >
>> > Regression with robust standard errors Number of obs =
>> > 672
>> > F(
>> > 1,
>> > 3) = .
>> >
>>Prob
>> > >
>> > F = .
>> >
>> > R-squared = 0.1220
>> > Number of clusters (village) = 4 Root MSE
>>=
>> > .29762
>> >
>> >
>>------------------------------------------------------------------------------
>> > | Robust
>> > cropfail | Coef. Std. Err. t P>|t| [95% Conf.
>> > Interval]
>> >
>>-------------+----------------------------------------------------------------
>> > DiD | .1867678 .0381533 4.90 0.016 .0653468
>> > .3081888
>> > cdum1 | .0407624 .0190767 2.14 0.122 -.0199481
>> > .1014729
>> > post | .0377531 .0255782 1.48 0.236 -.0436482
>> > .1191544
>> > season | -.0803571 .0418741 -1.92 0.151 -.2136192
>> > .0529049
>> > cdum2 | .0830587 5.54e-16 . 0.000 .0830587
>> > .0830587
>> > cdum4 | .085874 1.02e-15 . 0.000 .085874
>> > .085874
>> > _cons | .1601304 .0901628 1.78 0.174 -.1268078
>> > .4470686
>> >
>>------------------------------------------------------------------------------
>> >
>> >
>> > -MODEL 2 -
>> >
>> > reg y1 DiD vdc post season vdum2 vdum4, cluster(survey)
>> > Regression with robust standard errors Number of obs =
>> > 672
>> > F(
>> > 5,
>> > 167) = .
>> >
>>Prob
>> > >
>> > F = .
>> >
>> > R-squared = 0.1220
>> > Number of clusters (survey) = 168 Root MSE =
>> > .29762
>> >
>> >
>>------------------------------------------------------------------------------
>> > | Robust
>> > cropfail | Coef. Std. Err. t P>|t| [95% Conf.
>> > Interval]
>> >
>>-------------+----------------------------------------------------------------
>> > DiD | .1867678 .0788515 2.37 0.019 .0310936
>> > .342442
>> > cdum1 | .0407624 .012909 3.16 0.002 .0152765
>>.0662484
>> > post | .0377531 .0240521 1.57 0.118 -.0097322
>> > .0852384
>> > season | -.0803571 .0200387 -4.01 0.000 -.119919
>> > -.0407952
>> > cdum2 | .0830587 .0201067 4.13 0.000 .0433627
>> > .1227547
>> > cdum4 | .085874 .0476556 1.80 0.073 -.008211
>> > .179959
>> > _cons | .1601304 .0483279 3.31 0.001 .0647181
>> > .2555428
>> >
>>------------------------------------------------------------------------------
>> >
>> >
>> > -MODEL 3 -
>> > . reg y1 DiD vdc post season vdum2 vdum4, robust
>> >
>> > Regression with robust standard errors Number of obs =
>> > 672
>> > F( 6, 665) =
>> > 10.49
>> > Prob > F =
>> > 0.0000
>> > R-squared =
>> > 0.1220
>> > Root MSE =
>> > .29762
>> >
>> >
>>------------------------------------------------------------------------------
>> > | Robust
>> > cropfail | Coef. Std. Err. t P>|t| [95% Conf.
>> > Interval]
>> >
>>-------------+----------------------------------------------------------------
>> > DiD | .1867678 .0658962 2.83 0.005 .0573781
>> > .3161575
>> > cdum1 | .0407624 .0144458 2.82 0.005 .0123976
>> > .0691272
>> > post | .0377531 .0276749 1.36 0.173 -.0165876
>> > .0920938
>> > season | -.0803571 .0229621 -3.50 0.000 -.1254441
>> > -.0352702
>> > cdum2 | .0830587 .0206597 4.02 0.000 .0424926
>> > .1236247
>> > cdum4 | .085874 .0436286 1.97 0.049 .0002076
>> > .1715403
>> > _cons | .1601304 .0566039 2.83 0.005 .0489866
>> > .2712742
>> >
>>------------------------------------------------------------------------------
>> >
>> >
>> > Model 1 estimates the SEs at the cluster level, while Model 2 does it
>> at
>> > the
>> > ID level. Model 3 uses the robust option. and everything works out
>> fine.
>> > The
>> > help suggests that I may be estimating more parameters than i can
>>possible
>> > estimate with the data. I am not sure i see that since i have a sample
>>of
>> > over 670 observations, and I am estimating betwen 5 - 8 variable at
>>most.
>> >
>> > I was hoping someone has some intuition here as to what may be messing
>>me
>> > up.
>> >
>> > thanks.
>> > al
>> >
>> > _________________________________________________________________
>> > Express yourself instantly with MSN Messenger! Download today - it's
>>FREE!
>> > http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
>> >
>> > *
>> > * For searches and help try:
>> > * http://www.stata.com/support/faqs/res/findit.html
>> > * http://www.stata.com/support/statalist/faq
>> > * http://www.ats.ucla.edu/stat/stata/
>> >
>>
>>
>>Prof. Mark Schaffer
>>Director, CERT
>>Department of Economics
>>School of Management & Languages
>>Heriot-Watt University, Edinburgh EH14 4AS
>>tel +44-131-451-3494 / fax +44-131-451-3294
>>email: [email protected]
>>web: http://www.sml.hw.ac.uk/ecomes
>>
>
> _________________________________________________________________
> Don�t just search. Find. Check out the new MSN Search!
> http://search.msn.click-url.com/go/onm00200636ave/direct/01/
>
>
Prof. Mark Schaffer
Director, CERT
Department of Economics
School of Management & Languages
Heriot-Watt University, Edinburgh EH14 4AS
tel +44-131-451-3494 / fax +44-131-451-3294
email: [email protected]
web: http://www.sml.hw.ac.uk/ecomes
__________________________________________________________________
DISCLAIMER:
This e-mail message is subject to http://www.hw.ac.uk/disclaim.htm
__________________________________________________________________
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/