Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: First stage F stats - xtivreg
From
Austin Nichols <[email protected]>
To
[email protected]
Subject
Re: st: First stage F stats - xtivreg
Date
Tue, 21 Jun 2011 13:42:05 -0400
Agnese Romiti <[email protected]>:
You are right about -xtivreg2- refusing to participate, so you could
simply include dummies for every fixed effect in -ivreg2-, e.g.
webuse nhanes2, clear
xtivreg2 hlthstat (iron=lead), fe i(houssiz) cluster(houssiz)
xtivreg2 hlthstat (iron=lead), fe i(houssiz) cluster(location sampl)
qui ta houssiz, gen(d_)
ivreg2 hlthstat (iron=lead) d_*, fwl(d_*) cluster(location sampl)
Or you could cluster by initial region instead, e.g.
bys i (t): g initregion=region[1]
which involves different assumptions, but will also give you evidence
of how the data seem to be clustered.
On Tue, Jun 21, 2011 at 1:06 PM, Agnese Romiti <[email protected]> wrote:
> Dear Austin,
>
> When I used as cluster unit region-year or also only region I had to
> run ivreg2 on the data that I have previously transformed in deviation
> to the mean (within trasformation) because the xtivreg2 requires that
> no panel overlaps more than one cluster. So panels should be uniquely
> assigned to clusters.
> I tried to run instead xtivreg2 with two clusters as you suggested
> but I received an error message "cluster(): too many variables
> specified", apparently because I don't have the latest version of the
> commands. I have just done an update all and my stata seems to be
> updated to 30March 2011 (exe and ado), and to 1Sept 2010 , the
> utilities. Is there a reason whereby I still get the error?
>
> Thanks
> Agnese
>
>
>
>
> 2011/6/21 Austin Nichols <[email protected]>:
>> Agnese Romiti <[email protected]>:
>> I don't see how it matters that individuals move across clusters,
>> unless you want to cluster by individual as well, and -xtivreg2-
>> allows two dimensions of clustering. When you cluster by region-year,
>> you assume that a draw from the dgp of person i in year t is
>> independent from a draw from the dgp of person i in year t+1, which is
>> clearly problematic. You should try clustering by individual, by
>> region, and then try two dimensions of clustering. Let us know how
>> the first stage diagnostic statistics and SEs on main variables of
>> interest, in each of those 3 cases, compare to your
>> region-year-clustered version.
>>
>> On Tue, Jun 21, 2011 at 10:47 AM, Agnese Romiti <[email protected]> wrote:
>>> Austin,
>>>
>>> The reason whereby I have chosen the region-year as cluster unit was
>>> due to the fact that individuals - around 8 percent of them - move
>>> across regions over time, so the region was not unique for them.
>>>
>>> Many thanks again for your help and the ref.
>>> Agnese
>>>
>>> 2011/6/21 Austin Nichols <[email protected]>:
>>>> Agnese Romiti <[email protected]>
>>>> In that case the cluster-robust SE will be biased downward slightly,
>>>> resulting in overrejection and your first-stage F stat overstated, but
>>>> I expect it will still outperform the SE and F clustering by
>>>> region-year. You would have to do simulations matching your exact
>>>> setup to be sure; see e.g.
>>>> http://www.stata.com/meeting/13uk/nichols_crse.pdf
>>>>
>>>> On Tue, Jun 21, 2011 at 3:27 AM, Agnese Romiti <[email protected]> wrote:
>>>>> Hi,
>>>>> Thanks again
>>>>> In my data I have 19 regions, and around 18 percent of the data in the
>>>>> largest region.
>>>>>
>>>>> Agnese
>>>>>
>>>>>
>>>>> 2011/6/21 Austin Nichols <[email protected]>:
>>>>>> Agnese Romiti <[email protected]>:
>>>>>> No, you should cluster by region to correctly account for possible
>>>>>> serial correlation,
>>>>>> assuming you have sufficiently many regions in your data; how many are there?
>>>>>> What percent of the data is in the largest region?
>>>>>>
>>>>>> On Mon, Jun 20, 2011 at 5:19 PM, Agnese Romiti <[email protected]> wrote:
>>>>>>> Many thanks Austin,
>>>>>>>
>>>>>>> I'm actually clustering the standard errors at region-year level
>>>>>>> rather than at region because I have one regressor with variability at
>>>>>>> region-year level. Is that correct?
>>>>>>> Do you think that the high first stage F stats might be a signal of a
>>>>>>> bad instrument?Like a failure of the exogeneity requirement?
>>>>>>>
>>>>>>> Agnese
>>>>>>>
>>>>>>>
>>>>>>> 2011/6/20 Austin Nichols <[email protected]>:
>>>>>>>> Agnese Romiti <[email protected]>:
>>>>>>>> Are you clustering by region to account for the likely correlation of
>>>>>>>> errors within region?
>>>>>>>> Also see
>>>>>>>> http://www.stata.com/meeting/boston10/boston10_nichols.pdf
>>>>>>>> for an alternative model that allows your dep var to be nonnegative.
>>>>>>>>
>>>>>>>> On Mon, Jun 20, 2011 at 3:49 AM, Agnese Romiti <[email protected]> wrote:
>>>>>>>>> Dear Statalist users,
>>>>>>>>>
>>>>>>>>> I'm running a fixed effect model with IV (xtivreg2) , my dependent
>>>>>>>>> variable is a measure of labor supply at the individual level (working
>>>>>>>>> hours). Whereas I have an endogenous variable with variation only at
>>>>>>>>> regional-year level.
>>>>>>>>> My question is about the First stage statistics, the Weak
>>>>>>>>> identification test results in an F statistics extremely high which
>>>>>>>>> makes me worry about something wrong, i.e. F=3289.
>>>>>>>>> Do you have any clue about potential reasons driving this odd result?
>>>>>>>>>
>>>>>>>>> Many thanks in advance for your help.
>>>>>>>>>
>>>>>>>>> Agnese
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/