Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: RE: Matsize and Estimation of the Variance Matrix in a Regression
From
Joe Canner <[email protected]>
To
"[email protected]" <[email protected]>
Subject
RE: st: RE: Matsize and Estimation of the Variance Matrix in a Regression
Date
Thu, 5 Sep 2013 00:37:48 +0000
I was asking about levels because the -areg- documentation warns about having more levels in the absorbing variable than in the cluster variable. But that doesn't seem to be the problem.
Do you really only have 529 observations? If so, that doesn't seem like enough to estimate the parameters for week, retailer, and state (141+73+24). Maybe someone with more experience with this than I can provide more focused advice.
________________________________________
From: [email protected] [[email protected]] on behalf of Alex MacKay [[email protected]]
Sent: Wednesday, September 04, 2013 7:00 PM
To: [email protected]
Subject: Re: st: RE: Matsize and Estimation of the Variance Matrix in a Regression
In isolating the incident, I don't think it has to do with matsize,
but rather an amount of randomness in Stata. This is troubling. Simply
re-running the areg command, I get the results below. Note that the
model degrees of freedom goes from 48 to 0 to 49. I've also observed
as low as 45 and as high as 51 with some additional runs. Fixing the
random seed does not seem to have any impact.
Should I repost this under a new thread name?
The levels are: 141 (week), 73 (retailer_id), 24 (state_id), 25
(product), and 46 (clusterID), for a total of 309. Given that I am
only estimating 7 other coefficients, I think we can reject my earlier
hypothesis.
Alex
- - -
1.
note: 2599.week omitted because of collinearity
note: 597.retailer_id omitted because of collinearity
note: 866.retailer_id omitted because of collinearity
note: 877.retailer_id omitted because of collinearity
note: 9101.retailer_id omitted because of collinearity
note: 54.state_id omitted because of collinearity
note: 3997.retailer_id omitted because of collinearity
note: 4955.retailer_id omitted because of collinearity
note: 7005.retailer_id omitted because of collinearity
note: 7599.retailer_id omitted because of collinearity
Linear regression, absorbing indicators Number of obs = 597
F( 48, 45) = .
Prob > F = .
R-squared = 0.9256
Adj R-squared = 0.8695
Root MSE = 0.3082
(Std. Err. adjusted for 46 clusters in clusterID)
---------------------------------------------------------------------------------
| Robust
ln_price | Coef. Std. Err. t P>|t| [95%
Conf. Interval]
----------------+----------------------------------------------------------------
treatment | -4.044072 3.152507 -1.28 0.206
-10.39355 2.305404
postperiod | -.5653387 .3338128 -1.69 0.097 -1.237672
.1069948
treatmentXpostperiod | -.0178175 .1210774 -0.15 0.884
-.2616798 .2260448
2.
note: 2599.week omitted because of collinearity
note: 597.retailer_id omitted because of collinearity
note: 866.retailer_id omitted because of collinearity
note: 877.retailer_id omitted because of collinearity
note: 9101.retailer_id omitted because of collinearity
note: 54.state_id omitted because of collinearity
Warning: variance matrix is nonsymmetric or highly singular
note: 3997.retailer_id omitted because of collinearity
note: 4955.retailer_id omitted because of collinearity
note: 7005.retailer_id omitted because of collinearity
note: 7599.retailer_id omitted because of collinearity
Linear regression, absorbing indicators Number of obs = 597
F( 0, 45) = .
Prob > F = .
R-squared = 0.9256
Adj R-squared = 0.8695
Root MSE = 0.2950
(Std. Err. adjusted for 46 clusters in clusterID)
---------------------------------------------------------------------------------
| Robust
ln_price | Coef. Std. Err. t P>|t| [95%
Conf. Interval]
----------------+----------------------------------------------------------------
treatment | -4.044072 . . .
. .
postperiod | -.5653387 . . . .
.
treatmentXpostperiod | -.0178175 . . .
. .
3.
note: 2599.week omitted because of collinearity
note: 597.retailer_id omitted because of collinearity
note: 866.retailer_id omitted because of collinearity
note: 877.retailer_id omitted because of collinearity
note: 9101.retailer_id omitted because of collinearity
note: 54.state_id omitted because of collinearity
note: 3997.retailer_id omitted because of collinearity
note: 4955.retailer_id omitted because of collinearity
note: 7005.retailer_id omitted because of collinearity
note: 7599.retailer_id omitted because of collinearity
Linear regression, absorbing indicators Number of obs = 597
F( 49, 45) = .
Prob > F = .
R-squared = 0.9256
Adj R-squared = 0.8695
Root MSE = 0.3085
(Std. Err. adjusted for 46 clusters in clusterID)
---------------------------------------------------------------------------------
| Robust
ln_price | Coef. Std. Err. t P>|t| [95%
Conf. Interval]
----------------+----------------------------------------------------------------
treatment | -4.044072 3.152507 -1.28 0.206
-10.39355 2.305404
postperiod | -.5653387 .3338128 -1.69 0.097 -1.237672
.1069948
treatmentXpostperiod | -.0178175 .1210774 -0.15 0.884
-.2616798 .2260448
On Wed, Sep 4, 2013 at 11:07 AM, Joe Canner <[email protected]> wrote:
> How many levels are in week, retailer_id, state, product, and clusterID?
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Alex MacKay
> Sent: Wednesday, September 04, 2013 12:03 PM
> To: [email protected]
> Subject: Re: st: RE: Matsize and Estimation of the Variance Matrix in a Regression
>
> 1. The full specification is:
>
> areg ln_price treatment postperiod treatmentXpostperiod ln_unemployment ln_population ln_income price_index ///
> i.week i.retailer_id i.state, absorb(product) vce(cluster clusterID)
>
> 2. The fixed effects variables are stored as integers.
>
> 3. I'm increasing the matsize because I am running several regressions, and for some I run into the issue: "matsize too small." I re-ran all regressions, and for a few (like the one above) that did not have the error, the results changed.
>
> Alex
>
> On Wed, Sep 4, 2013 at 10:24 AM, Joe Canner <[email protected]> wrote:
>> Alex,
>>
>> I'm no -areg- expert, but I would suggestion that if you want get more traction with this question, you should probably provide additional information, including:
>>
>> 1. The complete specification of your model 2. A description of the
>> variables in your model (e.g., if categorical, how many levels) 3. Why
>> you are increasing the -matsize- in the first place
>>
>> I suspect that the model has some intrinsic problems that need to be fixed (perhaps something similar to what you have suggested) which will probably take care of the -matsize- issue (which is probably more of a symptom than a cause), but we would need to know more before offering a solution.
>>
>> Regards,
>> Joe Canner
>> Johns Hopkins University School of Medicine
>>
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of Alex MacKay
>> Sent: Wednesday, September 04, 2013 9:58 AM
>> To: [email protected]
>> Subject: st: Matsize and Estimation of the Variance Matrix in a
>> Regression
>>
>> Dear statalist,
>>
>> I have run into an issue that when I increase the matsize, it can
>> cause a regression that previously ran with no warnings to return:
>> "Warning: variance matrix is nonsymmetric or highly singular."
>>
>> It estimates the exact same coefficients across the board. I've put
>> the log for the first coefficient below. Notice the Warning in advance
>> of the output. With the larger matsize (10000), it does not estimate
>> standard errors, and the model degrees of freedom are zero.
>>
>> I am using the areg command to absorb the variable product_id. Is it
>> possible that Stata is trying to generate a number of fixed effects
>> that exceed 800, the original matsize, and decides to drop the
>> product_id dummy variables? This may allow it to estimate standard
>> errors. If so, I think it should be reported as a bug.
>>
>> Alex
>>
>> (Note: I'm reposting in a way that may more clearly identify the
>> issues, now that I am familiar with replying).
>>
>>
>> //Matsize = 10000
>>
>>
>> note: 2599.week omitted because of collinearity
>> note: 597.retailer_id omitted because of collinearity
>> note: 866.retailer_id omitted because of collinearity
>> note: 877.retailer_id omitted because of collinearity
>> note: 9101.retailer_id omitted because of collinearity
>> note: 54.state_id omitted because of collinearity
>> Warning: variance matrix is nonsymmetric or highly singular
>> note: 3997.retailer_id omitted because of collinearity
>> note: 4955.retailer_id omitted because of collinearity
>> note: 7005.retailer_id omitted because of collinearity
>> note: 7599.retailer_id omitted because of collinearity
>>
>> Linear regression, absorbing indicators Number of obs = 597
>>
>> F( 0, 45) = .
>> Prob > F = .
>> R-squared = 0.9256
>> Adj R-squared = 0.8695
>> Root MSE = 0.2950
>>
>> (Std. Err. adjusted for 46 clusters in
>> clusterID)
>> ------------------------------------------------------------------------------
>> | Robust
>> ln_price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
>> -------------+--------------------------------------------------------
>> -------------+--------
>> treatment | -4.044072 . . .
>> . .
>>
>>
>>
>> //Matsize == 800
>>
>> note: 2599.week omitted because of collinearity
>> note: 597.retailer_id omitted because of collinearity
>> note: 866.retailer_id omitted because of collinearity
>> note: 877.retailer_id omitted because of collinearity
>> note: 9101.retailer_id omitted because of collinearity
>> note: 54.fips omitted because of collinearity
>> note: 3997.retailer_id omitted because of collinearity
>> note: 4955.retailer_id omitted because of collinearity
>> note: 7005.retailer_id omitted because of collinearity
>> note: 7599.retailer_id omitted because of collinearity
>>
>> Linear regression, absorbing indicators Number of obs = 597
>>
>> F( 49, 45) = .
>> Prob > F = .
>> R-squared = 0.9256
>> Adj R-squared = 0.8695
>> Root MSE = 0.3085
>>
>> (Std. Err. adjusted for 46 clusters in
>> clusterID)
>> ------------------------------------------------------------------------------
>> | Robust
>> ln_price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
>> -------------+--------------------------------------------------------
>> -------------+--------
>> treatment | -4.044072 3.152507 -1.28 0.206
>> -10.39355 2.305404
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/