I have a problem that I'm hoping Stata-listers can help with. I've
scoured the archives and FAQ, to no avail.
I have a dataset with 500,000 observations, and am interested in the
share of variance in variable Y that is explained by each of three
categorical independent variables, X1, X2, and X3.
The problem is, each of the X's has many categories: 90000, 2000, and
800 respectively. So I can't use anovas because of matsize problems.
More importantly, I'm not really familiar with anovas, so maybe that's
the wrong way to go altogether.
I've also thought of just running fixed effect regressions, looking at
increments to adjusted R2 upon adding each set of dummies. However, I
run into the same matsize problems.
The areg command works fine with each of the variables individually,
but cannot handle more than one set of fixed effects. Based on a
previous Statalist post, I also tried grouping the fixed effects, i.e.
egen fe1=group(X1 X2) and egen fe2=group(X1 X2 X3), which also works.
The question is, can I just look at increments to the adjusted R2 from
these regressions to estimate the share of variance due to each set?
My instinct is no. But I'd be interested in hearing others' thoughts
on this. More importantly, any other thoughts on how to proceed?
Any help would be much appreciated.
VB
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/