I sympathise with your plight. Extra-large datasets are a mixed blessing.
At any rate, if you contract the data, you may find the program gllamm
useful. It is possible to use frequency weights with xtlogit but the
weight must be the same for all units in the "cluster". If you have a
contracted dataset then this may not hold, making it impossible I think o
use contracted datasets with xtlogit. Gllamm does not require frequency
weights to be equal within clusters and considerable speed may be gained
and considerable space and resources (e.g., memory, matsize) saved by
using a contracted dataset.
Hope this helps or, at the very least, does not hinder.
Sam
On Fri, 19 Jul 2002, Alves, Bernadette wrote:
> I'm a student looking for help with my MSc dissertation looking at factors
> associated with delivery by caesarean section. It's an analysis of a
> database of about half a million records of women who gave birth in
> hospital. I am using logistic regression and because my data are naturally
> grouped, I'm using a multi-level approach to take account of the correlation
> between women in the same hospital. I am therefore using xtlogit (rather
> than logit). I find that I cannot run xtlogit with my entire 500,000
> records - stata comes back with an error saying that it needs to be able to
> set matsize to approximately 18,000. Unfortunately the matsize limit for
> stata 7.0 is 800.
>
> I then took a 4% sample (approximately 20,000 records ) which is the largest
> that stata can cope with at a matsize of 800. But, and here's the weird
> thing that I need help with.... The parameter estimates are very dependent
> on the sample I take. Sometimes I get a p-value of 0.05, for other samples I
> get a p-value of 0.7. Here's an example of what I do to test whether
> xdelmid is a predictor of emergency caesarean section.
>
> sample 4 /* this give me the 4% sample */
>
> xi: xtlogit emerg i.gestat i.age i.xdelmid, pa corr(exch) robust
> i(provid)
>
> testparm _Ixdel* /* this does a wald test on xdelmid */
>
> Taking 10 different 4% sample, I find my estimates differ considerably and
> my p-values range from 0.04 to 0.71.
>
> Why can't stata cope with the full dataset and why are the parameter
> estimates so sensitive to the sample taken?
>
> I would be extremely grateful if someone could help me with this.
>
> Bernadette
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/