Re: st: multi-level modelling with a big dataset usingxtlogit

From   David Harless
To   [email protected]
Subject   Re: st: multi-level modelling with a big dataset usingxtlogit
Date   Fri, 19 Jul 2002

"Alves, Bernadette" wrote:
> I'm a student looking for help with my MSc dissertation looking at factors
> associated with delivery by caesarean section. It's an analysis of a
> database of about half a million records of women who gave birth in
> hospital.   I am using logistic regression and because my data are naturally
> grouped, I'm using a multi-level approach to take account of the correlation
> between women in the same hospital.  I am therefore using xtlogit (rather
> than logit).   I find that I cannot run xtlogit with my entire 500,000
> records - stata comes back with an error saying that it needs to be able to
> set matsize to approximately 18,000.  Unfortunately the matsize limit for
> stata 7.0 is 800.
> I then took a 4% sample (approximately 20,000 records ) which is the largest
> that stata can cope with at a matsize of 800.  But, and here's the weird
> thing that I need help with.... The parameter estimates are very dependent
> on the sample I take. Sometimes I get a p-value of 0.05, for other samples I
> get a p-value of 0.7.  Here's an example of what I do to test whether
> xdelmid is a predictor of emergency caesarean section.

Here is another possible approach for dealing with this problem:
If the number of women at each hospital is sufficiently large you can overcome this
problem and get consistent estimates using logit and explicitly including dummy
variables for hospitals.  For a detailed explanation, see message numbers 17853 and
17874 in the Yahoo stata archives:

Dave Harless
