Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Ariel Linden" <ariel.linden@gmail.com> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | Re: Re: st: A reference for "how many independent variables one regression can have?" |
Date | Sat, 14 Dec 2013 14:46:11 -0500 |
I want to thank Richard Goldstein, Marta Garcia-Granero, and Richard Williams for their very helpful responses to my posting! Ariel Date: Fri, 13 Dec 2013 12:10:11 -0500 From: Richard Williams <richardwilliams.ndu@gmail.com> Subject: Re: st: A reference for "how many independent variables one regression can have?" A few comments: * Long and Freese lay out some sample size suggestions for Maximum Likelihood Methods (e.g. logit) on p. 77 of http://www.stata.com/bookstore/regression-models-categorical-dependent-varia bles/ I summarize their recommendations on pp. 3-4 of http://www3.nd.edu/~rwilliam/xsoc73994/L02.pdf . * This paper claims that 10 may be more than you need: http://aje.oxfordjournals.org/content/165/6/710.full.pdf * I would say 10 cases per parameter rather than 10 cases per observation. With something like an mlogit model, you might estimate, say, 3 parameters for every independent variable. * Like Richard Goldstein suggests, you may need a minimum number of cases. Long and Freese say you need at least 100 cases for a ML analysis. On the other hand, for something like a T test and the regression model equivalents of it, you can get by with some absurdly small number of cases if assumptions of normality are met. (Interesting tidbit: Counter to common practice, Long and Freese say you need to use more stringent p values when N is small, since the small sample properties of ML significance tests are not known). * As a practical matter, I suspect you usually need much more than 10 cases per parameter if you want to get statistically significant results. At 10:50 AM 12/13/2013, Ariel Linden wrote: >Hi All, > >I came across a statement in a book I am using to teach a class on >evaluation that says "a common rule of thumb is that 1 independent variable >can be added for every 10 observations." (it goes on to say that this >depends on multicollinearity and desired level of precision). The book does >not provide a reference for this statement. > >Does someone know of a reference for this ratio, or perhaps a different >ratio? > >Thanks! > >Ariel * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/