Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: FW: help on variable selection problem
From
"Lachenbruch, Peter" <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
st: FW: help on variable selection problem
Date
Fri, 10 Jun 2011 12:40:23 -0700
This is not especially a Stata question, but it is driven by an analysis issue...
A student is trying to analyze data from a national survey (no weights needed). She has 26 variables plus 10 years of data. There are about 1,000,000 observations. With this many observations, everything is significantly different from 0. She's using mlogit (predicting medical care expenses), so she'd like to cut down the number of 'important' predictors. I have thought of several options: backward stepwise (not available with mlogit); look at effect size and insist it be larger than 0.05 - again not available since there are four categories of the response variable; use a Bonferroni inequality on the coefficients and insist on a low p-value to begin with - e.g. try for a size of 0.01 adjusting for 25 tests, so p must be less than 0.0004. The issue seems to be the huge sample size pushing everything to significance.
Does anybody have any ideas?
Tony
Peter A. Lachenbruch
Department of Public Health
Oregon State University
Corvallis, OR 97330
Phone: 541-737-3832
FAX: 541-737-4001
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/