Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | David Hoaglin <dchoaglin@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: find categorical variables |
Date | Thu, 22 Mar 2012 06:22:38 -0400 |
Jakob, In this situation (and in the binary vs. continuous discussion), the decision should be based, first, on a clear understanding of the definition of the variable. That stage it does not involve looking at the data. It involves understanding the "measurement process." If a "continuous" variable takes too few values in a particular set of data, it might be appropriate to treat it as an (ordered) categorical variable. In a regression-like model, that choice may depend on whether the variable is the response or a predictor. A similar consideration applies when the variable is a count. Data that are naturally "continuous" or counts are sometimes collected in categories. Income is one common example. Analysts sometimes use the midpoint of the category, but that distorts the data by not accounting for variation that would have been present if the data had not been collected in categories. Also, an open-ended top category may require special treatment. In building a regression model, when one has enough data, it may be useful to turn a continuous variable into a detailed set of categories and fit a separate coefficient for each category, so that the data can guide the choice of functional form for that variable. If the analyst has not understood the nature of all the variables, what are the results worth? David Hoaglin * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/