|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: how to deal with categories?
Dear all,
Right now I am wondering what is the better way to deal with
categorical information.
1.
What is about the best way to implement income groups into a
regression? E.g. as income has (usually) no upper limits, I tend to
generate an interaction term (dummy==1) if the individual is in the
highest income category (0 if else). Further, am I right in the
assumption that building categories is usually not sensible when the
number of observations is high? One issue I face is that very young
adults and very old adults are under-represented in the dataset
(meaning, not that many unique observations for these groups, sample
itself is good). Is there a rule of thumb what would be better,
building categories for all age-classes (increasing observations in
young/old group) or do not build classes at all (having more detailed
info)? It's clearly a trade-off but maybe there's some advice. I tend
not to use categories here, also because age-squared might be
important to have at hand, later.
2.
The "xi" command can help to make life less messy (in large data sets,
I think). But it seems to kill all my value labels in these
categorical groups! I could not find any option to tell "xi" to use
the already defined value labels. Is there a workaround at hand so
that the regression table will instead use the values defined (e.g.
for sex; 0==male, 1==female) as new variable names?
Many thanks for all your inputs,
Andrea
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/