Regarding 1., it is hard to see why you let dummies eat up your degrees of
freedom when you have a continuous variable "income" which -regress-
accepts. The fact that there is no upper limit for income does not render it
invalid as a covariate. Just include income itself without further ado (pun
intended :-) ).
Regarding 2., take a look at -h extended_fcn- to extract variable labels and
the like. Other listers may have more elaborate advice...
Martin Weiss
_________________________________________________________________
Diplom-Kaufmann Martin Weiss
Mohlstrasse 36
Room 415
72074 Tuebingen
Germany
Fon: 0049-7071-2978184
Home: http://www.wiwi.uni-tuebingen.de/cms/index.php?id=1130
Publications: http://www.wiwi.uni-tuebingen.de/cms/index.php?id=1131
SSRN: http://papers.ssrn.com/sol3/cf_dev/AbsByAuth.cfm?per_id=669945
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Andrea Bennett
Sent: Tuesday, June 03, 2008 2:40 PM
To: [email protected]
Subject: st: how to deal with categories?
Dear all,
Right now I am wondering what is the better way to deal with
categorical information.
1.
What is about the best way to implement income groups into a
regression? E.g. as income has (usually) no upper limits, I tend to
generate an interaction term (dummy==1) if the individual is in the
highest income category (0 if else). Further, am I right in the
assumption that building categories is usually not sensible when the
number of observations is high? One issue I face is that very young
adults and very old adults are under-represented in the dataset
(meaning, not that many unique observations for these groups, sample
itself is good). Is there a rule of thumb what would be better,
building categories for all age-classes (increasing observations in
young/old group) or do not build classes at all (having more detailed
info)? It's clearly a trade-off but maybe there's some advice. I tend
not to use categories here, also because age-squared might be
important to have at hand, later.
2.
The "xi" command can help to make life less messy (in large data sets,
I think). But it seems to kill all my value labels in these
categorical groups! I could not find any option to tell "xi" to use
the already defined value labels. Is there a workaround at hand so
that the regression table will instead use the values defined (e.g.
for sex; 0==male, 1==female) as new variable names?
Many thanks for all your inputs,
Andrea
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/