I am loath to belabor the point but I think that there are no hard and fast
rules against or in favor of "dummization" of ordered categorical variables. It
depends on what that variable represents.
The income ranges that one normally works with, have no significance per se.
They are often determined by statistical offices or by enumerators for their own
convenience and respond to multiplicity of objectives. Thus, intervals are
almost always of unequal size (one interval may be 100-200, another 1200-5200,
and the final interval may be an open interval, e.g., all incomes > 150,000
etc.) So to run a dummy for people with income levels between 100-200 and
another dummy for people with incomes between 1200-5200, does not have any
prior meaning that you want to explore because these intervals do not respond to
some "real" inherent differences between these types of people (Ho: people with
incomes between 100 to 200 behave all the same way and differently from people
whose income range is 1200-5200 and who in their turn all behave the same way).
Keeping the same ranges and using dummies would be particularly problematic if
you have panel data--because these same income ranges, depending on time
(inflation, real income growth)--may mean totally different things.
The intervals do not reflect, unlike e.g., the rural vs. urban distinction,
something that we believe is a meaningful distinction between the groups and
which we want to explore, but are created (as I said) partly by accident and
partly for the sake of convenience ("we want to have 10 income classes", "we
create some prior intervals into which to place people hoping that the resulting
distribution will be normal or lognormal", yet the outcome may be very
different--so you may end up with 30% percent of people in one interval, and 1%
in another). Using them as dummies gives them therefore an importance which
they do not possess. They are actually "compressions" of a continuous variable
(income) and it therefore makes more sense to try to "unpack" them by using the
means and treat them as proxies for a continuous variable (which they actually
are).
Branko
Development Research, World Bank
Email: [email protected] or branko_mi@yahoo.
tel: 202-473-6968
World Bank, Room MC 3-559
1818 H Street NW
Washington D.C. 20433
For "Worlds Apart" see
http://www.pupress.princeton.edu/titles/7946.html
Website:
http://econ.worldbank.org/projects/inequality
For papers see also:
http://econpapers.hhs.se/
http://papers.ssrn.com/sol3/cf_dev/AbsByAuth.cfm?per_id=149002
"Nick Cox"
<[email protected].
uk> To
Sent by: <[email protected]>
owner-statalist@hsp cc
hsun2.harvard.edu
Subject
RE: st: RE: Your opinion on income
06/09/2008 05:56 AM groups and inflation
Please respond to
statalist@hsphsun2.
harvard.edu
-mrunning- and -mlowess- are possible graphical aids here, giving
smooths of response versus each predictor, with adjustment for other
predictors. Use -findit- to identify locations of program files.
Nick
[email protected]
Austin Nichols
Andrea--
I strongly disagree with Martin Weiss, SamL, and Branko milanovic who
claim that an ordered categorical explanatory variable can be included
as a sensible regressor without justification. Creating dummies *is*
justifiable; you are merely computing conditional means. Including
income (or "trust") as a single explanatory variable when income (or
"trust") is measured as an ordered categorical explanatory variable
requires a strong assumption that the effect is linear in the index of
categories. The dummy variable approach requires no such assumption.
As Richard Williams quite rightly points out, you can -test- whether
the effect is linear in the index, or whether groups of individual
dummies all have the same effect. One useful way is to create dummies
that correspond to more interpretable groups, like above the median,
more than twice the median, less than half the median, etc. so you can
see directly from the regression output where deviations from
linearity occur... graphs are also helpful for this purpose.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/