|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: RE: Your opinion on income groups and inflation
Many thanks for this revealing illustration of tests! I will clearly
look into this...
Kind regards,
Andrea
On Jun 7, 2008, at 9:58 PM, Richard Williams wrote:
At 02:42 PM 6/7/2008, [email protected] wrote:
On income groups (intervals), I would not use dummies because you
have
information about income _levels_ which would be otherwise lost.
An income
interval of 300 to 400, is not the same thing as an income interval
of 1200 to
3600. Since you do not have information about distribution of
income within
Ch. 9 of Long & Freese's book (see especially pp. 421-422) shows how
to test whether treating an ordinal variable as interval loses
information. Basically, you run an unconstrained model where the
ordinal variable is broken up into dummies, and then run a
constrained model where you treat the ordinal variable as
continuous. If the difference is not significant, then treating the
var as continuous is ok. I imagine you can tweak this a bit, e.g.
assigning midpoints or whatever to the categories of the variable.
For info on the book, see
http://www.stata.com/bookstore/regmodcdvs.html
Here is an example:
sysuse auto
reg price rep78
est store constrained
xi: reg price i.rep78
est store unconstrained
lrtest constrained unconstrained
The output from the last part is
. lrtest constrained unconstrained
Likelihood-ratio test LR chi2(3)
= 1.00
(Assumption: constrained nested in unconstrained) Prob > chi2
= 0.8002
This is kind of a crummy example because the N is so small and the
relationship so weak; but in any event the test says it is ok to
treat rep78 as continuous.
You can also set it up as a Wald test, which may be handy in
situations where a LR test is inappropriate. If the X variable has
k categories, then include X and k-2 of the dummies computed from X,
and then test the dummies. e.g.
tab1 rep78, gen(rep)
reg price rep78 rep3 rep4 rep5
test rep3 rep4 rep5
The last command gives
. test rep3 rep4 rep5
( 1) rep3 = 0
( 2) rep4 = 0
( 3) rep5 = 0
F( 3, 64) = 0.31
Prob > F = 0.8160
This sort of thing is also useful if, say, your X variable is
continuous (e.g. education) but you suspect its effects are not
strictly linear (a year of college has a different effect than a
year of grade school).
Now, if the N is large, you may well find that the dummy variable
approach always comes out ahead. At that point, you may wish to
consider substantive significance (just how much do the effects
differ from straight linearity?) or consider some other criteria for
assessing significance that are less affected by sample size, e.g. a
BIC test. There is a lot to be said for parsimony.
-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME: (574)289-5227
EMAIL: [email protected]
WWW: http://www.nd.edu/~rwilliam
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/