Dear Statalist,
I have a statistical question concerning the handling of discrete
variables. I apologize if the question sounds a bit simple or is not a
direct stata question.
Suppose we have two discrete variables (y and x) with each three
categories (coded 0, 1 and 2). This results in a 3x3 contingency tabel
with m=9 different cells. Now, it seems to me, that I have two
possibilities to incorporate x and y in regression like analysis:
1) Use four dummy variables, two for x and two for y (additional we can
use interaction terms)
2) Use one dummy for each of the m-1=8 cells of the contingency tabel.
My questions are:
What are the differences between these two possibilities, what are the
effects of using more dummies in 2)?
How can I reconcile 1) and 2) ?
I would deeply appreciate any comments or references concerning the
topic above!
Thanks,
Thomas
begin:vcard
n:M�hlmann;Thomas
tel;fax:0221-470-2305
tel;work:0221-470-2628
x-mozilla-html:FALSE
org:Universit�t zu K�ln;Bankseminar
adr:;;Albertus-Magnus-Platz;50923 K�ln;;;
version:2.1
email;internet:[email protected]
fn:Dipl.-Kfm. Thomas M�hlmann
end:vcard