| |
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: stepwise
At 10:19 AM 9/4/2006, [email protected] wrote:
Hi Richard,
Do you know how the SPSS pairwise procedure work? I don't think it works
the way I wanted it to work.
Pairwise is one of the options on the SPSS regression command. It
will compute the correlations using pairwise deletion and then do the
calculations from there. Pairwise isn't too bad if data are missing
randomly and MD is scattered across cases. However, nonrandom
missing data can be a problem. Pairwise deletion and other commonly
used (and misused) techniques are discussed in my handout at
http://www.nd.edu/~rwilliam/stats2/l12.pdf
Now I'm really curious about why you suggested using a lower cut-off than
.05. In fact I was going to use 0.15, as suggested in Hosmer and Lemeshow.
Those two may know more than I do! But the concern is that, with
stepwise, vars can enter into the equation just by chance. So,
suppose your final model has X1-X3, and you make a big deal about
your profound discovery of the importance of these vars. But, just
by chance alone, if you have 50 vars and alpha = .05, you would
expect about 3 vars to enter into the equation. The problem is
compounded if you don't tell people you used stepwise and make it
sound like it was your great theory that identified those 3 winners!
The counter-argument, I guess, is that you want to increase the
likelihood that important controls are being included. However, keep
in mind that if you have 50 vars and alpha = .15, then just by chance
alone 7 or 8 could make it in.
My brief handout on stepwise is at
http://www.nd.edu/~rwilliam/stats1/x95.pdf
One other followup on what I said before:
I don't know about other fields, but in the Social Sciences it is
quite common to ask several questions that all tap the same
underlying attitude, e.g. there might be 6 questions that measure
self-efficacy, another 6 questions that tap political liberalism,
etc. If you try to include all these variables in a regression you
have a problem because you're basically including the same variable
operationalized in 6 different ways. But, choosing only one of the 6
can be a problem too, since it isn't obvious which question is the
"best." So typically, you would use factor analysis or some other
scale construction technique; and besides having fewer vars, if done
right the resulting scale should be more reliable than the individual
measures were.
I don't know about this particular data set, but I'd be a little
surprised if X1-X50 measured 50 unique concepts. Ergo, I'd be
tempted to try scale construction before I'd use stepwise to make the
fine-line distinction between X1 and X2.
For a very brief discussion, see
http://www.nd.edu/~rwilliam/stats2/l25.pdf
-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
FAX: (574)288-4373
HOME: (574)289-5227
EMAIL: [email protected]
WWW (personal): http://www.nd.edu/~rwilliam
WWW (department): http://www.nd.edu/~soc
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/