Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Clyde Schechter" <clyde.schechter@einstein.yu.edu> |
To | statalist@hsphsun2.harvard.edu |
Subject | st: Transforming response scales |
Date | Wed, 17 Mar 2010 20:14:35 -0700 |
<> This is a statistics/questionnaire design problem. In a study, one of our measures is a 17-item questionnaire. Each item has a discrete 4-point response scale, with respone 1 anchored at "Not at all" and 4 anchored at "A great deal." No descriptors were provided for the intermediate responses. Unfortunately, the first 70 respondents received an incorrect version of the questionnaire where the response scale went from 1 to 6, with the same anchors at the extremes, and no descriptors for responses 2 through 5. After the error was noted, the questionnaire was fixed, and the correct version has been administered to just under 400 additional participants. The study is ongoing, and we hope to obtain data from an additional 250 or so respondents by the time we're done. The question is whether we can salvage the data from those first 70 respondents by transforming the 6-point response set onto the 4-point version. There are only 10 monotone increasing functions from {1,2,3,4,5,6} onto {1,2,3,4}, so I tried applying each of them to see how the overall distribution of transfomed responses would compare to the observed distribution of responses from those who were given the correct version of the questionnaire. One transformation: recode resp6 (1/2=1) (3=2) (4=3) (5/6=4), gen(resp6_4) produces a qualitatively decent match of response frequencies: response | resp4 resp6_4 | Total -----------+----------------------+---------- 1 | 1,028 787 | 1,815 | 64.41 67.26 | 65.62 -----------+----------------------+---------- 2 | 169 86 | 255 | 10.59 7.35 | 9.22 -----------+----------------------+---------- 3 | 158 81 | 239 | 9.90 6.92 | 8.64 -----------+----------------------+---------- 4 | 241 216 | 457 | 15.10 18.46 | 16.52 -----------+----------------------+---------- Total | 1,596 1,170 | 2,766 | 100.00 100.00 | 100.00 While these distributions are "statisticallly significantly" different by an ordinary chi square test, that overstates the difference because these individual responses are nested within items and respondents. Since these responses are ordinal in nature, and will be later analyzed by calculating mean item response for each respondent anyway, I also looked at the difference in mean response. In a mixed model, with item as a fixed effect and a random effect for respondent, the adjusted mean difference in response between the group given the correct 4-point response set and the transformed responses from those given the 6-point response set is only 0.01, 95% CI -0.24 to +0.26. The estimated difference of 0.01 strikes me as a small enough bias to ignore, though if the true bias is near the extremes of that 95% CI, I would be concerned. I'm operating on the assumption that the group who received the 6-point response set are something like a random subset of the respondents. They were the first 70 enrolled in the study. There is no reason to expect any secular trend in what we are measuring during the course of the study, though I suppose one could think about things like learning curves on the part of study personnel somehow influencing these responses (obtained over the telephone--respondents received the questionnaires in the mail ahead of time). It dawns on me that if I were to discard the data from the first 70, treat them as missing, and (multiply) impute them from other data we are gathering from our participants using, say, a multivariate normal imputation model, nobody would object. It feels to me as if the kind of deterministic transformation I'm looking at in this message would be a more valid kind of imputation, based as it is on response to an identical question with a related response set. But it is, of course,_ad hoc,_ and I do not have the theoretical knowledge to figure out what kind of statistical properties inferences based on it would have. Wondering if any of the Statalisters have a reaction to this idea. Thanks for your consideration. Clyde Schechter, MA MD Associate Professor of Family & Social Medicine Albert Einstein College of Medicine, Bronx, NY, USA Please note new e-mail address: clyde.schechter@einstein.yu.edu Clyde Schechter, MA MD Associate Professor of Family & Social Medicine Please note new e-mail address: clyde.schechter@einstein.yu.edu * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/