Irina Campbell asked how to get a patient dataset with a variable for multiple
questions into wide format, with one record per patient. Some of the patients
did not answer all questions; there doesn't exist any observation for those
instances. Other patients gave more than one answer to some questions, so
there are multiple observations for those instances.
I suggest generating an identifier for the answers, so that multiple answers
can be discriminated by Stata during the -reshape-. If the question variable
is a string variable, use the -string- option in the -reshape- command. Also,
-reshape- can take more than two variables in the -i()- argument in order to
uniquely identify a patient-question combination, so a unique identifier
doesn't need to exist in the dataset. In addition, Stata will fill-in missing
values in order to create a rectangular dataset in situations in which records
for some patient-question combinations do not exist in the original long
format. I've illustrated below; note that the suggested solution is only four
commands long--most of the do-file is to generate a dataset that I believe is
similar in format to what Irina has, and I've assumed that both the question
(var4) and answer (var5) variables are string, although it doesn't really
matter for the latter variable.
Joseph Coveney
--------------------------------------------------------------------------------
clear
local obs = 242 * 26
set obs `obs'
set seed 20030915
generate byte que = mod(_n, 26)
generate str1 var4 = char(65 + que)
sort var4
generate int pid = mod(_n, 242) + 1
forvalues ans = 1/3 {
generate byte ans`ans' = int(uniform() * 3) + 1
}
reshape long ans, i(pid q) j(a)
label define Answers 1 Yes 2 No 3 Maybe
label values ans Answers
decode ans, generate(var5)
local obs = `obs' * 3
drop if uniform() > 11700 / `obs'
drop que a ans
sort pid
forvalues i = 1/3 {
generate float var`i' = .
bys pid: replace var`i' = uniform() if _n == 1
by pid: replace var`i' = var`i'[1]
}
*
* Begin suggested solution here
*
generate byte res = .
bysort pid var4: replace res = _n
reshape wide var5, i(pid var4) j(res)
reshape wide var51 var52 var53, i(pid) j(var4) string
*
* End suggested solution here
*
slist in 1/2, decimal(2)
exit
--------------------------------------------------------------------------------
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/