Short question:
I would like to be able to -collapse- some data concatenating multiple
values of a string variable, rather than taking some function of a
numeric variable. Is this at all possible?
Longer explanation of why:
I am using a biochemical dataset where each individual has had a blood
sample analysed for multiple biochemical variables. Each individual
may has multiple records, each identified by a different labnumber
(labno), as different variables are measured at different times. So a
typical record might be (where x represents a non-missing and . a
missing value):
individual_id labno potassium sodium hdl cholesterol
00001 78882 x x . .
00001 72010 . . x x
I want to collapse the data so I have a single record:
individual_id potassium sodium hdl cholesterol
00001 x x x x
This is fine with "typical" records, as above. But sometimes the same
test may have been performed twice (either (a) as a check that the lab
was working ok, or (b) because something seemed wrong about one
analysis), so there'll be a record like:
individual_id labno potassium sodium hdl cholesterol
00002 78882 x x . .
00002 72010 . . x x
00002 82901 . . y y
where the hdl and cholesterol levels differ. I want to list all such
records to distinguish between cases (a) and (b). I stored the names
of all (25) biochemical variables in the local macro allvars and did:
log using biochem-multiples, replace
qui {
foreach v of local allvars {
preserve
keep individual_id `v'
collapse (min) mn=`v' (max) mx=`v' , by(individual_id)
noi di "! `v'"
noi l ind mn mx if mn!=mx
restore
}
}
log close
view biochem-multiples.smcl
which lists all the individuals with multiple non-null, differing
records for any covar. But I would like to also see the labno's for
these records. I can manually scroll through the log file typing -di
if ...-, but would like to automate it. I guess what I would love is
to be able to replace the collapse like above with
keep individual_id labno `v'
collapse (min) mn=`v' (max) mx=`v' (concatenate, punct("/")) labno=labno, by(individual_id)
so that, for individual 00002 I would have in the logfile
00002 x y 78882/72010/82901
Even better would be
00002 x y 72010/82901
(listing the labno's only for the observations that are problems).
But I have no idea how to do this. Can anyone help - perhaps with an
alternative route altogether?
Many thanks, Chris.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/