> -----Original Message-----
> From: [email protected] [mailto:owner-
> [email protected]] On Behalf Of Chris Wallace
> Sent: Friday, December 10, 2004 5:47 AM
> To: [email protected]
> Subject: st: multiple observations - collapsing a string variable?
>
> Short question:
>
> I would like to be able to -collapse- some data concatenating multiple
> values of a string variable, rather than taking some function of a
> numeric variable. Is this at all possible?
>
>
> Longer explanation of why:
>
> I am using a biochemical dataset where each individual has had a blood
> sample analysed for multiple biochemical variables. Each individual
> may has multiple records, each identified by a different labnumber
> (labno), as different variables are measured at different times. So a
> typical record might be (where x represents a non-missing and . a
> missing value):
>
> individual_id labno potassium sodium hdl cholesterol
> 00001 78882 x x . .
> 00001 72010 . . x x
>
> I want to collapse the data so I have a single record:
>
> individual_id potassium sodium hdl cholesterol
> 00001 x x x x
>
> This is fine with "typical" records, as above. But sometimes the same
> test may have been performed twice (either (a) as a check that the lab
> was working ok, or (b) because something seemed wrong about one
> analysis), so there'll be a record like:
>
> individual_id labno potassium sodium hdl cholesterol
> 00002 78882 x x . .
> 00002 72010 . . x x
> 00002 82901 . . y y
>
> where the hdl and cholesterol levels differ. I want to list all such
> records to distinguish between cases (a) and (b). I stored the names
> of all (25) biochemical variables in the local macro allvars and did:
>
>
> log using biochem-multiples, replace
> qui {
> foreach v of local allvars {
> preserve
> keep individual_id `v'
> collapse (min) mn=`v' (max) mx=`v' , by(individual_id)
> noi di "! `v'"
> noi l ind mn mx if mn!=mx
> restore
> }
> }
> log close
> view biochem-multiples.smcl
>
> which lists all the individuals with multiple non-null, differing
> records for any covar. But I would like to also see the labno's for
> these records. I can manually scroll through the log file typing -di
> if ...-, but would like to automate it. I guess what I would love is
> to be able to replace the collapse like above with
>
> keep individual_id labno `v'
> collapse (min) mn=`v' (max) mx=`v' (concatenate, punct("/")) labno=labno,
> by(individual_id)
>
> so that, for individual 00002 I would have in the logfile
>
> 00002 x y 78882/72010/82901
>
> Even better would be
>
> 00002 x y 72010/82901
>
> (listing the labno's only for the observations that are problems).
> But I have no idea how to do this. Can anyone help - perhaps with an
> alternative route altogether?
>
Chris,
How about something like this:
. l, sepby(id) noobs
+-------------------------------------+
| id labno pot sod hdl cho |
|-------------------------------------|
| 001 78 9 9 . . |
| 001 72 . . 5 6 |
|-------------------------------------|
| 002 78 7 7 . . |
| 002 72 . . 5 5 |
| 002 82 . . 4 4 |
|-------------------------------------|
| 003 78 12 12 . . |
| 003 72 . . 6 6 |
| 003 82 . . 7 7 |
+-------------------------------------+
. egen tag = count(hdl), by(id)
. keep if tag == 2 & hdl !=.
(4 observations deleted)
. l , sepby(id) noobs
+-------------------------------------------+
| id labno pot sod hdl cho tag |
|-------------------------------------------|
| 002 72 . . 5 5 2 |
| 002 82 . . 4 4 2 |
|-------------------------------------------|
| 003 72 . . 6 6 2 |
| 003 82 . . 7 7 2 |
+-------------------------------------------+
. bysort id: gen lab2 = labno[_n-1]
(2 missing values generated)
. collapse (max) labno-lab2, by(id)
. order id labno lab2
. l, sepby(id) noobs
+--------------------------------------------------+
| id labno lab2 pot sod hdl cho tag |
|--------------------------------------------------|
| 002 82 72 . . 5 5 2 |
|--------------------------------------------------|
| 003 82 72 . . 7 7 2 |
+--------------------------------------------------+
Scott
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/