[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: multiple observations - collapsing a string variable?

From	"Scott Merryman" <[email protected]>
To	<[email protected]>
Subject	st: RE: multiple observations - collapsing a string variable?
Date	Fri, 10 Dec 2004 06:55:07 -0600

> -----Original Message-----
> From: [email protected] [mailto:owner-
> [email protected]] On Behalf Of Chris Wallace
> Sent: Friday, December 10, 2004 5:47 AM
> To: [email protected]
> Subject: st: multiple observations - collapsing a string variable?
> 
> Short question:
> 
> I would like to be able to -collapse- some data concatenating multiple
> values of a string variable, rather than taking some function of a
> numeric variable.  Is this at all possible?
> 
> 
> Longer explanation of why:
> 
> I am using a biochemical dataset where each individual has had a blood
> sample analysed for multiple biochemical variables.  Each individual
> may has multiple records, each identified by a different labnumber
> (labno), as different variables are measured at different times.  So a
> typical record might be (where x represents a non-missing and . a
> missing value):
> 
> individual_id labno  potassium  sodium  hdl cholesterol
> 00001         78882  x          x       .   .
> 00001         72010  .          .       x   x
> 
> I want to collapse the data so I have a single record:
> 
> individual_id potassium  sodium  hdl cholesterol
> 00001         x          x       x   x
> 
> This is fine with "typical" records, as above.  But sometimes the same
> test may have been performed twice (either (a) as a check that the lab
> was working ok, or (b) because something seemed wrong about one
> analysis), so there'll be a record like:
> 
> individual_id labno  potassium  sodium  hdl cholesterol
> 00002         78882  x          x       .   .
> 00002         72010  .          .       x   x
> 00002         82901  .          .       y   y
> 
> where the hdl and cholesterol levels differ.  I want to list all such
> records to distinguish between cases (a) and (b).  I stored the names
> of all (25) biochemical variables in the local macro allvars and did:
> 
> 
> log using biochem-multiples, replace
> qui {
>   foreach v of local allvars {
>   preserve
>   keep individual_id `v'
>   collapse (min) mn=`v' (max) mx=`v' , by(individual_id)
>   noi di "!     `v'"
>   noi l ind mn mx if mn!=mx
>   restore
> }
> }
> log close
> view biochem-multiples.smcl
> 
> which lists all the individuals with multiple non-null, differing
> records for any covar.  But I would like to also see the labno's for
> these records.  I can manually scroll through the log file typing -di
> if ...-, but would like to automate it.  I guess what I would love is
> to be able to replace the collapse like above with
> 
> keep individual_id labno `v'
> collapse (min) mn=`v' (max) mx=`v' (concatenate, punct("/")) labno=labno,
> by(individual_id)
> 
> so that, for individual 00002 I would have in the logfile
> 
> 00002 x y 78882/72010/82901
> 
> Even better would be
> 
> 00002 x y 72010/82901
> 
> (listing the labno's only for the observations that are problems).
> But I have no idea how to do this.  Can anyone help - perhaps with an
> alternative route altogether?
> 


Chris,

How about something like this:


. l, sepby(id) noobs

  +-------------------------------------+
  |  id   labno   pot   sod   hdl   cho |
  |-------------------------------------|
  | 001      78     9     9     .     . |
  | 001      72     .     .     5     6 |
  |-------------------------------------|
  | 002      78     7     7     .     . |
  | 002      72     .     .     5     5 |
  | 002      82     .     .     4     4 |
  |-------------------------------------|
  | 003      78    12    12     .     . |
  | 003      72     .     .     6     6 |
  | 003      82     .     .     7     7 |
  +-------------------------------------+

. egen tag = count(hdl), by(id)

. keep if tag == 2 & hdl !=.
(4 observations deleted)

. l , sepby(id) noobs

  +-------------------------------------------+
  |  id   labno   pot   sod   hdl   cho   tag |
  |-------------------------------------------|
  | 002      72     .     .     5     5     2 |
  | 002      82     .     .     4     4     2 |
  |-------------------------------------------|
  | 003      72     .     .     6     6     2 |
  | 003      82     .     .     7     7     2 |
  +-------------------------------------------+

. bysort id: gen lab2 = labno[_n-1]
(2 missing values generated)

. collapse (max) labno-lab2, by(id)

. order id labno lab2

. l, sepby(id) noobs

  +--------------------------------------------------+
  |  id   labno   lab2   pot   sod   hdl   cho   tag |
  |--------------------------------------------------|
  | 002      82     72     .     .     5     5     2 |
  |--------------------------------------------------|
  | 003      82     72     .     .     7     7     2 |
  +--------------------------------------------------+


Scott



*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: RE: multiple observations - collapsing a string variable?
  - From: Chris Wallace <[email protected]>

References:
- st: multiple observations - collapsing a string variable?
  - From: Chris Wallace <[email protected]>

Prev by Date: st: testing additional endogeneity in 2-stage probit least squares
Next by Date: st: Convert contents of string variables to lower case
Previous by thread: st: multiple observations - collapsing a string variable?
Next by thread: Re: st: RE: multiple observations - collapsing a string variable?
Index(es):
- Date
- Thread