I have produced something similar before... and with a few tweaks, here
it is. The following code strips numbers and decimal points from the
start of the lab variable (storing these in num) until it finds any
character that is not a number or a decimal point (storing the remainder
in unit).
gen num=""
gen unit=lab
tempvar c l
gen `c'=""
gen `l'=length(lab)
su `l', meanonly
local maxl=r(max)
forvalues i=1/`maxl' {
replace `c'=substr(unit,1,1)
replace num=num+`c' if strpos("0123456789.",`c')
replace unit=substr(unit,2,.) if strpos("0123456789.",`c')
}
David
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Andy Choi
Sent: 08 March 2006 21:51
To: [email protected]
Subject: st: string function
I have a large file with patient's labs. The lab variable contains the
lab result (a number) with the units (for example mg/dl) included.
There are many variations to the way the units are reported: they may be
capitalized or in parentheses or the units themselves may contain
numbers.
I would like to create a variable that includes only the lab result.
And if possible a separate variable with the units.
Thanks,
Andy
______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/