Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
re: st: assigning values from a list
From
Kit Baum <[email protected]>
To
[email protected]
Subject
re: st: assigning values from a list
Date
Sat, 22 May 2010 14:06:51 -0400
<>
Marietherese said:
Sorry I should clarify what var1 and var4 are:
When a patient presents to the clinic, they can be diagnosed with at least one
but up to 4 diseases, entered into var1-4 as codes from the Diagnostic and
Statistical Manual version 9.0. Var1 will always have a value but the remaining
var2-4 may or may not be missing, depending on the diagnosis. So if I want to
find patients suffering from a group of viral illnesses, I would have to search
all 4 variables for the codes 53.20, 54.43 etc up to 76.90
I need to search all var1-4 for every one those codes to make sure I don't miss
any cases. But there are many cases where at least one of var2-4 are missing.
In that case it might be better to step through through the variables one by
one
I also tried doing this using a
local varlist var1 var2 var3 var4
gen virus=.
foreach var in varlist {
if virus !=. {
replace virus=1 if ((`var'==53.20) |(`var'==54.42) |(`var'==54.43) | ///
(`var'==76.00) |(`var'==76.90))
}
}
command, but I wasn't aware of the command inlist when I wrote that.
Similarly if I want to find other groups of diseases (eg fungal infections) I
need to search var1-4 for a different list of DSM codes.
There are about 20 groups of diseases that I need to identify. To complicate
things patients can have multiple diagnoses so I need to make a judgement call
about which one is more serious - var1 takes precedence.
Here is a solution that will look for matches, and return a set of indicator variables where matches are found:
------------------------
clear all
set obs 100000
forv i=1/4 {
// make up fake data with some missing values
g var`i' = int(10000*runiform())
replace var`i' = cond(var`i'< 500, ., var`i')
// create a set of mvar1,2,3,4 variables, set to missing
g mvar`i' = .
// create lists of var1..var4 and mvar1..mvar4
loc vl "`vl' var`i'"
loc rl "`rl' mvar`i'"
}
// put the diagnoses to be matched in the matrix match
// number of elements does not matter
mat match = (5320, 5442, 5443, 7600, 7690)
mata:
void matchlist(string scalar varlist, string scalar retlist)
{
st_view(X=., ., varlist)
st_view(Z=., ., retlist)
match = st_matrix("match")
for(j=1; j<=cols(X); j++) {
for(k=1; k<=cols(match); k++) {
Z[.,j] = (X[.,j] :== match[k])
}
}
}
end
mata: matchlist("`vl'", "`rl'")
mat l match
egen anydiag = rowtotal(mvar*)
l var1 mvar1 mvar2 mvar3 mvar4 if anydiag>0, sep(0)
----------------------------
To use it, you merely need to create four variables mvar1..mvar4 (with all missing values) and place the diagnoses in a Stata matrix. You could look for different diagnoses by just placing different contents in that matrix. Any number of elements will work.
The variable anydiag flags whether a patient has any of the diagnoses, and variables mvar1-mvar4 indicate which. In my fake data there are not any patients who have more than one diagnosis, but the code should handle that gracefully.
Kit Baum | Boston College Economics & DIW Berlin | http://ideas.repec.org/e/pba1.html
An Introduction to Stata Programming | http://www.stata-press.com/books/isp.html
An Introduction to Modern Econometrics Using Stata | http://www.stata-press.com/books/imeus.html
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/