Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: RE: RE: RE: RE: looping to value of a variable
From
Richard Fox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
st: RE: RE: RE: RE: looping to value of a variable
Date
Thu, 23 Feb 2012 13:41:21 +0000
Hi Nick,
Sorry for the confusion.
I want just one flag that tells me if each record (row) has a missing value for the DFU variables. This would be simple were it not for the fact that for certain rows I only want to assess a subset of the variables for missing values. As per the example data I only want to assess DFU1-DFU(maxFU) for missingness.
If I could use the value of maxFU as above DFU1-DFU(maxFU) then I could simply use
egen rowmiss(DFU1-DFU(maxFU))
but I don't believe that's possible.
If I use egen = rowmiss(DFU1-DFU9) then for the 1st row I'd get 6 whereas I want just 1. For id 3 I'd expect flag ==0.
If I don't loop over records I believe stata will overwrite all flags (all rows) as 1 as soon as it finds any missing value.
After further thought this could be performed with a simple formula. Nonetheless I'm still interested to see how to loop to a variable value. I see that Mata may be a solution and will explore this in more detail. This is something that's easily performed in SAS but I appreciate that stata thinks in the opposite direction.
Not sure if it helps but I'm cleaning data for an oncology study. So for id (patient) 1 there should be 3 follow-up (fu) form each having a date of completion dfu (date follow up).
id DFU1 DFU2 DFU3 DFU4 DFU5 DFU6 DFU7 DFU8 maxFU
1 30/10/1910 08/02/1904 3
2 16/12/1908 24/01/1913 08/02/1904 4
3 04/09/1907 13/10/1911 21/11/1915 30/12/1919 07/02/1924 17/03/1928 25/04/1932 7
4 18/10/1914 08/02/1904 18/03/1908 26/04/1912 04/06/1916 13/07/1920 21/08/1924 8
I managed to get my code working, perhaps this may illustrate what I'm trying to do;
/* identify rows with missing dates */
gen flag=0
count
local N=r(N)
forvalues i = 1/`N' {
/* sp holds the max number of follow-ups visits for the particular patient (row) */
local sp = maxFU[`i']
forvalues j=1/`sp' {
replace flag=1 if DFU`j'==. & _n==`i'
}
}
Thanks once again.
Best Regards
Richard Fox
Biostatistician - CRCTU
Ext: 43410
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Nick Cox
Sent: 23 February 2012 10:25
To: '[email protected]'
Subject: st: RE: RE: RE: looping to value of a variable
Sorry, but I am still unclear on what flags you want.
The fact that -maxFU- exists seems to be a red herring. You can create flags by
forval j = 1/8 {
gen ismissing`j' = missing(dFU`j')
}
Or, if you want it the other way round, negate the function call with -!missing()-
But why do you need the flags at all?
Even if I am misunderstanding you, which is quite likely, the small bit of Stata technique may be some help.
Nick
[email protected]
Richard Fox
Hi Nick,
Yes you're correct, sorry for the confusion over DFU and FU. I added the egen function to illustrate where the loop count values could come from. In fact the values came from reshaping long data.
I want to flag missing dates, however, for each record I need to assess only to a certain point. These are missing follow-up forms in a medical scenario - if patients are only followed for a certain time then I can't record some forms as missing if the patient has reached that time-point.
Take the example below; for the 1st id I only want to loop to 3 to test for missing values. In the second id I only want to loop to 4, and so on. I suppose I could just only increment a counter if `i' <= maxFU. Just to note that the code within the loops (replace flag.....) was incomplete in my previous message - it was really just the form of the loop statements that I was interested in.
id dfu1 dfu2 dfu3 dfu4 dfu5 dfu6 dfu7 dfu8 maxFU
1 30/10/1910 08/02/1904 3
2 16/12/1908 24/01/1913 08/02/1904 4
3 04/09/1907 13/10/1911 21/11/1915 30/12/1919 07/02/1924 17/03/1928 25/04/1932 7
4 18/10/1914 08/02/1904 18/03/1908 26/04/1912 04/06/1916 13/07/1920 21/08/1924 8
I'll have a look at the reference.
Nick Cox
Your example is not very clear. You have FU* and by implication DFU*. Do you want to flag missings or non-missings? I can read your post either way.
However, you (almost surely) do not need to loop over observations. It is sufficient to loop over variables.
See a review in this territory
SJ-9-1 pr0046 . . . . . . . . . . . . . . . . . . . Speaking Stata: Rowwise
(help rowsort, rowranks if installed) . . . . . . . . . . . N. J. Cox
Q1/09 SJ 9(1):137--157
shows how to exploit functions, egen functions, and Mata
for working rowwise; rowsort and rowranks are introduced
Nick
[email protected]
Richard Fox
I want to loop to the value of a variable. Let's say I have generated the number of non-missing values in a row of data (maxFU in example below). I want to loop to that value which clearly can differ between records.
The following does the job but feels like cheating.
egen maxFU = rownonmissing(FU1 FU2 FU3 FU4 FU5 )
count
local N=r(N)
forvalues i = 1/`N' {
local sp = maxFU[`i']
forvalues j=1/`sp' {
qui replace flag`j'=1 if DFU`j'==.
}
}
There must be a simpler way; any ideas?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/