Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: RE: RE: RE: RE: looping to value of a variable


From   Nick Cox <[email protected]>
To   "'[email protected]'" <[email protected]>
Subject   st: RE: RE: RE: RE: RE: looping to value of a variable
Date   Thu, 23 Feb 2012 17:58:17 +0000

I see, I think. 

gen flag = 0 

forval j = 1/8 { 
	replace flag = 1 if missing(DFU`j') & flag == 0 & DFU`j' <= maxDFU 	
}
 
Nick 
[email protected] 

Richard Fox

Sorry for the confusion.

I want just one flag that tells me if each record (row) has a missing value for the DFU variables. This would be simple were it not for the fact that for certain rows I only want to assess a subset of the variables for missing values. As per the example data I only want to assess DFU1-DFU(maxFU) for missingness.

If I could use the value of maxFU as above DFU1-DFU(maxFU) then I could simply use 

egen rowmiss(DFU1-DFU(maxFU)) 

but I don't believe that's possible.

If I use egen = rowmiss(DFU1-DFU9) then for the 1st row I'd get 6 whereas I want just 1. For id 3 I'd expect flag ==0. 

If I don't loop over records I believe stata will overwrite all flags (all rows) as 1 as soon as it finds any missing value.

After further thought this could be performed with a simple formula. Nonetheless I'm still interested to see how to loop to a variable value. I see that Mata may be a solution and will explore this in more detail. This is something that's easily performed in SAS but I appreciate that stata thinks in the opposite direction.

Not sure if it helps but I'm cleaning data for an oncology study. So for id (patient) 1 there should be 3 follow-up (fu) form each having a date of completion dfu (date follow up).

id 	DFU1		DFU2		DFU3		DFU4		DFU5		DFU6		DFU7		DFU8		maxFU
1	30/10/1910			08/02/1904											3
2	16/12/1908	24/01/1913			08/02/1904									4
3	04/09/1907	13/10/1911	21/11/1915	30/12/1919	07/02/1924	17/03/1928	25/04/1932			7
4	18/10/1914			08/02/1904	18/03/1908	26/04/1912	04/06/1916	13/07/1920	21/08/1924	8

I managed to get my code working, perhaps this may illustrate what I'm trying to do;

/* identify rows with missing dates */
gen flag=0
count 
local N=r(N)
forvalues i = 1/`N' {
					
					/* sp holds the max number of follow-ups visits for the particular patient (row) */
					local sp = maxFU[`i']
					forvalues j=1/`sp'	{
								replace flag=1 if DFU`j'==. & _n==`i'
								}
					}

Nick Cox

Sorry, but I am still unclear on what flags you want. 

The fact that -maxFU- exists seems to be a red herring. You can create flags by 

forval j = 1/8 { 
	gen ismissing`j' = missing(dFU`j') 
} 

Or, if you want it the other way round, negate the function call with -!missing()-

But why do you need the flags at all? 

Even if I am misunderstanding you, which is quite likely, the small bit of Stata technique may be some help. 

Nick 
[email protected] 

Richard Fox

Hi Nick,

Yes you're correct, sorry for the confusion over DFU and FU. I added the egen function to illustrate where the loop count values could come from. In fact the values came from reshaping long data.

I want to flag missing dates, however, for each record I need to assess only to a certain point. These are missing follow-up forms in a medical scenario - if patients are only followed for a certain time then I can't record some forms as missing if the patient has reached that time-point.

Take the example below; for the 1st id I only want to loop to 3 to test for missing values. In the second id I only want to loop to 4, and so on. I suppose I could just only increment a counter if `i' <= maxFU. Just to note that the code within the loops (replace flag.....) was incomplete in my previous message - it was really just the form of the loop statements that I was interested in.

id 	dfu1		dfu2		dfu3		dfu4		dfu5		dfu6		dfu7		dfu8		maxFU
1	30/10/1910			08/02/1904											3
2	16/12/1908	24/01/1913			08/02/1904									4
3	04/09/1907	13/10/1911	21/11/1915	30/12/1919	07/02/1924	17/03/1928	25/04/1932			7
4	18/10/1914			08/02/1904	18/03/1908	26/04/1912	04/06/1916	13/07/1920	21/08/1924	8

I'll have a look at the reference.

Nick Cox

Your example is not very clear. You have FU* and by implication DFU*. Do you want to flag missings or non-missings? I can read your post either way. 

However, you (almost surely) do not need to loop over observations. It is sufficient to loop over variables. 

See a review in this territory 

SJ-9-1  pr0046  . . . . . . . . . . . . . . . . . . .  Speaking Stata: Rowwise
        (help rowsort, rowranks if installed) . . . . . . . . . . .  N. J. Cox
        Q1/09   SJ 9(1):137--157
        shows how to exploit functions, egen functions, and Mata
        for working rowwise; rowsort and rowranks are introduced

Nick 
[email protected] 

Richard Fox
I want to loop to the value of a variable. Let's say I have generated the number of non-missing values in a row of data (maxFU in example below). I want to loop to that value which clearly can differ between records.

The following does the job but feels like cheating.

egen maxFU = rownonmissing(FU1 FU2 FU3 FU4 FU5 )

count 
local N=r(N)
forvalues i = 1/`N' {
					local sp = maxFU[`i']
					forvalues j=1/`sp'	{
								qui replace flag`j'=1 if DFU`j'==.
								}
					}



There must be a simpler way; any ideas? 

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index