Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <n.j.cox@durham.ac.uk> |
To | "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |
Subject | st: RE: RE: RE: RE: RE: looping to value of a variable |
Date | Thu, 23 Feb 2012 17:58:17 +0000 |
I see, I think. gen flag = 0 forval j = 1/8 { replace flag = 1 if missing(DFU`j') & flag == 0 & DFU`j' <= maxDFU } Nick n.j.cox@durham.ac.uk Richard Fox Sorry for the confusion. I want just one flag that tells me if each record (row) has a missing value for the DFU variables. This would be simple were it not for the fact that for certain rows I only want to assess a subset of the variables for missing values. As per the example data I only want to assess DFU1-DFU(maxFU) for missingness. If I could use the value of maxFU as above DFU1-DFU(maxFU) then I could simply use egen rowmiss(DFU1-DFU(maxFU)) but I don't believe that's possible. If I use egen = rowmiss(DFU1-DFU9) then for the 1st row I'd get 6 whereas I want just 1. For id 3 I'd expect flag ==0. If I don't loop over records I believe stata will overwrite all flags (all rows) as 1 as soon as it finds any missing value. After further thought this could be performed with a simple formula. Nonetheless I'm still interested to see how to loop to a variable value. I see that Mata may be a solution and will explore this in more detail. This is something that's easily performed in SAS but I appreciate that stata thinks in the opposite direction. Not sure if it helps but I'm cleaning data for an oncology study. So for id (patient) 1 there should be 3 follow-up (fu) form each having a date of completion dfu (date follow up). id DFU1 DFU2 DFU3 DFU4 DFU5 DFU6 DFU7 DFU8 maxFU 1 30/10/1910 08/02/1904 3 2 16/12/1908 24/01/1913 08/02/1904 4 3 04/09/1907 13/10/1911 21/11/1915 30/12/1919 07/02/1924 17/03/1928 25/04/1932 7 4 18/10/1914 08/02/1904 18/03/1908 26/04/1912 04/06/1916 13/07/1920 21/08/1924 8 I managed to get my code working, perhaps this may illustrate what I'm trying to do; /* identify rows with missing dates */ gen flag=0 count local N=r(N) forvalues i = 1/`N' { /* sp holds the max number of follow-ups visits for the particular patient (row) */ local sp = maxFU[`i'] forvalues j=1/`sp' { replace flag=1 if DFU`j'==. & _n==`i' } } Nick Cox Sorry, but I am still unclear on what flags you want. The fact that -maxFU- exists seems to be a red herring. You can create flags by forval j = 1/8 { gen ismissing`j' = missing(dFU`j') } Or, if you want it the other way round, negate the function call with -!missing()- But why do you need the flags at all? Even if I am misunderstanding you, which is quite likely, the small bit of Stata technique may be some help. Nick n.j.cox@durham.ac.uk Richard Fox Hi Nick, Yes you're correct, sorry for the confusion over DFU and FU. I added the egen function to illustrate where the loop count values could come from. In fact the values came from reshaping long data. I want to flag missing dates, however, for each record I need to assess only to a certain point. These are missing follow-up forms in a medical scenario - if patients are only followed for a certain time then I can't record some forms as missing if the patient has reached that time-point. Take the example below; for the 1st id I only want to loop to 3 to test for missing values. In the second id I only want to loop to 4, and so on. I suppose I could just only increment a counter if `i' <= maxFU. Just to note that the code within the loops (replace flag.....) was incomplete in my previous message - it was really just the form of the loop statements that I was interested in. id dfu1 dfu2 dfu3 dfu4 dfu5 dfu6 dfu7 dfu8 maxFU 1 30/10/1910 08/02/1904 3 2 16/12/1908 24/01/1913 08/02/1904 4 3 04/09/1907 13/10/1911 21/11/1915 30/12/1919 07/02/1924 17/03/1928 25/04/1932 7 4 18/10/1914 08/02/1904 18/03/1908 26/04/1912 04/06/1916 13/07/1920 21/08/1924 8 I'll have a look at the reference. Nick Cox Your example is not very clear. You have FU* and by implication DFU*. Do you want to flag missings or non-missings? I can read your post either way. However, you (almost surely) do not need to loop over observations. It is sufficient to loop over variables. See a review in this territory SJ-9-1 pr0046 . . . . . . . . . . . . . . . . . . . Speaking Stata: Rowwise (help rowsort, rowranks if installed) . . . . . . . . . . . N. J. Cox Q1/09 SJ 9(1):137--157 shows how to exploit functions, egen functions, and Mata for working rowwise; rowsort and rowranks are introduced Nick n.j.cox@durham.ac.uk Richard Fox I want to loop to the value of a variable. Let's say I have generated the number of non-missing values in a row of data (maxFU in example below). I want to loop to that value which clearly can differ between records. The following does the job but feels like cheating. egen maxFU = rownonmissing(FU1 FU2 FU3 FU4 FU5 ) count local N=r(N) forvalues i = 1/`N' { local sp = maxFU[`i'] forvalues j=1/`sp' { qui replace flag`j'=1 if DFU`j'==. } } There must be a simpler way; any ideas? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/