Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: RE: RE: RE: RE: looping to value of a variable
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: RE: RE: RE: RE: RE: looping to value of a variable
Date
Fri, 24 Feb 2012 09:00:23 +0000
Let me address Richard's belief that
If I don't loop over records [... S]tata will overwrite all flags (all
rows) as 1 as soon as it finds any missing value.
This is groundless. Unless you instruct otherwise Stata only works
with the current observation [row or record in non-Stata terminology].
It is common that people with a lot of experience with other software
find it more difficult to adjust to Stata's ways of thinking than
people with little! This may be happening here.
Nick
On Thu, Feb 23, 2012 at 5:58 PM, Nick Cox <[email protected]> wrote:
> I see, I think.
>
> gen flag = 0
>
> forval j = 1/8 {
> replace flag = 1 if missing(DFU`j') & flag == 0 & DFU`j' <= maxDFU
> }
>
> Nick
> [email protected]
>
> Richard Fox
>
> Sorry for the confusion.
>
> I want just one flag that tells me if each record (row) has a missing value for the DFU variables. This would be simple were it not for the fact that for certain rows I only want to assess a subset of the variables for missing values. As per the example data I only want to assess DFU1-DFU(maxFU) for missingness.
>
> If I could use the value of maxFU as above DFU1-DFU(maxFU) then I could simply use
>
> egen rowmiss(DFU1-DFU(maxFU))
>
> but I don't believe that's possible.
>
> If I use egen = rowmiss(DFU1-DFU9) then for the 1st row I'd get 6 whereas I want just 1. For id 3 I'd expect flag ==0.
>
> If I don't loop over records I believe stata will overwrite all flags (all rows) as 1 as soon as it finds any missing value.
>
> After further thought this could be performed with a simple formula. Nonetheless I'm still interested to see how to loop to a variable value. I see that Mata may be a solution and will explore this in more detail. This is something that's easily performed in SAS but I appreciate that stata thinks in the opposite direction.
>
> Not sure if it helps but I'm cleaning data for an oncology study. So for id (patient) 1 there should be 3 follow-up (fu) form each having a date of completion dfu (date follow up).
>
> id DFU1 DFU2 DFU3 DFU4 DFU5 DFU6 DFU7 DFU8 maxFU
> 1 30/10/1910 08/02/1904 3
> 2 16/12/1908 24/01/1913 08/02/1904 4
> 3 04/09/1907 13/10/1911 21/11/1915 30/12/1919 07/02/1924 17/03/1928 25/04/1932 7
> 4 18/10/1914 08/02/1904 18/03/1908 26/04/1912 04/06/1916 13/07/1920 21/08/1924 8
>
> I managed to get my code working, perhaps this may illustrate what I'm trying to do;
>
> /* identify rows with missing dates */
> gen flag=0
> count
> local N=r(N)
> forvalues i = 1/`N' {
>
> /* sp holds the max number of follow-ups visits for the particular patient (row) */
> local sp = maxFU[`i']
> forvalues j=1/`sp' {
> replace flag=1 if DFU`j'==. & _n==`i'
> }
> }
>
> Nick Cox
>
> Sorry, but I am still unclear on what flags you want.
>
> The fact that -maxFU- exists seems to be a red herring. You can create flags by
>
> forval j = 1/8 {
> gen ismissing`j' = missing(dFU`j')
> }
>
> Or, if you want it the other way round, negate the function call with -!missing()-
>
> But why do you need the flags at all?
>
> Even if I am misunderstanding you, which is quite likely, the small bit of Stata technique may be some help.
>
> Nick
> [email protected]
>
> Richard Fox
>
> Hi Nick,
>
> Yes you're correct, sorry for the confusion over DFU and FU. I added the egen function to illustrate where the loop count values could come from. In fact the values came from reshaping long data.
>
> I want to flag missing dates, however, for each record I need to assess only to a certain point. These are missing follow-up forms in a medical scenario - if patients are only followed for a certain time then I can't record some forms as missing if the patient has reached that time-point.
>
> Take the example below; for the 1st id I only want to loop to 3 to test for missing values. In the second id I only want to loop to 4, and so on. I suppose I could just only increment a counter if `i' <= maxFU. Just to note that the code within the loops (replace flag.....) was incomplete in my previous message - it was really just the form of the loop statements that I was interested in.
>
> id dfu1 dfu2 dfu3 dfu4 dfu5 dfu6 dfu7 dfu8 maxFU
> 1 30/10/1910 08/02/1904 3
> 2 16/12/1908 24/01/1913 08/02/1904 4
> 3 04/09/1907 13/10/1911 21/11/1915 30/12/1919 07/02/1924 17/03/1928 25/04/1932 7
> 4 18/10/1914 08/02/1904 18/03/1908 26/04/1912 04/06/1916 13/07/1920 21/08/1924 8
>
> I'll have a look at the reference.
>
> Nick Cox
>
> Your example is not very clear. You have FU* and by implication DFU*. Do you want to flag missings or non-missings? I can read your post either way.
>
> However, you (almost surely) do not need to loop over observations. It is sufficient to loop over variables.
>
> See a review in this territory
>
> SJ-9-1 pr0046 . . . . . . . . . . . . . . . . . . . Speaking Stata: Rowwise
> (help rowsort, rowranks if installed) . . . . . . . . . . . N. J. Cox
> Q1/09 SJ 9(1):137--157
> shows how to exploit functions, egen functions, and Mata
> for working rowwise; rowsort and rowranks are introduced
>
> Nick
> [email protected]
>
> Richard Fox
> I want to loop to the value of a variable. Let's say I have generated the number of non-missing values in a row of data (maxFU in example below). I want to loop to that value which clearly can differ between records.
>
> The following does the job but feels like cheating.
>
> egen maxFU = rownonmissing(FU1 FU2 FU3 FU4 FU5 )
>
> count
> local N=r(N)
> forvalues i = 1/`N' {
> local sp = maxFU[`i']
> forvalues j=1/`sp' {
> qui replace flag`j'=1 if DFU`j'==.
> }
> }
>
>
>
> There must be a simpler way; any ideas?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/