Hi David,
Thank you. A lot of what you mentioned makes sense. Please see my previous reply to Fred and Nick as I think I may have explained myself a little clearer there.
Now on to the nitty gritty...
On Fri, 25 Apr 2003, David Kantor wrote:
> At 12:06 PM 4/25/2003 -0700, Dan Sabath wrote:
> >Hello,
> >
> >I am very new to Stata and am having some difficulty wrapping my brain
> >around Stata's methods of data processing.
> [...]
> I wrote a "program" to calculate the value based on args passed into it.
> >"checkfoo" returns a 1 or 0 depending if one of the arglist matches the
> >first arg.
> >so r(checked) = 1 if `k' or `l' is equal to `j'
> >otherwise r(checked) = 0.
> >
> >/*******vastly simplified**********/
> >local j = 1
> >gen k = 2
> >gen l = 3
> >
> >while `j' < 6 {
> > checkfoo `j' `k' `l' /* r(checked) returned equal to 1 when j = 2 or 3*/
> > replace k = 4 if r(checked) == 1
> > local j = `j' + 1
> >}
> >/***********************************/
> >I would like it to replace "k" on the 2nd and 3rd time through the loop
> >but not at any other time.
> >
> >I would be happy if I could just do
> >/* psudocode */
> >replace k = 4 if checkfoo `j' `k' `l' /* with checkfoo evaluating true or
> >false */
> >[...]
>
> Nick Cox has replied to this, but I would like to add some comments.
>
> Your code generates variables k and l. Then you pass macros `k' and `l' to
> checkfoo. Variables and macros are different kinds of entities. If you
> haven't defied these macros (and they are not defined in your code sample)
> then they are empty, and you are only passing one argument (`j') to checkfoo.
Ideally I would be passing the value of k and j on that row into checkfoo. perhaps something like k[_n] would work? I'm beginning to see that the implicit loop through the dataset exists in a different location then where I thought it did. My previous email explains it better.
>
> Note, also that
> gen k = 2
> gen l = 3
>
> set the variables k and l to 2 and 3 -- for all observations in the entire
> dataset.
>
Yes that was intentional. The actual data is a little more complicated and varies on a row by row basis...but for the example I used this.
> When your loop comes to...
> replace k = 4 if r(checked) == 1
>
> then k will be replaced with 4 -- again, for all observations in the entire
> dataset, since r(checked) is a scalar quantity.
This behavior I was not expecting. I was expecting r(checked) to change with the values from each row.
>
> (Actually, this is one place where it would be equivalent to write...
> if r(checked) {
> replace k = 4
> }
> but in general, there is a big difference between the -if- statement and
> the -if- qualifier. There is a FAQ on this subject.)
It was quite a surprise to find out that the if statement only evaluates its conditions once and not on each row. As a result, i'm not sure when it would be useful.
> Since this replace k = 4 will affect every observation, there seems little
> point to doing it. Presumably there will be other code that you have
> omitted. But, since this -replace- affects all observations equally, it
> might better have been a scalar or a macro,
> But if, as I might suspect, you are thinking of looping through the
> observations, then your code is not correct. But then, most likely, there
> is no point to correcting it as such; what you want to do is probably
> easily done in a few statements, once you get the idea of how Stata
> works. In fact, your "pseudocode" sample is almost (but not quite) a
> correct Stata statement -- if you are thinking of replacing k in some
> observations and not others.
>
That is exactly what I was aiming for.
> Your pseudocode sample will not work, because in...
> replace k = 4 if checkfoo `j' `k' `l' /* with checkfoo evaluating true or
> false */
> you cannot create your own function (checkfoo) that can be referenced in an
> expression.
to my great dissapointment :(
>
> You can, on the other hand, create a variable to carry the info that you
> want. You can also write a program to generate that variable. It is not
> clear whether you intended checkfoo to be such a program. (As shown in your
> example, it would appear that it yields scalar information, but you may
> have had something else in mind.)
checkfoo is actually an .ado file
/*****************
Checkfoo checks arglist[i] against arglist[0]; returns 1 if match and 0 if not match.
usage: checkfoo primary_var check1_var check2_var ...
returns r(checked) = 0 || 1
******************/
local checked = 0
local i = `1'
while "`2'" ~= "" {
if `i'==`2' {
local checked = 1
}
macro shift
}
return scalar checked = `checked'
end
>
> Overall I would suggest these points:
>
> 1: Understand the difference between variables, scalars and
> macros. (Scalars and macros are similar in that they have a single value.
> Variables have a set of values: one for each observation. Note, also that
> if a program returns something in r(), that returned value is a scalar or
> macro.)
>
At what point are scalars and macros evaluated? Can you reset the value in the middle of the run depending on other calculations? IE
x = 0;
replace y = z if x < 10, x++
> 2: Most Stata statements that operate on the data do so on the whole
> dataset at once. (Actually, there is a sequential aspect to the action that
> processes the statement, but you usually don't need to think about it.) It
> may help to remember that, for example, in you code...
> gen k = 2
> gen l = 3
>
> first, k is created and set to 2 for all observations; then l is created
> and set to 3 for all observations.
I believe that this is one of the fundimental differences (and a hard one to get your head around) between stata and other stats languages. The implicit loop through the data exists on each *line* of the do file and not around the program as a whole. Other languages work on the data a line at a time and allow you to make as many calculations / modifications as you like before proceeding. Please correct me if I am missing something.
(see http://www.cpc.unc.edu/services/computer/presentations/sas_to_stata/basic.html for more examples of the differences)
>
> 3: Understand the difference between the -if- statement and the -if- qualifier.
>
> 4: Looping is useful for actions that occur at a level that is logically
> higher than the individual observations. You almost never need to loop
> through the observations. If you are attempting to write code to loop
> through the observations, you probably are not thinking about the problem
> correctly. (Sometimes it is necessary, and I have done it -- *very* rarely.)
And this is exactly why I'm asking. I need to get my head adjusted to think about problems in this manner. I really have appreciated all the help you guys have been. Thank you!
>
> I hop this helps.
It certainly has. Thanks again!
-dan
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/