Others have replied pointing out that Stata
does this already.
A related question, perhaps not of much practical
interest, and I guess not Adrian's question,
but nevertheless a small puzzle testing grasp of Stata
technique, is how to specify that an average will be
calculated only if all the values of interest
are non-missing. A single missing value would be
enough to instruct Stata not to calculate.
Here is one way to do it. It works interactively
in Stata 8, and could be automated in a program.
. u auto
. count if mi(rep78)
. if r(N) == 0 su rep78
Stata uses the count left behind by -count-
in r(N). As r(N) is in fact 5, nothing is done.
. count if mi(mpg)
. if r(N) == 0 su mpg
In this case r(N) is indeed 0 and the
calculation is done.
Non-programmers should note the crucial difference
in principle between
1. if <condition> <command>
and
2. <command> if <condition>
Form 1 carries out _one_ test of the <condition>
supplied. If it is true, <command> is carried
out, but not otherwise.
Form 2 carries a test of the <condition>
supplied for _every_ observation specified
and then carries out <command>
for the observations for which it is true.
For this problem, form 1 is better.
As it happens,
. count if mi(mpg)
. su mpg if r(N) == 0
is not only legal, but produces the correct result.
What happens is that Stata looks at the condition
if r(N) == 0
and says in turn: is this true for observation 1?
for observation 2? for observation 3? and so
forth. As it happens, r(N) == 0 is nothing
to do with any of these observations in particular,
but Stata has little notion of irrelevance, and
irrelevance doesn't make a condition false.
(You could even use a tautology like -if 2 == 2-
which Stata would then test for every observation.)
So it carries out the test again and again, which
in a large dataset is naturally very inefficient.
In other problems, you would rarely get away
with sloppiness over whether form 1 or form 2
should be used. The FAQ at
http://www.stata.com/support/faqs/lang/ifqualifier.html
explains how you could get bitten.
If the issue were that a set of observations
should be non-missing on all variables specified,
something like
. egen rmiss = rmiss(<varlist>)
. count if rmiss
. if r(N) == 0 <command>
is one way to do it.
P.S. In Stata 7,
. count if missing(mpg)
. if r(N) == 0 { su mpg }
Nick
[email protected]
de la Garza, Adrian
> Can anyone tell me if it's possible to tell Stata to ignore missing
> observations when computing averages, etc.? I think that if
> there is a
> missing value in the observations considered, the average
> computed would
> be then missing too... and I need it to intelligently
> choose how many
> observations to use depending on whether observations are available,
> etc.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/