I certainly and Scott probably overlooked the fact that
you were using "." as a personal code for missing.
By and large, Stata commands do not treat "." as meaning
missing. The main, and perhaps only, exception is -destring-,
which is working on the assumption that a string variable
is really a numeric variable trapped in a string body.
(-compare- used to be another exception.)
It follows that counting missings, whether using -egen- or my
more direct approach, won't work for you until you re-code
"." as "", hence Svend's suggestion. Otherwise, Scott's and Svend's
suggestions are suggesting complementary -egen- functions.
I can't explain why Scott's and my suggestions give
different results unless you have other variables that
are not captured by -Var*-. I used -*- as a wildcard,
not -Var*-.
In your code, the -sort- and the -by:- do no harm
but are completely irrelevant. It would be easier to
count "." rather than cycle through all the other
values. With your previous set-up,
gen nperiod = 0
foreach v of var Var* {
replace nperiod = nperiod + (`v' == ".")
}
gives you a count of period missings, after which
gen allpresent = nperiod == 0
gives what I think you want. You could also
count occurrences != ".".
For this and other reasons, -foreach- and -forval-
are strongly recommended. The usual searches
point to tutorials on those constructs.
Nick
n.j.cox
(much editing in this digest)
barleywater is using Stata 8.2, and asked
> My data set looks like this:
>
> obs Var1 Var2 Var3 Var(nth)
> 1 jacn clstr lnreg pval
> 2 bstr . lgreg nopval
> 3 . rct . nopval
> 4 jacn clstr anova .
> I want to find out how many observations contained all the variables.
> In this example, only the first observation contained all the variables.
Scott Merryman suggested
> egen all_var = rmiss(Var*)
>
> count if all_var == 0
Nick Cox commented
> It can also be done without generating a new variable.
> unab var : *
> local var : subinstr local var " " ",", all
> count if !mi(`var')
barleywater replied
> I understand what Scott tried to do but looking at his commands made
> me realised that perhaps he, and by extension also Nick too,
> misunderstood my question, which could be better expressed.
> I have less understanding of Nick's commands which use macros
> (afraid my Stata fluency doesn't go that far yet).
>
> However, running Scott's command and Nick's showed a
> difference of 1,
> e.g. Scott's would return a value of 78 whilst Nick's 77.
> I am not sure why that is the case. But
> neither was what I was looking for.
> Here's what I did to get what i want.
>
> gen dumvar1=.
> gen dumvar2=.
> .
> .
> .
> sort var1
> by var1: replace dumvar1 = 1 if var1 == "jacn"
> by var1: replace dumvar1 = 1 if var1 == "bstr"
> sort dumvar1
> replace dumvar1 = 0 if dumvar1==.
> .
> .
> .
> sort var2
> by var2: replace dumvar2 = 1 if var2 == "clstr"
> by var2: replace dumvar2 = 1 if var2 == "rct"
> by var2: replace dumvar2 = 1 if var2 == "xovr"
> sort dumvar2
> replace dumvar1 = 0 if dumvar1==.
> .
> .
> gen total = var1 + var2 +...
> sort total
> l obs total
>
> but...
>
> 1. not elegant (not a problem since it does the job)
> 2. it loses the information the variables conveyed by
> replacing with 1's
> (not ideal)
>
> I would appreciate further help/advice to shorten the do-file
> if possible (i think it needed -foreach val- at the beginning).
Svend Juul suggested
> I understand that your var1-varn are string variables. For strings,
> the missing value typically is a blank, not a period, so I would
> first:
> foreach V of varlist var1-varn {
> replace `V' = "" if `V' == "."
> }
> If you feel unsecure about the above construct, you might instead give
> as many -replace- commands as you have variables:
>
> replace var1="" if var1=="."
> ...
> Now you can use egen's -robs()- function with the -strok- option:
>
> egen nonmiss = robs(var1-varn) , strok
> In Stata 9 the -robs()- function got the more telling name -rownonmiss()-.
> Now, the variable -nonmiss- tells the number of nonmissing (i.e.
> non-blank) values for each observation.
barleywater replied
> Your -egen- suggestion worked. Earlier on, at the
stage of inputting the data, I indeed used many
> replace var if ...
> in a do-file to replace blanks with a period before running my
> small do-file to count but I appreciate your -foreach- help suggestion.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/