On strings: this is an FAQ.
FAQ . . . . . . . . . Counting distinct strings across a set of variables
7/04 How do I count the number of distinct strings
across a set of variables?
http://www.stata.com/support/faqs/data/distinctstrings.html
There is a typo in the very last line of code.
if `v'[`i'] != . & trim(`v'[`i']) != ""
should be
if `v'[`i'] != "." & trim(`v'[`i']) != ""
Nick
[email protected]
Friedrich Huebler
> Sorry, I should have been more precise. I would like to tag individual
> observations if certain variables do not contain the same values for
> that particular observation.
>
> The purpose is error checking in household survey data. Assume every
> woman is asked about her age and every man is asked about his wife's
> age. The information is stored in separate files. When the files are
> merged, every woman has one age (if she is not married) or two ages. I
> would like to identify cases where the ages are not the same.
>
> -egen, rowmin()- and -egen, rowmax()- work for numeric variables like
> age but I hope there is a solution that also works with strings.
Nick Cox
> > Tagging in what sense?
> >
> > How do you tell which soldiers are out of step?
> > Majority vote? How do you split a 50:50
> > agreement? Three variables say "Stata" and three
> > say "SAS"? (No, that's an easy one to identify
> > which are incorrect.)
> >
> > (You didn't mention strings; I guess you don't
> > care about strings.)
> >
> > [...]
> >
> > Friedrich Huebler
> >
> > > I would like to compare a set of variables and tag those
> that do not
> > > contain the same values. Missing values should be ignored. -egen
> > > newvar = diff(varlist)- is not an option because it does not skip
> > > missing values. The last command in the example below works but it
> > > becomes impractical with a longer list of variables.
> > >
> > > . sysuse auto
> > > . gen mpg2 = mpg if foreign==0
> > > . gen mpg3 = mpg if foreign==1
> > > . replace mpg3 = mpg+1 if rep78==2
> > > . egen tag = diff(mpg mpg2 mpg3)
> > > . gen tag2 = (mpg!=mpg2 & mpg<. & mpg2<. | mpg!=mpg3 &
> mpg<. & mpg3<.
> > > | mpg2!=mpg3 & mpg2<. & mpg3<.)
> > >
> > > The -egen- command tags all observations, the -gen-
> command only those
> > > that I expect to be tagged. Are there better solutions
> that can also
> > > be used with ten or more variables?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/