Sorry, I should have been more precise. I would like to tag individual
observations if certain variables do not contain the same values for
that particular observation.
The purpose is error checking in household survey data. Assume every
woman is asked about her age and every man is asked about his wife's
age. The information is stored in separate files. When the files are
merged, every woman has one age (if she is not married) or two ages. I
would like to identify cases where the ages are not the same.
-egen, rowmin()- and -egen, rowmax()- work for numeric variables like
age but I hope there is a solution that also works with strings.
Friedrich
On 7/27/07, Nick Cox <[email protected]> wrote:
> Tagging in what sense?
>
> How do you tell which soldiers are out of step?
> Majority vote? How do you split a 50:50
> agreement? Three variables say "Stata" and three
> say "SAS"? (No, that's an easy one to identify
> which are incorrect.)
>
> (You didn't mention strings; I guess you don't
> care about strings.)
>
> [...]
>
> Friedrich Huebler
>
> > I would like to compare a set of variables and tag those that do not
> > contain the same values. Missing values should be ignored. -egen
> > newvar = diff(varlist)- is not an option because it does not skip
> > missing values. The last command in the example below works but it
> > becomes impractical with a longer list of variables.
> >
> > . sysuse auto
> > . gen mpg2 = mpg if foreign==0
> > . gen mpg3 = mpg if foreign==1
> > . replace mpg3 = mpg+1 if rep78==2
> > . egen tag = diff(mpg mpg2 mpg3)
> > . gen tag2 = (mpg!=mpg2 & mpg<. & mpg2<. | mpg!=mpg3 & mpg<. & mpg3<.
> > | mpg2!=mpg3 & mpg2<. & mpg3<.)
> >
> > The -egen- command tags all observations, the -gen- command only those
> > that I expect to be tagged. Are there better solutions that can also
> > be used with ten or more variables?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/