From | [email protected] |
To | [email protected] |
Subject | st: Sorting by and testing within subsets |
Date | Tue, 11 Nov 2003 12:19:56 +0100 |
Greetings, Statalisters. I am currently working with a dataset which I suspect
contains numerous errors - faulty settings of classification variables and
such. Thus I want to run different logical tests to sort out which observations
I need to have a closer look at. The dataset is set up like this: PersonID PersonInfo 1 A 2 A 2 A 2 B 3 A 3 A 4 B 5 C 6 C . . . . I am interested in checking wether the information
registered on a person is consistent. In the example above (sorry I can't give
you the real deal, but it's sensitive
information) we can see that Person 2 is registered twice as A and once as B.
Person three is registered twice, both times as A. What I would like is a list which shows the persons who have
conflicting PersonInfo, one line for each person (Only person 2 in the example
above). I figure I have to sort the PersonID into groups somehow and then do a
check within each group if the registered information is consistent. However I'm
having a hard time getting Stata to do so. My best suggestion so far would be bysort PersonID:
gen dummy=1 if( <not all values of PersonInfo within the group are equal>
) but I'm
not able to specify what goes in the if-statement since I cannot seem to find
any function which counts number of distinct values. Also listing the
troublesome observations only one line per person seems to be out of my grasp. Any
comments or suggestions you might have would be greatly appreciated. Sincerely yours, Steinar Fossedal |
© Copyright 1996–2024 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |