Dear all,
thank you very much for the advice!
Sincerely yours,
Ekaterina
In message <[email protected]> [email protected] writes:
> Hi,
>
> Nick Cox already mentioned the 'duplicates' command and it's just a
> little twist to use it to find non-duplicates. "duplicates" is easy to
> set up and works with different types of vars.
>
> duplicates tag zipcode var1-var5, gen(dup)
>
> "dup" counts the number of copies in each zipcode group starting with
> the second identical case.
> If var1-var5 in a zipcode group are constant, dup + 1 is equal to the
> number of cases in the group (_N)
>
> bysort zipcode : assert _N == dup+1
>
> In case of errors there may be many ways to spot and correct them,
> depending on the size of the dataset, the number of vars to compare and
> possible sources of error. It may be feasible to create a variable for
> _N in each zipcode group
>
> bysort zipcode : gen N = _N
>
> The following code tabulates non-constant vars by zipcode
>
> levelsof zipcode if N != dup + 1, local(ziperror)
> foreach x of local ziperror {
> di "Zipcode: `x'"
> foreach y of varlist var1-var5 {
> qui tab `y' if zipcode == "`x'" // only to check if the var has more
> than one non-missing values
> if r(r) > 1 & r(r) <. tab `y' if zipcode == "`x'" // tabulates var if it
> has more than one value
> }
> }
>
>
> *** An example with an additional string var and some errors (the assert
> command is commented out)
>
>
> clear
> input str10 zipcode var1 /*
> */ var2 var3 var4 var5 str1 var6
> "0182801" 1252 144 115 113 29 "A"
> "0182801" 1253 144 115 123 29 "A"
> "0182801" 1253 144 115 113 29 "B"
> "0182801" 1253 144 115 113 29 "A"
> "0183204" 91 8 8 8 0 "C"
> "0183204" 90 8 8 8 0 "D"
> "0183331" 772 81 64 62 17 "E"
> "0183331" 772 81 64 62 17 "F"
> "0183331" 772 81 64 62 17 "E"
> "0183505" 1716 262 218 211 44 "A"
> "0183505" 1716 262 218 211 44 "A"
> end
>
> duplicates tag zipcode var1-var6, gen(dup)
> * bysort zipcode : assert _N == dup+1
> bysort zipcode : gen N = _N
> levelsof zipcode if N != dup + 1, local(ziperror)
> foreach x of local ziperror {
> di ""
> di "Zipcode: `x'"
> foreach y of varlist var1-var6 {
> qui tab `y' if zipcode == "`x'" // only to check if the var has more
> than one values
> if r(r) > 1 & r(r) <. tab `y' if zipcode == "`x'" // show vars with more
> than one values
> }
> }
>
>
>
> Best wishes
> Stefan Gawrich
> Dillenburg
> Germany
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
--
Ekaterina Hertog (nee Korobtseva)
Nissan Institute of Japanese Studies
27 Winchester Road, Oxford
OX2 6NA
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/