Dear Martin,
Thank you for your help.
I am studying stata and found -cf- did not work as I expected previously.
So ask for help to verify my doubt.
Thank you very much!
Best regards,
Rose
----- Original Message -----
From: Martin Weiss <[email protected]>
To: <[email protected]>
Subject: st: RE: Is it necessary to sort data before using -cf-?
Date: 2009-11-29 18:36:31
<>
At the end of the day, it is natural that a comparison of values of a
variable should be conducted row after row, so the -sort- order does matter
for it. The manual entry and help file do not mention this fact, but I feel
that it goes without saying. What else would you compare but the values line
by line?
Note how in the following code the datasets are both ordered by -rep78-.
Given that rep78 only features 5 distinct values, this -sort- order is not
unique, though. That is the reason for the existence of the -stable- option
to -sort-, btw...
*******
sysuse auto,clear
sort rep78
save new.dta, replace
u new.dta, clear
sort for
//ends up being sorted by rep78
sort rep78
cf _all using new.dta, verbose
*******
Given only 5 values to go by, -sort- has to randomize its results, and only
by chance will it produce the same result twice. These differences are
subsequently picked up by -cf-.
See also Phil`s http://www.stata-journal.com/sjpdf.html?articlenum=dm0019
and http://www.stata.com/support/faqs/lang/sort.html
There is a -findit compdta- package, which is quite old and runs under
-version 4.0-. It does, however, feature a -sort- option.
HTH
Martin
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of [email protected]
Sent: Sonntag, 29. November 2009 10:36
To: statalist
Subject: st: Is it necessary to sort data before using -cf-?
Dear statalists,
Is it necessary to sort data before using -cf-?
Without sorting, I found two same datasets are reported difference. However,
I found no reference in -help cf-.
If necessary, how to determine the sorted variable(s) if I compare all the
variables or certain variables?
Does the sorted variable need to have no duplicates?
For example,
. sysuse auto,clear
(1978 Automobile Data)
. sort turn
. save new,replace
file new.dta saved
. sysuse auto,clear
(1978 Automobile Data)
. sort rep78
. cf _all using new
make: 74 mismatches
price: 74 mismatches
mpg: 69 mismatches
rep78: 63 mismatches
headroom: 64 mismatches
trunk: 72 mismatches
weight: 73 mismatches
length: 73 mismatches
turn: 71 mismatches
displacement: 72 mismatches
gear_ratio: 72 mismatches
foreign: 42 mismatches
r(9);
.
Could anyone help me? Thank you.
Best regards,
Rose
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/