Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Comparing two data set
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: Comparing two data set
Date
Wed, 2 Mar 2011 09:14:12 +0000
One way is to check that the .dta or other data files are identical
using your operating system.
Also, check out -cf- and -dta_equal-.
Another way to approach this is to -append- the datasets and look for
-duplicates-. However, -duplicates- just looks for duplicate
observations. In principle, the variable names, variable labels, value
labels, formats and characteristics must also be shown to be
identical.
To do this last, you will need to create a dataset identifier so that
you can work out where any anomalies are.
Here is an example where by construction the interesting part of the
data is identical. So, -duplicates- confirms that everything occurs
twice. Conversely, mismatches would imply singletons, triplicates,
etc.
. sysuse auto
(1978 Automobile Data)
. gen ds = 1
. save auto1
file auto1.dta saved
. sysuse auto, clear
(1978 Automobile Data)
. gen ds = 2
. append using auto1
(label origin already defined)
. tab ds
ds | Freq. Percent Cum.
------------+-----------------------------------
1 | 74 50.00 50.00
2 | 74 50.00 100.00
------------+-----------------------------------
Total | 148 100.00
. duplicates report make-foreign
Duplicates in terms of make price mpg rep78 headroom trunk weight
length turn displacement
gear_ratio foreign
--------------------------------------
copies | observations surplus
----------+---------------------------
2 | 148 74
--------------------------------------
Nick
On Wed, Mar 2, 2011 at 9:01 AM, Rajaram Subramanian Potty
<[email protected]> wrote:
> We are carried out a survey and the data from the survey was entered
> two times. Now, we want to compare these two data files for possible
> data etnry errors. Please, inform how to compare the two data files
> and identify the data entry error using stata.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/