Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Comparing two data set
From
Rajaram Subramanian Potty <[email protected]>
To
[email protected]
Subject
Re: st: Comparing two data set
Date
Wed, 2 Mar 2011 14:55:34 +0530
Dear Nick,
Thanks for the information. Twor or three times I used the -cf-
command to identify the errors in two data files. But I want the error
should be displayed according to the ID variable. But presently, the
-cf- command gives error by observation number in the Stata data set
and not by the ID variable. If I will be able to generate the errors
according to the ID variable, it will be easy for use to trace
questionnaire and find the error in the data entry. So, I just want to
know whether it is possible to get the error listed by the ID vriable.
Thanks and regards,
RAJARAM. S
On Wed, Mar 2, 2011 at 2:44 PM, Nick Cox <[email protected]> wrote:
> One way is to check that the .dta or other data files are identical
> using your operating system.
>
> Also, check out -cf- and -dta_equal-.
>
> Another way to approach this is to -append- the datasets and look for
> -duplicates-. However, -duplicates- just looks for duplicate
> observations. In principle, the variable names, variable labels, value
> labels, formats and characteristics must also be shown to be
> identical.
>
> To do this last, you will need to create a dataset identifier so that
> you can work out where any anomalies are.
>
> Here is an example where by construction the interesting part of the
> data is identical. So, -duplicates- confirms that everything occurs
> twice. Conversely, mismatches would imply singletons, triplicates,
> etc.
>
> . sysuse auto
> (1978 Automobile Data)
>
> . gen ds = 1
>
> . save auto1
> file auto1.dta saved
>
> . sysuse auto, clear
> (1978 Automobile Data)
>
> . gen ds = 2
>
> . append using auto1
> (label origin already defined)
>
>
> . tab ds
>
> ds | Freq. Percent Cum.
> ------------+-----------------------------------
> 1 | 74 50.00 50.00
> 2 | 74 50.00 100.00
> ------------+-----------------------------------
> Total | 148 100.00
>
> . duplicates report make-foreign
>
> Duplicates in terms of make price mpg rep78 headroom trunk weight
> length turn displacement
> gear_ratio foreign
>
> --------------------------------------
> copies | observations surplus
> ----------+---------------------------
> 2 | 148 74
> --------------------------------------
>
> Nick
>
> On Wed, Mar 2, 2011 at 9:01 AM, Rajaram Subramanian Potty
> <[email protected]> wrote:
>
>> We are carried out a survey and the data from the survey was entered
>> two times. Now, we want to compare these two data files for possible
>> data etnry errors. Please, inform how to compare the two data files
>> and identify the data entry error using stata.
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/