Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: using Stata to detect interviewer fraud
From
Robert Picard <[email protected]>
To
[email protected]
Subject
Re: st: using Stata to detect interviewer fraud
Date
Sat, 1 May 2010 18:49:54 -0400
Here's a quick and simple way to do it. It does not distinguish
missing values but that should be easy to adjust. If I look for cars
that are the same for 70% or more variables, I find that the Dodge
Diplomat is very similar to the Dodge Magnum.
Hope this helps,
Robert
*--------------------------- begin example -----------------------
version 11
clear all
sysuse auto
unab vlist: *
gen id1 = _n
tempfile f
qui save "`f'"
rename id1 id2
cross using "`f'"
gen diffid = id1 != id2
sort id1 diffid id2
gen nmatch = 0
foreach v in `vlist' {
qui by id1: replace nmatch = nmatch + (`v'[1] == `v')
}
by id1: gen similar = nmatch / nmatch[1] > .7
by id1: egen check = sum(similar)
list id1 id2 make-foreign if check>1 & similar, noobs sepby(id1)
*--------------------- end example --------------------------
On Fri, Apr 30, 2010 at 11:16 PM, Michelson, Ethan <[email protected]> wrote:
> I'd be deeply grateful for help writing a more efficient, more parsimonious .do file to help detect interviewer fraud. After completing a survey of 2,500 households, I discovered that a few interviewers copied each others' questionnaires. I decided to write some code that calculates the proportion of all nonmissing questionnaire items that are identical across every other questionnaire. Although my .do file accomplishes this task, I strongly suspect I'm making Stata do tons of unnecessary work. It takes Stata about 12 hours to process 505 questionnaires (from a single survey site, since I can rule out the possibility that interviewers conspired across different survey sites).....
>
> In the following code, "id" is the unique questionnaire id. There are 505 questionnaires in this batch. The final command at the bottom asks Stata to list combinations of questionnaires with >80% identical content. I have no doubt there's a far more efficient way to do this. I'd really appreciate any advice anyone can offer.
>
> ********************
> sort id
> gen order=0
> gen add=-1
> replace order=1 if _n==1
> levels id, local(levels)
> foreach l of local levels {
> gen same_`l'=0
> gen all_`l'=0
> }
> forv n = 1(1)504 {
> foreach l of local levels {
> foreach var of varlist a1* a2* a3* b* d* c1 c12 c23 c34 c44 c55 c67 c77 c88 c100 c107 c116 c126 c136 c144 c155 c165 c176 c185 c195 {
> quietly replace same_`l'=same_`l'+1 if `var'==`var'[_n+`n']&`var'~=.&id[_n+`n']==`l'
> quietly replace all_`l'=all_`l'+1 if `var'~=.&`var'[_n+`n']~=.&id[_n+`n']==`l'
> display "`l' `n'"
> }
> }
> quietly replace order=add if order==1
> quietly replace add=add-1
> gsort -order id
> quietly replace order=1 if _n==1
> }
> foreach l of local levels {
> gen prop_`l'=same_`l'/all_`l'*100
> }
> foreach l of local levels {
> list id prop_`l' same_`l' all_`l' if prop_`l'>80&prop_`l'<.
> }
>
> ******************
>
> Ethan Michelson
> Departments of Sociology and East Asian Languages & Cultures, Associate Professor
> Maurer School of Law, Associate Professor of Sociology and Law
> mail address:
> Department of Sociology
> Indiana University
> 744 Ballantine Hall
> 1020 E. Kirkwood Ave.
> Bloomington, IN 47405
> Phone: (812) 856-1521
> Fax: (812) 855-0781
> Email: [email protected]
> URL: http://www.indiana.edu/~emsoc/
>
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/