Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Two datasets: Look for similar observations in the second dataset

From	Roberto Ferrer <[email protected]>
To	Stata Help <[email protected]>
Subject	Re: st: Two datasets: Look for similar observations in the second dataset
Date	Mon, 27 Jan 2014 00:22:37 -0430

Please follow Statalist policy and provide cross-references when
posting in other forums:
http://www.stata.com/support/faqs/resources/statalist-faq/#crossposting

The following is one way of doing what you want. You could avoid the
-forvalues- loop if your database is not too big, but I assume it is.
I didn't test speed with a big data set but I hope it gets you
started.

* ----------------------- begin code -----------------------

clear all
set more off

* Input fake databases (including -dum- variable)
input str1 company year size rat
A                  2012        140                    0.2
B                  2011        200                   0.4
C                  2010        300                    0.2
D                  2010        160                    0.5
end

gen dum = 1

tempfile samp
save "`samp'"

clear all
input str4 company year size rat
X                  2012        150                    0.19
XX                  2012        150                    0.20
XXX                  2012        150                    0.22
XXXX                  2012        150                    0.195
Y                  2010        280                   0.9
YY                  2010        280                   0.9
Z                  2012        50                      0.01
ZZ                  2010        300                    0.2
T                  2011        200                   0.95
U                  2010        300                    0.10
end

gen dum = 1

tempfile pop
save "`pop'"


* Main process
tempfile result
local lowlimit .8
local highlimit 1.2

quietly {
    forvalues i = 1/4 { // 4 is # observations in sample file
      use "`samp'" in `i', clear
      rename (company year size rat) =0
      joinby dum using "`pop'"
      drop dum

      keep if year0 == year // compare companies with same year only
      keep if inrange(size, `lowlimit'*size0, `highlimit'*size0)

      gen ratdif = abs(rat0 - rat)
      * Ties in -ratdif- are broken alphabetically by -company- name
      isid ratdif company, sort
      capture keep in 1/3

      if (`i' == 1) save "`result'"
      else {
        append using "`result'"
        save "`result'", replace
      }

    }

}

* Check and reshape
use "`result'", clear
isid company0 ratdif company, sort
list, sepby(company0)

keep company*
list, sepby(company0)

by company0: gen id = _n
reshape wide company, i(company0) j(id)
list, separator(0)

*------------------------- end code ------------------------

On Sun, Jan 26, 2014 at 4:18 PM, Torsten Häberle
<[email protected]> wrote:
> Sorry guys. Just wanted to get different opinions since it's a tough one.
>
> 2014-01-26 daniel klein <[email protected]>:
>> This is a tripple post (with slight variations) that has already
>> generated two answers here
>>
>> http://www.talkstats.com/showthread.php/53371-Find-matching-firms-in-another-dataset
>>
>> http://www.stata-forum.de/post2400.html#p2400
>>
>>
>> Please see the FAQ concerning cross-postings
>> (http://www.stata.com/support/faqs/resources/statalist-faq/#crossposting)
>>
>>
>> Best
>> Daniel
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Two datasets: Look for similar observations in the second dataset
  - From: Torsten Häberle <[email protected]>
- Re: st: Two datasets: Look for similar observations in the second dataset
  - From: Torsten Häberle <[email protected]>
- Re: st: Two datasets: Look for similar observations in the second dataset
  - From: Torsten Häberle <[email protected]>

References:
- Re: st: Two datasets: Look for similar observations in the second dataset
  - From: daniel klein <[email protected]>
- Re: st: Two datasets: Look for similar observations in the second dataset
  - From: Torsten Häberle <[email protected]>

Prev by Date: st: interval data as independent variable
Next by Date: Re: st: Two datasets: Look for similar observations in the second dataset
Previous by thread: Re: st: Two datasets: Look for similar observations in the second dataset
Next by thread: Re: st: Two datasets: Look for similar observations in the second dataset
Index(es):
- Date
- Thread