Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Two datasets: Look for similar observations in the second dataset
From
Roberto Ferrer <[email protected]>
To
Stata Help <[email protected]>
Subject
Re: st: Two datasets: Look for similar observations in the second dataset
Date
Tue, 28 Jan 2014 18:32:28 -0430
I'm afraid I can't help you with your modified problem, but maybe some
other user will.
Let me comment briefly that your problem reminds me of one posted on
this list some weeks ago:
http://www.stata.com/statalist/archive/2014-01/msg00190.html
As stated, your problem has multiple solutions because the order in
which you match the firms will affect the remaining possible matches.
If you cannot justify the order in which you do the matches, you could
get in trouble.
On Mon, Jan 27, 2014 at 4:35 PM, Torsten Häberle
<[email protected]> wrote:
> Sorry, I have to answer again. I kind of solved the problem with the
> missing ratios. I found a way with the if/else command to match based
> on the closest size if the ratios are missing.
>
> However, I couldn't figure out a solution to problem (2), namely:
> different sample firms can be matched to the same matching firm. To
> make my matching perfect, it would be great if the loop could be
> extended in the following way.
>
> - If a sample firm B is matched to a matching firm A in year X (2000),
> then drop out the matching firm A from the universe of all matching
> firms for the years X (2000), X+1 (2001), X+2 (2002), X+3 (2003), X-1
> (1999), X-2 (1998), X-3 (1997).
> - Basically, this means that matching firm A could be matched again
> with another sample firm, but only in OTHER years than those outlined
> above in the example.
> - For example, if there is another sample firm in 2007, then this
> sample firm could be matched again with our matching firm A in year
> 2007. However, if there would be a sample firm in 2002, matching firm
> A could NOT be the matching firm again, because it was already matched
> to sample firm B in 2000.
> - In summary, if a matching firm was matched with a sample firm, it
> cannot be a match again in the three years before and the three years
> after it was matched the first time. But it can be another match in
> all other years. If there would be a second match, again, this second
> "7-year period" would be locked again.
>
> Sorry, this is an even more complex extension.
>
> Thanks again so much.
>
> 2014-01-27 Roberto Ferrer <[email protected]>:
>> Please follow Statalist policy and provide cross-references when
>> posting in other forums:
>> http://www.stata.com/support/faqs/resources/statalist-faq/#crossposting
>>
>> The following is one way of doing what you want. You could avoid the
>> -forvalues- loop if your database is not too big, but I assume it is.
>> I didn't test speed with a big data set but I hope it gets you
>> started.
>>
>> * ----------------------- begin code -----------------------
>>
>> clear all
>> set more off
>>
>> * Input fake databases (including -dum- variable)
>> input str1 company year size rat
>> A 2012 140 0.2
>> B 2011 200 0.4
>> C 2010 300 0.2
>> D 2010 160 0.5
>> end
>>
>> gen dum = 1
>>
>> tempfile samp
>> save "`samp'"
>>
>> clear all
>> input str4 company year size rat
>> X 2012 150 0.19
>> XX 2012 150 0.20
>> XXX 2012 150 0.22
>> XXXX 2012 150 0.195
>> Y 2010 280 0.9
>> YY 2010 280 0.9
>> Z 2012 50 0.01
>> ZZ 2010 300 0.2
>> T 2011 200 0.95
>> U 2010 300 0.10
>> end
>>
>> gen dum = 1
>>
>> tempfile pop
>> save "`pop'"
>>
>>
>> * Main process
>> tempfile result
>> local lowlimit .8
>> local highlimit 1.2
>>
>> quietly {
>> forvalues i = 1/4 { // 4 is # observations in sample file
>> use "`samp'" in `i', clear
>> rename (company year size rat) =0
>> joinby dum using "`pop'"
>> drop dum
>>
>> keep if year0 == year // compare companies with same year only
>> keep if inrange(size, `lowlimit'*size0, `highlimit'*size0)
>>
>> gen ratdif = abs(rat0 - rat)
>> * Ties in -ratdif- are broken alphabetically by -company- name
>> isid ratdif company, sort
>> capture keep in 1/3
>>
>> if (`i' == 1) save "`result'"
>> else {
>> append using "`result'"
>> save "`result'", replace
>> }
>>
>> }
>>
>> }
>>
>> * Check and reshape
>> use "`result'", clear
>> isid company0 ratdif company, sort
>> list, sepby(company0)
>>
>> keep company*
>> list, sepby(company0)
>>
>> by company0: gen id = _n
>> reshape wide company, i(company0) j(id)
>> list, separator(0)
>>
>> *------------------------- end code ------------------------
>>
>> On Sun, Jan 26, 2014 at 4:18 PM, Torsten Häberle
>> <[email protected]> wrote:
>>> Sorry guys. Just wanted to get different opinions since it's a tough one.
>>>
>>> 2014-01-26 daniel klein <[email protected]>:
>>>> This is a tripple post (with slight variations) that has already
>>>> generated two answers here
>>>>
>>>> http://www.talkstats.com/showthread.php/53371-Find-matching-firms-in-another-dataset
>>>>
>>>> http://www.stata-forum.de/post2400.html#p2400
>>>>
>>>>
>>>> Please see the FAQ concerning cross-postings
>>>> (http://www.stata.com/support/faqs/resources/statalist-faq/#crossposting)
>>>>
>>>>
>>>> Best
>>>> Daniel
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> * http://www.ats.ucla.edu/stat/stata/
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/