Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Two datasets: Look for similar observations in the second dataset

From	Amadou DIALLO <[email protected]>
To	[email protected]
Subject	Re: st: Two datasets: Look for similar observations in the second dataset
Date	Tue, 28 Jan 2014 20:48:04 +0100

Hi,
I'm in a conference so I've not looked at your data which seems
complex but I believe you can find a solution in this presentation by
Prof. Kit Baum (http://economics.adelaide.edu.au/research/seminars/Stata_Lecture4.pdf),
p.194, particularly the code on nneighbor.ado that you can customize
for your needs, maybe combining somehow with the spirit of the code by
Roberto above. I HTHs.

2014-01-27, Torsten Häberle <[email protected]>:
> Sorry, I have to answer again. I kind of solved the problem with the
> missing ratios. I found a way with the if/else command to match based
> on the closest size if the ratios are missing.
>
> However, I couldn't figure out a solution to problem (2), namely:
> different sample firms can be matched to the same matching firm. To
> make my matching perfect, it would be great if the loop could be
> extended in the following way.
>
> - If a sample firm B is matched to a matching firm A in year X (2000),
> then drop out the matching firm A from the universe of all matching
> firms for the years X (2000), X+1 (2001), X+2 (2002), X+3 (2003), X-1
> (1999), X-2 (1998), X-3 (1997).
> - Basically, this means that matching firm A could be matched again
> with another sample firm, but only in OTHER years than those outlined
> above in the example.
> - For example, if there is another sample firm in 2007, then this
> sample firm could be matched again with our matching firm A in year
> 2007. However, if there would be a sample firm in 2002, matching firm
> A could NOT be the matching firm again, because it was already matched
> to sample firm B in 2000.
> - In summary, if a matching firm was matched with a sample firm, it
> cannot be a match again in the three years before and the three years
> after it was matched the first time. But it can be another match in
> all other years. If there would be a second match, again, this second
> "7-year period" would be locked again.
>
> Sorry, this is an even more complex extension.
>
> Thanks again so much.
>
> 2014-01-27 Roberto Ferrer <[email protected]>:
>> Please follow Statalist policy and provide cross-references when
>> posting in other forums:
>> http://www.stata.com/support/faqs/resources/statalist-faq/#crossposting
>>
>> The following is one way of doing what you want. You could avoid the
>> -forvalues- loop if your database is not too big, but I assume it is.
>> I didn't test speed with a big data set but I hope it gets you
>> started.
>>
>> * ----------------------- begin code -----------------------
>>
>> clear all
>> set more off
>>
>> * Input fake databases (including -dum- variable)
>> input str1 company year size rat
>> A                  2012        140                    0.2
>> B                  2011        200                   0.4
>> C                  2010        300                    0.2
>> D                  2010        160                    0.5
>> end
>>
>> gen dum = 1
>>
>> tempfile samp
>> save "`samp'"
>>
>> clear all
>> input str4 company year size rat
>> X                  2012        150                    0.19
>> XX                  2012        150                    0.20
>> XXX                  2012        150                    0.22
>> XXXX                  2012        150                    0.195
>> Y                  2010        280                   0.9
>> YY                  2010        280                   0.9
>> Z                  2012        50                      0.01
>> ZZ                  2010        300                    0.2
>> T                  2011        200                   0.95
>> U                  2010        300                    0.10
>> end
>>
>> gen dum = 1
>>
>> tempfile pop
>> save "`pop'"
>>
>>
>> * Main process
>> tempfile result
>> local lowlimit .8
>> local highlimit 1.2
>>
>> quietly {
>>     forvalues i = 1/4 { // 4 is # observations in sample file
>>       use "`samp'" in `i', clear
>>       rename (company year size rat) =0
>>       joinby dum using "`pop'"
>>       drop dum
>>
>>       keep if year0 == year // compare companies with same year only
>>       keep if inrange(size, `lowlimit'*size0, `highlimit'*size0)
>>
>>       gen ratdif = abs(rat0 - rat)
>>       * Ties in -ratdif- are broken alphabetically by -company- name
>>       isid ratdif company, sort
>>       capture keep in 1/3
>>
>>       if (`i' == 1) save "`result'"
>>       else {
>>         append using "`result'"
>>         save "`result'", replace
>>       }
>>
>>     }
>>
>> }
>>
>> * Check and reshape
>> use "`result'", clear
>> isid company0 ratdif company, sort
>> list, sepby(company0)
>>
>> keep company*
>> list, sepby(company0)
>>
>> by company0: gen id = _n
>> reshape wide company, i(company0) j(id)
>> list, separator(0)
>>
>> *------------------------- end code ------------------------
>>
>> On Sun, Jan 26, 2014 at 4:18 PM, Torsten Häberle
>> <[email protected]> wrote:
>>> Sorry guys. Just wanted to get different opinions since it's a tough
>>> one.
>>>
>>> 2014-01-26 daniel klein <[email protected]>:
>>>> This is a tripple post (with slight variations) that has already
>>>> generated two answers here
>>>>
>>>> http://www.talkstats.com/showthread.php/53371-Find-matching-firms-in-another-dataset
>>>>
>>>> http://www.stata-forum.de/post2400.html#p2400
>>>>
>>>>
>>>> Please see the FAQ concerning cross-postings
>>>> (http://www.stata.com/support/faqs/resources/statalist-faq/#crossposting)
>>>>
>>>>
>>>> Best
>>>> Daniel
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>


-- 
Amadou B. DIALLO, PhD.
Senior Economist, AfDB.
[email protected]
+21671101789

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- Re: st: Two datasets: Look for similar observations in the second dataset
  - From: daniel klein <[email protected]>
- Re: st: Two datasets: Look for similar observations in the second dataset
  - From: Torsten Häberle <[email protected]>
- Re: st: Two datasets: Look for similar observations in the second dataset
  - From: Roberto Ferrer <[email protected]>
- Re: st: Two datasets: Look for similar observations in the second dataset
  - From: Torsten Häberle <[email protected]>

Prev by Date: Re: st: Unexpected end of file in mata.
Next by Date: Re: st: Midas- initial values not feasible=error
Previous by thread: Re: st: Two datasets: Look for similar observations in the second dataset
Next by thread: Re: st: Two datasets: Look for similar observations in the second dataset
Index(es):
- Date
- Thread