Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: merge and nearest value
From
Francesco <[email protected]>
To
[email protected]
Subject
Re: st: merge and nearest value
Date
Sun, 19 Aug 2012 14:11:45 +0200
Well David my apologies because your idea is indeed really
appropriate, after some slight data manipulation
Indeed I have just to drop the Type-Date duplicates in A, use your
joinby suggestion (thus obtaining dataset C) and then perform a m:1
merge between A and C.
Thank you !
Best
On 19 August 2012 12:07, Francesco <[email protected]> wrote:
> Dear David,
>
> Many thanks for your interesting suggestion!
> Unfortunately it cannot work because
> 1) There can be numerous duplicates of TYPE and DATE in A. However in
> B TYPE and DATE uniquely idenfity observations
> 2) I have several million of observations, therefore using a joinby
> would probably destroy my computer's Ram ...
>
> :-(
>
> On 19 August 2012 05:13, David Kantor <[email protected]> wrote:
>> First, I presume that in A, TYPE and DATE uniquely identify observations.
>>
>> I suggest you do a -joinby- on TYPE. This will create a large multitude of
>> observations.
>> Then for each distinct TYPE and DATE combination, compute the difference and
>> then select (by TYPE and DATE) the one with the minimal difference.
>> You can do appropriate sorting to break ties in the manner you desire.
>>
>> HTH
>> --David
>>
>>
>> At 07:22 PM 8/18/2012, Francesco wrote:
>>>
>>> Dear Statalist,
>>>
>>> I wish again that you could help me with this particular merging
>>> problem...
>>>
>>> Let say I have a dataset A as:
>>>
>>> TYPE DATE
>>> A 2
>>> A 5
>>> A 20
>>> B 10
>>> B 2
>>>
>>>
>>> and I have another dataset B as :
>>>
>>>
>>> TYPE Special_Date
>>> A 2
>>> A 6
>>> A 20
>>> A 22
>>> B 5
>>> B 6
>>>
>>> The question is : I would like to obtain the difference between the
>>> date of each observation in A and the closest special date in B with
>>> the same type. In case of ties I would take the latest date of the
>>> two.
>>>
>>> For example I would obtain here
>>>
>>> TYPE DATE Difference
>>> A 2 0=2-2
>>> A 5 -1=5-6
>>> A 20 0=20-20
>>> B 10 +4=10-6
>>> B 2 -3=2-5
>>>
>>>
>>> I was thinking of reshaping the dataset B in order to have the special
>>> dates in column for each type, merging then on type with A, creating a
>>> difference variable between the date and each special date, and taking
>>> the minimum...
>>> But this involves creating a lot of variables and maybe there is
>>> something more simple ?
>>>
>>> Many thanks for your suggestions,
>>>
>>> Best Regards,
>>
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/