Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: drop duplicates iff
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: drop duplicates iff
Date
Tue, 16 Apr 2013 15:08:56 +0100
It sounds as if you want
bysort ObjektID (day_of_week) : gen day_of_week2 = day_of_week if _n == 1
Nick
[email protected]
Terminology is tricky:
-duplicates- is a command, not a function.
-replace- with missing is not deletion.
An observation in Stata is an entire row, case, or record, not an
individual value of a variable.
On 16 April 2013 14:57, Joel Jönsson <[email protected]> wrote:
> I have the following problem. Im trying to delete observations (replace by [.] ) for values that were filled in automatically when I merged two data sets with different amount of ID observations. These values are duplicates. However I do not wish to use "duplicates drop" since this drops observations containing information in other variables. I can not (as far as I know) control for other variables by adding them to the "duplicate drop" function, since the information in the variable containing most observations is unique for each observation, and must not be droped.
>
> I tried the following, only to realize that only the first and third duplicate was replaced, leaving the second and fourth duplicate intact.
>
> replace day_of_week2 =cond(day_of_week2[_n]==day_of_week2[_n-1], .,day_of_week2). This yield
>
> ObjektID day_of_week day_of_week2
> 3063 5 5
> 3066 3 3
> 3066 3 .
> 3066 3 3
> 3066 3 .
> 3066 3 3
> 3069 2 2
>
> in this case, I would like to have all the 3 removed. Any suggestion how to improve the code?
>
> Best,
>
> Joel
>
>
> On Apr 15, 2013, at 11:07 AM, Nick Cox wrote:
>
>> You don't.
>>
>> From what you say, you want
>>
>> duplicates drop apartment_id bidder_id
>>
>> If that would result in loss of information, -duplicates- will tell
>> you. -duplicates- is dedicated to being careful about loss of
>> information.
>>
>> Nick
>> [email protected]
>>
>>
>> On 15 April 2013 09:27, Joel Jönsson <[email protected]> wrote:
>>> Thanks for your quick response Nick. I have been looking at the documentation (help duplicates).
>>> My problem is to isolate the removal of duplicates to one Apartment-ID at the time. Which command [if] [in] [bysort] [group] do I use?
>>>
>>> On Apr 15, 2013, at 1:48 AM, Nick Cox wrote:
>>>
>>>> Did you try looking at the documentation? There is a -duplicates-
>>>> command. Once you have used it to remove duplicates, the second
>>>> question is
>>>>
>>>> bysort Apartment_ID : replace Bidder_ID = _n
>>>>
>>>> Nick
>>>> [email protected]
>>>>
>>>>
>>>> On 14 April 2013 23:19, Joel Jönsson <[email protected]> wrote:
>>>>> Dear all Statalist users.
>>>>>
>>>>> I'm quit new to Stata and I'm facing the following challenge. I wish to get rid of duplicates within a
>>>>> variable (Bidder-ID) for a specific observation number (Apartment-ID) only i.e. there are numerous
>>>>> of observations with the value 49, 50, 51 etc. within Bidder-ID which are allowed only once
>>>>> within the same Apartment-ID.
>>>>>
>>>>> _n Apartment-ID Bidder-ID
>>>>>
>>>>> 1. 3345 49
>>>>> 2. 3345 49
>>>>> 3. 3345 50
>>>>> 4. 3345 51
>>>>> 5. 3345 50
>>>>> 6. 5780 49
>>>>> 7. 5780 50
>>>>> 8. 5780 49
>>>>>
>>>>> I would like the result to look something like the following:
>>>>>
>>>>> _n Apartment-ID Bidder-ID
>>>>> 1. 3345 49
>>>>> 2. 3345 50
>>>>> 3. 3345 51
>>>>> 4. 5780 49
>>>>> 5. 5780 50
>>>>>
>>>>> Also, I wish to rename the observations in Bidder-ID (49,50,51) which could also take on numbers
>>>>> such as 2234, 2244, 2255 (they symbolize one unique bidder) to take on values equal to when they first
>>>>> appeared in Appartment-ID. So, if Bidder-ID 49, 50, 51, 2234, 2244, 2255 exist for the same
>>>>> Apartment-ID, then 49=1, 50=2, 51=3, 2234=4 etc., not necessarily in that order (2234=2, 51=1, 49=4 …).
>>>>> Thus, It would look something like this:
>>>>>
>>>>> _n Apartment-ID Bidder-ID
>>>>> 1. 3345 1
>>>>> 2. 3345 2
>>>>> 3. 3345 3
>>>>> 4. 5780 1
>>>>> 5. 5780 2
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/