[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: RE: RE: Question erase duplicates values

From	"Daniel Sepulveda-Adams" <[email protected]>
To	"'Nick Cox'" <[email protected]>, <[email protected]>
Subject	st: RE: RE: RE: RE: Question erase duplicates values
Date	Tue, 12 Aug 2008 13:27:19 -0500

Nick

Yes I'm sure that I want to use -merge- 
I putted together three of this data base (using append & merge in the case
that was necessary) and that generated duplicates values too and I was not
able to merge the last one, therefore I'm thinking if I do in this way I
will be able to finish the merge with the four data set. But maybe I'm
wrong.

Daniel A. Sepulveda Adams
Research Scientist - PRIME Institute
College of Pharmacy - University of Minnesota
308 Harvard ST SE, Weaver Densford Hall, 7-159
Minneapolis, MN, 55455, USA
Phone: 612-624-8489
Cell Phone: 651-295-7771
Fax: 612-625-9931
Email: [email protected]

-----Original Message-----
From: Nick Cox [mailto:[email protected]] 
Sent: Tuesday, August 12, 2008 1:16 PM
To: Daniel Sepulveda-Adams
Subject: RE: RE: RE: RE: Question erase duplicates values

Sergiy's code, just given separately, will do what you ask for. 

That's not the difficulty at all. 

My point remains: Why do you expect a -merge- to work on the results? 

Consider just ID 1. Once you hide the fact that some of the observations
were for ID 1, a -merge- won't be able to do magic and rediscover that
fact. 

Are you sure that you don't want an -append-, not a -merge-? 

Nick 
[email protected] 

Daniel Sepulveda-Adams

I'm doing that because that is the only way that I know that I can
create an
ID to mix with my others data set that they have the same ID. 

And related to the last paragraph yes you are correct, I used

Duplicates drop ID, force

The reason to all of this is because I have four data set that have the
same
ID but only one of them has duplicates values, therefore the only way
that I
know to merge them is create an ID without the values that are
duplicates.
Do you have any suggestion?

Nick Cox

Thanks for this, but I don't understand it at all. 

Why you want to throw away information about your ID? If you map second
and higher occurrences of each ID to missing, you just create
duplicates of missing, and it is difficult to see how a -merge- could
then work properly. 

Ironically enough, the syntax 

duplicates drop ID

that I alluded to is illegal. Perhaps what you tried was -duplicates
drop ID, force- and that would have the effect you describe. 

Daniel Sepulveda-Adams

Sorry that I was not very precise & I understand your explanation, let
see
if I can be more precise. EX:

ID 	ndc	units1	units2	units3
----------------------------------------
1	1 	5	6	7
1	1	4	8	9
2	2	7	8	6	
2	2	8	2	1
3	3	1	4	6
3	3	4	6	8

What I need is
ID 	ndc	units1	units2	units3
----------------------------------------
1	1 	5	6	7
.	1	4	8	9
2	2	7	8	6	
.	2	8	2	1
3	3	1	4	6
.	3	4	6	8

The command that I used was
Duplicates drop ID, but that drop all the observations that were
duplicates
not just the duplicates values in the variables ID

Let me know if that helps to understand my problem.

Nick Cox

There is no code here and no example data to be clear on what you tried.

So, how can anyone answer this except by guessing? 

The fact that values of an identifier are repeated does not mean that
the dataset should be cleaned up by removing duplicates of the
identifier. That principle would wreak havoc on panel data. Cloning the
identifier makes no difference to that principle. What is true of the
original is true of the clone, necessarily. 

Perhaps you did something like 

. duplicates drop clonedid 

And -duplicates- refused. I am very pleased to hear that. I designed
that behaviour into -duplicates- to protect people from losing
information. 

Perhaps you did something else altogether, in which case please say
precisely what. 

Daniel Sepulveda-Adams

I'm trying to created a unique ID to make a merge between two date set 
But the Unique ID is a variable that have many duplicates values,
therefore
what I did was clone the variables and try to erase the duplicates
values
but just in the NEW variable but I was not able to do that. Anyone has
an
idea how to do that? Thank you for your time.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: RE: RE: RE: RE: Question erase duplicates values
  - From: "Sergiy Radyakin" <[email protected]>
- st: RE: RE: RE: RE: Question erase duplicates values
  - From: "Nick Cox" <[email protected]>

References:
- st: Question erase duplicates values
  - From: "Daniel Sepulveda-Adams" <[email protected]>
- st: RE: Question erase duplicates values
  - From: "Nick Cox" <[email protected]>
- st: RE: RE: Question erase duplicates values
  - From: "Daniel Sepulveda-Adams" <[email protected]>
- st: RE: RE: RE: Question erase duplicates values
  - From: "Nick Cox" <[email protected]>

Prev by Date: st: FW: RE: RE: RE: Question erase duplicates values
Next by Date: st: RE: RE: RE: RE: Question erase duplicates values
Previous by thread: st: RE: RE: RE: Question erase duplicates values
Next by thread: st: RE: RE: RE: RE: Question erase duplicates values
Index(es):
- Date
- Thread