(I'm reposting the original mailing and my reply.
The original mailing was HTML, which I spotted, and corrected
for, and it carried an accompanying winmail.dat, which
I didn't spot; that stuck to my reply mail like dirt
on a shoe. The original posting will appear as
complete gibberish to recipients of the digest
version of the list. As often mentioned, please do
_not_ send mailjunk to the list.)
================================
Radu Ban
> The data is organized like this, numbers are made-up for this
description:
>
> id dummy descriptor
> 13 1 <blank>
> 13 0 abc
> 13 1 <blank>
> 14 0 <blank>
> 14 0 def
> 14 0 def
>
> The idea is that the id variable should be unique, but for some
> reason it is not. This means that both the dummy and descriptor
> should have the same values accross the id groups. A complication
> is that for the dummy, if there's a "1" in a group all the group
> should be "1".
>
> I want to reduce this to a clean version which looks like this:
>
> id dummy descriptor
> 13 1 abc
> 14 0 def
>
> For the dummy part I dealt with it like this (probably a convoluted
method):
> bysort id: egen maxdummy = max(dummy)
> replace dummy = maxdummy
> bysort id: keep if _n == 1
>
> But I am a bit stuck on how to deal with the string descriptor. I
> mean I know one way of doing by splitting the data and then
> merging it back but there has to be a more efficient way.
I think you are right: you can do all you want in one place.
The dummy can be sorted out your way, or this way:
bysort id (dummy) : replace dummy = dummy[_N]
as 1s will get sorted to the end.
If I understand correctly, the descriptor can be
sorted out similarly
bysort id (descriptor) : replace descriptor = descriptor[_N]
as the empty strings will get sorted to the beginning.
However, before you do that you should test the
assumption that all (non-empty) descriptors are
identical within -id-:
gen empty = mi(descriptor)
bysort id empty (descriptor) :
assert descriptor[1] == descriptor[_N]
On the last, see also
http://www.stata.com/support/faqs/data/diff.html
Nick
[email protected]
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/