[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: cleaning a specific data structure

From	"Nick Cox" <[email protected]>
To	<[email protected]>
Subject	st: RE: cleaning a specific data structure
Date	Fri, 21 Nov 2003 13:06:59 -0000

Radu Ban 

> The data is organized like this, numbers are made-up for this
description:
> 
> id dummy descriptor
> 13 1 <blank>
> 13 0 abc
> 13 1 <blank>
> 14 0 <blank>
> 14 0 def
> 14 0 def
> 
> The idea is that the id variable should be unique, but for some
> reason it is not.  This means that both the dummy and descriptor
> should have the same values accross the id groups. A complication
> is that for the dummy, if there's a "1" in a group all the group
> should be "1". 
> 
> I want to reduce this to a clean version which looks like this:
> 
> id dummy descriptor
> 13 1 abc
> 14 0 def
> 
> For the dummy part I dealt with it like this (probably a convoluted
method):
> bysort id: egen maxdummy = max(dummy)
> replace dummy = maxdummy
> bysort id: keep if _n == 1
> 
> But I am a bit stuck on how to deal with the string descriptor. I
> mean I know one way of doing by splitting the data and then
> merging it back but there has to be a more efficient way.

I think you are right: you can do all you want in one place. 

The dummy can be sorted out your way, or this way: 

bysort id (dummy) : replace dummy = dummy[_N] 

as 1s will get sorted to the end. 

If I understand correctly, the descriptor can be 
sorted out similarly 

bysort id (descriptor) : replace descriptor = descriptor[_N] 

as the empty strings will get sorted to the beginning. 

However, before you do that you should test the 
assumption that all (non-empty) descriptors are 
identical within -id-: 

gen empty = mi(descriptor) 
bysort id empty (descriptor) : 
	assert descriptor[1] == descriptor[_N]  

On the last, see also 
http://www.stata.com/support/faqs/data/diff.html
 
Nick
[email protected]

<<attachment: winmail.dat>>

References:
- st: cleaning a specific data structure
  - From: "Ban,R (pgt)" <[email protected]>

Prev by Date: st: Question on Ttest.
Next by Date: Re: st: Question on Ttest.
Previous by thread: st: cleaning a specific data structure
Next by thread: st: confidence intervals for means and ratios using p-weights in SRS
Index(es):
- Date
- Thread