Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Number cases into groups based on a shared value


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: Number cases into groups based on a shared value
Date   Mon, 14 Mar 2005 18:39:01 -0000

-egen, group()- is a wrapper around this 
main idea: 

bysort SomeNum : gen GroupNum = _n == 1 
replace GroupNum = sum(GroupNum) 

I have forgotten all the SPSS syntax 
I ever knew, which was very little and a 
long time ago, so I can't translate the 
other way. And -by:- is pretty Stataish. 
It may not be very translatable. 

In more words, 

0. -sort-ing on SomeNum is needed. (-egen- 
does that quietly, if needed, and then undoes 
it. With DIY, you must DIY.) You see that. 

1. Once you have 

SomeNum 
10 
10
...
11
11
...

...
16
16 
...

then you just assign 1 to the first in 
each block with a 1 and assign 
0 to the others: 

SomeNum        GroupNum 
10                1 
10                0
...
11                1
11                0 
...

...
16                1
16                0 
...

2. Finally, what you want is the 
cumulative sum, given by -sum()-. 

Another way to do it is 

sort SomeNum 
gen GroupNum = _n == 1 
replace GroupNum = 
	cond(SomeNum != SomeNum[_n-1], GroupNum[_n-1] + 1, GroupNum[_n-1]) 
	in 2/l 

which is closer in spirit to the code you have, but not the 
approved way to do this. 

Nick 
[email protected] 

Mike Lacy

> I'm wanting to learn about a "do it yourself" way to do what is 
> accomplished by the -group- function in the -egen- command in 
> the following:
> 
> set obs 100
> gen SomeNum = 10 + int(7 * uniform())
> * Attach a sequential group number to all the
> * cases with the same value for "SomeNum"
> egen GroupNum = group(SomeNum)
> 
> 
> This works fine at accomplishing the task.  My interest in 
> the DIY approach 
> is that the kind of algorithm I am accustomed  to using for 
> this task does 
> not fit with the inner nature <grin> of Stata.  I'm 
> accustomed (in SPSS or 
> lower level languages) something like:
> 
> sort SomeNum
> gen MyGroup = 1 if _n ==1
> gen Same = (Somenum = Somenum[_n-1])
> gen MyGroup = MyGroup[_n-1] if Same
> gen MyGroup = 1+ MyGroup[_n-1] if ! Same
> 
> This doesn't fit with how Stat does -if-, as near as I
> understand. So, what would the Stata DIY approach to this 
> kind of algorithm 
> be?  All I could come up with was to put SomeNum into a 
> matrix so that I 
> could loop through it, but that hardly seems like a desirable 
> way to do things.
 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index