[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: Is there a "running count" command in Stata?

From	"Nick Cox" <[email protected]>
To	<[email protected]>
Subject	st: RE: RE: Is there a "running count" command in Stata?
Date	Thu, 31 Aug 2006 16:53:59 +0100

There was a key typo in my previous post. I invented names, 
then edited back to Mingfeng's names, but with an error. 
Here is a second edition. 

Nick 
[email protected] 

There is an interesting underlying issue here, what 
exactly is "programming" in Stata? A precise
answer is that a program is whatever is defined 
by whatever follows a -program- statement. (There
is no circularity here, as program the English 
word and -program- the Stata command name are from
metalanguage and language.) 
 
OK, enough of that.  
 
The good news is that this can be done without
ever writing down the Stata command name -program-, 
so the answer is yes. 
 
The other news looks bad, but isn't so bad really. 
In fact, it is really good news. 
 
You can do this, but it requires a little more
Stata than you may want at this moment. However, the features
to be used are among the most Stataish of all
Stata features and are very, very useful. 
 
Using your second list of values (which differs 
slightly from your first) we have 
 
 . l
 
      +------+
      |    x |
      |------|
   1. |  cd1 |
   2. |  cd2 |
   3. |  cd2 |
   4. |  cd3 |
   5. |  cd1 |
      |------|
   6. |  cd3 |
   7. |  cd4 |
   8. |  cd1 |
   9. |  cd5 |
  10. |  cd3 |
      +------+
 
 We need to tag the first time any value 
 occurs. That will need a -sort-, and because
 of that we should keep a record of the current
 sort order, not least because we will want
 to return to it. That means 
 
 . gen order = _n
 
 If your dataset is really big, that should be 
 
 . gen long order = _n
 
 We sort into groups of -x- and ensure that the 
 within groups of -x- the original sort order 
 is followed. Then we tag the very first occurrence 
 of each value of -x-. This can all be telescoped into one
 statement. 
 
 . bysort x (order) : gen y = _n == 1
 
 There is a FAQ on constructs like those on the right-hand 
 side of the assignment:
 
 FAQ     . . . . . . . . . . . . . . . . . . . . . . .  True 
 and false in Stata
         2/03    What is true and false in Stata?
                 
 http://www.stata.com/support/faqs/data/trueorfalse.html
 
 Now -sort- back to the original order. Then we just need a running
 sum of -y-, as the number of distinct values
 seen so far is equal to (or even defined as)
 the number of first occurrences seen so far. 
 
 . sort order
 
 . replace y = sum(y)
 (9 real changes made)
 
 -order- has served its purpose. Bye-bye! 
 
 . drop order
 
 What have we got? 
 
 . l
 
      +----------+
      |    x   y |
      |----------|
   1. |  cd1   1 |
   2. |  cd2   2 |
   3. |  cd2   2 |
   4. |  cd3   3 |
   5. |  cd1   3 |
      |----------|
   6. |  cd3   3 |
   7. |  cd4   4 |
   8. |  cd1   4 |
   9. |  cd5   5 |
  10. |  cd3   5 |
      +----------+
 
 Now with a little more knowledge we could wrap that 
 up into a command, or better an -egen- function. But
 in many ways it is better to use the code here and 
 understand its logic, which will help 
 for that next problem with a similar flavour. 
 
 The key construct here is -by:-. The documentation
 for -by:- is scattered around the manuals. A Mickey Mouse
 tutorial bringing together the main ideas was given in 
 
 SJ-2-1  pr0004  . . . . . . . . . . Speaking Stata:  How to 
 move step by: step
         Q1/02   SJ 2(1):86-102                         
         explains the use of the by varlist : construct to tackle
         a variety of problems with group structure, ranging from
         simple calculations for each of several groups to more
         advanced manipulations that use the built-in _n and _N
 
 Nick 
 [email protected] 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: st: RE: Is there a "running count" command in Stata?
Next by Date: st: RE: IV list in ivreg and ivreg2: procedure to test for endogeinity of added variable if one of the original variables is endogenous?
Previous by thread: st: RE: Is there a "running count" command in Stata?
Next by thread: st: RE: IV list in ivreg and ivreg2: procedure to test for endogeinity of added variable if one of the original variables is endogenous?
Index(es):
- Date
- Thread