[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: RE: Re: How to speed up loop

From	"Nick Cox" <[email protected]>
To	<[email protected]>
Subject	st: RE: RE: RE: Re: How to speed up loop
Date	Fri, 17 Sep 2004 12:51:54 +0100

I'll try. 

The prefix 

. by hhid: 

is, as you know, an instruction that operations are done 
separately for distinct values of -hhid-. For this to work in 
the example given, observations need to be 
in sort order of -hhid-. Also, we need them 
to be sorted by -lineno- within -hhid-, 
as is implied by the example dataset given. A more careful 
prefix is thus 

. bysort hhid (lineid): 

which does the sorting if need be. 

Now the key wrinkle is that

	under the aegis of -by <byvarlist>:-, 
	subscripts are interpreted as being within 
	groups defined by the <byvarlist> 

so if you go 

. bysort panel (time) : gen first = value[1] 

the [1] always refers to the first observation 
within each -panel- (_not_ the first observation 
in the dataset), and similarly 

. bysort panel (time) : gen last = value[_N] 

is always the last observation within each -panel-
(_not_ in the dataset). 

These two examples already give an important hint: 
what is within the subscript can be an expression, 
and need not be a constant. (The expression need not 
even evaluate to an integer. 

. di mpg[exp(1)] 

is legal Stata, although I can't think of a use 
for it. exp(1) gets truncated to 2, by the way.) 

So also is this legal Stata: 

. by hhid : gen mage = age[mlineno]

Take 

hhid    lineno       age   mlineno      mage
  1         1        32         .         .
  1         2        30         .         .
  1         3         5         2        30

Each expression within [ ] is evaluated 
separately for each observation. For 
the first and second 

	age[mlineno] becomes age[.] 

which is taken as missing. For the third, 

	age[mlineno] becomes age[3] 

which by the wrinkle rule above is 30. 
It is the third observation _within that group_. 

As far as -by:- is concerned, see also

SJ-2-1  pr0004  . . . . . . . . . . Speaking Stata:  How to move step by: step
Q1/02   SJ 2(1):86-102                                   (no commands)
        explains the use of the by varlist : construct to tackle
        a variety of problems with group structure, ranging from
        simple calculations for each of several groups to more
        advanced manipulations that use the built-in _n and _N


For another example of cute subscript use, look inside 
the code for -qqplot- (or of -qplot- from SSC, borrowing the
same trick). 

The same idea in general is what I call "cosorting". 

Cosorting sorts each variable in a varlist and replaces variables so that all 
are in sorted order, aligned so that the first of each is in the first 
observation, the second of each is in the second observation, and so on. 
Variables may be numeric or string. 

Suppose we have 

      a   b   c
      3   7   13 
      1   8   12
      2   9   11 

After cosorting we have 

      a   b   c
      1   7   11
      2   8   12
      3   9   13 

Warning: this is rarely needed and destroys information in your data set 
in so far as values in each observation are typically not kept together. 

Anyway, here is one way to do it: 

program define cosort
*! 1.0.0 NJC 3 November 1999 
	version 6 
	syntax varlist(min=2) [if] [in] 
	tokenize `varlist' 
	tempvar touse order 
	mark `touse' `if' `in' 
	qui replace `touse' = 1 - `touse' 
	sort `touse' `1' 
        gen long `order' = _n
	mac shift 
	qui while "`1'" != "" { 
		tempvar copy 
		local type : type `1' 
		gen `type' `copy' = `1' 
		sort `touse' `1' 
		replace `1' = `copy'[`order']
		drop `copy' 
		mac shift 
	}	
	sort `order' 
end

Nick 
[email protected] 

Scott Merryman
 
> Nick,
> 
> Could you please explain how this -gen mage = age[mlineno]- 
> works or where I
> could find it. I realize that square brackets are used for explicit
> subscripting, but is not clear to me how this working.
 
Nick Cox

> > Looks like
> > 
> > by hhid : gen mage = age[mlineno]
> > 
> 
> 
> <snip>
> 
> > > hhid    lineno       age   mlineno      mage
> > >    1         1        32         .         .
> > >    1         2        30         .         .
> > >    1         3         5         2        30
> > >    2         1        68         .         .
> > >    2         2        41         1        68
> > >    2         3        40         .         .
> > >    2         4        17         3        40
> > >    2         5        14         3        40
> > >

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- st: RE: RE: RE: RE: Re: How to speed up loop
  - From: "David E Moore" <[email protected]>
- st: Variance of a ratio
  - From: Leonelo Bautista <[email protected]>
- st: RE: RE: RE: RE: Re: How to speed up loop
  - From: "Scott Merryman" <[email protected]>

Prev by Date: Re: st: Failure time in stset.
Next by Date: st: NL (not your favorite math equations editor)
Previous by thread: st: RE: Re: RE: Re: How to speed up loop
Next by thread: st: RE: RE: RE: RE: Re: How to speed up loop
Index(es):
- Date
- Thread