I'll try.
The prefix
. by hhid:
is, as you know, an instruction that operations are done
separately for distinct values of -hhid-. For this to work in
the example given, observations need to be
in sort order of -hhid-. Also, we need them
to be sorted by -lineno- within -hhid-,
as is implied by the example dataset given. A more careful
prefix is thus
. bysort hhid (lineid):
which does the sorting if need be.
Now the key wrinkle is that
under the aegis of -by <byvarlist>:-,
subscripts are interpreted as being within
groups defined by the <byvarlist>
so if you go
. bysort panel (time) : gen first = value[1]
the [1] always refers to the first observation
within each -panel- (_not_ the first observation
in the dataset), and similarly
. bysort panel (time) : gen last = value[_N]
is always the last observation within each -panel-
(_not_ in the dataset).
These two examples already give an important hint:
what is within the subscript can be an expression,
and need not be a constant. (The expression need not
even evaluate to an integer.
. di mpg[exp(1)]
is legal Stata, although I can't think of a use
for it. exp(1) gets truncated to 2, by the way.)
So also is this legal Stata:
. by hhid : gen mage = age[mlineno]
Take
hhid lineno age mlineno mage
1 1 32 . .
1 2 30 . .
1 3 5 2 30
Each expression within [ ] is evaluated
separately for each observation. For
the first and second
age[mlineno] becomes age[.]
which is taken as missing. For the third,
age[mlineno] becomes age[3]
which by the wrinkle rule above is 30.
It is the third observation _within that group_.
As far as -by:- is concerned, see also
SJ-2-1 pr0004 . . . . . . . . . . Speaking Stata: How to move step by: step
Q1/02 SJ 2(1):86-102 (no commands)
explains the use of the by varlist : construct to tackle
a variety of problems with group structure, ranging from
simple calculations for each of several groups to more
advanced manipulations that use the built-in _n and _N
For another example of cute subscript use, look inside
the code for -qqplot- (or of -qplot- from SSC, borrowing the
same trick).
The same idea in general is what I call "cosorting".
Cosorting sorts each variable in a varlist and replaces variables so that all
are in sorted order, aligned so that the first of each is in the first
observation, the second of each is in the second observation, and so on.
Variables may be numeric or string.
Suppose we have
a b c
3 7 13
1 8 12
2 9 11
After cosorting we have
a b c
1 7 11
2 8 12
3 9 13
Warning: this is rarely needed and destroys information in your data set
in so far as values in each observation are typically not kept together.
Anyway, here is one way to do it:
program define cosort
*! 1.0.0 NJC 3 November 1999
version 6
syntax varlist(min=2) [if] [in]
tokenize `varlist'
tempvar touse order
mark `touse' `if' `in'
qui replace `touse' = 1 - `touse'
sort `touse' `1'
gen long `order' = _n
mac shift
qui while "`1'" != "" {
tempvar copy
local type : type `1'
gen `type' `copy' = `1'
sort `touse' `1'
replace `1' = `copy'[`order']
drop `copy'
mac shift
}
sort `order'
end
Nick
[email protected]
Scott Merryman
> Nick,
>
> Could you please explain how this -gen mage = age[mlineno]-
> works or where I
> could find it. I realize that square brackets are used for explicit
> subscripting, but is not clear to me how this working.
Nick Cox
> > Looks like
> >
> > by hhid : gen mage = age[mlineno]
> >
>
>
> <snip>
>
> > > hhid lineno age mlineno mage
> > > 1 1 32 . .
> > > 1 2 30 . .
> > > 1 3 5 2 30
> > > 2 1 68 . .
> > > 2 2 41 1 68
> > > 2 3 40 . .
> > > 2 4 17 3 40
> > > 2 5 14 3 40
> > >
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/