Many thanks to Nick and John. Not only did I get an answer, I got some
insight into programming Stata.
Bayard Webb
Game Design, IGT
[email protected]
(775) 448-7406
-----Original Message-----
From: Nick Cox [mailto:[email protected]]
Sent: Sunday, July 13, 2003 9:40 AM
To: [email protected]
Subject: st: RE: var gen based on history
Webb.Bayard
> I have variables Category, Date and Amount and I need to
> generate two new
> variables described as follows.
>
> minAmount = smallest Amount in Category as of Date
> maxAmount = largest Amount in Category as of Date
>
>
> Example:
> Category Date Amount min max
> 1 1/1/2001 50 50 50
> 2 1/1/2001 100 100 100
> 1 1/2/2001 52 50 52
> 2 1/2/2001 98 98 100
> 1 1/3/2001 47 47 52
> 2 1/3/2001 105 98 105
> 1 1/4/2001 55 47 55
> 2 1/4/2001 93 93 105
>
> I've looked, but I can't come up with the right gen or egen
> commands and
> I've not yet tackled programming. I'm hoping someone can
> spot some tricky
> way of accomplishing this.
Solution 1. Use -record()- from -egenmore-
==========================================
The -egen- function -record()- written by Kit Baum and
myself is aimed at this problem:
record(exp) [ , by(varlist) min order(varlist) ] produces the maximum
(with min the minimum) value observed "to date" of the specified
exp.
Thus record(wage), by(id) order(year) produces the maximum wage so
far in worker's career, calculations being separate for each id
and
records being determined within each id in year order. Although
explanation and example here refer to dates, nothing in record()
restricts
its use to data ordered in time. If not otherwise specified with
by()
and/or order(), records are determined with respect to the current
order of observations. No special action is required for missing
values,
as internally record() uses either the max() or the min()
function,
both of which return results of missing only if all values are
missing.
In fact, the write-up is longer than the code:
*! 1.2.1 CFB/NJC 8 Oct 2001
* 1.2.0 CFB/NJC 8 Oct 2001
* 1.1.0 CFB 06 Oct 2001
program define _grecord
version 6.0
syntax newvarname =/exp [if] [in] [, BY(varlist) ORDER(varlist) MIN
]
tempvar touse obsno
local op = cond("`min'" == "min", "min", "max")
quietly {
mark `touse' `if' `in'
gen `typlist' `varlist' = `exp' if `touse'
gen long `obsno' = _n
sort `touse' `by' `order' `obsno'
by `touse' `by': /*
*/ replace `varlist' = `op'(`varlist',`varlist'[_n-1]) if `touse'
}
end
This function is part of the -egenmore- package on SSC. You
can install it by
. ssc inst egenmore
Solution 2: interactively from first principles
===============================================
Let's back up and tackle your problem from first principles.
We want to do calculations separately -by category:-
and naturally given -category- the appropriate -sort- order
is by -date-.
Also a simple, and perfectly adequate, starting point is that
the very first value observed is the minimum and the maximum
observed so far, so that is an appropriate initialisation:
bysort category (date) : gen min = amount[1]
by category : gen max = amount[1]
On that second line, we could, for symmetry, have
written
bysort category (date): gen max = amount[1]
but we know that the data are -sort-ed as we wish,
because we just arranged that.
A principle used here is that under the aegis of -by:-
subscripts are interpreted _within_ groups.
As each new value comes in, it is a new minimum if it is
less than the previous minimum, and a new maximum if it
is greater than the previous maximum.
by category : replace min = min(amount, min[_n-1])
by category : replace max = max(amount, max[_n-1])
And that's it, in essence.
What could go wrong, or what could complicate matters:
1. Missing values. Actually, this code is robust
against missing values. That's explained in the
write-up for -record()- above. In fact, I
could have initialised to
bysort category (date) : gen min = .
by category : gen max = .
or indeed to anything else.
2. You want to do this for some subset defined
by -if- and -in-. You could also do that interactively,
but it will be easier to use -record()-.
3. You want to do this for some expression, not
just a single variable. Same comment.
Summary
=======
canned solutions:
egen min = record(amount), by(category) order(date) min
egen max = record(amount), by(category) order(date)
from first principles:
bysort category (date) : gen min = amount[1]
by category : gen max = amount[1]
by category : replace min = min(amount, min[_n-1])
by category : replace max = max(amount, max[_n-1])
John Hennen's solution
======================
John Hennen posted a solution too:
sort date category
capture drop order
gen byte order = _n
capture drop min
capture drop max
sort category date
foreach X of new min max {
by category : gen `X' = amount[1]
}
qui by category : replace min = amount if amount < min[_n-1]
qui by category : replace min = min[_n-1] if min > min[_n-1]
qui by category : replace max = amount if amount > max[_n-1]
qui by category : replace max = max[_n-1] if max < max[_n-1] & _n >1
sort order
list
This is almost the same idea, but not quite.
The code for the maximum will perpetuate any missing values:
after the first missing value is observed, that's the
record from now on. This is because missings count
as higher than any non-missing. In practice, this
might not be a problem, but watch out.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/