Daniel Waxman wrote:
Thanks much for the reply. There is much to learn...
Anyway, I'd imagine that there is little difference in overhead between
using -sum- in this way and creating and then dropping a temporary variable.
I should get over the frugality issues.
--------------------------------------------------------------------------------
I believe that the difference in overhead will depend how big the dataset is
and how often you need to perform the calculation. For 1 000 000
observations, single precision, it seems to be a matter of 10-20 seconds for
a single iteration on a recent-vintage piece of equipment.
For larger datasets, higher precision and many iterations, the time saving
might make -summarize , meanonly- pay.
Joseph Coveney
. clear
. set more off
. quietly set memory 500M
. quietly set obs `=1e6'
. set seed `=date("2006-02-07", "ymd")'
. generate float p_pred_mort = uniform()
. generate float p_act_mort = uniform()
. *
. program define use_egen, rclass
1. tempvar maxmort predmort
2. egen float `maxmort' = max(p_pred_mort)
3. egen float `predmort' = max(p_act_mort)
4. return scalar maxmax = max(`maxmort', `predmort')
5. drop `maxmort' `predmort' // Does omitting this save time?
6. end
. *
. program define use_summarize, rclass
1. tempname maxpred
2. summarize p_pred_mort, meanonly
3. scalar `maxpred' = r(max)
4. summarize p_act_mort, meanonly
5. return scalar maxmax = max(scalar(`maxpred'), r(max))
6. scalar drop `maxpred'
7. end
. *
. set rmsg on
r; t=0.00 16:57:31
. use_egen
r; t=15.47 16:57:46
. use_summarize
r; t=0.22 16:57:47
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/