Others have addressed the question of efficiency,
advising the use of -by:-.
Expanding one comment made elsewhere and making
another:
1. -sort id- could be a little dangerous if
you have no time variable. You should check
out the -stable- option.
2. Just in case it is not clear, your code
below is incorrect as well as inefficient.
It will only give correct answers for
the first three observations. Perhaps you
meant it as a sketch, but programmers tend
to take code literally....
Nick
[email protected]
Raoul C Reulen
> I‘ve got multiple records per person and want to calculate
> some kind of cumulative exposure index per person. I’ve got a
> variable called “exposure” and want to create a new variable
> called “cum_exposure”. Each observation in the cum_exposure
> variable should give the sum of all previous cells in the
> exposure column (but per person). So something like this:
>
> Id Exposure list Cum.Exposure
> 1 10 1 10
> 1 14 2 24
> 1 15 3 39
> 2 8 1 8
> 2 10 2 18
> 2 15 3 32
>
> I’ve tried this:
>
> .by id: gen list=_n
> .gen cum_exposure=.
> .replace cum_exposure= exposure[1] if list==1
> .replace cum_exposure= exposure[1] + exposure [2] if list==2
> .replace cum_exposure= exposure[1] + exposure [2] +
> exposure[3] if list==3
>
> But how can I do this more efficiently? In a loop for example?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/