<>
Shezad might also like to note that -egen, sum()- was abandoned for the
Stata 9 release (-help whatsnew8to9-), but a look at its code shows that it
calls -egen, total()- on the user`s behalf.
*************
clear*
input patient_id timepoint clinic
1 1 .
1 2 .
1 3 .
2 1 2
2 2 0
3 1 1
3 2 .
end
bysort patient_id: egen sum_clinic = sum(clinic)
bysort patient_id: egen total_clinic = total(clinic)
compare sum_clinic total_clinic
*************
The -sum()- function lives on when called by -gen- as in my code earlier.
It produces a runnning sum, though, in contrast to the (constant) total
returned by -egen, total()-...
HTH
Martin
-----Ursprüngliche Nachricht-----
Von: [email protected]
[mailto:[email protected]] Im Auftrag von Ulrich Kohler
Gesendet: Freitag, 17. April 2009 15:43
An: [email protected]
Betreff: Re: st: simple sum() question
Shehzad Ali wrote
Here is a quick summary of what I am doing. Each patient (varname:
patient_id) was observed at 4 time points and at each time point we
asked about the clinic visits (varname: clinic) in the last 3 months.
The
dataset
is in long form (shown below):
patient_id timepoint clinic
1 1 0
1 2 1
1 3 .
2 1 2
2 2 0
3 1 1
3 2 .
The line below generates a sum of all clinic visits for each patient:
bysort patient_id: egen sum_clinic = sum(clinic)
Now if at one time point, clinic visit is missing (as its seen for
patients
1 and 3), then I want stata to return missing value for the sum. The
above
command returns the total of the non-missing observations, ignoring the
missing ones (understandably). But if I tried:
bysort patient_id: egen sum_clinic = sum(clinic) if clinic!=.
then it returns missing value for the sum variable only for the time
point
which is missing and not for all the time points for that patient. Can
anyone please suggest how to resolve this?
Martin already gave a good solution. Here is an alternative starting
from first principles:
. by patient_id (clinic), sort: gen sum_clinic = sum(clinic)
. by patient_id (clinic): ///
replace sum_clinic = cond(!mi(clinic[_N]),sum_clinic[_N],.)
This uses the fact that observations which are missing on clinic==. are
sorted at the end of each observation.
Secondly, whats the best way to collapse the dataset to one observation
per
patient? Once I have the sum_clinic for each patient, it would be easier
just to have one observation per patient.
by patient_id: keep if _n==1
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/