Shehzad Ali wrote
> Here is a quick summary of what I am doing. Each patient (varname:
> patient_id) was observed at 4 time points and at each time point we asked
> about the clinic visits (varname: clinic) in the last 3 months. The dataset
> is in long form (shown below):
>
> patient_id timepoint clinic
> 1 1 0
> 1 2 1
> 1 3 .
> 2 1 2
> 2 2 0
> 3 1 1
> 3 2 .
>
> The line below generates a sum of all clinic visits for each patient:
>
> bysort patient_id: egen sum_clinic = sum(clinic)
>
> Now if at one time point, clinic visit is missing (as its seen for patients
> 1 and 3), then I want stata to return missing value for the sum. The above
> command returns the total of the non-missing observations, ignoring the
> missing ones (understandably). But if I tried:
>
> bysort patient_id: egen sum_clinic = sum(clinic) if clinic!=.
>
> then it returns missing value for the sum variable only for the time point
> which is missing and not for all the time points for that patient. Can
> anyone please suggest how to resolve this?
Martin already gave a good solution. Here is an alternative starting
from first principles:
. by patient_id (clinic), sort: gen sum_clinic = sum(clinic)
. by patient_id (clinic): ///
replace sum_clinic = cond(!mi(clinic[_N]),sum_clinic[_N],.)
This uses the fact that observations which are missing on clinic==. are
sorted at the end of each observation.
> Secondly, whats the best way to collapse the dataset to one observation per
> patient? Once I have the sum_clinic for each patient, it would be easier
> just to have one observation per patient.
by patient_id: keep if _n==1
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/