Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: How do you select and describe a single variable of interest from a merged dataset but avoiding duplication (due to the merge)? |
Date | Thu, 4 Apr 2013 17:48:15 +0100 |
Look at the -tag()- function of the -egen- command. (It was mentioned earlier today in another thread; posters should read Statalist as well as write to it!) Here is a dopey example: . sysuse auto, clear (1978 Automobile Data) . egen tag = tag(rep78) . l rep78 if tag +-------+ | rep78 | |-------| 1. | 3 | 5. | 4 | 12. | 2 | 20. | 5 | 40. | 1 | +-------+ Nick njcoxstata@gmail.com On 4 April 2013 17:39, Gwinyai Masukume <parturitions@gmail.com> wrote: > I have a single dataset obtained by merging two datasets (these 2 > datasets are related – obtained from a relational database). > e.g. 1st dataset was of patients and the second dataset was of their > hospital visits – a single patient can have multiple hospital visits. > So the merged dataset has many entries for a single patient. > > In my merged data set, I would like to analyze say patient age > (assuming it’s fixed for that patient regardless of the number of > visits). Since a single patient has the same age for their different > hospital visits, a command like “sum Age” will give too many > observations for age (duplication). > > Each patient has a unique ID (identification number). > How do I issue a command to only count 1 age for each unique patient > ID and then summarize this information? > I have tried using the duplicates command to drop other hospital > visits and remain with one visit, then pick say patient age from this > to avoid the duplication mentioned above. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/