Ann,
I think this is what you want:
sort id cdc4cdate cdc4c
by id cdc4cdate (cdc4c): keep if _n == 1
The cdc4c results will be sorted in ascending order within each date and
this selects only the lowest result.
-ml
Ann Miller wrote:
Dear Statalist,
I am trying to clean some data, in which I have 2 different and contradictory lab results on the same date. Example:
id cd4c cd4cdate
1 325 01 Mar 06
1 352 01 Mar 06
1 500 03 Aug 06
2 167 20 Mar 06
2 302 20 Mar 06
2 900 12 Dec 06
3 118 20 Oct 05
3 178 20 Oct 05
3 450 01 May 06
I want to drop the row with the highest cd4c date when there's a date match. This is proving to be surprisingly hard to do. I tried
sort id cd4c cd4cdate
bysort imbd_id cd4cdate: gen min= cd4c[1]
and then tried to replace the cd4c with min, but in this case, min was not always the smallest cd4. I suspect that's because when I sort based on the date, it doesn't also sort by the cd4c value. Is there a way that I can reliably drop the row with the largest CD4c value when id and cd4cdate match?
Many thanks!
--Ann
Ann C. Miller, PhD, MPH
Research Associate
FXB Center for Health and Human Rights
Harvard School of Public Health
651 Huntington Ave, 7th Floor
Boston, MA 02115
(617) 432-7297
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
--
Michael I. Lichter, Ph.D. <[email protected]>
Research Assistant Professor & NRSA Fellow
UB Department of Family Medicine / Primary Care Research Institute
UB Clinical Center, 462 Grider Street, Buffalo, NY 14215
Office: CC 125 / Phone: 716-898-4751 / FAX: 716-898-3536
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/