Jeph wrote:
Though I see a long way to solve this, I'm hoping
someone out there sees a more efficient solution.
My data consist of different reports of hospital
admissions - one from the medical record (mr) and
one from the patient (pat). I have an admission
date (admdt) & hospital code (hospcd) for each
admission. Here -id- identifies patients, and I've
added some separators manually for clarity:
id mr_admdt mr_hospcd pt_admdt pt_hospcd
---------------------------------------------------
1 1 Jan 07 35 6 Sep 06 35
1 6 Sep 06 35 1 Feb 07 36
1 . . 23 Jun 06 35
---------------------------------------------------
2 11 Oct 07 34 21 Dec 06 34
2 21 Dec 06 34 . .
---------------------------------------------------
3 1 Jan 07 33 1 Jan 07 33
---------------------------------------------------
...
The problem is to determine discrepancies. For instance,
patient [3] has no problems - the two sources identify
the same admissions (same date and place). However patient
[1] has one match (both report 6 sep 06 at hosp #35) and
then three discrepancies.
Ideally I would end up with:
id mr_admdt mr_hospcd mr_dis pt_admdt pt_hospcd pat_dis
-----------------------------------------------------------
1 1 Jan 07 35 1 6 Sep 06 35 0
1 6 Sep 06 35 0 1 Feb 07 36 1
1 . 23 Jun 06 35 1
-----------------------------------------------------------
2 11 Oct 07 34 1 21 Dec 06 34 0
2 21 Dec 06 34 0 .
-----------------------------------------------------------
3 1 Jan 07 33 0 1 Jan 07 33 0
-----------------------------------------------------------
...
My solution would be to create two datasets and merge them
on (hospdt hospcd), using _merge to identify which
records don't match. However, I it gets messy trying to
get the results back into the original file.
Is there any command or routine out there that will let
me match up two variables (I can always group the date
& code variables into one) without disturbing the original
data? I don't mind sorting the admissions with patid
(the current order is the arbitrary order in which they
are reported).
=========================================================
I don't quite understand what you want to end up with. I think
I would want something like:
+-------------------------------------------------------+
| id hospcd hospdt patdata hospdata _merge |
|-------------------------------------------------------|
1. | 1 35 23jun2006 1 . 1 |
2. | 1 35 06sep2006 1 1 3 |
3. | 1 35 01jan2007 . 1 2 |
|-------------------------------------------------------|
4. | 2 34 21dec2006 1 1 3 |
5. | 2 34 11oct2007 . 1 2 |
|-------------------------------------------------------|
6. | 3 33 01jan2007 1 1 3 |
+-------------------------------------------------------+
I obtained it with this do-file:
-----------------------------
clear
input id d m y hospcd
1 1 1 2007 35
1 6 9 2006 35
2 11 10 2007 34
2 21 12 2006 34
3 1 1 2007 33
end
gen hospdt = mdy(m,d,y)
format hospdt %td
drop d m y
generate hospdata=1
sort id hospdt hospcd
save hospdata.dta, replace
clear
input id d m y hospcd
1 6 9 2006 35
1 2 2007 36
1 23 6 2006 35
2 21 12 2006 34
3 1 1 2007 33
end
gen hospdt = mdy(m,d,y)
format hospdt %td
drop d m y
sort id hospdt hospcd
generate patdata=1
save patdata.dta, replace
merge id hospdt hospcd using hospdata.dta
sort id hospdt hospcd
list, sepby(id)
-----------------------------
Hope this helps
Svend
________________________________________________________
Svend Juul
Institut for Folkesundhed, Afdeling for Epidemiologi
(Institute of Public Health, Department of Epidemiology)
Vennelyst Boulevard 6
DK-8000 Aarhus C, Denmark
Phone, work: +45 8942 6090
Phone, home: +45 8693 7796
Fax: +45 8613 1580
E-mail: [email protected]
_________________________________________________________
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/