| |
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: Re: interweave?
If I understand your problem, you want to essentially append the using
dataset's multiple records of information about each pid to each pid/fid
combination in the master dataset that matches on pid? If so, I
think -joinby- may be helpful:
use usingdata
drop fid
save usingdata2
use masterdata
keep pid fid
bysort pid fid: keep if _n==1
joinby pid using usingdata2, unmatched(master) _merge(mrgpidevents)
* now you have multiple copies of the pid event data
* one set for each fid associated with that pid, now append onto master
append using masterdata
sort pid fid
Michael Blasnik
[email protected]
----- Original Message -----
From: <[email protected]>
To: <[email protected]>
Sent: Friday, April 28, 2006 3:53 PM
Subject: st: interweave?
hello all,
i am using SE v. 9.1 for macintosh.
i have two datasets where each contains an identifying (not necessarily
unique) variable (PID). in the "master" dataset, this PID may contain
multiple sub-identifiers (FID), whereas in the "using" dataset only PID
exists. each dataset contains what may be characterized as an "event
history" in -long- format w/ the event history in the "master" dataset
being FID-specific whereas in the "using" dataset it is PID-specific. i
need to interweave, so to speak, the event history from the "using"
dataset into the event history of each FID whose PID exists in both
datasets. briefly, a portion of the data from the "master" dataset:
+--------------------------------+
| pid fid event_date |
|--------------------------------|
| 800000056 56 20sep1972 |
| 800000056 56 03aug1999 |
| 800000056 56 25oct1999 |
| 800000056 56 28oct1999 |
| 800000056 56 28mar2000 |
| 800000056 56 05apr2001 |
| 800000056 56 29apr2002 |
| 800000056 56 30mar2003 |
| 800000056 56 17nov2004 |
|--------------------------------|
| 800000056 215891 25oct1999 |
| 800000056 215891 28oct1999 |
| 800000056 215891 29sep2003 |
| 800000056 215891 30mar2004 |
| 800000056 215891 17nov2004 |
| 800000056 215891 23mar2005 |
|--------------------------------|
And from the using dataset:
+--------------------------------+
| pid fid effect_date |
|--------------------------------|
| 800000056 . 01 Oct 90 |
| 800000056 . 01 Oct 94 |
| 800000056 . 01 Oct 95 |
| 800000056 . 28 Jan 03 |
| 800000056 . 01 Nov 03 |
| 800000056 . 03 Feb 06 |
| 800000056 . 16 Feb 06 |
+--------------------------------+
as it stands now, the "using" data is appended to the "master" data and i
can successfully expand the data to accomodate the additional number of
using records that must be incorporated into the event histories of the
"master" FIDs. the problem i'm having, however, is in figuring out a way
to replace the missing FID values from the "using" dataset so that the
"using" event histories are incorporated into the "master" FID event
histories.
i've searched the online- and print-documentation but to little avail (the
standard -merge-, -append-, and -joinby- commands don't seem to accomplish
precisely what i need here). i suspect this may involve some use of _n or
_N, but alas, a solution eludes me. any suggestions??
many thanks,
clint
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/