For the "find mate" problem of:
nh pid sid age educ
1 1 2 34 3
1 2 1 29 2
1 3 . 21 2
1 4 . 27 1
2 1 3 44 12
2 2 . 23 9
2 3 1 31 11
2 4 . 19 2
2 5 . 27 3
A combined strategy of egen, by etc would do this type of rearrangement,
but it can be very useful to control the logic yourself by comparison of
one record with the next using recordidentifier varname[_n] and comparing
to prior or next record with index [_n-1] or [_n+1]
Often the problem is however that the data are not consistent due to data
entry error or misreading of entry material.
For the situation at hand one strategy could be something along the lines
of:
* assuming you only consider couples:
use datafilename
drop if sid == . //as suggested yesterday.
sort nh pid sid
* assert quality of data:
gen str10 test = string(nh)+"-" + string(pid) + "-" + string(sid)
test = trim(test)
assert test != test[_n-1]
* if the assert failed then do a listing of the records which failed.
e.g. list if test == test[_n-1]
* and fix the data.
* now generate an identifier.
gen coupleid = _n //record number
replace coupleid =coupleid[_n-1] if nh == nh[_n-1]
* generate variables of interest e.g. age difference:
gen agedif = age - age[_n+1] if coupleid == coupleid[_n+1]
*make a lookup table of this:
keep nh pid coupleid agedif
sort nh pid
save lookupfile, replace
* put back the variables to original file:
use datefilename
sort nh pid
merge nh pid using lookupfile
* do whatever you like of analysis, but remember to count each couple
only once, e.g.
keep if coupleid != coupleid[_n-1]
summ agedif
Jens Lauritsen
Odense University Hospital, Denmark
www.epidata.dk
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/