[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: st Data Manupulation

From	"Jens Lauritsen" <[email protected]>
To	<[email protected]>
Subject	st: Re: st Data Manupulation
Date	Tue, 1 Mar 2005 10:41:07 +0100 (CET)

For the "find mate" problem of:
nh pid  sid  age  educ
1    1    2   34    3
1    2    1   29    2
1    3    .   21    2
1    4    .   27    1
2    1    3   44    12
2    2    .   23    9
2    3    1   31    11
2    4    .   19     2
2    5    .   27     3

A combined strategy of egen, by  etc would do this type of rearrangement,
but it can be very useful to control the logic yourself by comparison of
one record with the next using recordidentifier varname[_n] and comparing
to prior or next record with index  [_n-1]  or [_n+1]

Often the problem is however that the data are not consistent due to data
entry error or misreading of entry material.

For the situation at hand  one strategy could be something along the lines
of:

* assuming you only consider couples:
use datafilename
drop if sid == .              //as suggested yesterday.
sort nh pid sid

* assert quality of data:
gen str10 test = string(nh)+"-" + string(pid) + "-" + string(sid)
test = trim(test)
assert  test != test[_n-1]
* if the assert failed then do a listing of the records which failed.
e.g. list if  test == test[_n-1]
* and fix the data.

* now generate an identifier.
gen coupleid = _n              //record number
replace coupleid =coupleid[_n-1]  if nh == nh[_n-1]

* generate variables of interest e.g. age difference:
gen agedif = age - age[_n+1] if  coupleid == coupleid[_n+1]

*make a lookup table of this:
keep nh pid coupleid agedif

sort nh pid
save lookupfile, replace

* put back the variables to original file:
use datefilename
sort nh pid
merge nh pid using lookupfile

* do whatever you like of  analysis, but remember to count each couple
only once,  e.g.
keep if coupleid != coupleid[_n-1]
summ  agedif

Jens Lauritsen
Odense University Hospital, Denmark
www.epidata.dk



*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: st: within-household person/spouse data manipulation problem
Next by Date: st: update: rfl 3.2 (a recent files list for Stata) up on SSC
Previous by thread: st: within-household person/spouse data manipulation problem
Next by thread: st: update: rfl 3.2 (a recent files list for Stata) up on SSC
Index(es):
- Date
- Thread