This is a hot area of the survey research when you get the data from
different sources that are supposed to be on the same individuals, but
due to privacy concerns, you don't have any individual identifiers.
Moreover, identifying information such as the date of birth may have
received some noise of say 5 or 10 days. So instead of doing a fair
-merge-, you would have to guess as to who's who. Of course, you
cannot do that at the individual level for 10000 observations, so all
those probabilistic linkage stuff is a way to stochastically merge the
data sets. I don't know that much about it, just heard a couple of
presentations.
I doubt anybody has implemented this in Stata. I am sure it can be
done, as the models are not that difficult, it just depends on how
much time / programming resource is available.
On Tue, 21 Sep 2004 13:30:37 +0100, Nick Cox <[email protected]> wrote:
> D.E. Clark 2004. Perhaps you could add further details
> for those interested.
>
> Nick
> [email protected]
>
> Adrian Sp�rri-Fahrni
>
> > I'm involved in a project where we link different huge
> > (health) data sets.
> > I'm interested in programming this linkage process in stata
> > calculating
> > Bayesian posterior odds (see Clark,D.E.,2004, as an example).
> > Has anyone of
> > you experience with probabilistic record linkage in Stata?
> > I know about the Australian febrl-project and about linkpro
> > in SAS, but I'd
> > prefer having a Stata solution.
> > Thanks for any references or remarks
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
--
Stas Kolenikov
http://stas.kolenikov.name
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/