On Thursday, August 28, 2003, at 02:33 AM, John wrote:
What is the difference between the way .merge and .joinby work? I've
been
using joinby because it appears to work the same way relational
databases
do, and I'm familiar with that concept.
That is correct--joinby forms the Cartesian product (outer join), which
users of RDBMS are exhorted to avoid at all costs (run a proposed
SELECT statement with an outer join by your DBA and see what s/he
says). You practically never really want a Cartesian product, which
generates a row (observation) for every defined combination of the two
sets (in Stataese, the master and using dataset). More usually, you
want to somehow match the observations in the using dataset with the
master dataset -- with a one-to-one, one-to-many, or many-to-one merge.
If you have about the same number of obs. in both datasets it would
seem that you're really trying to do a one-to-one merge. joinby will
not achieve that, but will generate a huge number of observations in
the Cartesian product (about 450^2? 450^2 obs and 47 variables is quite
a bit larger than 450 x 45).
Kit
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/