Glenn Hoetker <[email protected]> writes, 
> I have an issue with large datasets, and am hoping for some advice on how to
> best handle it.  To simplify the issue somewhat, [...]
> 
> [...] doing this in Stata is proving more challenging.  The improved merge
> command in version 8 helps a bit, but I'm still having to rename variables
> repeatedly, save interim datasets, and sort large datasets in different
> ways.  [...]
Let me setup Glenn's problem and show how I would go about solving it.
I am not sure this will be helpful because, perhaps, this is just what 
Glenn has already done.
Description of problem
----------------------
We have two datasets, containing
        PATIENTS.dta
            variables:   patno          x1               x2 ...
        CITATIONS.dta
            variables:   patno_citing   patno_cited 
To do:  Create new dataset containing
        COMBINED.dta:
            variables    patno x1 x2 ... patno_citing x1_citing x2_citing ...
Modification of problem
-----------------------
Rather than creating COMBINED.DTA, we will create 
        UNCITED.dta
            variables    patno x1 x2 
        
        CITED.dta
            variables    patno x1 x2 ... patno_citing x1_citing x2_citing ...
These two datasets -append-ed together will be equal to COMBINED.dta.  
Doing this will save a little memory, if that matters.
Solution
--------
        // Step 1:  make UNCITED.dta and 
        //          make TMP_CITED.dta = [PATIENTS.dta] w/ var patno_citing
        . use CITATIONS
        . sort patno_cited
        . rename patno_cited patno
        . save TMP1  
        . use PATIENTS
        . sort patno 
        . merge patno using TMP1, nokeep
        . save TMPRES
        . keep if _merge==1
        . drop _merge 
        . save UNCITED
        . use TMPRES 
        . drop if _merge==1
        . drop _merge
        . save TMP_CITED
        . erase TMPRES.dta
        . erase TMP1.dta
        // Step 2:
        // take TMP_CITED.dta = [PATIENTS.dta] w/ var patno_citing
        // and merge to add x1_citing, x2_citing, ...
        . use PATIENTS
        . rename x1 x1_citing
        . rename x2 x2_citing
        . ...
        . rename patno patno_citing
        . sort patno_citing
        . save TMP2
        . use TMP_CITED
        . sort patno_citing
        . merge patno_citing using TMP2, nokeep
        . assert _merge==3
        . drop _merge
        . save CITED
        . erase TMP2.dta
        . erase TMP_CITED.dta
-- Bill
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/