Glenn Hoetker <[email protected]> writes,
> I have an issue with large datasets, and am hoping for some advice on how to
> best handle it. To simplify the issue somewhat, [...]
>
> [...] doing this in Stata is proving more challenging. The improved merge
> command in version 8 helps a bit, but I'm still having to rename variables
> repeatedly, save interim datasets, and sort large datasets in different
> ways. [...]
Let me setup Glenn's problem and show how I would go about solving it.
I am not sure this will be helpful because, perhaps, this is just what
Glenn has already done.
Description of problem
----------------------
We have two datasets, containing
PATIENTS.dta
variables: patno x1 x2 ...
CITATIONS.dta
variables: patno_citing patno_cited
To do: Create new dataset containing
COMBINED.dta:
variables patno x1 x2 ... patno_citing x1_citing x2_citing ...
Modification of problem
-----------------------
Rather than creating COMBINED.DTA, we will create
UNCITED.dta
variables patno x1 x2
CITED.dta
variables patno x1 x2 ... patno_citing x1_citing x2_citing ...
These two datasets -append-ed together will be equal to COMBINED.dta.
Doing this will save a little memory, if that matters.
Solution
--------
// Step 1: make UNCITED.dta and
// make TMP_CITED.dta = [PATIENTS.dta] w/ var patno_citing
. use CITATIONS
. sort patno_cited
. rename patno_cited patno
. save TMP1
. use PATIENTS
. sort patno
. merge patno using TMP1, nokeep
. save TMPRES
. keep if _merge==1
. drop _merge
. save UNCITED
. use TMPRES
. drop if _merge==1
. drop _merge
. save TMP_CITED
. erase TMPRES.dta
. erase TMP1.dta
// Step 2:
// take TMP_CITED.dta = [PATIENTS.dta] w/ var patno_citing
// and merge to add x1_citing, x2_citing, ...
. use PATIENTS
. rename x1 x1_citing
. rename x2 x2_citing
. ...
. rename patno patno_citing
. sort patno_citing
. save TMP2
. use TMP_CITED
. sort patno_citing
. merge patno_citing using TMP2, nokeep
. assert _merge==3
. drop _merge
. save CITED
. erase TMP2.dta
. erase TMP_CITED.dta
-- Bill
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/