Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: joinby command and memory issues
From
"Weichle, Thomas" <[email protected]>
To
<[email protected]>
Subject
st: joinby command and memory issues
Date
Fri, 8 Oct 2010 10:57:42 -0500
Hi Statalisters,
I'm trying to combine two datasets using the joinby command. In the
hgb.dta dataset, it contains hemoglobin test results for individuals and
could contain multiple tests on the same day. I'm only keeping the
study_id, test date, and test result. This dataset is very large. In
the epo.dta dataset, it contains individuals who receive the EPO drug
and receipt date and could contain multiple receipts for an individual.
My goal is to create all pairwise combinations between the two dates in
order to determine whether and drug receipt was within 7 days of the
hemoglobin test(s).
Doing do will create a very large dataset and I don't believe I have the
memory capacity to do so. I 'set memory to 1000m" which appears to be
the maximum on my computer, but I receive an error. Are there any
suggestions to be able to carry out such a large task? If there was a
way to only include the variables for study_id and receipt date in the
epo.dta dataset, then this might free some space but I don't think
joinby allows this option. There are over 36,000 individuals in the
epo.dta dataset and over 406,000 total epo receipts.
set memory 1000m
Current memory allocation
current memory usage
settable value description (1M = 1024k)
--------------------------------------------------------------------
set maxvar 5000 max. variables allowed 1.909M
set memory 1000M max. data space 1,000.000M
set matsize 400 max. RHS vars in models 1.254M
-----------
1,003.163M
. use study_id ord_date result using
"G:\ESA_Cancer\ESA_DATA\ESA_USE\hgb0209.dta", clear
. unique study_id
Number of unique values of study_id is 255317
Number of records is 7438632
.
. sort study_id ord_date
.
. describe, fullnames
Contains data from G:\ESA_Cancer\ESA_DATA\ESA_USE\hgb0209.dta
obs: 7,438,632
vars: 3
size: 208,281,696 (83.0% of memory free)
------------------------------------------------------------------------
---------------------------
storage display value
variable name type format label variable label
------------------------------------------------------------------------
---------------------------
study_id double %12.0g Study ID
ord_date long %d order date
result double %12.0g
------------------------------------------------------------------------
---------------------------
Sorted by: study_id ord_date
.
. * Pairwise combinations
. joinby study_id using "G:\ESA_Cancer\ESA_DATA\ESA_USE\epo0209.dta",
unmatched(none)
no room to add more observations
An attempt was made to increase the number of observations beyond
what is currently possible.
You have the following alternatives:
1. Store your variables more efficiently; see help compress.
(Think of Stata's data area as
the area of a rectangle; Stata can trade off width and length.)
2. Drop some variables or observations; see help drop.
3. Increase the amount of memory allocated to the data area using
the set memory command; see
help memory.
r(901);
Tom Weichle
Math Statistician
Center for Management of Complex Chronic Care (CMC3)
Hines VA Hospital, Bldg 1, C202
708-202-8387 ext. 24261
[email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/