Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: m:1 merge with string function, data set too large?
From
"David Fredericks" <[email protected]>
To
<[email protected]>
Subject
st: m:1 merge with string function, data set too large?
Date
Fri, 23 Aug 2013 11:57:20 +0530
Dear all
I just spent a frustrating morning trying to undertake a m:1 merge using a
string function (uni = unique household identifier) using for Stata 11.2m
on an Asus laptop with i7 processor, 4 gig of RAM, a 64-bit operating system
running Windows 7.
I have a large household data set with 7 rounds of data for each household
(master) and wished to merge this with another file that linked the unique
identifier for each household to a village name (using file).
I used the command:
merge m:1 uni round using "filename"
And that produced some funny results.
Result # of obs.
-----------------------------------------
not matched 146
from master 126 (_merge==1)
from using 20 (_merge==2)
matched 2,145 (_merge==3)
-----------------------------------------
I should have a village name for 2,145 households. However, I only got 331
village names matched for one round of data.
Village Freq. Percnt Cum.
Vname1 12 3.70 3.70
Vname2 23 7.10 10.80
Vname3 22 6.79 17.59
Vname4 22 6.79 24.38
Vname5 16 4.94 29.32
Vname6 22 6.79 36.11
Vname7 16 4.94 41.05
Vname8 18 5.56 46.60
Vname9 40 12.35 58.95
Vname10 16 4.94 63.89
Vname11 22 6.79 70.68
Vname12 30 9.26 79.94
Vname13 53 16.36 96.30
Vname14 12 3.70 100.00
Total 324 100.00
It was not a problem with leading or trailing spaces.
It seems to have been a problem with the size of the master data set and the
use of a m:1 merge (and possibly the fact it was string merge and memory
allocation).
When I deleted all the data parameters from the master file I was able to
successfully merge the two data sets using a m:1 merge .
After that I was able to merge the original large data set with the file
contain the round, the unique household identifier and village name using a
1:1 merge.
The data set was large (for me)
obs: 2,271
vars: 301 23 Aug 2013 11:12
size: 3,987,876 (99.2% of memory free)
I could not find any other reference to this problem on the net, so have
posted the problem/sollution (for me) here. Of course I would be much
better off without the string identifier for the household.
I'm sorry I can't post/share the data files to replicate this problem but
this may help someone at some stage.
df
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/