Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: RE: m:1 merge with string function, data set too large?
From
Joe Canner <[email protected]>
To
"[email protected]" <[email protected]>
Subject
st: RE: m:1 merge with string function, data set too large?
Date
Fri, 23 Aug 2013 17:05:34 +0000
David,
I know you didn't actually ask for help, but you got my curiosity up. I am very skeptical that Stata had a problem with this merge because you had too much data or because you were using a string variable.
What do you mean by "deleted all the data parameters from the master file"?
Also, how is the variable "round" defined in the -using- dataset? If you do not have an observation in the -using- dataset for each household(uni)-round combination you could get strange results like the ones you posted. However, this would be an odd thing to have (i.e., a village linkage file with household-village linkage that is duplicated for each round), so I suspect what you really want is:
. merge m:1 uni using "filename"
I'm not sure how what you did solved the problem, but I suspect you may have similar problems in the future if you are not adequately accounting for the structure of your files when you do a merge.
Regards,
Joe Canner
Johns Hopkins University School of Medicine
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of David Fredericks
Sent: Friday, August 23, 2013 2:27 AM
To: [email protected]
Subject: st: m:1 merge with string function, data set too large?
Dear all
I just spent a frustrating morning trying to undertake a m:1 merge using a string function (uni = unique household identifier) using for Stata 11.2m on an Asus laptop with i7 processor, 4 gig of RAM, a 64-bit operating system
running Windows 7.
I have a large household data set with 7 rounds of data for each household
(master) and wished to merge this with another file that linked the unique identifier for each household to a village name (using file).
I used the command:
merge m:1 uni round using "filename"
And that produced some funny results.
Result # of obs.
-----------------------------------------
not matched 146
from master 126 (_merge==1)
from using 20 (_merge==2)
matched 2,145 (_merge==3)
-----------------------------------------
I should have a village name for 2,145 households. However, I only got 331 village names matched for one round of data.
Village Freq. Percnt Cum.
Vname1 12 3.70 3.70
Vname2 23 7.10 10.80
Vname3 22 6.79 17.59
Vname4 22 6.79 24.38
Vname5 16 4.94 29.32
Vname6 22 6.79 36.11
Vname7 16 4.94 41.05
Vname8 18 5.56 46.60
Vname9 40 12.35 58.95
Vname10 16 4.94 63.89
Vname11 22 6.79 70.68
Vname12 30 9.26 79.94
Vname13 53 16.36 96.30
Vname14 12 3.70 100.00
Total 324 100.00
It was not a problem with leading or trailing spaces.
It seems to have been a problem with the size of the master data set and the use of a m:1 merge (and possibly the fact it was string merge and memory allocation).
When I deleted all the data parameters from the master file I was able to successfully merge the two data sets using a m:1 merge .
After that I was able to merge the original large data set with the file contain the round, the unique household identifier and village name using a
1:1 merge.
The data set was large (for me)
obs: 2,271
vars: 301 23 Aug 2013 11:12
size: 3,987,876 (99.2% of memory free)
I could not find any other reference to this problem on the net, so have posted the problem/sollution (for me) here. Of course I would be much better off without the string identifier for the household.
I'm sorry I can't post/share the data files to replicate this problem but this may help someone at some stage.
df
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/