Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: m:1 merge with string function, data set too large?

From	"David Fredericks" <[email protected]>
To	<[email protected]>
Subject	st: m:1 merge with string function, data set too large?
Date	Fri, 23 Aug 2013 11:57:20 +0530

Dear all

I just spent a frustrating morning trying to undertake a m:1 merge using a
string function (uni = unique household identifier) using for  Stata 11.2m
on an Asus laptop with i7 processor, 4 gig of RAM, a 64-bit operating system
running Windows 7.    

I have a large household data set with 7 rounds of data for each household
(master) and wished to merge this with another file that linked the unique
identifier for each household to a village name (using file).

I used the command:
merge m:1 uni round using "filename"

And that produced some funny results. 


    Result                           # of obs.
    -----------------------------------------
    not matched                           146
        from master                       126  (_merge==1)
        from using                         20  (_merge==2)

    matched                             2,145  (_merge==3)
    -----------------------------------------

I should have a village name for 2,145 households.  However, I only got 331
village names matched for one round of data.

Village		Freq.	Percnt	Cum.		
Vname1	12	3.70	3.70
Vname2	23	7.10	10.80
Vname3	22	6.79	17.59
Vname4	22	6.79	24.38
Vname5	16	4.94	29.32
Vname6	22	6.79	36.11
Vname7	16	4.94	41.05
Vname8	18	5.56	46.60
Vname9	40	12.35	58.95
Vname10	16	4.94	63.89
Vname11	22	6.79	70.68
Vname12	30	9.26	79.94
Vname13 	53	16.36	96.30
Vname14	12	3.70	100.00
Total	324	100.00

It was not a problem with leading or trailing spaces.

It seems to have been a problem with the size of the master data set and the
use of a m:1 merge (and possibly the fact it was string merge and memory
allocation).  
When I deleted all the data parameters from the master file I was able to
successfully merge the two data sets using a m:1 merge . 
After that I was able to merge the original large data set with the file
contain the round, the unique household identifier and village name using a
1:1 merge.

The data set was large (for me)

obs:         2,271                          
vars:           301                          23 Aug 2013 11:12
size:     3,987,876 (99.2% of memory free)

I could not find any other reference to this problem on the net, so have
posted the problem/sollution (for me) here.  Of course I would be much
better off without the string identifier for the household.

I'm sorry I can't post/share the data files to replicate this problem but
this may help someone at some stage.

df

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- st: RE: m:1 merge with string function, data set too large?
  - From: Joe Canner <[email protected]>

Prev by Date: Re: st: re: comparing models after multiple imputation
Next by Date: Re: st: Double Clustered Standard Errors in Regression with Factor Variables
Previous by thread: st: endogenous switching regression - test for instrument variables validity
Next by thread: st: RE: m:1 merge with string function, data set too large?
Index(es):
- Date
- Thread