Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: merge m:1 by string

From	"Ben Ammar" <[email protected]>
To	[email protected]
Subject	Re: st: merge m:1 by string
Date	Sat, 19 Mar 2011 15:18:38 +0100

Hi Rebecca,

thanks for your answer and your hint. Your are right there still were trailing blanks in the strings I didn't consider.
For others that might encounter the same problem. First use the command trim() for your strings before you merge datasets.
Cheers
Ben

-------- Original-Nachricht --------
> Datum: Fri, 18 Mar 2011 18:45:18 -0500
> Von: Rebecca Pope <[email protected]>
> An: [email protected]
> Betreff: Re: st: merge m:1 by string

> Ben,
> If this is real data from your sample, I'm not sure what is causing
> your problem. I wasn't able to duplicate the issue you describe.
> 
> /***** begin code *****/
> clear
> input str32 name budget
> "Alex T. Smith"         130
> "Andrew J. Williams"    345
> "Steve R. Jackson"      245
> end
> save using, replace
> 
> clear
> input str32 name household1    date
> "Alex T. Smith"         45          1988
> "Alex T. Smith"         33          1977
> "Andrew J. williams"    12          1999
> "Andrew J. Williams"    12          2004
> "Steve R. Jackson"      23          1979
> end
> 
> merge m:1 name using using
> 
> list
> /***** end code *****/
> 
> /**** output - apologies if this doesn't line up on your end... ****/
> 
> 	name   househ~1   date   budget            _merge
> 	-----------------------------------------------------------------
> 1.	Alex T. Smith         45   1988      130       matched (3)
> 2.	Alex T. Smith         33   1977      130       matched (3)
> 3.	Andrew J. Williams         12   2004      345       matched (3)
> 4.	Andrew J. williams         12   1999        .   master only (1)
> 5.	Steve R. Jackson         23   1979      245       matched (3)
> 
> /********/
> As you can see, Stata matches everything except obs. #4 above, but
> that's to be expected because "williams" is not equivalent "Williams";
> Stata is case-sensitive.
> 
> Also, please verify that either (1) this produces the same results on
> your computer or (2) that the same problem emerges even when you run
> this code. Since you didn't specify, I'm assuming you are running
> Stata 11.
> 
> If this code works for you, my guess is that there are differences in
> your actual data you can't see by just "eyeballing" it. You say you
> checked for leading spaces. Did you check for trailing ones?
> 
> As regards -encode-, I think you are using it incorrectly or at least
> expecting it to be something it isn't. It is just going to generate a
> numeric variable that takes a new value for each distinct value of the
> string, there is no particular relationship between the numeric
> variable and the string variable other than the order Stata
> encountered the particular string value. Observe the results below
> (code not shown) "ename" is encoded name in the master set & "ename_u"
> is for using. As you can see, the encoded names are different for obs
> 4 & 5.
> 
> 	name   househ~1   date   ename   budget   ename_u   _merge
> 	--------------------------------------------------------------------------
> 1.	Alex T. Smith         33   1977       1      130         1        3
> 2.	Alex T. Smith         45   1988       1      130         1        3
> 3.	Andrew J. Williams         12   2004       2      345         2       
> 3
> 4.	Andrew J. williams         12   1999       3        .         .       
> 1
> 5.	Steve R. Jackson         23   1979       4      245         3        3
> 
> Hope this helps,
> Rebecca
> 
> 
> 
>          __o                __o
>       _`\ <,_            _`\ <,_
>      (_)/   (_)          (_)/   (_)
> =========================
> 
> 
> On Fri, Mar 18, 2011 at 5:21 PM, Ben Ammar <[email protected]> wrote:
> >
> > Hi everybody,
> >
> > I've got a problem concerning the merge-command or rather the result of
> it.
> > I'd be very grateful for any help. There are more than 2 million names
> (%str32) in my master and 4000 names(%str32) in my using concerning the
> variable (name) I want to merge on. Since there are multiple observations with
> the same name in my master but only one unique observation in the using,
> the m:1 merge command supposed to be correct.
> >
> > master:
> > name               household1    date
> >
> > Alex T. Smith         45          1988
> > Alex T. Smith         33          1977
> > Andrew J. williams    12          1999
> > Andrew J. Williams    12          2004
> > Steve R. Jackson      23          1979
> >
> >
> > using:
> > name                 budget
> >
> > Alex T. Smith         130
> > Andrew J. Williams    345
> > Steve R. Jackson      245
> >
> >
> > but what happens is that the using is appended at the end of the master
> after the merger. I think the problem here is the string variable even
> though I don't understand why. When I encoded the string variable (name) about
> 8000 observations (out of 2 million) in the master where matched just like
> it should be but unfortunately not yet enough. The format of the var in
> both data sets is the same and I even sorted them. I also checked if there's a
> space at the beginning of the name or if there's anything within the
> string that differs from the using-name but both string-variables are exactly
> the same. Last (unlikely) case I checked was the RAM by dropping all other
> variables which could have taken too much memory and therefore explain why a
> very little part was matched when trying to encode the string. That didn't
> work either. Does anyone have an idea on that or even made the same
> experience? Thanks for any comments!
> >
> > Regards
> > Ben
> >
> >
> > --
> > NEU: FreePhone - kostenlos mobil telefonieren und surfen!
> > Jetzt informieren: http://www.gmx.net/de/go/freephone
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/help.cgi?search
> > *   http://www.stata.com/support/statalist/faq
> > *   http://www.ats.ucla.edu/stat/stata/
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

-- 
NEU: FreePhone - kostenlos mobil telefonieren und surfen!			
Jetzt informieren: http://www.gmx.net/de/go/freephone
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: merge m:1 by string
  - From: "Ben Ammar" <[email protected]>
- Re: st: merge m:1 by string
  - From: Rebecca Pope <[email protected]>

Prev by Date: st: Re: Predicted values where a subgroup of variables are held constant
Next by Date: st: RE: st: Marginal Effects LogNormal Two-Part Model
Previous by thread: Re: st: merge m:1 by string
Next by thread: st: xtDPDsys postestimation
Index(es):
- Date
- Thread