Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Identifier values change after Merge
From 
 
Phil Clayton <[email protected]> 
To 
 
<[email protected]> 
Subject 
 
Re: st: Identifier values change after Merge 
Date 
 
Thu, 11 Nov 2010 11:42:35 +1030 
That's strange. Are you sure it's not just that the display format for the 
identifier has changed? You could check this using: 
format %11.0g tractno 
 
If the merge is indeed changing the data, a quick and dirty solution would be to 
create a copy of the identifier using -clonevar- before performing the merge. 
Then you can use the new variable as the identifier. 
 
It may also be worth looking at the user-written command -mmerge- which I find to 
be a more user-friendly command for merging datasets. 
 
Phil 
 
On Thu 11/11/10  8:42 AM , Anjanette Chan Tack [email protected] sent: 
> Hi 
>  
>  
> I am using intercooled stata 9.1 to do a 1 to 1 merge using an 11 digit 
> long identifier that uniquely designates a census tract. As background, I 
> got the census tract data from the geolytics Neighborhood Change database, 
> and these 11 digit numbers are the unique identifiers that come with them. 
>  
> The identifier is being stored as double. In executing the merge, I ask 
> stata to keep the matched observations only and drop the unmatched 
> observations. Since the master file's list of identifiers is a subset of 
> the using file, I was hoping that it would allow me to extract this subset 
> of observations and their attendant information easily. To do so, I use 
> this command: 
>  
> merge 1:1 tractno using C:\Program Files\Stata9\Filename assert (match, 
> master) keep (match) 
> In some ways the merge proceeds well. The resulting list of N observations 
> is the N I expect. The problem is that after the merge, the value of the 
> identifiers change. Where previously, census tracts had unique 11 digit 
> identifiers like, these idenifiers are all rounded to the same number in 
> the new merged dataset. 
>  
> Thus I have a BEFORE and AFTER that look like this: 
>  
> Before: 
>  
> 17031020500 
> 17031020600 
> 17031020700 
> 17031130100 
> 17031090100 
> 17031090200 
>  
> After 
> 1.70E+10 
> 1.70E+10 
> 1.70E+10 
> 1.70E+10 
> 1.70E+10 
> 1.70E+10 
>  
> Where 1.70E+10  = 17030000000 in all cases.  
>  
> I thought that this might be due to the way that stata is storing the 
> information, so I googled "help stata is approximating numeric 
> values". I found an archived response to a problem that seems similar 
> here: http://www.stata.com/statalist/archive/2010-06/msg01017.html 
> The help answer says that the double storage type can sustain up to 15 
> digits. Since my identifier is only 11 digits long, I can't understand what 
> the problem might be. 
> I am quite unfamiliar with stata (it's the first time I'm using it in 3 
> years, and the first time outside a classroom setting for basic trainign in 
> statistics), so I would be grateful for any suggestions and advice. 
> Many thanks in advance! 
>  
> Anjie. 
> ------------------------------- 
> Anjanette M. Chan Tack 
> PhD student  
> University of Chicago Department of Sociology 
> * 
> *   For searches and help try: 
> *   http://www.stata.com/help.cgi?search*   
http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/ 
>  
>  
 
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/