Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Identifier values change after Merge
From
Phil Clayton <[email protected]>
To
<[email protected]>
Subject
Re: st: Identifier values change after Merge
Date
Thu, 11 Nov 2010 11:42:35 +1030
That's strange. Are you sure it's not just that the display format for the
identifier has changed? You could check this using:
format %11.0g tractno
If the merge is indeed changing the data, a quick and dirty solution would be to
create a copy of the identifier using -clonevar- before performing the merge.
Then you can use the new variable as the identifier.
It may also be worth looking at the user-written command -mmerge- which I find to
be a more user-friendly command for merging datasets.
Phil
On Thu 11/11/10 8:42 AM , Anjanette Chan Tack [email protected] sent:
> Hi
>
>
> I am using intercooled stata 9.1 to do a 1 to 1 merge using an 11 digit
> long identifier that uniquely designates a census tract. As background, I
> got the census tract data from the geolytics Neighborhood Change database,
> and these 11 digit numbers are the unique identifiers that come with them.
>
> The identifier is being stored as double. In executing the merge, I ask
> stata to keep the matched observations only and drop the unmatched
> observations. Since the master file's list of identifiers is a subset of
> the using file, I was hoping that it would allow me to extract this subset
> of observations and their attendant information easily. To do so, I use
> this command:
>
> merge 1:1 tractno using C:\Program Files\Stata9\Filename assert (match,
> master) keep (match)
> In some ways the merge proceeds well. The resulting list of N observations
> is the N I expect. The problem is that after the merge, the value of the
> identifiers change. Where previously, census tracts had unique 11 digit
> identifiers like, these idenifiers are all rounded to the same number in
> the new merged dataset.
>
> Thus I have a BEFORE and AFTER that look like this:
>
> Before:
>
> 17031020500
> 17031020600
> 17031020700
> 17031130100
> 17031090100
> 17031090200
>
> After
> 1.70E+10
> 1.70E+10
> 1.70E+10
> 1.70E+10
> 1.70E+10
> 1.70E+10
>
> Where 1.70E+10 = 17030000000 in all cases.
>
> I thought that this might be due to the way that stata is storing the
> information, so I googled "help stata is approximating numeric
> values". I found an archived response to a problem that seems similar
> here: http://www.stata.com/statalist/archive/2010-06/msg01017.html
> The help answer says that the double storage type can sustain up to 15
> digits. Since my identifier is only 11 digits long, I can't understand what
> the problem might be.
> I am quite unfamiliar with stata (it's the first time I'm using it in 3
> years, and the first time outside a classroom setting for basic trainign in
> statistics), so I would be grateful for any suggestions and advice.
> Many thanks in advance!
>
> Anjie.
> -------------------------------
> Anjanette M. Chan Tack
> PhD student
> University of Chicago Department of Sociology
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search*
http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/
>
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/