Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Controling precision for multiple runs of same code
From
"Lachenbruch, Peter" <[email protected]>
To
"[email protected]" <[email protected]>
Subject
RE: st: Controling precision for multiple runs of same code
Date
Tue, 4 Jun 2013 13:46:21 +0000
please listen to the advice given and do not repeat the query. you're barking up the wrong tree.
Peter A. Lachenbruch,
Professor (retired)
________________________________________
From: [email protected] [[email protected]] on behalf of Melanie Leis [[email protected]]
Sent: Monday, June 03, 2013 7:04 PM
To: statalist
Subject: Re: st: Controling precision for multiple runs of same code
Thank you, I appreciate your recommendations about not using m:m.
Nevertheless, in each run, out of 47,267,047 observations and 14
variables, I only get a difference in 102 observations and 2
variables. This seems to me like something that should be fixable.
I insist on using the merge m:m because, overall, it yields the most
reasonable numbers for what I'm trying to do. Joinby gave me a total
population about 60 million higher than what I need (and what I get
with the merge m:m).
I understand that the lack of context and the fact that I insist on
using a merge m:m make this difficult. Nevertheless, any ideas on how
I could fix the code below to get replicable results without taking
out the merge m:m would be greatly appreciated.
Thank you!
Melanie
On Mon, Jun 3, 2013 at 8:05 PM, Phil Clayton
<[email protected]> wrote:
> The first thing that stands out is:
> merge m:m ...
>
> You should pretty much never do this. Don't take my word for it - the manual entry for -merge- says "m:m specifies a many-to-many merge and is a bad idea" and explains why, including that you might get non-reproducible results.
>
> You probably want m:1. If for some reason you definitely need to join all records in a many-to-many fashion based on one or more ID variables, you should use -joinby-.
>
> Phil
>
> On 04/06/2013, at 9:55 AM, Melanie Leis <[email protected]> wrote:
>
>> Hello,
>>
>> I'm having trouble with a section of my code that yields different
>> results each time I run it.
>>
>> I start out with a dataset, baseline_4.dta, which has 47,267,047
>> observations and 16 variables, and run this:
>>
>> merge m:m statefips agecat_census using "ABCD.dta"
>> assert _merge==3
>> drop _merge
>> egen tot_pop=sum(pop), by(statefips countyfips agecat_census sexcat
>> racecat iprcat_mpact iprcat coverage groupsize)
>> checkpop
>> rename pop oldpop
>> gen pop=tot_pop*prob_agecat_mpact
>> checkpop
>> collapse (sum) pop, by(statefips countyfips agecat_mpact sexcat
>> racecat iprcat_mpact iprcat coverage groupsize)
>> checkpop
>> sum
>> sort _all
>> save "baseline_5.dta", replace
>>
>> checkpop is a program that tells me what my total population is each
>> time I run it. My total population is the same before and after the
>> collapse function (see results below).
>>
>> At the end, my total population and my number of observations in
>> baseline_5.dta is different every time I run this. I suspect the
>> difference is in rounding when it executes the gen pop line, but I've
>> tried replacing it for
>>
>> gen double pop=tot_pop*prob_agecat_mpact
>>
>> and
>>
>> gen float pop=tot_pop*prob_agecat_mpact
>>
>> And I still get differences.
>>
>> I tried using
>>
>> gen long pop=tot_pop*prob_acegat_mpact
>>
>> But I lost too much precision by doing this.
>>
>> Could you please recommend a solution to obtain the exact same numbers
>> in each run, without sacrificing precision?
>>
>> Thanks!
>>
>> Melanie
>>
>> The log file for 2 of the runs I've done:
>>
>> ************* RUN A ***********************
>>
>> . merge m:m statefips agecat_census using "ABCD.dta"
>>
>> Result # of obs.
>> -----------------------------------------
>> not matched 0
>> matched 47,267,047 (_merge==3)
>> -----------------------------------------
>>
>> . assert _merge==3
>>
>> . drop _merge
>>
>> . egen tot_pop=sum(pop), by(statefips countyfips agecat_census sexcat
>> racecat iprcat_mpac
>>> t iprcat coverage groupsize)
>>
>> . checkpop
>>
>> Total pop: 347,095,179
>> Observations: 47,267,047
>> Missing: 0
>>
>> . rename pop oldpop
>>
>> . gen pop=tot_pop*prob_agecat_mpact
>>
>> . checkpop
>>
>> Total pop: 332,455,972
>> Observations: 47,267,047
>> Missing: 0
>>
>> . collapse (sum) pop, by(statefips countyfips agecat_mpact sexcat
>> racecat iprcat_mpact ip
>>> rcat coverage groupsize)
>>
>> . checkpop
>>
>> Total pop: 332,455,972
>> Observations: 36,351,520
>> Missing: 0
>>
>>
>> ************** RUN B *************
>> . merge m:m statefips agecat_census using "ABCD.dta"
>>
>> Result # of obs.
>> -----------------------------------------
>> not matched 0
>> matched 47,267,047 (_merge==3)
>> -----------------------------------------
>>
>> . assert _merge==3
>>
>> . drop _merge
>>
>> . egen tot_pop=sum(pop), by(statefips countyfips agecat_census sexcat
>> racecat iprcat_mpac
>>> t iprcat coverage groupsize)
>>
>> . checkpop
>>
>> Total pop: 347,095,179
>> Observations: 47,267,047
>> Missing: 0
>>
>> . rename pop oldpop
>>
>> . gen pop=tot_pop*prob_agecat_mpact
>>
>> . checkpop
>>
>> Total pop: 332,455,928
>> Observations: 47,267,047
>> Missing: 0
>>
>> . collapse (sum) pop, by(statefips countyfips agecat_mpact sexcat
>> racecat iprcat_mpact ip
>>> rcat coverage groupsize)
>>
>> . checkpop
>>
>> Total pop: 332,455,928
>> Observations: 36,351,515
>> Missing: 0
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/