Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Controling precision for multiple runs of same code (out of office until12th June)
From
"Seyi Soremekun" <[email protected]>
To
<[email protected]>
Subject
Re: st: Controling precision for multiple runs of same code (out of office until12th June)
Date
Tue, 04 Jun 2013 01:07:08 +0100
I am currently out of the office until the 12st June with limited email contact.
Please contact Angela Vega ([email protected]) for any enquiries.
>>> Phil Clayton <[email protected]> 06/04/13 01:05 >>>
The first thing that stands out is:
merge m:m ...
You should pretty much never do this. Don't take my word for it - the manual entry for -merge- says "m:m specifies a many-to-many merge and is a bad idea" and explains why, including that you might get non-reproducible results.
You probably want m:1. If for some reason you definitely need to join all records in a many-to-many fashion based on one or more ID variables, you should use -joinby-.
Phil
On 04/06/2013, at 9:55 AM, Melanie Leis <[email protected]> wrote:
> Hello,
>
> I'm having trouble with a section of my code that yields different
> results each time I run it.
>
> I start out with a dataset, baseline_4.dta, which has 47,267,047
> observations and 16 variables, and run this:
>
> merge m:m statefips agecat_census using "ABCD.dta"
> assert _merge==3
> drop _merge
> egen tot_pop=sum(pop), by(statefips countyfips agecat_census sexcat
> racecat iprcat_mpact iprcat coverage groupsize)
> checkpop
> rename pop oldpop
> gen pop=tot_pop*prob_agecat_mpact
> checkpop
> collapse (sum) pop, by(statefips countyfips agecat_mpact sexcat
> racecat iprcat_mpact iprcat coverage groupsize)
> checkpop
> sum
> sort _all
> save "baseline_5.dta", replace
>
> checkpop is a program that tells me what my total population is each
> time I run it. My total population is the same before and after the
> collapse function (see results below).
>
> At the end, my total population and my number of observations in
> baseline_5.dta is different every time I run this. I suspect the
> difference is in rounding when it executes the gen pop line, but I've
> tried replacing it for
>
> gen double pop=tot_pop*prob_agecat_mpact
>
> and
>
> gen float pop=tot_pop*prob_agecat_mpact
>
> And I still get differences.
>
> I tried using
>
> gen long pop=tot_pop*prob_acegat_mpact
>
> But I lost too much precision by doing this.
>
> Could you please recommend a solution to obtain the exact same numbers
> in each run, without sacrificing precision?
>
> Thanks!
>
> Melanie
>
> The log file for 2 of the runs I've done:
>
> ************* RUN A ***********************
>
> . merge m:m statefips agecat_census using "ABCD.dta"
>
> Result # of obs.
> -----------------------------------------
> not matched 0
> matched 47,267,047 (_merge==3)
> -----------------------------------------
>
> . assert _merge==3
>
> . drop _merge
>
> . egen tot_pop=sum(pop), by(statefips countyfips agecat_census sexcat
> racecat iprcat_mpac
>> t iprcat coverage groupsize)
>
> . checkpop
>
> Total pop: 347,095,179
> Observations: 47,267,047
> Missing: 0
>
> . rename pop oldpop
>
> . gen pop=tot_pop*prob_agecat_mpact
>
> . checkpop
>
> Total pop: 332,455,972
> Observations: 47,267,047
> Missing: 0
>
> . collapse (sum) pop, by(statefips countyfips agecat_mpact sexcat
> racecat iprcat_mpact ip
>> rcat coverage groupsize)
>
> . checkpop
>
> Total pop: 332,455,972
> Observations: 36,351,520
> Missing: 0
>
>
> ************** RUN B *************
> . merge m:m statefips agecat_census using "ABCD.dta"
>
> Result # of obs.
> -----------------------------------------
> not matched 0
> matched 47,267,047 (_merge==3)
> -----------------------------------------
>
> . assert _merge==3
>
> . drop _merge
>
> . egen tot_pop=sum(pop), by(statefips countyfips agecat_census sexcat
> racecat iprcat_mpac
>> t iprcat coverage groupsize)
>
> . checkpop
>
> Total pop: 347,095,179
> Observations: 47,267,047
> Missing: 0
>
> . rename pop oldpop
>
> . gen pop=tot_pop*prob_agecat_mpact
>
> . checkpop
>
> Total pop: 332,455,928
> Observations: 47,267,047
> Missing: 0
>
> . collapse (sum) pop, by(statefips countyfips agecat_mpact sexcat
> racecat iprcat_mpact ip
>> rcat coverage groupsize)
>
> . checkpop
>
> Total pop: 332,455,928
> Observations: 36,351,515
> Missing: 0
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/