Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Controling precision for multiple runs of same code
From
"[email protected]" <[email protected]>
To
<[email protected]>
Subject
RE: st: Controling precision for multiple runs of same code
Date
Mon, 3 Jun 2013 17:03:28 -0700 (PDT)
Without going into much detail, be aware that a many-to-many merge can yield
non-deterministic (and often meaningless) pairings of observations, leading to irreproducable or inconsistent results.
Sent with Verizon Mobile Email
---Original Message---
From: [email protected]
Sent: 6/3/2013 7:56 pm
To: "statalist" <[email protected]>
Subject: st: Controling precision for multiple runs of same code
Hello,
I'm having trouble with a section of my code that yields different
results each time I run it.
I start out with a dataset, baseline_4.dta, which has 47,267,047
observations and 16 variables, and run this:
merge m:m statefips agecat_census using "ABCD.dta"
assert _merge==3
drop _merge
egen tot_pop=sum(pop), by(statefips countyfips agecat_census sexcat
racecat iprcat_mpact iprcat coverage groupsize)
checkpop
rename pop oldpop
gen pop=tot_pop*prob_agecat_mpact
checkpop
collapse (sum) pop, by(statefips countyfips agecat_mpact sexcat
racecat iprcat_mpact iprcat coverage groupsize)
checkpop
sum
sort _all
save "baseline_5.dta", replace
checkpop is a program that tells me what my total population is each
time I run it. My total population is the same before and after the
collapse function (see results below).
At the end, my total population and my number of observations in
baseline_5.dta is different every time I run this. I suspect the
difference is in rounding when it !
executes the gen pop line, but I've
tried replacing it for
gen double pop=tot_pop*prob_agecat_mpact
and
gen float pop=tot_pop*prob_agecat_mpact
And I still get differences.
I tried using
gen long pop=tot_pop*prob_acegat_mpact
But I lost too much precision by doing this.
Could you please recommend a solution to obtain the exact same numbers
in each run, without sacrificing precision?
Thanks!
Melanie
The log file for 2 of the runs I've done:
************* RUN A ***********************
. merge m:m statefips agecat_census using "ABCD.dta"
Result # of obs.
-----------------------------------------
not matched 0
matched 47,267,047 (_merge==3)
-----------------------------------------
. assert _merge==3
. drop _merge
. egen tot_pop=sum(pop), by(statefips countyfips agecat_census sexcat
racecat iprcat_mpac
> t iprcat coverage groupsize)
. checkpop
Total pop: 34!
7,095,179
Observations: 47,267,047
Missing: 0
.
rename pop oldpop
. gen pop=tot_pop*prob_agecat_mpact
. checkpop
Total pop: 332,455,972
Observations: 47,267,047
Missing: 0
. collapse (sum) pop, by(statefips countyfips agecat_mpact sexcat
racecat iprcat_mpact ip
> rcat coverage groupsize)
. checkpop
Total pop: 332,455,972
Observations: 36,351,520
Missing: 0
************** RUN B *************
. merge m:m statefips agecat_census using "ABCD.dta"
Result # of obs.
-----------------------------------------
not matched 0
matched 47,267,047 (_merge==3)
-----------------------------------------
. assert _merge==3
. drop _merge
. egen tot_pop=sum(pop), by(statefips countyfips agecat_census sexcat
racecat iprcat_mpac
> t iprcat coverage groupsize)
. checkpop
Total pop: 347,095,179
Observations: 47,267,047
Missing: 0
. rename pop oldpop
. !
gen pop=tot_pop*prob_agecat_mpact
. checkpop
Total pop: 332,455,928
Observations: 47,267,047
Missing: 0
. collapse (sum) pop, by(statefips countyfips agecat_mpact sexcat
racecat iprcat_mpact ip
> rcat coverage groupsize)
. checkpop
Total pop: 332,455,928
Observations: 36,351,515
Missing: 0
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/