Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Radwin, David" <dradwin@rti.org> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | st: RE: Controling precision for multiple runs of same code |
Date | Mon, 3 Jun 2013 20:04:10 -0400 |
Melanie, The problem may be to do with your m:m merge. The extended (PDF) documentation for -merge- says "m:m merges are dependent on the current sort order-something which should never happen. Because m:m merges are such a bad idea, we are not going to show you an example. If you think that you need an m:m merge, then you probably need to work with your data so that you can use a 1:m or m:1 merge. Tips for this are given in Troubleshooting m:m merges below." Different sort orders could be giving you different merges and therefore different results. You may want to rethink this step and consult the documentation. David -- David Radwin Senior Research Associate Education Studies Division RTI International 2150 Shattuck Ave., Suite 800 Berkeley, CA 94704 Phone: 510-665-8274 www.rti.org > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu [mailto:owner- > statalist@hsphsun2.harvard.edu] On Behalf Of Melanie Leis > Sent: Monday, June 03, 2013 4:55 PM > To: statalist > Subject: st: Controling precision for multiple runs of same code > > Hello, > > I'm having trouble with a section of my code that yields different > results each time I run it. > > I start out with a dataset, baseline_4.dta, which has 47,267,047 > observations and 16 variables, and run this: > > merge m:m statefips agecat_census using "ABCD.dta" > assert _merge==3 > drop _merge > egen tot_pop=sum(pop), by(statefips countyfips agecat_census sexcat > racecat iprcat_mpact iprcat coverage groupsize) > checkpop > rename pop oldpop > gen pop=tot_pop*prob_agecat_mpact > checkpop > collapse (sum) pop, by(statefips countyfips agecat_mpact sexcat > racecat iprcat_mpact iprcat coverage groupsize) > checkpop > sum > sort _all > save "baseline_5.dta", replace > > checkpop is a program that tells me what my total population is each > time I run it. My total population is the same before and after the > collapse function (see results below). > > At the end, my total population and my number of observations in > baseline_5.dta is different every time I run this. I suspect the > difference is in rounding when it executes the gen pop line, but I've > tried replacing it for > > gen double pop=tot_pop*prob_agecat_mpact > > and > > gen float pop=tot_pop*prob_agecat_mpact > > And I still get differences. > > I tried using > > gen long pop=tot_pop*prob_acegat_mpact > > But I lost too much precision by doing this. > > Could you please recommend a solution to obtain the exact same numbers > in each run, without sacrificing precision? > > Thanks! > > Melanie > > The log file for 2 of the runs I've done: > > ************* RUN A *********************** > > . merge m:m statefips agecat_census using "ABCD.dta" > > Result # of obs. > ----------------------------------------- > not matched 0 > matched 47,267,047 (_merge==3) > ----------------------------------------- > > . assert _merge==3 > > . drop _merge > > . egen tot_pop=sum(pop), by(statefips countyfips agecat_census sexcat > racecat iprcat_mpac > > t iprcat coverage groupsize) > > . checkpop > > Total pop: 347,095,179 > Observations: 47,267,047 > Missing: 0 > > . rename pop oldpop > > . gen pop=tot_pop*prob_agecat_mpact > > . checkpop > > Total pop: 332,455,972 > Observations: 47,267,047 > Missing: 0 > > . collapse (sum) pop, by(statefips countyfips agecat_mpact sexcat > racecat iprcat_mpact ip > > rcat coverage groupsize) > > . checkpop > > Total pop: 332,455,972 > Observations: 36,351,520 > Missing: 0 > > > ************** RUN B ************* > . merge m:m statefips agecat_census using "ABCD.dta" > > Result # of obs. > ----------------------------------------- > not matched 0 > matched 47,267,047 (_merge==3) > ----------------------------------------- > > . assert _merge==3 > > . drop _merge > > . egen tot_pop=sum(pop), by(statefips countyfips agecat_census sexcat > racecat iprcat_mpac > > t iprcat coverage groupsize) > > . checkpop > > Total pop: 347,095,179 > Observations: 47,267,047 > Missing: 0 > > . rename pop oldpop > > . gen pop=tot_pop*prob_agecat_mpact > > . checkpop > > Total pop: 332,455,928 > Observations: 47,267,047 > Missing: 0 > > . collapse (sum) pop, by(statefips countyfips agecat_mpact sexcat > racecat iprcat_mpact ip > > rcat coverage groupsize) > > . checkpop > > Total pop: 332,455,928 > Observations: 36,351,515 > Missing: 0 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/