Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: Controling precision for multiple runs of same code (out of office until12th June)

From	"Seyi Soremekun" <[email protected]>
To	<[email protected]>
Subject	Re: st: RE: Controling precision for multiple runs of same code (out of office until12th June)
Date	Tue, 04 Jun 2013 01:05:54 +0100

I am currently out of the office until the 12st June with limited email contact.
Please contact Angela Vega ([email protected]) for any enquiries.

>>> "Radwin, David" <[email protected]> 06/04/13 01:04 >>>

Melanie,

The problem may be to do with your m:m merge.

The extended (PDF) documentation for -merge- says "m:m merges are
dependent on the current sort order-something which should never happen.
Because m:m merges are such a bad idea, we are not going to show you an
example. If you think that you need an m:m merge, then you probably need
to work with your data so that you can use a 1:m or m:1 merge. Tips for
this are given in Troubleshooting m:m merges below."

Different sort orders could be giving you different merges and therefore
different results. You may want to rethink this step and consult the
documentation.

David
--
David Radwin
Senior Research Associate
Education Studies Division
RTI International
2150 Shattuck Ave., Suite 800
Berkeley, CA 94704
Phone: 510-665-8274

www.rti.org


> -----Original Message-----
> From: [email protected] [mailto:owner-
> [email protected]] On Behalf Of Melanie Leis
> Sent: Monday, June 03, 2013 4:55 PM
> To: statalist
> Subject: st: Controling precision for multiple runs of same code
> 
> Hello,
> 
> I'm having trouble with a section of my code that yields different
> results each time I run it.
> 
> I start out with a dataset, baseline_4.dta, which has 47,267,047
> observations and 16 variables, and run this:
> 
> merge m:m statefips agecat_census using "ABCD.dta"
> assert _merge==3
> drop _merge
> egen tot_pop=sum(pop), by(statefips countyfips agecat_census sexcat
> racecat iprcat_mpact iprcat coverage groupsize)
> checkpop
> rename pop oldpop
> gen pop=tot_pop*prob_agecat_mpact
> checkpop
> collapse (sum) pop, by(statefips countyfips agecat_mpact sexcat
> racecat iprcat_mpact iprcat coverage groupsize)
> checkpop
> sum
> sort _all
> save "baseline_5.dta", replace
> 
> checkpop is a program that tells me what my total population is each
> time I run it. My total population is the same before and after the
> collapse function (see results below).
> 
> At the end, my total population and my number of observations in
> baseline_5.dta is different every time I run this. I suspect the
> difference is in rounding when it executes the gen pop line, but I've
> tried replacing it for
> 
> gen double pop=tot_pop*prob_agecat_mpact
> 
> and
> 
> gen float pop=tot_pop*prob_agecat_mpact
> 
> And I still get differences.
> 
> I tried using
> 
> gen long pop=tot_pop*prob_acegat_mpact
> 
> But I lost too much precision by doing this.
> 
> Could you please recommend a solution to obtain the exact same numbers
> in each run, without sacrificing precision?
> 
> Thanks!
> 
> Melanie
> 
> The log file for 2 of the runs I've done:
> 
> ************* RUN A ***********************
> 
> . merge m:m statefips agecat_census using "ABCD.dta"
> 
>     Result                           # of obs.
>     -----------------------------------------
>     not matched                             0
>     matched                        47,267,047  (_merge==3)
>     -----------------------------------------
> 
> . assert _merge==3
> 
> . drop _merge
> 
> . egen tot_pop=sum(pop), by(statefips countyfips agecat_census sexcat
> racecat iprcat_mpac
> > t iprcat coverage groupsize)
> 
> . checkpop
> 
> Total pop:       347,095,179
> Observations:     47,267,047
> Missing:                   0
> 
> . rename pop oldpop
> 
> . gen pop=tot_pop*prob_agecat_mpact
> 
> . checkpop
> 
> Total pop:       332,455,972
> Observations:     47,267,047
> Missing:                   0
> 
> . collapse (sum) pop, by(statefips countyfips agecat_mpact sexcat
> racecat iprcat_mpact ip
> > rcat coverage groupsize)
> 
> . checkpop
> 
> Total pop:       332,455,972
> Observations:     36,351,520
> Missing:                   0
> 
> 
> ************** RUN B *************
> . merge m:m statefips agecat_census using "ABCD.dta"
> 
>     Result                           # of obs.
>     -----------------------------------------
>     not matched                             0
>     matched                        47,267,047  (_merge==3)
>     -----------------------------------------
> 
> . assert _merge==3
> 
> . drop _merge
> 
> . egen tot_pop=sum(pop), by(statefips countyfips agecat_census sexcat
> racecat iprcat_mpac
> > t iprcat coverage groupsize)
> 
> . checkpop
> 
> Total pop:       347,095,179
> Observations:     47,267,047
> Missing:                   0
> 
> . rename pop oldpop
> 
> . gen pop=tot_pop*prob_agecat_mpact
> 
> . checkpop
> 
> Total pop:       332,455,928
> Observations:     47,267,047
> Missing:                   0
> 
> . collapse (sum) pop, by(statefips countyfips agecat_mpact sexcat
> racecat iprcat_mpact ip
> > rcat coverage groupsize)
> 
> . checkpop
> 
> Total pop:       332,455,928
> Observations:     36,351,515
> Missing:                   0



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: Re: st: Controling precision for multiple runs of same code
Next by Date: Re: st: Controling precision for multiple runs of same code (out of office until12th June)
Previous by thread: Re: st: Controling precision for multiple runs of same code (out of office until12th June)
Next by thread: Re: st: biprobit, interactions, and correct marginal effects
Index(es):
- Date
- Thread