Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Replicability and -imputw-
From
Richard Williams <[email protected]>
To
[email protected], Stata Help <[email protected]>
Subject
Re: st: Replicability and -imputw-
Date
Sun, 25 Aug 2013 19:53:36 -0500
At 06:42 PM 8/25/2013, Roberto Ferrer wrote:
Richard, nice finding. Thank you for taking the time.
Obviously you and I should both read the Stata Blog much more
carefully. ;-) However, I think many/most people would assume that
setting the seed would also guarantee the same sort order. A few
words in the documentation that that is not the case might be helpful.
Bests,
Roberto
On Mon, Aug 26, 2013 at 1:26 AM, Richard Williams
<[email protected]> wrote:
> At 06:07 PM 8/25/2013, Roberto Ferrer wrote:
>>
>> Richard,
>>
>> Thank you for your reply. I just posted my solution. I remember
>> reading that adding -stable- could in some cases obscure other
>> problems. I think in this case it was safe, but I thought it would
>> require more computation time (not exactly sure about this, though).
>>
>> Now that you mention it, I also find interesting that the seed that
>> was set just before the -sort- doesn't affect it. Maybe someone can
>> comment on that.
>
>
> I think Bill Gould already has:
>
>
http://blog.stata.com/2012/08/03/using-statas-random-number-generators-part-2-drawing-without-replacement/
>
> "Did you know sort has its own, private random-number generator built into
> it? It does, and sort uses its random-number generator to determine the
> order of tied observations. In the manuals we at StataCorp are fond of
> writing, "the ties will be ordered randomly" and a few sophisticated users
> probably took that to mean, "the ties will be ordered in a way that we at
> StataCorp do not know and even though they might be ordered in a way that
> will cause a bias in the subsequent analysis, because we don't know, we'll
> ignore the possibility." But we meant it when wrote that the ties will be
> ordered randomly; we know that because we put a random number
generator into
> sort to ensure the result. And that is why I can now write that repeated
> values of the runiform() function cause a reproducibility issue, but not a
> statistical issue."
>
> Further, in the comments section, the question is asked "Can you
explain why
> sort does not use the same seed as the other random number generators? That
> would make sort also foolproof with respect to reproducibility."
Gould has a
> detailed response. At the end he says "Setting the random-number seed is a
> way of reproducing results from routines that are intended to produce
> different results in different runs. -sort- is not such a function; if it
> produces different results in different runs, and that matters, that is a
> bug" - where the bug is in the code the user wrote (not in Stata).
>
>
>
>
>
>> Thank you.
>>
>> Bests,
>> Roberto
>>
>> On Mon, Aug 26, 2013 at 12:54 AM, Richard Williams
>> <[email protected]> wrote:
>> > I would suggest adding the -stable- option to sort. Or (possibly better)
>> > have the data sorted before you start calling the program. The latter
>> > would
>> > be a little more efficient in terms of computing time, plus there was
>> > some
>> > sort of thread way back when saying sorting was better if you didn't use
>> > the
>> > stable option (although I don't remember why).
>> >
>> > According to the help for sort, "Without the stable option, the ordering
>> > of
>> > observations with equal values of varlist is randomized." I just ran a
>> > quick
>> > quick, and as far as I can tell setting the seed does not cause the same
>> > random order to occur across multiple calls (which strikes me as odd,
>> > but
>> > maybe there is a reason for it). So, I think sorting the data first or
>> > using
>> > the stable option will give you what you want. Please let us know one
>> > way or
>> > the other.
>> >
>> >
>> > At 02:02 PM 8/25/2013, Roberto Ferrer wrote:
>> >>
>> >> Hello,
>> >>
>> >> I've been using a user-written command -imputw- downloaded from
>> >>
>> >> http://fdz.iab.de/187/section.aspx/Publikation/k050719a04
>> >> Based on Gartner, Herman. "The Imputation of Wages Above the
>> >> Contribution
>> >> Limit with the German IAB Employment Sample." FDZ, 2005.
>> >>
>> >> My problem is with replicability. I use -set seed- to control for the
>> >> randomness introduced by the command but I can't manage to obtain the
>> >> same results for the output variable -lnw_i-. Can anyone please point
>> >> to source of "uncontrolled randomness" that is affecting the results
>> >> by inspecting the code?
>> >>
>> >> I've double checked, using -cf-, that the data going in is the same
>> >> for the replication runs. The results for the regressions are the same
>> >> for all runs (I've checked the log files in a bash terminal (linux)
>> >> using the program "diff" and they are identical except for log times).
>> >> But the final resulting variable is not the same for any two runs.
>> >>
>> >> I copy the source below since it's not very long and the code snippet
>> >> I'm running.
>> >>
>> >> Thank you.
>> >>
>> >> * --------------------- User-written command
>> >> -------------------------------------
>> >> program define imputw, byable(recall)
>> >>
>> >> version 8
>> >> syntax varlist [if] , Cens(varlist) Grenze(varlist) [Outvar(string
>> >> asis)]
>> >>
>> >> marksample touse
>> >> * If no name given to the output, call it by default "lnw_i".
>> >> if "`outvar'" == "" {
>> >> local outvar "lnw_i"
>> >> }
>> >> * Estimate Tobit model
>> >> cnreg `varlist' if `touse', censored(`cens')
>> >> quietly {
>> >> * Make predictions
>> >> predict xb00 if `touse' , xb
>> >> * Generate standardized limit for each value
>> >> gen alpha00=(ln(`grenze')-xb00)/_b[_se] if `touse'
>> >> }
>> >>
>> >> cap gen `outvar'=.
>> >> replace `outvar'=`1' if `touse'
>> >> * Imputation
>> >> replace `outvar'=xb00+_b[_se] *
>> >> invnorm(uniform()*(1-norm(alpha00))+norm(alpha00)) if `touse' &
>> >> `cens'
>> >>
>> >> drop xb00 alpha00
>> >> end
>> >>
>> >> * ------------------- Code I'm using
>> >> -----------------------------------
>> >> set seed 391829 // -imputw- uses random number generator
>> >> sort yearobs size_b
>> >> by yearobs size_b: imputw lwage frau gebjahr bild esector,
>> >> cens(censored)
>> >> ///
>> >> grenze(uplimit)
>> >> *
>> >> * For searches and help try:
>> >> * http://www.stata.com/help.cgi?search
>> >> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> >> * http://www.ats.ucla.edu/stat/stata/
>> >
>> >
>> > -------------------------------------------
>> > Richard Williams, Notre Dame Dept of Sociology
>> > OFFICE: (574)631-6668, (574)631-6463
>> > HOME: (574)289-5227
>> > EMAIL: [email protected]
>> > WWW: http://www.nd.edu/~rwilliam
>> >
>> >
>> > *
>> > * For searches and help try:
>> > * http://www.stata.com/help.cgi?search
>> > * http://www.stata.com/support/faqs/resources/statalist-faq/
>> > * http://www.ats.ucla.edu/stat/stata/
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
>
> -------------------------------------------
> Richard Williams, Notre Dame Dept of Sociology
> OFFICE: (574)631-6668, (574)631-6463
> HOME: (574)289-5227
> EMAIL: [email protected]
> WWW: http://www.nd.edu/~rwilliam
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME: (574)289-5227
EMAIL: [email protected]
WWW: http://www.nd.edu/~rwilliam
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/