Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Replicability and -imputw-
From
Richard Williams <[email protected]>
To
[email protected], Stata Help <[email protected]>
Subject
Re: st: Replicability and -imputw-
Date
Sun, 25 Aug 2013 19:26:48 -0500
At 06:07 PM 8/25/2013, Roberto Ferrer wrote:
Richard,
Thank you for your reply. I just posted my solution. I remember
reading that adding -stable- could in some cases obscure other
problems. I think in this case it was safe, but I thought it would
require more computation time (not exactly sure about this, though).
Now that you mention it, I also find interesting that the seed that
was set just before the -sort- doesn't affect it. Maybe someone can
comment on that.
I think Bill Gould already has:
http://blog.stata.com/2012/08/03/using-statas-random-number-generators-part-2-drawing-without-replacement/
"Did you know sort has its own, private random-number generator built
into it? It does, and sort uses its random-number generator to
determine the order of tied observations. In the manuals we at
StataCorp are fond of writing, "the ties will be ordered randomly"
and a few sophisticated users probably took that to mean, "the ties
will be ordered in a way that we at StataCorp do not know and even
though they might be ordered in a way that will cause a bias in the
subsequent analysis, because we don't know, we'll ignore the
possibility." But we meant it when wrote that the ties will be
ordered randomly; we know that because we put a random number
generator into sort to ensure the result. And that is why I can now
write that repeated values of the runiform() function cause a
reproducibility issue, but not a statistical issue."
Further, in the comments section, the question is asked "Can you
explain why sort does not use the same seed as the other random
number generators? That would make sort also foolproof with respect
to reproducibility." Gould has a detailed response. At the end he
says "Setting the random-number seed is a way of reproducing results
from routines that are intended to produce different results in
different runs. -sort- is not such a function; if it produces
different results in different runs, and that matters, that is a bug"
- where the bug is in the code the user wrote (not in Stata).
Thank you.
Bests,
Roberto
On Mon, Aug 26, 2013 at 12:54 AM, Richard Williams
<[email protected]> wrote:
> I would suggest adding the -stable- option to sort. Or (possibly better)
> have the data sorted before you start calling the program. The latter would
> be a little more efficient in terms of computing time, plus there was some
> sort of thread way back when saying sorting was better if you
didn't use the
> stable option (although I don't remember why).
>
> According to the help for sort, "Without the stable option, the ordering of
> observations with equal values of varlist is randomized." I just
ran a quick
> quick, and as far as I can tell setting the seed does not cause the same
> random order to occur across multiple calls (which strikes me as odd, but
> maybe there is a reason for it). So, I think sorting the data
first or using
> the stable option will give you what you want. Please let us know
one way or
> the other.
>
>
> At 02:02 PM 8/25/2013, Roberto Ferrer wrote:
>>
>> Hello,
>>
>> I've been using a user-written command -imputw- downloaded from
>>
>> http://fdz.iab.de/187/section.aspx/Publikation/k050719a04
>> Based on Gartner, Herman. "The Imputation of Wages Above the Contribution
>> Limit with the German IAB Employment Sample." FDZ, 2005.
>>
>> My problem is with replicability. I use -set seed- to control for the
>> randomness introduced by the command but I can't manage to obtain the
>> same results for the output variable -lnw_i-. Can anyone please point
>> to source of "uncontrolled randomness" that is affecting the results
>> by inspecting the code?
>>
>> I've double checked, using -cf-, that the data going in is the same
>> for the replication runs. The results for the regressions are the same
>> for all runs (I've checked the log files in a bash terminal (linux)
>> using the program "diff" and they are identical except for log times).
>> But the final resulting variable is not the same for any two runs.
>>
>> I copy the source below since it's not very long and the code snippet
>> I'm running.
>>
>> Thank you.
>>
>> * --------------------- User-written command
>> -------------------------------------
>> program define imputw, byable(recall)
>>
>> version 8
>> syntax varlist [if] , Cens(varlist) Grenze(varlist) [Outvar(string asis)]
>>
>> marksample touse
>> * If no name given to the output, call it by default "lnw_i".
>> if "`outvar'" == "" {
>> local outvar "lnw_i"
>> }
>> * Estimate Tobit model
>> cnreg `varlist' if `touse', censored(`cens')
>> quietly {
>> * Make predictions
>> predict xb00 if `touse' , xb
>> * Generate standardized limit for each value
>> gen alpha00=(ln(`grenze')-xb00)/_b[_se] if `touse'
>> }
>>
>> cap gen `outvar'=.
>> replace `outvar'=`1' if `touse'
>> * Imputation
>> replace `outvar'=xb00+_b[_se] *
>> invnorm(uniform()*(1-norm(alpha00))+norm(alpha00)) if `touse' &
>> `cens'
>>
>> drop xb00 alpha00
>> end
>>
>> * ------------------- Code I'm using -----------------------------------
>> set seed 391829 // -imputw- uses random number generator
>> sort yearobs size_b
>> by yearobs size_b: imputw lwage frau gebjahr bild esector, cens(censored)
>> ///
>> grenze(uplimit)
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
>
> -------------------------------------------
> Richard Williams, Notre Dame Dept of Sociology
> OFFICE: (574)631-6668, (574)631-6463
> HOME: (574)289-5227
> EMAIL: [email protected]
> WWW: http://www.nd.edu/~rwilliam
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME: (574)289-5227
EMAIL: [email protected]
WWW: http://www.nd.edu/~rwilliam
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/