Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Routine from do-file that every time it's run gives a different result
From
"Sarah Edgington" <[email protected]>
To
<[email protected]>
Subject
RE: st: Routine from do-file that every time it's run gives a different result
Date
Wed, 6 Nov 2013 14:11:31 -0800
Clarice,
Nick's right that you need to do more digging. However, I would argue that
the solution of using the stable option to -sort- is worse than "[solving]
the problem with the price of not understanding it." Using -sort, stable-
is actually just pretending that there is not a problem at all. Yes, that
strategy will get you consistent results, but the chances that they'll be
the right results are pretty slim. Being able to reproduce the wrong answer
is generally just as bad as not being able to reproduce the answer at all.
To be a bit more explicit, what sort order you end up with clearly matters
for your results. You need to figure out why the variables you're sorting
on are not producing unique results and figuring out how to fix that that.
Using -sort, stable- may very well appear to fix your problem but presumably
you care whether the average of P5 is 6.154 or 3.286. If you don't do more
investigation you'll never know which of those is the number you're really
looking for (or whether it's something else completely).
One thing that I find useful when troubleshooting this kind of problem is to
use -sum- after every section where I create new variables with values where
sort order matters. Then I'll run the dofile multiple times, saving a
logfile with a different name each time. Usually you can pretty quickly
spot where things went wrong by comparing the log files from two different
runs, as long as you put in descriptive of your created variables along the
way.
Another useful command when trying to identify whether you're uniquely
sorting observations is -isid-. Any combination of variables that don't
function as a unique ID will leave you with ties on the sort, leading to the
kind of unpredictable results you see here.
-Sarah
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Nick Cox
Sent: Wednesday, November 06, 2013 1:51 PM
To: [email protected]
Subject: Re: st: Routine from do-file that every time it's run gives a
different result
But that solves the problem with the price of not understanding it.
Somewhere Clarice has hidden assumptions about the -sort- order being enough
to get the right order without extra information that are not correct.
Nick
[email protected]
On 6 November 2013 21:46, Sergiy Radyakin <[email protected]> wrote:
> Clarice, add the option stable to the sort commands. Without this
> option, the -sort- command will break the ties randomly. See here:
> http://www.stata.com/help.cgi?sort
>
> Best, Sergiy
>
> On Wed, Nov 6, 2013 at 4:30 PM, Clarice Martins
> <[email protected]> wrote:
>> Dear Statalist group,
>>
>> I have a routine that apparently was running ok, and then I noticed that
everytime I execute the code I get different results for one of the
variables.
>> (The routine is long, so I don't know how to best provide you guys
>> with enough info.)
>>
>> 1) I believe the problem has to do with variable -P5- since this is the
variable which average changes every time I run the code.
>>
>> 2) Sample of the results, I am getting: as you can see variable P1
>> is always approximately the same (it should be the same) and variable
>> Strategy is ALWAYS the same, but var -P5- changes by a lot. (I've
>> shown two outputs, but I've ran it several, several times.)
>>
>>
>> . esttab .
>>
>> ----------------------------
>> (1)
>> Mean
>> ----------------------------
>> P1 0.300***
>> (3.41)
>>
>> P5 6.154
>> (1.53)
>>
>> strategy 7.190
>> (1.78)
>> ----------------------------
>> N 150
>> ----------------------------
>>
>>
>> ----------------------------
>> (1)
>> Mean
>> ----------------------------
>> P1 0.223*
>> (2.24)
>>
>> P5 3.286
>> (1.15)
>>
>> strategy 7.190
>> (1.78)
>> ----------------------------
>> N 150
>> ----------------------------
>>
>> 3) Piece of the code that deals with creating and changing variable
>> P5: (my apologies if this is confusing or too long)
>>
>> ***create variable P1/P5 and sum all 1st/5th quintiles per <yrmonth>
>> gen P1_sell = .
>> quietly levelsof yrmonth, local(levs) quietly foreach lev of local
>> levs {
>> egen work=total(return) if rtype=="buy_sell_period" & yrmonth ==
"`lev'" & quintile==1
>> replace P1_sell=work if rtype=="buy_sell_period" & yrmonth ==
"`lev'" & quintile==1
>> drop work
>> }
>>
>> gen P5_buy = .
>> quietly levelsof yrmonth, local(levs) quietly foreach lev of local
>> levs {
>> egen work=total(return) if rtype=="buy_sell_period" & yrmonth ==
"`lev'" & quintile==5
>> replace P5_buy=work if rtype=="buy_sell_period" & yrmonth ==
"`lev'" & quintile==5
>> drop work
>> }
>>
>> sort quintile yrmonth rtype
>>
>> **undo the buy/sell operation
>> *in order to do the procedure, first copy quintile #s to same <co_id>
>> but for 6 <yrmonth> LATER
>>
>> bysort co_id period: egen tocopy2 = total(quintile / (rtype ==
>> "buy_sell_period")) bysort co_id rtype (negperiod) : replace quintile =
tocopy2[_n+6] if missing(quintile) & rtype == "hold_period"
>> sort quintile yrmonth rtype
>>
>> *add sums of 1st/5th quintiles for <hold_period> to variables P1/P5
>>
>> quietly levelsof yrmonth, local(levs) quietly foreach lev of local
>> levs {
>> egen work=total(return) if rtype=="hold_period" & yrmonth ==
"`lev'" & quintile==5
>> replace P1_sell=work if rtype=="hold_period" & yrmonth == "`lev'"
& quintile==5
>> drop work
>> }
>>
>> quietly levelsof yrmonth, local(levs) quietly foreach lev of local
>> levs {
>> egen work=total(return) if rtype=="hold_period" & yrmonth ==
"`lev'" & quintile==1
>> replace P5_buy=work if rtype=="hold_period" & yrmonth == "`lev'"
& quintile==1
>> drop work
>> }
>> sort quintile yrmonth rtype
>>
>>
>> ***------procedures for Strategy analysis **preparing time-series
>> *P1 is the variable to use for the time-series / keep -P1_sell-
>> intact just for the sake of it
>>
>> gen P1 = P1_sell
>> gen copyP1=P1
>> replace P1 = . if P1 == copyP1[_n-1]
>> drop copyP1
>>
>> *P5 is the variable to use for the time-series / keep -P5_buy- intact
>> just for the sake of it
>>
>> gen P5 = P5_buy
>> gen copyP5=P5
>> replace P5 = . if P5 == copyP5[_n-1]
>> drop copyP5
>>
>> *keeping only time-series variables & unique records keep P1 P5
>> period
>>
>> sort period P1 P5
>> quietly by period P1 P5: gen dup = cond(_N==1,0,_n) drop if dup>0
>> drop dup
>>
>> sort period P1 P5
>> gen P5copy = P5
>> replace P5 = P5copy[_n+1] if P5 >= .
>> replace P5 = P5copy[_n+3] if P5 >= .
>> drop P5copy
>>
>> sort period
>> quietly by period: gen dup = cond(_N==1,0,_n) drop if dup>2 drop dup
>>
>> gen temp = P1 + P5
>> drop if temp >= .
>> drop temp
>>
>> by period: egen strategy=total(P1 + P5)
>>
>> sort strategy
>> quietly by strategy: gen dup = cond(_N==1,0,_n) drop if dup>1 drop
>> dup
>>
>> sort period
>>
>> ** changing into a time-series // not sure if it is necessary yet...
>> tsset period
>> mean P1 P5 strategy
>> ******end of code
>>
>> Thanks for your consideration! Any comment or suggestions will be
appreciated.
>> Clarice
>>
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/