Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Routine from do-file that every time it's run gives a different result
From
Clarice Martins <[email protected]>
To
[email protected]
Subject
Re: st: Routine from do-file that every time it's run gives a different result
Date
Thu, 7 Nov 2013 16:05:09 -0200
Definitely helps, Sergiy.
Of course after the checking line by line of the code, I guess my next question should be: is this the correct method? The methodology I was following just says "divide in quintiles" and I assumed (i guess wrongly) to use the "canned percentile functions".
After Nick's comment: (Thanks, Nick also!)
>> And -xtile- is written -sortpreserve-
>> so it doesn't change the sort order of your data.
Do this means that I need to be aware of how data is sorted before using -xtile- ?
Thanks again!!!
Clarice
On Nov 7, 2013, at 3:48 PM, Sergiy Radyakin wrote:
> Clarice,
> the following article discusses what Excel is doing to compute quartiles:
> http://stats.stackexchange.com/questions/28123/quartiles-in-excel
>
> In general don't expect different statistical packages to break your
> observations into groups (quartiles, quintiles, deciles) identically.
> This applies not only to Excel, but also to SPSS, SAS, etc.
> http://www-01.ibm.com/support/docview.wss?uid=swg21480663
> http://www.erieri.com/blog/post/technically-speaking-does-excel-always-know-what-is-best-for-your-compensation-data
> and tons of other discussions, just check Google.
>
> Multiple methods exist, and the defaults are not always identical
> across the packages.
> In some cases it might be better to be explicit, and sort and break
> the dataset into groups yourself, rather then rely on the canned
> percentile functions. Better read your code line by line, and check if
> it implements exactly what you want it to do.
>
> Hope this helps, Sergiy
>
>
>
> On Thu, Nov 7, 2013 at 12:03 PM, Nick Cox <[email protected]> wrote:
>> -xtile- is undoubtedly problematic -- as it reduces the information in
>> your data and isn't guaranteed to produce equal-sized groups even
>> when the number of observations is an exact multiple of the number of
>> groups. But one of its rules is that observations with the same value
>> always go into the same group. And -xtile- is written -sortpreserve-
>> so it doesn't change the sort order of your data.
>> Nick
>> [email protected]
>>
>>
>> On 7 November 2013 16:47, Clarice Martins <[email protected]> wrote:
>>> Thanks to all for the valuable input...
>>>
>>> Sarah, thanks for the practical tips on how to troubleshoot, I am definitely very new at any kind of programming and needed this kind of advice.
>>>
>>> Nick, I agree that it is important to verify where the error is, the -stable- option might aid me, but I will definitely search forward to figure out where are my hidden assumptions.
>>>
>>> Just another question on this issue:
>>> Another portion of the code uses -xtile- to break the portfolio of returns in quintiles. (at first, I didn't think it was important.)
>>>
>>> But... Could there be a problem also with how -xtile- break the dataset in groups? I mean, even when I did this manually in Excel, it was always difficult to decide how many observations stay in each quintile group. (e.g.: if the dataset has 21 observations, we will have 4 groups of 4 and one group of 5, which group takes the extra observation?)
>>>
>>> Thanks again!!!
>>> Clarice
>>>
>>>
>>> On Nov 6, 2013, at 8:11 PM, Sarah Edgington wrote:
>>>
>>>> Clarice,
>>>> Nick's right that you need to do more digging. However, I would argue that
>>>> the solution of using the stable option to -sort- is worse than "[solving]
>>>> the problem with the price of not understanding it." Using -sort, stable-
>>>> is actually just pretending that there is not a problem at all. Yes, that
>>>> strategy will get you consistent results, but the chances that they'll be
>>>> the right results are pretty slim. Being able to reproduce the wrong answer
>>>> is generally just as bad as not being able to reproduce the answer at all.
>>>>
>>>> To be a bit more explicit, what sort order you end up with clearly matters
>>>> for your results. You need to figure out why the variables you're sorting
>>>> on are not producing unique results and figuring out how to fix that that.
>>>> Using -sort, stable- may very well appear to fix your problem but presumably
>>>> you care whether the average of P5 is 6.154 or 3.286. If you don't do more
>>>> investigation you'll never know which of those is the number you're really
>>>> looking for (or whether it's something else completely).
>>>>
>>>> One thing that I find useful when troubleshooting this kind of problem is to
>>>> use -sum- after every section where I create new variables with values where
>>>> sort order matters. Then I'll run the dofile multiple times, saving a
>>>> logfile with a different name each time. Usually you can pretty quickly
>>>> spot where things went wrong by comparing the log files from two different
>>>> runs, as long as you put in descriptive of your created variables along the
>>>> way.
>>>>
>>>> Another useful command when trying to identify whether you're uniquely
>>>> sorting observations is -isid-. Any combination of variables that don't
>>>> function as a unique ID will leave you with ties on the sort, leading to the
>>>> kind of unpredictable results you see here.
>>>>
>>>> -Sarah
>>>>
>>>> -----Original Message-----
>>>> From: [email protected]
>>>> [mailto:[email protected]] On Behalf Of Nick Cox
>>>> Sent: Wednesday, November 06, 2013 1:51 PM
>>>> To: [email protected]
>>>> Subject: Re: st: Routine from do-file that every time it's run gives a
>>>> different result
>>>>
>>>> But that solves the problem with the price of not understanding it.
>>>> Somewhere Clarice has hidden assumptions about the -sort- order being enough
>>>> to get the right order without extra information that are not correct.
>>>> Nick
>>>> [email protected]
>>>>
>>>>
>>>> On 6 November 2013 21:46, Sergiy Radyakin <[email protected]> wrote:
>>>>> Clarice, add the option stable to the sort commands. Without this
>>>>> option, the -sort- command will break the ties randomly. See here:
>>>>> http://www.stata.com/help.cgi?sort
>>>>>
>>>>> Best, Sergiy
>>>>>
>>>>> On Wed, Nov 6, 2013 at 4:30 PM, Clarice Martins
>>>>> <[email protected]> wrote:
>>>>>> Dear Statalist group,
>>>>>>
>>>>>> I have a routine that apparently was running ok, and then I noticed that
>>>> everytime I execute the code I get different results for one of the
>>>> variables.
>>>>>> (The routine is long, so I don't know how to best provide you guys
>>>>>> with enough info.)
>>>>>>
>>>>>> 1) I believe the problem has to do with variable -P5- since this is the
>>>> variable which average changes every time I run the code.
>>>>>>
>>>>>> 2) Sample of the results, I am getting: as you can see variable P1
>>>>>> is always approximately the same (it should be the same) and variable
>>>>>> Strategy is ALWAYS the same, but var -P5- changes by a lot. (I've
>>>>>> shown two outputs, but I've ran it several, several times.)
>>>>>>
>>>>>>
>>>>>> . esttab .
>>>>>>
>>>>>> ----------------------------
>>>>>> (1)
>>>>>> Mean
>>>>>> ----------------------------
>>>>>> P1 0.300***
>>>>>> (3.41)
>>>>>>
>>>>>> P5 6.154
>>>>>> (1.53)
>>>>>>
>>>>>> strategy 7.190
>>>>>> (1.78)
>>>>>> ----------------------------
>>>>>> N 150
>>>>>> ----------------------------
>>>>>>
>>>>>>
>>>>>> ----------------------------
>>>>>> (1)
>>>>>> Mean
>>>>>> ----------------------------
>>>>>> P1 0.223*
>>>>>> (2.24)
>>>>>>
>>>>>> P5 3.286
>>>>>> (1.15)
>>>>>>
>>>>>> strategy 7.190
>>>>>> (1.78)
>>>>>> ----------------------------
>>>>>> N 150
>>>>>> ----------------------------
>>>>>>
>>>>>> 3) Piece of the code that deals with creating and changing variable
>>>>>> P5: (my apologies if this is confusing or too long)
>>>>>>
>>>>>> ***create variable P1/P5 and sum all 1st/5th quintiles per <yrmonth>
>>>>>> gen P1_sell = .
>>>>>> quietly levelsof yrmonth, local(levs) quietly foreach lev of local
>>>>>> levs {
>>>>>> egen work=total(return) if rtype=="buy_sell_period" & yrmonth ==
>>>> "`lev'" & quintile==1
>>>>>> replace P1_sell=work if rtype=="buy_sell_period" & yrmonth ==
>>>> "`lev'" & quintile==1
>>>>>> drop work
>>>>>> }
>>>>>>
>>>>>> gen P5_buy = .
>>>>>> quietly levelsof yrmonth, local(levs) quietly foreach lev of local
>>>>>> levs {
>>>>>> egen work=total(return) if rtype=="buy_sell_period" & yrmonth ==
>>>> "`lev'" & quintile==5
>>>>>> replace P5_buy=work if rtype=="buy_sell_period" & yrmonth ==
>>>> "`lev'" & quintile==5
>>>>>> drop work
>>>>>> }
>>>>>>
>>>>>> sort quintile yrmonth rtype
>>>>>>
>>>>>> **undo the buy/sell operation
>>>>>> *in order to do the procedure, first copy quintile #s to same <co_id>
>>>>>> but for 6 <yrmonth> LATER
>>>>>>
>>>>>> bysort co_id period: egen tocopy2 = total(quintile / (rtype ==
>>>>>> "buy_sell_period")) bysort co_id rtype (negperiod) : replace quintile =
>>>> tocopy2[_n+6] if missing(quintile) & rtype == "hold_period"
>>>>>> sort quintile yrmonth rtype
>>>>>>
>>>>>> *add sums of 1st/5th quintiles for <hold_period> to variables P1/P5
>>>>>>
>>>>>> quietly levelsof yrmonth, local(levs) quietly foreach lev of local
>>>>>> levs {
>>>>>> egen work=total(return) if rtype=="hold_period" & yrmonth ==
>>>> "`lev'" & quintile==5
>>>>>> replace P1_sell=work if rtype=="hold_period" & yrmonth == "`lev'"
>>>> & quintile==5
>>>>>> drop work
>>>>>> }
>>>>>>
>>>>>> quietly levelsof yrmonth, local(levs) quietly foreach lev of local
>>>>>> levs {
>>>>>> egen work=total(return) if rtype=="hold_period" & yrmonth ==
>>>> "`lev'" & quintile==1
>>>>>> replace P5_buy=work if rtype=="hold_period" & yrmonth == "`lev'"
>>>> & quintile==1
>>>>>> drop work
>>>>>> }
>>>>>> sort quintile yrmonth rtype
>>>>>>
>>>>>>
>>>>>> ***------procedures for Strategy analysis **preparing time-series
>>>>>> *P1 is the variable to use for the time-series / keep -P1_sell-
>>>>>> intact just for the sake of it
>>>>>>
>>>>>> gen P1 = P1_sell
>>>>>> gen copyP1=P1
>>>>>> replace P1 = . if P1 == copyP1[_n-1]
>>>>>> drop copyP1
>>>>>>
>>>>>> *P5 is the variable to use for the time-series / keep -P5_buy- intact
>>>>>> just for the sake of it
>>>>>>
>>>>>> gen P5 = P5_buy
>>>>>> gen copyP5=P5
>>>>>> replace P5 = . if P5 == copyP5[_n-1]
>>>>>> drop copyP5
>>>>>>
>>>>>> *keeping only time-series variables & unique records keep P1 P5
>>>>>> period
>>>>>>
>>>>>> sort period P1 P5
>>>>>> quietly by period P1 P5: gen dup = cond(_N==1,0,_n) drop if dup>0
>>>>>> drop dup
>>>>>>
>>>>>> sort period P1 P5
>>>>>> gen P5copy = P5
>>>>>> replace P5 = P5copy[_n+1] if P5 >= .
>>>>>> replace P5 = P5copy[_n+3] if P5 >= .
>>>>>> drop P5copy
>>>>>>
>>>>>> sort period
>>>>>> quietly by period: gen dup = cond(_N==1,0,_n) drop if dup>2 drop dup
>>>>>>
>>>>>> gen temp = P1 + P5
>>>>>> drop if temp >= .
>>>>>> drop temp
>>>>>>
>>>>>> by period: egen strategy=total(P1 + P5)
>>>>>>
>>>>>> sort strategy
>>>>>> quietly by strategy: gen dup = cond(_N==1,0,_n) drop if dup>1 drop
>>>>>> dup
>>>>>>
>>>>>> sort period
>>>>>>
>>>>>> ** changing into a time-series // not sure if it is necessary yet...
>>>>>> tsset period
>>>>>> mean P1 P5 strategy
>>>>>> ******end of code
>>>>>>
>>>>>> Thanks for your consideration! Any comment or suggestions will be
>>>> appreciated.
>>>>>> Clarice
>>>>>>
>>>>>>
>>>>>> *
>>>>>> * For searches and help try:
>>>>>> * http://www.stata.com/help.cgi?search
>>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>>
>>>>> *
>>>>> * For searches and help try:
>>>>> * http://www.stata.com/help.cgi?search
>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> * http://www.ats.ucla.edu/stat/stata/
>>>
>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/