Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Re: st: Stata loop execution, failing to take into consideration all variables
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: Re: st: Stata loop execution, failing to take into consideration all variables
Date
Wed, 2 Mar 2011 15:03:02 +0000
Sorry, "reg" was my typo. You had "reg" in your original for -regress-
but you have no such variable. I should have deleted that.
The point remains that you could -list- what is being identified as
data for each regression.
Evidently you have monthly data and the most data points that could be
included in a regression is 37 (from February 2008 to February 2011,
inclusive, for example, is 37 data points, not 36). But if your
notional window looks back or forward beyond the edges of your data
you easily can have fewer data. And the -regress- requires a minimum
number of distinct points to work. That said, your code should cope
with that as if a -regress- fails e(sample) will be 0 and no variables
should be -replace-d.
So, I think that there's something else here that I haven't spotted.
Someone else may have a better idea.
Unless it is that you are getting mixed up between -smb- which goes in
the regression and -smbexp- which gets results. Are these different
variables?
Nick
On Wed, Mar 2, 2011 at 2:45 PM, S.A.J.van Vijfeijken
<[email protected]> wrote:
> First of all Nick thanks for your reply,
>
> The first code you gave :
> forvalues f= 1/10 {
> forvalues y=1998(1)2010{
> forvalues m=1(1)12{
> list reg exret mktrf hml smb if
> fundid==`f' & (date>=ym(`y'-3,`m') & date<=ym(`y',`m'))
> }
> }
> }
>
> gives the error variable reg not found. After removing the list command I get many output screens looking like this:
> Source | SS df MS Number of obs = 1
>> 1
> -------------+------------------------------ F( 3, 7) = 40.7
>> 9
> Model | .00615674 3 .002052247 Prob > F = 0.000
>> 1
> Residual | .000352204 7 .000050315 R-squared = 0.945
>> 9
> -------------+------------------------------ Adj R-squared = 0.922
>> 7
> Total | .006508944 10 .000650894 Root MSE = .0070
>> 9
>
> -----------------------------------------------------------------------------
>> -
> exret | Coef. Std. Err. t P>|t| [95% Conf. Interval
>> ]
> -------------+---------------------------------------------------------------
>> -
> mktrf | .613138 .0722452 8.49 0.000 .4423051 .783970
>> 8
> hml | .1122045 .1318149 0.85 0.423 -.1994881 .423897
>> 2
> smb | .0528789 .0639938 0.83 0.436 -.0984423 .204200
>> 1
> _cons | -.0006199 .0026987 -0.23 0.825 -.0070014 .005761
>> 6
>
> However, in the end an error occurs saying insufficient observations r(2001); and the last 2 tabels are empty due to collinearity. However I must say I do not get much out of this observation.
>
> The other latter part of code:
> forvalues f= 1/10 {
> 2. forvalues y=1998(1)2010{
> 3. forvalues m=1(1)12{
> 4. count if fundid==`f' &(date>=ym(`y'-3,`m
>> ') & date<=ym(`y',`m'))
> 5. }
> 6. }
> 7. }
>
> Gives the following output:
> 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 37 37 37 37 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 !
0 0 0 0 0 0 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 36 35 34 33 32
>
> and so on until it has counted to 37 and back for 10 times. It takes Stata quite some time to run this, even though it considers only 10 funds, but this can off course be expected.
>
> This seems a bit more familiar, because when I ran the original code it goes up to 37 and back as well, and afterwards leaving missing data points. However I do not know which conclusion to take out of this.
>
> Bas
>
>
> -----Original Message-----
> From: Nick Cox <[email protected]>
> To: [email protected]
> Date: Wed, 2 Mar 2011 13:41:46 +0000
> Subject: Re: st: Stata loop execution, failing to take into consideration all variables
>
> In addition, as I understand it, your various samples overlap. I can't
> see that would create your problem, but I think it won't make it
> easier.
>
> On Wed, Mar 2, 2011 at 1:28 PM, Nick Cox <[email protected]> wrote:
>> I don't have your data, but in any case the bigger issue is how to
>> debug a problem like this. I would take your loops and turn them into
>> code that spits out what data are being identified. Looking at the
>> first few funds should help to identify what is being looked at, say
>>
>> forvalues f= 1/10 {
>> � � � �forvalues y=1998(1)2010{
>> � � � � � � � � � � � �forvalues m=1(1)12{
>> � � � � � � � � � � � � � � � list reg exret mktrf hml smb if
>> fundid==`f' & (date>=ym(`y'-3,`m') & date<=ym(`y',`m'))
>> � � � � � � � �}
>> � � � �}
>> �}
>>
>> or (less output, and also less diagnostic, but may reveal a problem)
>>
>> forvalues f= 1/10 {
>> � � � �forvalues y=1998(1)2010{
>> � � � � � � � � � � � �forvalues m=1(1)12{
>> � � � � � � � � � � � � � � � count if fundid==`f' &
>> (date>=ym(`y'-3,`m') & date<=ym(`y',`m'))
>> � � � � � � � �}
>> � � � �}
>> �}
>>
>> On Wed, Mar 2, 2011 at 1:11 PM, S.A.J.van Vijfeijken
>> <[email protected]> wrote:
>>
>>
>>> As a means of return comparisment in my mutual fund research I am using the Fama-French factors as a measurement of excess return.
>>>
>>> However, when I want to regress the Fama-French factors for every fund in my database Stata only calculates the values for the first fund number.
>>>
>>> The data used is from the CRSP mutual fund data base, monthly mutual fund returns and the monthly Fama-French factors from the website of Kenneth R. �French for the 1995-2010 period. I have used the panel data command tsset to let Stata know I’m using panel data.
>>>
>>> The part of the code that does not work is as follows:
>>>
>>> gen exret = mret - rf
>>> sort crsp_fundno
>>> egen fundid = group(crsp_fundno)
>>> gen hmlexp = .
>>> gen smbexp = .
>>> gen performance = .
>>>
>>> forvalues f= 1(1)11281{
>>> � � � �forvalues y=1998(1)2010{
>>> � � � � � � � � � � � �forvalues m=1(1)12{
>>> � � � � � � � � � � � �quietly reg exret mktrf hml smb if fundid==`f' & (date>=ym(`y'-3,`m') & date<=ym(`y',`m'))
>>> � � � � � � � � � � � �replace performance = _b[_cons] if e(sample)
>>> � � � � � � � � � � � �replace hmlexp = _b[hml] if e(sample)
>>> � � � � � � � � � � � �replace smbexp = _b[smb] if e(sample)
>>> � � � � � � � �}
>>> � � � �}
>>> }
>>>
>>> As you can see I want the data for the 1998-2010 period and I have 11281 mutual funds in my data currently. However Stata only calculates the coefficients for the first fund, and leaves all the others as missing(.), the way they are generated. I looked at the Statalist database and the FAQ, however the solutions offered there did not help.
>>> http://www.stata.com/support/faqs/data/foreach.html
>>> I cannot figure out why Stata stops after the first fund and can’t find the error in the code (if there is one).
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/