Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: regression r(103): too many variables
From
[email protected]
To
[email protected]
Subject
Re: st: RE: regression r(103): too many variables
Date
Wed, 24 Feb 2010 13:43:10 -0800
Now that you've figured out what caused the error message, perhaps you
should reconsider your proposed analysis. You have too few
observations to fit 2500 predictors.The rule of thumb, I believe, is
that the ratio of observations to coefficients should be greater than
10:1.
Steve
On Wed, Feb 24, 2010 at 8:01 AM, Paul Higgins <[email protected]> wrote:
> Hi all,
>
> Thanks for all of your suggestions: they were a big help. My code contained an error that is probably a classic newbie misstep: misusing hyphens when making lists of variables. The rhs of my regression contained thousands of interactions between sets of dummy variables (96 dummies representing quarter-hour time increments interacted with 22 date values of special import for the problem I was investigating, yielding a total of 2112 altogether just for that one pair of variables). To construct these, I used code of the following form:
>
> /*****************************/
> /* generate separate dummies */
> /* for each event date */
> /*****************************/
>
> #delimit ;
> local eventdates "mdy(1,13,2009) mdy(2,20,2009) mdy(3,27,2009)
> mdy(4,10,2009) mdy(4,17,2009) mdy(5,18,2009)
> mdy(5,23,2009) mdy(5,24,2009) mdy(6,30,2009)
> mdy(7,1,2009) mdy(7,9,2009) mdy(8,14,2009)
> mdy(8,15,2009) mdy(9,16,2009) mdy(9,18,2009)
> mdy(9,19,2009) mdy(10,3,2009) mdy(11,2,2009)
> mdy(11,3,2009) mdy(12,7,2009) mdy(12,8,2009)
> mdy(12,9,2009)";
> #delimit cr
> local c = 1
> foreach x of local eventdates {
> gen byte dum_`c' = (dt==`x')
> local c = `c' + 1
> }
>
> /************************************/
> /* interact each event date dummy w/*/
> /* each quarter-hour interval dummy */
> /************************************/
>
> forvalues x = 1/96 {
> forvalues y = 1/22 {
> gen byte dum_`y'_int_`x' = dum_`y'*int_`x'
> }
> }
>
> Due to the order I used to nest the two loops, the variables weren't created in the same sequence as that assumed by my hyphenated lists in my regress statement. I am a recent arrival in Stata-world (having been born in SAS-land, and having emigrated here via several other intermediate stops along the way), and in most other stats programs I've worked with, a single hyphen in a list of this type (i.e., dum_1_int_1-dum_1_int_96) would be expanded out in logical sequential fashion (i.e., dum_1_int_1 dum_1_int_2 ...). But Stata expanded it out in the physical order in which the variables appeared in the data set (i.e., dum_1_int_1 dum_2_int_1 ...). Thus, my regressions contained far more than 2500 rhs variables -- mostly redundant ones! Once I replaced the hyphenated lists in the regress statement with wild-card versions (e.g., dum_1_int_*), all was well.
>
> Thanks again for your assitance.
>
> Paul H.
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Martin Weiss
> Sent: Wednesday, February 24, 2010 1:59 AM
> To: [email protected]
> Subject: AW: st: RE: regression r(103): too many variables
>
>
> <>
>
> Andi may want to use
>
>
> *************
> des, short
> *************
>
> to prevent clutter on his screen.
>
>
> HTH
> Martin
>
> -----Ursprüngliche Nachricht-----
> Von: [email protected]
> [mailto:[email protected]] Im Auftrag von
> [email protected]
> Gesendet: Mittwoch, 24. Februar 2010 06:13
> An: [email protected]
> Betreff: Re: st: RE: regression r(103): too many variables
>
> Verify that you actually have 2500 variables, possibly by running
> -des- on the variable list.
>
> Steve
> --- Paul Higgins
>> I am trying to use regress to run a linear regression. The
>> specification has a lot of rhs variables (around 2500), the
>> majority of which are binary (0/1) variables. <snip> I am
>> getting r(103), "Too many variables specified".
>
>
> On Tue, Feb 23, 2010 at 1:08 PM, Martin Weiss <[email protected]> wrote:
>>
>> <>
>>
>>
>> This runs w/o a hitch in Stata 10.1 MP. Takes something like 2 minutes:
>>
>> *******
>> clear*
>> set mem 500m
>> set obs 13700
>>
>> foreach var of newlist var1-var2500{
>> gen byte `var'=runiform()<.3
>> }
>>
>> gen y=rnormal()
>> reg y var1-var2500
>> *******
>>
>>
>> HTH
>> Martin
>>
>>
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of Paul Higgins
>> Sent: Dienstag, 23. Februar 2010 21:28
>> To: '[email protected]'
>> Subject: st: regression r(103): too many variables
>>
>> Hi all,
>>
>> I am trying to use regress to run a linear regression. The specification
>> has a lot of rhs variables (around 2500), the majority of which are binary
>> (0/1) variables. The data set contains about 13700 observations. At the
>> top of the .do file I set mem to 5 gigabytes, maxvar to 10000 and matsize
> to
>> 10000. I'm using Stata / SE 10.1 for Windows, under Windows XP
> Professional
>> x64 edition version 5.2, on a machine that has 8 gigabytes of physical
>> memory on-board. I am getting r(103), "Too many variables specified".
> I've
>> poked around the documentation, and I can see no mention of any internal
>> limits to the regress command regarding number of variables. Thus, I have
>> assumed that only the general limits for Stata SE apply: maximum of 32767
>> variables, maximum matsize of 11000. But I appear to be wrong.
>>
>> Suggestions, please?
>>
>> PaulH
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>>
>
>
>
> --
> Steven Samuels
> [email protected]
> 18 Cantine's Island
> Saugerties NY 12477
> USA
> 845-246-0774
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
--
Steven Samuels
[email protected]
18 Cantine's Island
Saugerties NY 12477
USA
845-246-0774
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/