Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Programming a slightly complex list of independent variables
From
"Nic" <[email protected]>
To
<[email protected]>
Subject
st: Programming a slightly complex list of independent variables
Date
Wed, 6 Apr 2011 00:51:55 -0400
Hi statalist
I am attempting to create a .do file which will run a number of OLS
regressions containing a single continuous x continuous interaction term.
My ultimate question is: how can I program my regression command so that the
s* and f* variables at the end of the command refer to all "s" and "f"
variables EXCEPT for the two specific "s" and "f" (`x' and `z') variables
referenced at the beginning of the equation?
Here is the applicable code of what I have so far:
-------------------------------------------------------------------------
foreach y of varlist d* {
local laby : variable label `y'
foreach x of varlist s* {
local labx : variable label `x'
local prex = substr("`x'",1,3)
foreach z of varlist f* {
local labz : variable label `z'
local prez = substr("`z'",1,2)
regress `y' `x' `z' i`prex'`prez' g* c* e* s* f*
--------------------------------------------------------------------------
As you can see, the inclusion of s* and f* at the end of the equation will
result in two variables being repeated in the equation: `x' and `z'. The
consequence is that one instance of the repeated variables is omitted
because of collinearity.
I would assume that the second instance (s* or f*) of the repeated variable
in the equation would be the one that is omitted, but this is not always so.
Sometimes it is the first instance (`x' or `z'). Apparently this is normal
("Which variable it omits is somewhat arbitrary") according to the Stata
FAQ, "Why do estimation commands sometimes omit variables?" located at
www.stata.com/support/faqs/stat/drop.html.
The consequence of the above is that the location of the values in the e(b)
and e(V) matrices is unpredictable. This is a problem for me because the
next step in my .do file is to call upon the first and second independent
variables listed in the regression command as well as their interaction term
(to ultimately create a graph):
----------------------
matrix b=e(b)
matrix V=e(V)
scalar b1=b[1,1]
scalar b2=b[1,2]
scalar b3=b[1,3]
scalar varb1=V[1,1]
scalar varb2=V[2,2]
scalar varb3=V[3,3]
scalar covb1b3=V[1,3]
scalar covb2b3=V[2,3]
-----------------------
As you can see, when the second instance of the repeated variables is
omitted, b1/b2/b3 etc refer to a valid cell in the matrix. But when the
first instance is "somewhat arbitrarily" omitted instead, b1/b2/b3 etc no
longer refer to the intended cells in the matrix.
So my ultimate question is: how can I program my regression command so that
the s* and f* variables at the end of the command refer to all "s" and "f"
variables EXCEPT for the two specific "s" and "f" (`x' and `z') variables
referenced at the beginning of the equation? Logic tells me that this is
surely possible but I am still so new to Stata and programming in particular
that I simply have not been able to suss it out.
With gratitude,
Nic
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/