Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Normalize Variables by s.d. (programmatically)
From
Ryan Turner <[email protected]>
To
[email protected]
Subject
st: Normalize Variables by s.d. (programmatically)
Date
Wed, 21 Dec 2011 15:46:13 -0500
Hi all,
I want to normalize each variable in my regressions by its own standard deviation. Simple enough, but I have a lot of variables and a lot of regressions, so I would like to do this programmatically to simplify the code and avoid mistakes. Further, the data have many missing records and most of the time the number of observations are different for each regression; therefore I need to calculate a variable's standard deviation over the subset of records included in that particular regression.
I have been searching for three days for a simple way to do this, and, finding none, have spent significant time writing a program "doreg" which takes a varlist to regress on, backs up that varlist, creates a rule to determine what records would be included in the regression (e.g. if !missing(varlist)), and for each variable in varlist, it calculates the s.d. given that rule and replaces the original variable with the normalized value (divide by standard deviation). I more or less got it working, but the program is plagued by special cases; dummy variables, difference operators, wildcards. When my program devolved into manually parsing wildcards I knew it was time to ask for help.
So, what is the proper way to accomplish my goal? Do I just need to get my program working, or is there some other fundamentally better way to do it? I have included my program doreg for reference but really I am looking for a higher level response.
Thanks,
--
Ryan J. Turner <[email protected]>
// Reference program; fails when passing a wildcard in the varlist
// Function to normalize beta by standard deviation:
capture program drop doreg
program define doreg
syntax anything [, *]
//local reg_list `anything'
local bak_reg_list
// backup each `item' in `reg_list' and return `bak_reg_list'
foreach item in `anything' {
// generate backed up name of `item'
local bak_item "bak_`item'"
// check if `item' exists
capture confirm var `item'
if _rc == 0 { // `item' exists
// check that `bak_item' is empty
capture confirm var `bak_item'
assert _rc != 0
// move `item' to `bak_item'
rename `item' `bak_item'
}
else { // `item' does not exist OR ITEM CONTAINS WILDCARD
// check that `item' has already been backed up
capture confirm var `bak_item'
assert _rc == 0
}
// we now know all variables have been backed up; none of
// `reg_list' exists now
local bak_reg_list `bak_reg_list' `bak_item'
}
// generate test of what observations are included in regression
local keep_list = "!missing(" + subinstr("`bak_reg_list'"," ",",",.) + ")"
assert length("`keep_list'") < 244
// generate normalized variables from `bak_reg_list'
foreach bak_item of var `bak_reg_list' {
// generate original item name
local item = substr("`bak_item'",5,.)
// Get s.d. of `bak_item' for observations included in the regression
quietly: summ `bak_item' if `keep_list'
// divide item by its s.d. and store in new_item
gen `item' = `bak_item' / r(sd)
// Add new_item to the list of regression variables
local reg_list `reg_list' `item'
}
// do the actual regression
xtreg `reg_list', `options'
// drop so that we don't accidently reuse this one-time varlist
drop `reg_list'
end
eststo: doreg s_sem_grd_pts s_tot_time , robust // success
eststo: doreg s_n_courses s_tot_time , robust // success
eststo: doreg s_sem_grd_pts s_tot_time app_avg_grade cohort curr_year_* major_*, robust // FAILS ON curr_year_*
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/