Dear Stata list users:
I need to perform a set of extremely simple operations, but I am not
sure how to tell Stata not to use the missing values. Here is an
example, I just need to calculate the average of five variables (v1,
v2, v3, v4, v5), which is, of course, very simple to do:
gen average=(v1+v2+v3+v4+v5)/5
The problem is that, in cases where any or several of v1 to v5 are
missing, the above operation will generate missing observations for
'average', what I need instead is to calculate the average of the
variables not missing, that is:
if v1==.
Then I need average=(v2+v3+v4+v5)/4
If v1==. & v4==.
Then I need average=(v2+v3+v5)/3
In the lines below appears what I did, somewhat convoluted. In short,
I replaced the missing observations with zeros and then created a
variable (den) which stores the number of non-missing in v1 to v5,
then I calculated the average using den as the denominator.
Is there a simpler, more elegant procedure to calculate averages when
there are missing observations?
'My solution'
gen average=(v1+v2+v3+v4+v5)/5
*'p' stands for present
*This program takes care of cases where 1 or more of the vars is missing and
*then averages only the non-missing vars over time
foreach v of var v1 v2 v3 v4 v5 {
gen p_`v' = 1 if `v' ~= .
replace p_`v' = 0 if p_`v' == .
replace `v' = 0 if `v' == .
}
generate den = p_v1 + p_v2 + p_v3 + p_v4 + p_v5
replace average = (v1 + v2 + v3 + v4 + v5)/den if den~=5
Thank you very much in advance,
Laura
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/