Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "J. J. W." <bsc.j.j.w@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | st: Re: Efficiently looping through countries and years counting and computing the percentage of people whom selected a specific answer |
Date | Thu, 6 Jun 2013 04:58:25 +0200 |
Dear all, I have a small problem, which I have solved, but I was wondering whether: - What the usual way is to do this? - Can this be implemented more efficiently? Suppose I have Country Year Female Netherlands 1990 1 Netherlands 1990 0 Netherlands 1990 1 Netherlands 1991 1 Netherlands 1991 1 Netherlands 1991 1 Netherlands 1992 1 Netherlands 1992 0 ... Well now I would like to calculate the amount of females as the percentage of total. Now do this for every country for every year. Well I've devised a script for it, presented below: gen per_female= 0 /* Getting the maximum and minimum indices for countries */ su country_id, meanonly /* For all different countries */ forvalues i = `r(min)'/`r(max)'{ su year if country_id == `i', meanonly /* For all different years */ forvalues j = `r(min)'/`r(max)'{ count if country_id == `i' & female== 1 & year == `j' local nr_females= r(N) count if country_id == `i' & year == `j'& (female== 1 | female== 0) local nr_obser = r(N) replace trust2 = `nr_females'/`nr_obser' if country_id == `i' & year == `j' } } It basically works, however there are some problems. a) I do not believe this is an efficient computation since there are a LOT of cases there are no replacements at all. How can I make this more efficient? b) Is my way, "the way to go"? I believe this is more like programming and I am wondering how this can be done more easily in STATA (even though my method is relatively easy and straight forward). c) At the moment you see that I did this: "(female== 1 | female== 0)", basically this ensures that I only count the observations that I have and eliminates the ones that I have missing values for (females == .). Is this correct? Should I handle missing data in this way? Any suggestions, advice or comments are very helpful and appreciated! Thank you for your answer! Wen Jun Jie * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/