Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Re: Efficiently looping through countries and years counting and computing the percentage of people whom selected a specific answer
From
"J. J. W." <[email protected]>
To
[email protected]
Subject
st: Re: Efficiently looping through countries and years counting and computing the percentage of people whom selected a specific answer
Date
Thu, 6 Jun 2013 04:58:25 +0200
Dear all,
I have a small problem, which I have solved, but I was wondering whether:
- What the usual way is to do this?
- Can this be implemented more efficiently?
Suppose I have
Country Year Female
Netherlands 1990 1
Netherlands 1990 0
Netherlands 1990 1
Netherlands 1991 1
Netherlands 1991 1
Netherlands 1991 1
Netherlands 1992 1
Netherlands 1992 0
...
Well now I would like to calculate the amount of females as the
percentage of total. Now do this for every country for every year.
Well I've devised a script for it, presented below:
gen per_female= 0
/* Getting the maximum and minimum indices for countries */
su country_id, meanonly
/* For all different countries */
forvalues i = `r(min)'/`r(max)'{
su year if country_id == `i', meanonly
/* For all different years */
forvalues j = `r(min)'/`r(max)'{
count if country_id == `i' & female== 1 & year == `j'
local nr_females= r(N)
count if country_id == `i' & year == `j'& (female== 1 | female== 0)
local nr_obser = r(N)
replace trust2 = `nr_females'/`nr_obser' if country_id == `i' & year == `j'
}
}
It basically works, however there are some problems.
a) I do not believe this is an efficient computation since there are a
LOT of cases there are no replacements at all. How can I make this
more efficient?
b) Is my way, "the way to go"? I believe this is more like programming
and I am wondering how this can be done more easily in STATA (even
though my method is relatively easy and straight forward).
c) At the moment you see that I did this: "(female== 1 | female== 0)",
basically this ensures that I only count the observations that I have
and eliminates the ones that I have missing values for (females == .).
Is this correct? Should I handle missing data in this way?
Any suggestions, advice or comments are very helpful and appreciated!
Thank you for your answer!
Wen Jun Jie
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/