On 5/21/07, Alexander Staus <[email protected]> wrote:
Dear Stata users,
in my panel dataset I want a dummy for the most occurred value in a variable.
e.g. for a household a variable can take values from 1 to 250, value 15 is the most named
value in one household, so I want a dummy which is 1 when the household named 15
otherwise 0.
I have some idea but I'm lost in getting the proper loop:
tab var, gen(var)
forvalues i=1(1)250 {
bysort household: gen N_`i'=sum(var`i')
bysort household: replace N_`i'=N_`i'[_N]
}
* Now some loop for:
bysort hh: gen dummy=1 if N_1 > N_2 & N_1 > N_3 & N_1 > N_4 ********more
possible values from 1 to 250 but not every number is named************* & var1 ==1
* here all other possibilities...
Some idea or an easier way?
You can avoid using loops in this instance...
* 1. Reshape your data to long to make this easier....
reshape long N_, i(hh) j(n)
* 2. Now generate a variable that is the maximum observed variable
bysort hh: gen N_max = max(N_)
* 3. Now create a dummy variable...
bysort hh: gen dummy = cond(N_ == N_max, 1, 0)
* 4. Drop the maximum value that you created
drop N_max
* 5. If needed reshape your data back to wide...
reshape wide N_, i(hh) j(n)
I *think* that should do the trick.
I created some dummy data in long format and it works..
set obs 100
gen hh = round(_n / 10)
gen N_ = int(uniform() * 250)
* Now perform steps 2-4
bysort hh: egen N_max = max(N_)
bysort hh: gen dummy = cond(N_ == N_max, 1, 0)
list
drop N_max
Neil
--
"In mathematics you don't understand things. You just get used to
them." - Johann von Neumann
Email - [email protected] / [email protected]
Website - http://slack.ser.man.ac.uk/
Photos - http://www.flickr.com/photos/slackline/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/