[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: data management - loop?

From	"Neil Shephard" <[email protected]>
To	[email protected]
Subject	Re: st: data management - loop?
Date	Mon, 21 May 2007 18:11:17 +0100

On 5/21/07, Alexander Staus <[email protected]> wrote:

Dear Stata users,

in my panel dataset I want a dummy for the most occurred value in a variable.

e.g.  for a household a variable can take values from 1 to 250, value 15 is the most named
value in one household, so I want a dummy which is 1 when the household named 15
otherwise 0.

I have some idea but I'm lost in getting the proper loop:


tab var, gen(var)

                forvalues i=1(1)250 {

                        bysort household: gen N_`i'=sum(var`i')
                        bysort household: replace N_`i'=N_`i'[_N]
}

* Now some loop for:

        bysort hh: gen dummy=1 if N_1 > N_2 & N_1 > N_3  & N_1 > N_4 ********more
possible values from 1 to 250 but not every number is named*************  & var1 ==1

* here all other possibilities...

Some idea or an easier way?

You can avoid using loops in this instance...


* 1. Reshape your data to long to make this easier....
reshape long N_, i(hh) j(n)

* 2. Now generate a variable that is the maximum observed variable
bysort hh: gen N_max = max(N_)

* 3. Now create a dummy variable...
bysort hh: gen dummy = cond(N_ == N_max, 1, 0)

* 4. Drop the maximum value that you created
drop N_max

* 5. If needed reshape your data back to wide...
reshape wide N_, i(hh) j(n)

I *think* that should do the trick.

I created some dummy data in long format and it works..

set obs 100
gen hh = round(_n / 10)
gen N_ = int(uniform() * 250)
* Now perform steps 2-4
bysort hh: egen N_max = max(N_)
bysort hh: gen dummy = cond(N_ == N_max, 1, 0)
list
drop N_max

Neil
--
"In mathematics you don't understand things. You just get used to
them."  - Johann von Neumann

Email - [email protected] / [email protected]
Website - http://slack.ser.man.ac.uk/
Photos - http://www.flickr.com/photos/slackline/
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: data management
  - From: "Alexander Staus" <[email protected]>
- st: RE: data management
  - From: "Nick Cox" <[email protected]>
- st: data management - loop?
  - From: "Alexander Staus" <[email protected]>

Prev by Date: Re: st: Multinomial selection, but Selmlog/Svyselmlog not feasible
Next by Date: st: Re: data management - loop?
Previous by thread: st: data management - loop?
Next by thread: st: Re: data management - loop?
Index(es):
- Date
- Thread