Ramani Gunatilaka asked
> I have a data set with consumption and other variables such
> as number of
> adults, district, sector for each household.
> I need to write a programme that requires selecting the
> particular household
> whose consumption is nearest to the mean consumption of all
> the households.
and Renzo Comolli replied
> How do you plan to handle ties (i.e. a case in which there
> is more than one
> household which is at the same minimal distance from the average
> consumption)?
> If you plan to pick one of them at random then
> . egen meanconsumption=mean(consumption)
> . generate absconsumptiondev=abs(meanconsumption-consumption)
> . sort absconsumptiondev
> . keep in 1
> -sort- already does the randomization for you among ties
>
> If you plan to keep all the ties
> . egen meanconsumption=mean(consumption)
> . generate absconsumptiondev=abs(meanconsumption-consumption)
> . egen mindevfromavgcons=min(absconsumptiondev)
> . keep if absconsumptiondev==mindevfromavgcons
In the same spirit, note that you don't need to
store the mean (which is clearly a constant) in
a variable.
su consumption, meanonly
produces (silently) a mean accessible immediately
thereafter as -r(mean)-, so you can then
gen absconsumptiondev = abs(consumption - r(mean))
Similarly, you don't to need to store the
minimum in a variable, as a similar approach
could be used. In this case, however,
sort absconsumption
would let you look at the first few households.
The -egen- approach really comes into its own
when you want to do something like this
within (e.g.) panels.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/