Dear Nick,
Thanks so much! I highly appreciate your kind support.
Have A Wonderful Day!
Many thanks!
Quang
On 10/2/07, n j cox <[email protected]> wrote:
> Note that the general issue is also discussed at
>
> How do I create variables summarizing for each individual properties of
> the other members of a group?
> http://www.stata.com/support/faqs/data/members.html
>
> Apart from sums and means -- when we can use short-cuts hased
> on some rearrangement of, or implication of,
>
> sum for everyone = sum for others + value for this individual
>
> -- this kind of problem usually requires a loop. In the FAQ
> just cited, it is shown that you can do by it looping
> over within-group identifiers, rather than the whole
> dataset.
>
> However, the trade-offs are not very clear to me.
>
> -_pctile- is built in, while any call to -egen- involves
> an interpretative overhead. On the other hand, -_pctile-
> can only emit one 75th percentile at a time, and -egen-
> with -by()- can calculate several at a time by side-stepping
> -_pctile-. The precise trade-offs would probably depend on the size of
> the dataset and the number of groups.
>
> No doubt you could also speed it up using Mata or writing
> more direct code.
>
> Nick
> [email protected]
>
> Quang Nguyen asked
>
> A simplified version of my data looks as follows:
>
> ID Group X
> 1 a 5
> 2 a 7
> 3 a 9
> 4 a 8
> 5 b 3
> 6 b 4
> 7 b 9
> ..........................
>
> I would like to generate a new variable whose value is the 75 percentile of
> other individuals in the same group as the concerned individual. For
> example, for the first individual (ID=1), this will be: 75 percentile
> of {7, 9, 8}.
>
> and Joseph Coveney replied
>
> -findit percentile- turns up a lot to pore over. But among the results
> is -egen <varname> = pctile(exp), p(#)-, which can take a -by- varlist.
>
> Try something like:
> bysort Group: egen p75 = pctile(X), p(75)
>
> To finish: an observation is going to lie beneath, above or on a given
> percentile for its group, so there's a smarter (more efficient)
> algorithm, but a brute-force approach is shown below.
>
> clear *
> set more off
> set seed `=date("2007-09-29", "YMD")'
> set obs 100
> generate byte pid = _n
> generate byte group = mod(_n, 10)
> generate double response = uniform()
> *
> * Begin here
> *
> tempvar tmpvar0 tmpvar1
> sort group
> generate double p75 = .
> generate double `tmpvar0' = .
> quietly forvalues i = 1/`=_N' {
> replace `tmpvar0' = response if _n != `i'
> by group: egen double `tmpvar1' = pctile(`tmpvar0'), p(75)
> replace p75 = `tmpvar1' in `i'
> drop `tmpvar1'
> replace `tmpvar0' = .
> }
> drop `tmpvar0'
> list in 1/20, noobs sepby(group)
> exit
>
> Although my suggestion was centered around -egen-, which is very often a
> convenience, you can usually do things more efficiently. For example,
> in this case, -_pctile if . . ., percentiles(75)- and then -replace p75
> = r() in . . . - would avoid redundancy of -by . . .: egen . . .
> pctile()- where all of the other groups' results are calculated and
> discarded each time. There are other ways to polish the suggestion, too,
> and difference would be noticeable with large datasets and many groups.
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
--
"My father gave me the greatest gift anyone could give another person,
he believed in me." - Jim Valvano
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/