Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: AW: st: RE: Decile sorts


From   Philipp Rehm <[email protected]>
To   [email protected]
Subject   Re: AW: st: RE: Decile sorts
Date   Fri, 10 Nov 2006 11:27:06 +0100

A potential work-around on a command that's not by-able is to use -levelsof- in combination with a loop.

Example: Assume you want price deciles by car type (foreign). The following may get you there:

sysuse auto, clear

gen deciles_price=.
levelsof foreign, local(l)
foreach C in `l' {
xtile deciles_price_`C'=price if foreign==`C', n(10)
replace deciles_price=deciles_price_`C' if foreign==`C'
drop deciles_price_`C'
}

You can make this loop arbitrarily complicated to incorporate more variables (although it's not particularly pretty to read): Just add more -levelsof- and -foreach--loops.

There may well be much better ways to do this, though.

HTH,
Philipp


Thomas Erdmann wrote:

A further note on Jeph's suggestion:

It looks very convenient, but I need to adjust for the fact that I do not
need the mean of the same item but of a different attribute:

foreach X of varlist c1* {
xtile deciles_`X'=`X', n(10)
bysort deciles_`X': egen Rr`X'=mean(c1ds_ri)
}

But a problem still remains: the deciles are calculated over all observations - but what I need is
calculating the mean of deciles by yrm (my time variable representing
YearMonth) and afterwards the mean of all deciles groups (1-10) over all
yrm's. I was not able to integrate this into this short solution as -by- is
not allowed for -xtile- .
-Tom




-----Urspr�ngliche Nachricht-----
Von: [email protected]
[mailto:[email protected]] Im Auftrag von Jeph Herrin
Gesendet: Freitag, 10. November 2006 01:26
An: [email protected]
Betreff: Re: st: RE: Decile sorts

Oops, don't forget to drop -deciles-

foreach X of varlist c1* {
xtile deciles=`X', n(10)
bys deciles: egen R`X'=mean(`X')
drop deciles
}






Jeph Herrin wrote:

Maybe I'm missing something, but why not:

foreach X of varlist c1* {
   xtile deciles=`X', n(10)
   bys deciles: egen R`X'=mean(`X')
}

?

hth,
Jeph


Nick Cox wrote:
Various comments sprinkled here and there. You may have
strong reasons to use these decile bins, but binning strikes me as, usually, at best a means towards an end (or perhaps ends towards some means). Some nonparametric
regression might do more justice to the data.
Also, you are mixing two naming conventions 1...10 and 10...90. Just use one.
Nick [email protected]
Thomas Erdmann

I am trying to sort my observations into deciles according to one attribute
and afterwards calculating the average of another attribute of those ten groups.

Please find the code I came up with below [lines with ... are omitted], yrm is the time variable (YearMonth)

(1) As far as I can tell it works out, but a) it's a lot of code and
b)produces a lot of variables and c)generating the output is rather awkward.

Could you give me hints on how to implement a smarter solution or if there
are any errors in the way the calculation is carried out currently?

*** Generate Percentiles
sort yrm foreach X of varlist c1* {
by yrm: egen p10_`X'= pctile(`X'), p(10.0)
by yrm: egen p20_`X'= pctile(`X'), p(20.0)
by yrm: egen p30_`X'= pctile(`X'), p(30.0)
...
by yrm: egen p90_`X'= pctile(`X'), p(90.0)
}
This is two loops rolled out into one.
sort yrm foreach X of varlist c1* { forval i = 10(10)90 { by yrm : egen p`i'_`X' = pctile(`X'), p(`i') }
}

*** Sort into Percentile groups
foreach X of varlist c1* {
gen G_`X'=1 if `X'<p10_`X' & `X'~=.
replace G_`X'=2 if `X'>p10_`X' & `X'<p20_`X' ... replace G_`X'=9 if `X'>p80_`X' & `X'<p90_`X' replace G_`X'=10 if `X'>p90_`X' & `X'~=.
}
Similar story with boundary conditions.
foreach X of varlist c1* {
gen byte G_`X' = `X' < p10_`X' forval i = 2/9 { local j = 10 * `i' replace G_`X' = `i' if `X' < p`j'_`X' & G_`X' == 0 }
replace G_`X' = cond(`X' == ., ., 10) if G_`X' == 0 }


*** Calculate return mean for each group
sort yrm
    foreach X of varlist G* {
    by yrm: egen R1`X'= mean(c1ds_ri) if `X'==1
    by yrm: egen R2`X'= mean(c1ds_ri) if `X'==2
    ...
    by yrm: egen R9`X'= mean(c1ds_ri) if `X'==9
    by yrm: egen R10`X'= mean(c1ds_ri) if `X'==10
    }
Why do you need all these variables? The results for bin are disjoint, so can be put in a single variable.
foreach X of varlist G* { bysort yrm `X' : egen R`X' = mean(c1ds_ri)
}
Having said that, it can probably done more directly with a series of -collapse-s.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index