Line for the server...
Try -egen, cut()- with the - group(#)- option.
HTH
Martin
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Gisella Young
Sent: Tuesday, December 02, 2008 3:29 PM
To: [email protected]
Subject: st: problem with dividing dataset into equally sized groups
I am trying to divide my dataset into equally sized groups on the basis of
an income variable (eg 100 groups from lowest to highest income). I have
tried several methods but the groups are not equally sized. For example,
-xtile cat=income, n(100)-
(similarly with pctile)
and
-sumdist income, n(100) qgp(cat)-
It produces the desired number of groups but they are not equally sized.
(Which I see by looking at the frequencies when I say -tab cat- thereafter).
The differences are not small - some groups are many times larger than
others. This is not because of weighting as I have tried even without
weights. It is also not related to the size of groups. I wonder whether it
might be because of clustering of incomes around certain values (e.g. 10
000, 15 000) and all of those values being lumped into certain categories.
Can anyone suggest a way to partition the sample into equally sized groups?
This actually stems from an earlier thread (but no need to read that for the
above) about plotting a chart of income distribution with the occupational
composition of each percentile. Austin's suggestion (below) comes close to
that. However, even with his code the groups are not equally sized, but they
are sized the same as when I use the sumdist or xtile commands mentioned
above.
best,
Gisella
--- On Mon, 12/1/08, Austin Nichols <[email protected]> wrote:
> From: Austin Nichols <[email protected]>
> Subject: Re: st: how to make an area graph showing distribution?
> To: [email protected]
> Date: Monday, December 1, 2008, 2:02 AM
> Gisella Young <[email protected]>:
> It may be that you are looking for a simple stacked bar
> graph over
> income quintiles or deciles or the like, as opposed to a
> parametric
> smooth over income quantiles. If so, you might want to
> adapt one of
> this pair of example graphs to your needs:
>
> clear all
> sysuse nlsw88
> ren industry i
> tab i, g(ind)
> g w=round(uniform()*20)
> la var w "fake survey weight"
> _pctile wage [pw=w], nq(5)
> g q=1 if wage<=r(r1)
> forv i=2/5 {
> replace q=`i' if wage>r(r`=`i'-1') &
> wage<=r(r`i')
> }
> loc y
> forv i=1/12 {
> loc l "`=substr("`: var la
> ind`i''",4,.)'"
> loc y `"`y' lab(`i'
> "`l'")"'
> loc lv`i' `"la var ind`i' "`l'"
> "'
> }
> gr bar ind* [pw=w], stack over(q) name(b) leg(`y')
> collapse ind* [pw=w], by(q)
> forv i=2/12 {
> replace ind`i'=ind`i'+ind`=`i'-1'
> }
> loc v
> forv i=1/12 {
> `lv`i''
> loc v "ind`i' `v'"
> }
> tw bar `v' q, name(tw)
>
> Note that the commands above destroy the data in memory, so
> make sure
> you -preserve- or -save- first as appropriate. Also note
> that there
> is no guarantee that the distributions of income by
> occupation, or
> occupation by income category, display any sort of
> stochastic
> dominance that would allow easy ranking of occupations.
>
> See also
> http://www.stata.com/capabilities/graphexamples.html
>
>
> On Sun, Nov 30, 2008 at 10:37 AM, Maarten buis
> <[email protected]> wrote:
> > --- Gisella Young <[email protected]>
> wrote:
> >> On Maarten Buis's suggestion, I am not sure
> why I would really need
> >> a regression - I get from his email that this is
> basically for
> >> smoothing?
> >
> > Yes, as income in the example dataset (and I assume in
> your dataset as
> > well) is a continuous variable, there just aren't
> enough cases for each
> > income value to estimate the proportions.
> >
> >> Since I actually want to plot the actual data (but
> realise
> >> that this needs smoothing),
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/