Thank you for your replies. However sorry to come back but I am still stuck and wonder whether I could bother people for further advice. On Maarten Buis's suggestion, I am not sure why I would really need a regression - I get from his email that this is basically for smoothing? Since I actually want to plot the actual data (but realise that this needs smoothing), what I would prefer to do would be to have income plotted in for example percentiles (on the x-axis) showing for each percentile the composition of that income percentile in terms of occupation on the y-axis.
I guess one way would be a histogram or bar chart, but what I really want is a continuous area plot with percentiles of income on the x-axis and percentage in each occupation (for each income percentile) on the y-axis.
I'm not sure whether a dot graph as Martin Weiss suggests will really do this, I've looked into it but it seems quite different unless I am misunderstanding? Also, I am struggling with even the first step of creating another variable first with the proportions of each occupation for each income group (eg percentile). I have tried functions such as sumdist, pctile, and xtile (downloaded from SSC) but they are not dividing the population into equally sized percentile groups. I have tried for instance -sumdist income if date==2007 [fw=weight], n(100) qgp(test)- but the groups are not of the same size.
I'm hopeful that there must be a simple way to do this, in part because in Excel it can be done in a few minutes (but excel of course can't handle large survey data as I am dealing with). Sorry to bother the list with these follow-up enquiries.
best,
Gisella
--- On Sun, 11/30/08, Maarten buis <[email protected]> wrote:
> From: Maarten buis <[email protected]>
> Subject: Re: st: how to make an area graph showing distribution?
> To: [email protected]
> Date: Sunday, November 30, 2008, 10:52 AM
> --- Gisella Young wrote:
> > I am trying to make a chart showing the distribution
> of income by
> > occupation. On the x-axis I would like the
> distribution of income
> > from 0 to the highest. Then on the y-axis I want to
> show the
> > proportion of people in different occupations. I have
> a variable
> > (occup) with 6 different occupational categories. In
> other words, I
> > want to show how the different occupations fit into
> income
> > distribution, by showing how the occupational
> breakdown of income
> > changes moving up the income spectrum. I thought an
> area chart
> > (summing to 100) would be the best way to do this,
> although there
> > might be better ways which I would be open to
> suggestions. I have
> > tried the twoway area function with different
> variations, but it
> > doesn't seem to be right (just gives a crazy chart
> with lines all
> > over) and I'm not sure how to do it.
>
> You'll probably need to smooth the proportion as you
> won't have for
> each wage in your data enough cases within each
> occupational category
> to reliably estimate the proportions. In the example below
> I have done
> so by estimating a -mlogit- predicting occupational
> catagory with a
> wage represented as a restricted cubic spline (see -help
> mkspline-). I
> treat the predicted probabilities as the smoothed
> proportions.
>
> For the graph I created the variables zero, one, and l1
> till l5. The
> logic is that on the y-axis the first band should range
> from 0 (zero)
> to the first proportion (l1), on second band should start
> at the first
> proportion and end at the first + the second proportion,
> etc. Two
> things are worth noting: 1) you need to sort first on wage
> (or use the
> sort option) to avoid creating modern art, and 2) I
> reversed the order
> in the legend (going from 6 to 1) so that the order in
> which they
> appear in the legend corresponds with the order in which
> they appear in
> the graph (1 at the bottom and 6 at the top).
>
> *--------------- begin example -----------------------
> // prepare the example data
> sysuse nlsw88, clear
> gen ind_gr = industry
> recode ind_gr 1/5=1 6=2 7=3 8/10=4 11=5 12=6
> label define ind_gr 1 "manual" ///
> 2 "trade" ///
> 3 "finance" ///
> 4 "other services" ///
> 5 "professional services" ///
> 6 "public administration"
> label value ind_gr ind_gr
>
> // smooth the proportions
> mkspline s_w=wage, cubic nknots(5)
> mlogit ind_gr s_w*
> predict pr*
>
> // create the graph
> gen zero = 0
> gen one = 1
> gen l1 = pr1
> gen l2 = pr1 + pr2
> gen l3 = pr1 + pr2 + pr3
> gen l4 = pr1 + pr2 + pr3 + pr4
> gen l5 = pr1 + pr2 + pr3 + pr4 + pr5
>
> sort wage
> twoway rarea zero l1 wage || ///
> rarea l1 l2 wage || ///
> rarea l2 l3 wage || ///
> rarea l3 l4 wage || ///
> rarea l4 l5 wage || ///
> rarea l5 one wage, ///
> legend(order( 6 "public administration"
> ///
> 5 "professional services"
> ///
> 4 "other services"
> ///
> 3 "finance"
> ///
> 2 "trade"
> ///
> 1 "manual" ))
> *---------------------- end example -----------------
> (For more on how to use examples I sent to the Statalist,
> see
> http://home.fsw.vu.nl/m.buis/stata/exampleFAQ.html )
>
> Hope this helps,
> Maarten
>
> -----------------------------------------
> Maarten L. Buis
> Department of Social Research Methodology
> Vrije Universiteit Amsterdam
> Boelelaan 1081
> 1081 HV Amsterdam
> The Netherlands
>
> visiting address:
> Buitenveldertselaan 3 (Metropolitan), room N515
>
> +31 20 5986715
>
> http://home.fsw.vu.nl/m.buis/
> -----------------------------------------
>
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/