--- Gisella Young <[email protected]> wrote:
> On Maarten Buis's suggestion, I am not sure why I would really need
> a regression - I get from his email that this is basically for
> smoothing?
Yes, as income in the example dataset (and I assume in your dataset as
well) is a continuous variable, there just aren't enough cases for each
income value to estimate the proportions.
> Since I actually want to plot the actual data (but realise
> that this needs smoothing),
You have to choose one or the other, and if you choose to smooth than
my use of -mlogit- is probably the easiest method that will ensure that
the smoothed proportions will add up to one.
> what I would prefer to do would be to have income plotted in for
> example percentiles (on the x-axis) showing for each percentile the
> composition of that income percentile in terms of occupation on the
> y-axis.
If I understand you correctly, all that is different from my example is
that you want to do that on a transformed metric of income. The way to
do the percentile rank transformation is discussed here:
http://www.stata.com/support/faqs/stat/pcrank.html
This has been implemented in the example below (and just because I felt
like it, I replaced the legend with a second y-axis)
*--------------- begin example -----------------------
// prepare the example data
sysuse nlsw88, clear
gen ind_gr = industry
recode ind_gr 1/5=1 6=2 7=3 8/10=4 11=5 12=6
label define ind_gr 1 "manual" ///
2 "trade" ///
3 "finance" ///
4 "other services" ///
5 "professional services" ///
6 "public administration"
label value ind_gr ind_gr
// compute percentile ranks
egen n = count(wage)
egen i = rank(wage)
gen hazen = (i - 0.5) / n * 100
label variable hazen "percentile rank of income"
// smooth the proportions
mkspline s_w=hazen, cubic nknots(5)
mlogit ind_gr s_w*
predict pr*
// create the graph
gen zero = 0
gen one = 100
gen l1 = (pr1)*100
gen l2 = (pr1 + pr2)*100
gen l3 = (pr1 + pr2 + pr3)*100
gen l4 = (pr1 + pr2 + pr3 + pr4)*100
gen l5 = (pr1 + pr2 + pr3 + pr4 + pr5)*100
sort hazen
// collect the labels for the second y-axis
local mid = l1[_N]/2
local yaxis `"`mid' "manual""'
local mid = (l2[_N]-l1[_N])/2 + l1[_N]
local yaxis `"`yaxis' `mid' "trade""'
local mid = (l3[_N]-l2[_N])/2 + l2[_N]
local yaxis `"`yaxis' `mid' "finance""'
local mid = (l4[_N]-l3[_N])/2 + l3[_N]
local yaxis `"`yaxis' `mid' "other services""'
local mid = (l5[_N]-l4[_N])/2 + l4[_N]
local yaxis `"`yaxis' `mid' "professional services""'
local mid = (100-l5[_N])/2 + l5[_N]
local yaxis `"`yaxis' `mid' "public administration""'
twoway rarea zero l1 hazen, yaxis(1) || ///
rarea l1 l2 hazen, yaxis(2) || ///
rarea l2 l3 hazen || ///
rarea l3 l4 hazen || ///
rarea l4 l5 hazen || ///
rarea l5 one hazen, ///
ytitle("percentage") ///
ylab(`yaxis', axis(2)) ///
yscale(range(0 100) axis(1)) ///
yscale(range(0 100) axis(2)) ///
ytitle("", axis(2)) ///
plotregion(margin(zero)) ///
aspect(1) ///
legend(off)
*---------------------- end example -----------------
(For more on how to use examples I sent to the Statalist, see
http://home.fsw.vu.nl/m.buis/stata/exampleFAQ.html )
Hope this helps,
Maarten
-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands
visiting address:
Buitenveldertselaan 3 (Metropolitan), room N515
+31 20 5986715
http://home.fsw.vu.nl/m.buis/
-----------------------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/