Tim Victor <[email protected]> asks:
> This should be easy but I've been working at it for three days now and
> can't find the solution. What I am trying to do is simply plot the
> profiles for 5 cluster solution. All I want to do is plot each cluster
> mean (and error bar) for each attribute in the same graph. Oddly, doing
> this in SAS is only a few lines after transposing the data:
>
> symbol i=std1mjt;
> proc gplot data=plotme;
> plot value * attribute = cluster / haxis=axis1 vaxis=axis2 frame;
> run;
>
> Any suggestions? Thanks.
As Nick Cox might say -- what is SAS?
There is not currently (to my knowledge) a single command or two
in Stata that will produce what you want. However, it can be
done. Let me outline the steps. These steps could be combined
up into an ado program if you were doing this kind of thing a
lot.
Step 1 -- obtain the needed data (the means and std deviations or
std errors) in a layout that can be used in Step 2.
Step 2 -- use -serrbar- (or for more fine control use -graph- and
-gph-) to produce the graph
I will illustrate with the auto data and I will be plotting means
and error bars that are +/- 1.96 * std. deviation. If you want
std. errors, then alter code below. Step 0 is to obtain a five
group cluster solution.
Step 0:
use auto, clear
keep head trunk turn disp
replace disp = disp/20
cluster completelink head trunk turn disp, name(mycl)
cluster gen my5 = group(5)
The variable my5 indicates the five groups. We can view the
data we will want to obtain (the means and std. dev. of the
four variables by the five groups) with:
bysort my5 : summarize head trunk turn disp
Step 1:
There are probably better ways, but here is one way that I
thought of to produce the desired dataset to be used in
graphing.
preserve
foreach var in head trunk turn disp {
statsby "summarize `var'" mean = (r(mean)) sd = (r(sd)) /*
*/ , by(my5) clear
gen str2 name = substr("`var'",1,2)
save mytmp`var' , replace
restore, preserve
}
use mytmphead , clear
foreach var in trunk turn disp {
append using mytmp`var'
}
sort name my5
egen namecl = group(name my5) , label
list
save mynew , replace
-statsby- gives us what we want for a single variable. We
need the results for each of the variables in the cluster
analysis, so we loop over the variables and create little
datasets that we later -append- together.
Step 2:
I will present four alternatives
Alternative 1
serrbar mean sd namecl , scale(1.96) xlab(1/20) ylab
Alternative 2
sort my5 name
serrbar mean sd namecl , scale(1.96) xlab(1/20) ylab c(LII)
Alternative 3
encode name, gen(name2)
sort my5 name2
serrbar mean sd name2, scale(1.96) xlab ylab c(LII)
Alternative 4
gen name3 = name2 + my5/10
serrbar mean sd name3, scale(1.96) xlab ylab c(LII)
Step 3:
restore
After producing the graph we -restore- back to the original
data.
I prefer Alternative 4, but more labeling etc. would be nice. To
get better control of this, you might need to use -graph- and -gph-.
Ken Higbee [email protected]
StataCorp 1-800-STATAPC
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/