Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: Non-overlapping scatter mlabels |
Date | Tue, 27 Aug 2013 08:05:17 +0100 |
Ulrich Kohler wrote an -egen- function which is in -egenmore- (SSC). mlabvpos(yvar xvar) [ , log polynomial(#) matrix(5x5 matrix) ] automatically generates a variable giving clock positions of marker labels given names of variables yvar and xvar defining the axes of a scatter plot. Thus the command generates a variable to be used in the scatter option mlabvpos(). The general idea is to pull marker labels away from the data region. So, marker labels in the lower left of the region are at clock positions 7 or 8, and those in the upper right are at clock-position 1 or 2, etc. More precisely, considering the following rectangle as the data region, then marker labels are placed as follows: +--------------+ |11 12 12 12 1| |10 11 12 1 2| | 9 9 12 3 3| | 8 7 6 5 4| | 7 6 6 6 5| +--------------+ Note that there is no attempt to prevent marker labels from overplotting, which is likely in any dataset with many observations. In such situations you might be better off simply randomizing clock positions with say ceil(uniform() * 12). If yvar and xvar are highly correlated, than the clock-positions are generated as follows (which is however the same general idea): +--------------+ | 12 1 3| | 12 12 3 4| |11 11 12 5 5| |10 9 6 6 | | 9 7 6 | +--------------+ To calculate the positions, the x axis is first categorized into 5 equal intervals around the mean of xvar. Afterwards the residuals from regression of yvar on xvar are categorized into 5 equal intervals. Both categorized variables are then used to calculate the positions according to the first table above. The rule can be changed with the option matrix(). log indicates that residuals from regression are to be calculated using the logarithms of xvar. This might be useful if the scatter shows a strong curvilinear relationship. polynomial(#) indicates that residuals are to be calculated from a regression of yvar on a polynomial of xvar. For example, use poly(2) if the scatter shows a U-shaped relationship. matrix(#) is used to change the general rule for the plot positions. The positions are specified by a 5 x 5 matrix, in which cell [1,1] gives the clock position of marker labels in the upper left part of the data region, and so forth. (Stata 8.2 required.) . egen clock = mlabvpos(mpg weight) . scatter mpg weight, mlab(make) mlabvpos(clock) . egen clock2 = mlabvpos(mpg weight), matrix(11 1 12 11 1 \\ 10 2 12 10 2 \\ 9 3 12 9 3 \\ 8 4 6 8 4 \\ 7 5 6 7 5) . sc mpg weight, mlab(make) mlabvpos(clock2) Nick njcoxstata@gmail.com On 27 August 2013 07:44, <Stefan.Gawrich@hlpug.hessen.de> wrote: > Dear Statalisters, > > I do a lot of batch processing of graphs and want to add some scatterplots to my output. > Non-overlapping marker labeling has always been an issue in scatterplots and I'm thinking about ways to reduce the risk of overlapping. > > In many cases manual specification of the "mlabposition" option within -scatter- might be sufficient to avoid overlaps. > For batch creation of graphs here has to be an algorithm for this. I will start working on this. > > Even more advanced ways may also consider option "mlabgap" or even different x and y scatter values for label display. > > > Here's an difficult example: > > sysuse auto if rep == 3, clear > scatter mpg trunk || scatter mpg trunk , msymbol(i) mlabel(make) mlabsize(*.7) > > > Has anybody ever worked in that direction? I didn't find anything. > > Best wishes > Stefan Gawrich > Dillenburg > Germany > > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/