Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Non-overlapping scatter mlabels

From	Nick Cox <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: Non-overlapping scatter mlabels
Date	Tue, 27 Aug 2013 08:05:17 +0100

Ulrich Kohler wrote an -egen- function which is in -egenmore- (SSC).

  mlabvpos(yvar xvar) [ , log polynomial(#) matrix(5x5 matrix) ] automatically
        generates a variable giving clock positions of marker labels given names
        of variables yvar and xvar defining the axes of a scatter plot. Thus the
        command generates a variable to be used in the scatter option
mlabvpos().

        The general idea is to pull marker labels away from the data region. So,
        marker labels in the lower left of the region are at clock
positions 7 or
        8, and those in the upper right are at clock-position 1 or 2, etc.  More
        precisely, considering the following rectangle as the data region, then
        marker labels are placed as follows:

        +--------------+
        |11 12 12 12  1|
        |10 11 12  1  2|
        | 9  9 12  3  3|
        | 8  7  6  5  4|
        | 7  6  6  6  5|
        +--------------+

        Note that there is no attempt to prevent marker labels from
overplotting,
        which is likely in any dataset with many observations. In such
situations
        you might be better off simply randomizing clock positions with say
        ceil(uniform() * 12).

        If yvar and xvar are highly correlated, than the clock-positions are
        generated as follows (which is however the same general idea):

        +--------------+
        |      12  1  3|
        |   12 12  3  4|
        |11 11 12  5  5|
        |10  9  6  6   |
        | 9  7  6      |
        +--------------+

        To calculate the positions, the x axis is first categorized into 5 equal
        intervals around the mean of xvar. Afterwards the residuals from
        regression of yvar on xvar are categorized into 5 equal intervals. Both
        categorized variables are then used to calculate the positions according
        to the first table above.  The rule can be changed with the option
        matrix().

        log indicates that residuals from regression are to be calculated using
        the logarithms of xvar. This might be useful if the scatter shows a
        strong curvilinear relationship.

        polynomial(#) indicates that residuals are to be calculated from a
        regression of yvar on a polynomial of xvar. For example, use poly(2) if
        the scatter shows a U-shaped relationship.

        matrix(#) is used to change the general rule for the plot
positions.  The
        positions are specified by a 5 x 5 matrix, in which cell [1,1] gives the
        clock position of marker labels in the upper left part of the data
        region, and so forth.  (Stata 8.2 required.)

    . egen clock = mlabvpos(mpg weight)
    . scatter mpg weight, mlab(make) mlabvpos(clock)
    . egen clock2 = mlabvpos(mpg weight), matrix(11 1 12 11 1 \\ 10 2 12 10 2 \\
        9 3 12 9 3 \\ 8 4 6 8 4 \\ 7 5 6 7 5)
    . sc mpg weight, mlab(make) mlabvpos(clock2)

Nick
[email protected]


On 27 August 2013 07:44,  <[email protected]> wrote:
> Dear Statalisters,
>
> I do a lot of batch processing of graphs and want to add some scatterplots to my output.
> Non-overlapping marker labeling has always been an issue in scatterplots and I'm thinking about ways to reduce the risk of overlapping.
>
> In many cases manual specification of the "mlabposition" option within -scatter- might be sufficient to avoid overlaps.
> For batch creation of graphs here has to be an algorithm for this. I will start working on this.
>
> Even more advanced ways may also consider option "mlabgap" or even different x and y scatter values for label display.
>
>
> Here's an difficult example:
>
> sysuse auto if rep == 3, clear
> scatter mpg trunk || scatter mpg trunk , msymbol(i) mlabel(make) mlabsize(*.7)
>
>
> Has anybody ever worked in that direction? I didn't find anything.
>
> Best wishes
> Stefan Gawrich
> Dillenburg
> Germany
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Non-overlapping scatter mlabels
  - From: <[email protected]>

Prev by Date: st: Non-overlapping scatter mlabels
Next by Date: Re: st: generate random data referring to given covariance matrix
Previous by thread: st: Non-overlapping scatter mlabels
Next by thread: Re: st: Non-overlapping scatter mlabels
Index(es):
- Date
- Thread