Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: RE: AW: RE: AW: RE: Correct labeling in egenmore axis()?

From	"Nick Cox" <[email protected]>
To	<[email protected]>
Subject	st: RE: RE: AW: RE: AW: RE: Correct labeling in egenmore axis()?
Date	Thu, 13 May 2010 13:59:56 +0100

Marc Kaulisch started a thread on using -eclplot- (Roger Newson, SSC etc.) and -ciplot- (me, SSC) on 7th May. Marc's main interest is plotting confidence intervals for means. That thread mutated into this one.  

The threads mostly focused on using either the bpwide.dta or the bplong.dta installed with Stata for examples. (Being able to use the same dataset is an enormous aid to discussion.) 

As below, Marc and I had some discussion off list. 

Here is my summary of the points I want to emphasise from these threads. Naturally, that leaves every possibility for Marc to have his own differing point of view, not least on what works best with his own data. I make no comment on -eclplot-, which remains Roger's offspring. 

1. I am happy if anyone finds my -ciplot- useful, but I'd much rather people look at my -stripplot- (also SSC) first. 

2. For the blood pressure data, broken down by before vs after, sex and age group, I offer this example as a useful graph: 

sysuse bplong, clear 
egen group = group(age sex), label 
stripplot bp*, bar over(when) by(group, compact col(1) note("")) /// 
subtitle(, pos(9) ring(1) nobexpand bcolor(none) placement(e)) ytitle("")

(Perhaps "Before" should come above "After" too.)

3. My -egen- function -axis()- from -egenmore- does what it was designed to do, including omitting categories from labels to avoid repetition. 

Nick 
[email protected] 

Nick Cox

If you look back at my first posting in this thread, you will see that my mention of -egen, axis()- was clearly (a) embedded in a discussion labelled as wider-ranging than your question, (b) labelled as "some technique", not necessarily an answer to your question and (c) in response to your mention of sorting on means in your first post.  

But I don't know why you still talk about sorting on means, as all your recent code examples clearly show attempts to sort subsets by categorical variable. 

Either way, the original example of -egen, axis()- evidently does not do what you want, so you shouldn't use or copy it literally. 

To the present: "does not work as I expect it should work" is not something I can say much in response to. I am in meetings almost all today, but I will send you my version of _gaxis.ado (1.0.2) to see if we can make progress off-list. 

Nick 
[email protected] 

Kaulisch, Marc

Nick,

With including mean1 I followed your suggestion from your first answer (see http://stata.com/statalist/archive/2010-05/msg00338.html) and the help file in order to sort categories by their means. And it seems in this combination (with two grouping categories) the labelling with axis does not work as I would expect it should work. Option Reverse does not change the picture - at least here....

Marc


-----Ursprüngliche Nachricht-----
Von: [email protected] [mailto:[email protected]] Im Auftrag von Nick Cox
Gesendet: Dienstag, 11. Mai 2010 19:43
An: [email protected]
Betreff: st: RE: AW: RE: Correct labeling in egenmore axis()?

Marc's replies to 1 and 3 shows that what I think abstractly is a good or bad idea doesn't map onto what he wants concretely. Fair enough. 

In terms of 2, I have looked into the inside of -axis()- and find at a crucial point -- as called by Marc -- that function's view of the world looks like this: 

     +----------------------------------------------------+
     | `touse'    mean1   agegrp      sex            axis |
     |----------------------------------------------------|
  1. |        0        .    30-45     Male              . |
  2. |        0        .    30-45   Female              . |
  3. |        0        .    46-59     Male              . |
  4. |        0        .    46-59   Female              . |
  5. |        0        .      60+     Male              . |
     |----------------------------------------------------|
  6. |        0        .      60+   Female              . |
  7. |        1    149.9    30-45   Female   30-45 Female |
  8. |        1   151.15    46-59   Female          46-59 |
  9. |        1   153.45    30-45     Male     30-45 Male |
 10. |        1   159.05    46-59     Male          46-59 |
     |----------------------------------------------------|
 11. |        1   159.85      60+   Female     60+ Female |
 12. |        1    165.3      60+     Male           Male |
     +----------------------------------------------------+

-axis()-'s designed behaviour is not to mention any category with the same value as in the previous group. It does exactly that for Marc's data. But because he is including -mean1- in the arguments, the results are not what he wants. 

What he wants should, I think, be obtained with a different call to -egen, axis()-. 

sysuse bpwide, clear
tempfile tf1 tf2
statsby mean1=r(mean) ub1=r(ub) lb1=r(lb) N1=r(N), by(agegrp sex)
saving(`tf1'): ci bp_before
statsby mean2=r(mean) ub2=r(ub) lb2=r(lb) N2=r(N), by(agegrp sex)
saving(`tf2'): ci bp_after
dsconcat `tf1' `tf2'
egen axis = axis(agegrp sex), reverse
twoway scatter  axis mean1 || rcap ub1 lb1 axis, hori || scatter  axis
mean2 || rcap ub2 lb2 axis, hori , ///
			ylabel(1(1)6, labs(vsmall) nogrid val
angle(hori))			///	
			ytitle("")
///
			legend(label(1 "Mean bp_before") label(2 "CI
bp_before") ///
			label(3 "Mean bp_after") label(4 "CI bp_after")
size(vsmall) rows(2) span)

Nick
[email protected] 

Kaulisch, Marc

Ad 1: The missings on mean1 are on purpose because I want to display/plot mean1 and mean2 in one row per category.
So the simplified code is:
---
sysuse bpwide, clear
statsby mean1=r(mean) ub1=r(ub) lb1=r(lb) N1=r(N), by(agegrp sex): ci bp_before sort agegrp sex mean1 egen axis = axis(mean1 agegrp sex), label(agegrp sex) egen group = group(agegrp sex), label
---

Even here, labelling is not doing what it is supposed to do (see Nick's 2. point)

Ad 3: I realised that your solution uses a long dataset. But I am not sure if it is suitable for me because (see ad 1) I would like to compare confidence intervals for blood pressure before and after in one row per category.
(I reshape my data already in a long format in order to create a categorical var).


Nick Cox

I see three issues here:

1. What you are feeding to -egen, axis()- includes missing values on -mean1-. -list- what you are feeding it to see that. 

The -axis()- function can't know what those missing values should be. It ignores them, therefore. Note that its -missing- option won't help here, as the missings would still be classified differently from the non-missings. 

So, you need to fix the data before you call -egen, axis()-. 

2. Independently of that, I think you've unearthed a bug in -axis()-, but I don't yet know what it is. 

3. As with previous examples, I think you are making the problem more difficult than it need be. The bplong dataset is in more congenial structure than the bpwide dataset and wouldn't pose this problem for you, as one of my previous examples showed. Although it's not your real data, presumably, there's probably an implication for that, i.e. things may be easier after a -reshape-. 

Kaulisch, Marc

Follow up on my earlier graphing issue.

It looks like if the label-option in egenmore (ssc) axis() is not doing what it supposed to do or am I overlooking something again?

-----
sysuse bpwide, clear
tempfile tf1 tf2
statsby mean1=r(mean) ub1=r(ub) lb1=r(lb) N1=r(N), by(agegrp sex)
saving(`tf1'): ci bp_before
statsby mean2=r(mean) ub2=r(ub) lb2=r(lb) N2=r(N), by(agegrp sex)
saving(`tf2'): ci bp_after
dsconcat `tf1' `tf2'
sort agegrp sex mean1
egen axis = axis(mean1 agegrp sex), label(agegrp sex) replace axis = axis[_n-1] if axis == .
egen group = group(agegrp sex), label
----

Here I get as labels in axis correctly labelled cases and incorrect labelled cases whereas group() does the labelling correctly.

Correct labels are 30-45 Male
Incorrect labels are 46-59 or Male

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: AW: RE: AW: RE: Correct labeling in egenmore axis()?
  - From: "Kaulisch, Marc" <[email protected]>
- st: RE: AW: RE: AW: RE: Correct labeling in egenmore axis()?
  - From: "Nick Cox" <[email protected]>

Prev by Date: Re: st: adjusted r-squared, regress with pweight
Next by Date: st: Stata stuck in loop with -rndpoi-?
Previous by thread: st: RE: AW: RE: AW: RE: Correct labeling in egenmore axis()?
Next by thread: st: AW: RE: AW: RE: Correct labeling in egenmore axis()?
Index(es):
- Date
- Thread