Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: extract values from kdensity graphic |
Date | Thu, 3 May 2012 17:42:17 +0100 |
-x- is by construction equally spaced and in any case not the original data. I suggest that a fairer graph is graph twoway (connected d size if group == 1) /// (connected d size if group == 2) /// (connected d size if group == 3) /// (connected d size if group == 4) /// (connected d size if group == 5) which shows that your method based on gaps agrees well with the kernel density default -- in this example. Nick On Thu, May 3, 2012 at 5:24 PM, Seed, Paul <paul.seed@kcl.ac.uk> wrote: > Dear Statalist, > > As Nick points out, this is becoming quite a complex problem. > I actually would not use -kdensity-, as it does > not capture the essential features of Mike's original data set. > > A simpler approach is to look at the differences between successive values, > and declare a new group whenever the gap is large (for a suitable value > of "large"). This can be quite easily done in version 8. > > > ***** Begin example ********** > > * Enter Mike's data set > set more off > clear > input sampling_event size > 1 94.74 > 2 94.89 > 3 94.95 > 4 94.97 > 5 95 > 6 95.05 > 7 95.08 > 8 96.11 > 9 96.22 > 10 96.24 > 11 96.27 > 12 96.27 > 13 96.27 > 14 96.32 > 15 96.34 > 16 97.19 > 17 97.26 > 18 97.26 > 19 97.32 > 20 97.34 > 21 97.39 > 22 98.41 > 23 100.62 > 24 100.69 > 25 100.69 > 26 100.76 > 27 100.76 > 28 100.76 > 29 100.84 > 30 100.91 > end > list > twoway (scatter size sampling_event) > > * Indentify groups > sort size > gen step = size -size[_n-1] > > * Use -stem- to quickly assess the step sizes > stem step > * In the example, steps are all <=0.1 or >= 0.85 > * I declare a new group for any step > 0.5 > * I could change this depending on the data set > > gen group = step >0.5 > replace group = sum(group) > > * Check groups are well defined > bys group : su size > > * Graph the various groups in different colours > graph twoway (connected size sampling_event if group == 1) /// > (connected size sampling_event if group == 2) /// > (connected size sampling_event if group == 3) /// > (connected size sampling_event if group == 4) /// > (connected size sampling_event if group == 5) > * That looks good > > * Now try out -kdensity-; pick up the plotted values in x and d > kdensity size , w(0.1) n(30) gen(x d) > > graph twoway (connected d x if group == 1) /// > (connected d x if group == 2) /// > (connected d x if group == 3) /// > (connected d x if group == 4) /// > (connected d x if group == 5) > * kdensity just does not seem to capture the groups I see in the simple scatter plot. > > > ********** End example ************** > > Paul T Seed, Senior Lecturer in Medical Statistics, > > Division of Women's Health, King's College London > Women's Health Academic Centre KHP > 020 7188 3642, > paul.seed@kcl.ac.uk, > http://www.kcl.ac.uk/medicine/research/divisions/wh/about/people/seedp.aspx > > Please do not send unencrypted un-anonymised data to this address. > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/