Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: extract values from kdensity graphic


From   Nick Cox <[email protected]>
To   [email protected]
Subject   Re: st: extract values from kdensity graphic
Date   Fri, 4 May 2012 07:45:39 +0100

. quantile size

or

. qplot size

is an excellent simple plot for looking for gaps in datasets of small
or moderate size. (-qplot- is from SJ.)

Nick

On Thu, May 3, 2012 at 5:42 PM, Nick Cox <[email protected]> wrote:
> -x- is by construction equally spaced and in any case not the original data.
>
> I suggest that a fairer graph is
>
> graph twoway (connected d size if group == 1) ///
>         (connected d size if group == 2) ///
>         (connected d size if group == 3) ///
>         (connected d size if group == 4) ///
>         (connected d size if group == 5)
>
> which shows that your method based on gaps agrees well with the kernel
> density default -- in this example.
>
> Nick
>
> On Thu, May 3, 2012 at 5:24 PM, Seed, Paul <[email protected]> wrote:
>> Dear Statalist,
>>
>> As Nick points out, this is becoming quite a complex problem.
>> I actually would not use -kdensity-, as it does
>> not capture the essential features of Mike's original data set.
>>
>> A simpler approach is to look at the differences between successive values,
>> and declare a new group whenever the gap is large (for a suitable value
>> of "large").  This can be quite easily done in version 8.
>>
>>
>> ***** Begin example **********
>>
>> * Enter Mike's data set
>> set more off
>> clear
>> input sampling_event size
>> 1 94.74
>> 2 94.89
>> 3 94.95
>> 4 94.97
>> 5 95
>> 6 95.05
>> 7 95.08
>> 8 96.11
>> 9 96.22
>> 10 96.24
>> 11 96.27
>> 12 96.27
>> 13 96.27
>> 14 96.32
>> 15 96.34
>> 16 97.19
>> 17 97.26
>> 18 97.26
>> 19 97.32
>> 20 97.34
>> 21 97.39
>> 22 98.41
>> 23 100.62
>> 24 100.69
>> 25 100.69
>> 26 100.76
>> 27 100.76
>> 28 100.76
>> 29 100.84
>> 30 100.91
>> end
>> list
>> twoway (scatter size sampling_event)
>>
>> * Indentify groups
>> sort size
>> gen step = size -size[_n-1]
>>
>> * Use -stem- to quickly assess the step sizes
>> stem step
>> * In the example, steps are all <=0.1 or >= 0.85
>> * I declare a new group for any step > 0.5
>> * I could change this depending on the data set
>>
>> gen group = step >0.5
>> replace group = sum(group)
>>
>> * Check groups are well defined
>> bys group : su size
>>
>> * Graph the various groups in different colours
>> graph twoway (connected size sampling_event if group == 1) ///
>>        (connected size sampling_event if group == 2) ///
>>        (connected size sampling_event if group == 3) ///
>>        (connected size sampling_event if group == 4) ///
>>        (connected size sampling_event if group == 5)
>> * That looks good
>>
>> * Now try out -kdensity-; pick up the plotted values in x and d
>> kdensity size , w(0.1) n(30) gen(x d)
>>
>> graph twoway (connected d x if group == 1) ///
>>        (connected d x if group == 2) ///
>>        (connected d x if group == 3) ///
>>        (connected d x if group == 4) ///
>>        (connected d x if group == 5)
>> * kdensity just does not seem to capture the groups I see in the simple scatter plot.
>>
>>
>> ********** End example **************
>>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index