Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: extract values from kdensity graphic
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: extract values from kdensity graphic
Date
Fri, 4 May 2012 07:45:39 +0100
. quantile size
or
. qplot size
is an excellent simple plot for looking for gaps in datasets of small
or moderate size. (-qplot- is from SJ.)
Nick
On Thu, May 3, 2012 at 5:42 PM, Nick Cox <[email protected]> wrote:
> -x- is by construction equally spaced and in any case not the original data.
>
> I suggest that a fairer graph is
>
> graph twoway (connected d size if group == 1) ///
> (connected d size if group == 2) ///
> (connected d size if group == 3) ///
> (connected d size if group == 4) ///
> (connected d size if group == 5)
>
> which shows that your method based on gaps agrees well with the kernel
> density default -- in this example.
>
> Nick
>
> On Thu, May 3, 2012 at 5:24 PM, Seed, Paul <[email protected]> wrote:
>> Dear Statalist,
>>
>> As Nick points out, this is becoming quite a complex problem.
>> I actually would not use -kdensity-, as it does
>> not capture the essential features of Mike's original data set.
>>
>> A simpler approach is to look at the differences between successive values,
>> and declare a new group whenever the gap is large (for a suitable value
>> of "large"). This can be quite easily done in version 8.
>>
>>
>> ***** Begin example **********
>>
>> * Enter Mike's data set
>> set more off
>> clear
>> input sampling_event size
>> 1 94.74
>> 2 94.89
>> 3 94.95
>> 4 94.97
>> 5 95
>> 6 95.05
>> 7 95.08
>> 8 96.11
>> 9 96.22
>> 10 96.24
>> 11 96.27
>> 12 96.27
>> 13 96.27
>> 14 96.32
>> 15 96.34
>> 16 97.19
>> 17 97.26
>> 18 97.26
>> 19 97.32
>> 20 97.34
>> 21 97.39
>> 22 98.41
>> 23 100.62
>> 24 100.69
>> 25 100.69
>> 26 100.76
>> 27 100.76
>> 28 100.76
>> 29 100.84
>> 30 100.91
>> end
>> list
>> twoway (scatter size sampling_event)
>>
>> * Indentify groups
>> sort size
>> gen step = size -size[_n-1]
>>
>> * Use -stem- to quickly assess the step sizes
>> stem step
>> * In the example, steps are all <=0.1 or >= 0.85
>> * I declare a new group for any step > 0.5
>> * I could change this depending on the data set
>>
>> gen group = step >0.5
>> replace group = sum(group)
>>
>> * Check groups are well defined
>> bys group : su size
>>
>> * Graph the various groups in different colours
>> graph twoway (connected size sampling_event if group == 1) ///
>> (connected size sampling_event if group == 2) ///
>> (connected size sampling_event if group == 3) ///
>> (connected size sampling_event if group == 4) ///
>> (connected size sampling_event if group == 5)
>> * That looks good
>>
>> * Now try out -kdensity-; pick up the plotted values in x and d
>> kdensity size , w(0.1) n(30) gen(x d)
>>
>> graph twoway (connected d x if group == 1) ///
>> (connected d x if group == 2) ///
>> (connected d x if group == 3) ///
>> (connected d x if group == 4) ///
>> (connected d x if group == 5)
>> * kdensity just does not seem to capture the groups I see in the simple scatter plot.
>>
>>
>> ********** End example **************
>>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/