Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: extract values from kdensity graphic
From
[email protected]
To
[email protected]
Subject
Re: st: extract values from kdensity graphic
Date
Thu, 3 May 2012 02:27:43 +1000
Many thanks Nick,
-group1d- doesn't suit my application (versions of Stata aside) as I don't
want to have to specify the number of groups. I really like the kdensity
plot because it automatically determines the number of groups (which are
in the hundreds for my real data sets).
Unfortunately -round- often fails to group sizes appropriately in my full
data sets too, as the clusters don't always align with the rounding units.
The kdensity plot shows exactly what I want, but alas I can't extract it's
data (trough coordinates).
Any more thoughts from the list?
Mike.
Another way of looking at these data is to apply -group1d- (SSC). In fact
Mike cannot do that himself because it needs Stata 9, but he can use the
results. With a least-squares criterion explained in the help and
references given, -group1d- yields as the best 5 groups
Group Size First Last Mean SD
5 8 23 100.62 30 100.91 100.75 0.09
4 1 22 98.41 22 98.41 98.41 0.00
3 6 16 97.19 21 97.39 97.29 0.06
2 8 8 96.11 15 96.34 96.25 0.07
1 7 1 94.74 7 95.08 94.95 0.11
In fact, just about any method of cluster analysis should find the same
groups if they are genuine, e.g. -cluster kmeans-. Then use whatever
summary you prefer.
Details follow for -group1d-.
. sort size
. group1d size, max(7)
Partitions of 30 data up to 7 groups
1 group: sum of squares 143.60
Group Size First Last Mean SD
1 30 1 94.74 30 100.91 97.43 2.19
2 groups: sum of squares 23.00
Group Size First Last Mean SD
2 9 22 98.41 30 100.91 100.49 0.74
1 21 1 94.74 21 97.39 96.12 0.93
3 groups: sum of squares 6.62
Group Size First Last Mean SD
3 8 23 100.62 30 100.91 100.75 0.09
2 15 8 96.11 22 98.41 96.81 0.66
1 7 1 94.74 7 95.08 94.95 0.11
4 groups: sum of squares 1.26
Group Size First Last Mean SD
4 8 23 100.62 30 100.91 100.75 0.09
3 7 16 97.19 22 98.41 97.45 0.40
2 8 8 96.11 15 96.34 96.25 0.07
1 7 1 94.74 7 95.08 94.95 0.11
5 groups: sum of squares 0.20
Group Size First Last Mean SD
5 8 23 100.62 30 100.91 100.75 0.09
4 1 22 98.41 22 98.41 98.41 0.00
3 6 16 97.19 21 97.39 97.29 0.06
2 8 8 96.11 15 96.34 96.25 0.07
1 7 1 94.74 7 95.08 94.95 0.11
6 groups: sum of squares 0.14
Group Size First Last Mean SD
6 8 23 100.62 30 100.91 100.75 0.09
5 1 22 98.41 22 98.41 98.41 0.00
4 6 16 97.19 21 97.39 97.29 0.06
3 8 8 96.11 15 96.34 96.25 0.07
2 5 3 94.95 7 95.08 95.01 0.05
1 2 1 94.74 2 94.89 94.81 0.08
7 groups: sum of squares 0.10
Group Size First Last Mean SD
7 2 29 100.84 30 100.91 100.88 0.04
6 6 23 100.62 28 100.76 100.71 0.05
5 1 22 98.41 22 98.41 98.41 0.00
4 6 16 97.19 21 97.39 97.29 0.06
3 8 8 96.11 15 96.34 96.25 0.07
2 5 3 94.95 7 95.08 95.01 0.05
1 2 1 94.74 2 94.89 94.81 0.08
Groups Sums of squares
1 143.60
2 23.00
3 6.62
4 1.26
5 0.20
6 0.14
7 0.10
On Wed, May 2, 2012 at 9:34 AM, Nick Cox <[email protected]> wrote:
In practice,
gen sizer = round(size)
is a simpler way of degrading your data. Check by
scatter sizer size
Nick
On Wed, May 2, 2012 at 9:16 AM, <[email protected]> wrote:
* Hi Statalist,
* I'm a beginner using version 8.
* The following measurements were collected by a machine in my lab...
clear
input sampling_event size
1 94.74
2 94.89
3 94.95
4 94.97
5 95
6 95.05
7 95.08
8 96.11
9 96.22
10 96.24
11 96.27
12 96.27
13 96.27
14 96.32
15 96.34
16 97.19
17 97.26
18 97.26
19 97.32
20 97.34
21 97.39
22 98.41
23 100.62
24 100.69
25 100.69
26 100.76
27 100.76
28 100.76
29 100.84
30 100.91
end
list
twoway (scatter size sampling_event)
* My aim is to class these size values into categories (5 categories in
* the example shown).
* kdensity will generate the following graphic...
kdensity size , w(0.1) n(30)
* The troughs of this graphic are a good way to define the bounds of
* each category.
* Category_4, for example would include all size values larger than 98
* and less than 99.
* I'd like to extract these trough points as a kdensity post-estimation
* and output them as a new variable.
* Is this possible?
* Look forward to any advice the list has to offer.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/