Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: RE: RE: RE: Silverman test
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
st: RE: RE: RE: Silverman test
Date
Thu, 3 Mar 2011 20:27:39 +0000
Having looked again at -diptest- I'd go for the dip test. It is free of all arbitrary details about you estimate density and leads directly to both descriptive statistics and a P-value for those so inclined.
I'd also recommend a look at -hsmode- and -shorth- from SSC.
Modes are the poor cousin of means and medians, but they have their fascination. Sooner or later I will write a Speaking Stata column about them in the Stata Journal.
Nick
[email protected]
Alfonso Miranda
Many thanks for your kind e-mail. I'll think carefully about your comments. Please let me know if you have any further thoughts about this.
Nick Cox [[email protected]]
I've not done this that I can recall. I think it's likely to be more interesting in theory than in practice. I think modes work best when there is an independent scientific story about a mixture of distinct types, but even when there is a known mixture (e.g. males plus females) a mix of quite different sub-populations can still end up unimodal.
Also, in practice, although not necessarily in your case, stupid things like human observers' digit preferences and conventions about the resolution of measurement can interfere too.
I can't think of clever tricks for #modes. I think you just count directly. A mode has higher density than its neighbours, so that once a density estimate is sorted appropriately for k values on a grid the main set of modes is given by
(1) density[i] > density[i-1] & density[i] > density[i+1]
You would need a decision on whether to count maxima at the extremes, i.e. if
(2) density[1] > density[2]
or
(3) density[k] > density[k-1]
Even for unimodal distributions with both tails going to zero, you might get modes of type (2) or (3) for low kernel half-width. I guess in practice it shouldn't matter much what you do so long as you do it consistently.
I'd add that -kdensity- often gives better results if you estimate on a transformed scale, typically logarithmic or logit, and then back transform. This is a standard trick, explained in most texts, but few people seem to know about it or use it. There are some details and examples in
<http://www.stata-journal.com/sjpdf.html?articlenum=gr0003>
But if you have a spike of zeros that will be tricky too.
I have not thought much about your last question.
Another approach is to think about measuring the bumps on the quantile function or distribution function in some way.
I implemented the Hartigan-Hartigan dip test in -diptest- from SSC. Someone a while ago suggested that I'd garbled the description of this test in its help, which seems quite likely. I've just remembered that I've still got to check that suggestion.
It strikes me that your problem, as I understand it, is extra-difficult, as you have to distinguish the granularity that is inevitable for a counted variable (which may well have implications for residuals too) from the multimodality that may or may not exist.
Nick
[email protected]
Alfonso Miranda
I need to determine whether the residuals from a (Poisson) regression are
unimodal or multimodal. I thought of using a Silverman test. Sometime ago
(2004) Nick Cox and Stephen Pollard exchanged a message in Statalist about
the Silverman test (see below). Nick suggested:
> 3. Ignore -silvtest- and do this directly with
> -bootstrap- and -kdensity-.
>
> The last would seem by far the easiest. A program
> of the order of 10 lines long would appear to be
> needed to produce the mode count.
I am trying myself to implement this last strategy. Before I go on, however,
I wonder whether:
1. Has Nick or somebody else already implemented this?
2. I am not sure how to count the number of modes after fitting the
kdensity, will you give me some ideas on how to do this?
3. The critical bandwidth h_m is the smallest possible bandwidth h that
produces a kernel density with, at most, m modes. So, finding the critical
bandwidth implies searching for a range of bandwidths and determining the
number of modes. I do not have indication, however, what range of bandwidths
I should start my search with or when declaring that I found the `smallest
possible bandwidth'. Any ideas on how to do this?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/