Thanks very much Stas. The problem is that the estimate goes from a
p-value of less than 0.01% to a p-value of 19% so I am in the dilemma
of trying to figure out which is most reliable. I would truly
appreciate a little bit more of your time. Below you suggest to look
at the confidence intervals. Are you suggesting to compare the
bootstrap intervals with the sandwich intervals? Would it make sense
to check what happens if I increase my repetitions from 1000 to say
5000 given that I have more than 1600 clusters?
I would appreciate any further comments on this.
Erasmo
On 9/15/07, Stas Kolenikov <[email protected]> wrote:
> On 9/14/07, Erasmo Giambona <[email protected]> wrote:
> > Dear Stas,
> > As you expected most of my results are unchanged. However, one of the
> > variable looses significance. The number of cluster that I have is
> > quite large (about 1600). Can it be bootstrapping is eliminating the
> > effect of some outliers?
>
> Well if anything bootstrap amplifies the outliers. Think about say
> inference on a sample mean of 9 values from uniform (0,1), and one
> sample value equal to 10. Then in 35% (=0.9^10) the outlier will be
> absent, and the mean will be around 0.5; in some 38%=(10 choose 1 *
> 0.9^9 * 0.1), it will be present once, so that the mean will be around
> 1.5; and in the remaining cases, the outlier will be resampled twice
> or more often, so you'll see the mean of some 2.5 or more. Out of blue
> sky, you've got a distribution with multiple modes, which may not be
> very close to the true distribution of the mean even if the original
> distribution was heavy tailed, as the distribution of the mean would
> probably be reasonably smooth. Also, the normal approximation for this
> distribution will be terrible, and 1.96 magic number won't work to
> give you the tail 5%. You could look into the -estat bootstrap- after
> all, to see how your confidence interval are doing, as that's where
> the bootstrap really gets an edge against symmetric things like the
> sandwich standard errors.
>
> Besides you would need to remember that anything you get out of sample
> is subject to sampling fluctuations and type I/II/III errors. If your
> variable was borderline with p-value of 3% with the sandwich standard
> errors, and now borderline 7% with the bootstrap standard errors, I
> wouldn't bother.
>
> To Austin: I am reading the wild cluster bootstrap paper, looks
> interesting, although I will suggest another 15 or so references to
> the authors :)).
>
> --
> Stas Kolenikov, also found at http://stas.kolenikov.name
> Small print: Please do not reply to my Gmail address as I don't check
> it regularly.
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/