[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: bootstrap command -- cluster and strata options

From	"Michael Blasnik" <[email protected]>
To	<[email protected]>
Subject	st: Re: bootstrap command -- cluster and strata options
Date	Wed, 14 Jul 2004 11:21:34 -0400

My understanding is that the cluster option will cause bootstrap to resample
clusters, not observations within cluster.  Given that you have only 2
clusters, there are only 3 possible samples -- sample all of each cluster
(should happen half the time), sample all of cluster 1 twice (a quarter of
the time), sample all of cluster 2 twice (also a quarter of the time).  The
95th percentile can therefore take on just 3 values -- depending on which of
these three samples is drawn.  Your results appear to confirm this
interpretation.

Michael Blasnik
[email protected]


----- Original Message ----- 
From: <[email protected]>
To: <[email protected]>
Sent: Wednesday, July 14, 2004 10:14 AM
Subject: st: bootstrap command -- cluster and strata options
>
> Dear Statalisters:
>
> I am trying to understand what the "cluster" and "strata" options do on
> -bootstrap-.  I may be misinterpreting the manual with respect to what
> these options do because when I gin up a dataset to which I think I know
> what the result should be,  the Stata answer doesn't seem to be what I
> expected.
>
> Basically, I set up a data set which is drawn from two distributions --
> 1000 observations from a uniform distribution of from 0 to 100 and 1000
> observations from a uniform distribution from 0 to 1000.  "Score" is the
> value, group is a "1" or "2" indicating whether it was drawn from the
> U(0,100) or U(0.1000) distribution, and id is a unique identifier.
> The final data set description and summary  is as follows:
>
<snip>

> I am interested in sampling by "group" so tried both the -cluster- and
> -strata- options (only the cluster option shown below -- but both
> produce results I did not expect).  Specifically, I would like Stata to,
> when it samples, to  repeatedly sample from only group 1 or group 2
> (i.e., not mix a group 1 value with a group 2 value).  I am interested
> in the 95th percentile values that result from the exercise.  I would
> expect the -saving(bsout)- output from this command to contain a value
> close to 95 half  of the time and close to 950 the remainder of the
> time.  This would be true if Stata were consistently sampling from the
> U(0,100) half of the time and the U(0,1000) the remaining half.  I used
> the following command (output follows) :
>
>
> . bootstrap "summarize score, detail" r(p95), reps(500) saving(bsout)
> cluster(group) replace
>
> command:      summarize score , detail
> statistic:    _bs_1      = r(p95)
>
<snip>
>
> . tabulate  _bs_1
>
>      r(p95) |      Freq.     Percent        Cum.
> ------------+-----------------------------------
>       95.48 |        112       22.40       22.40
>      899.03 |        261       52.20       74.60
>      950.77 |        127       25.40      100.00
> ------------+-----------------------------------
>       Total |        500      100.00
>
>
> Again, not at all what I expected (it's discrete and tri-valued and I
> thought it would be continuous).  I thought the appropriate command
> would (for the expected continuous distribution) be -histogram _bs_1-
> and I would have seen a bimodal distribution centered on 95 and 950.
> What I would like to see is a distribution which results from either
> repeated sampling from group 1 (ca. half the time) OR repeated sampling
> from group 2 (the remainder fo the time).  My reading and understanding
> of  the -cluster-  and -strata- options under -bootstrap- must be
> faulty.  Can anyone let me know what I am missing here?  Or what I might
> do to obtain what I am looking for?
>
> I am sure that the problem lies with my (mis)understanding, but I am
> using Stata 8.2:
>
> David Miller
> Health Effects Division
> Office of Pesticide Programs


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: bootstrap command -- cluster and strata options
  - From: [email protected]

Prev by Date: RE: st: gllamm question - estimating 4 random effects
Next by Date: Re: st: gllamm question - estimating 4 random effects
Previous by thread: st: bootstrap command -- cluster and strata options
Next by thread: Re: st: bootstrap command -- cluster and strata options
Index(es):
- Date
- Thread