Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Bootstrap command when used with cluster and strata options
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: Bootstrap command when used with cluster and strata options
Date
Thu, 24 Oct 2013 10:56:25 +0100
Hmmm... So, what would the "fix" be? At first sight, you asked for
something you didn't want. It's difficult for Stata to know that.
Nick
[email protected]
On 24 October 2013 10:31, Chris Frost <[email protected]> wrote:
Thanks for the succinct illustration of the problem and neat "get
round" using egen. I do think that this is a trap for the unwary
though and should really be fixed in the software (I can conceive of
no situation where newid needs to be crossed with strata in the
bootstrap samples - and plenty of situations where the introduction of
this artificial sharing of newid across strata will cause errors if
not corrected).
Austin Nichols <[email protected]> 23/10/2013 19:10
> No need, I can see what you mean in a simple example:
>
> clear
> set seed 1
> set obs 10
> g s=_n<5
> g i=_n
> bsample, strata(s) cluster(i) idcluster(newid)
> egen c=group(s newid)
> list
>
> and I assume you need a newid that can act as a identifier across
> strata, so you need to generate a c as above. You can wrap your
> commands to bootstrapped in a -program- and bootstrap it.
On Wed, Oct 23, 2013 at 12:30 PM, Chris Frost <[email protected]> wrote:
Thanks for your reply - but I do think the problem is with the
program, not with the data. In my data clusters (id) do not cross
strata (group) - the problem is that in each bootstrap sample that is
created the created cluster variable (newid) DOES (erroneously) cross
strata. This can be seen if the bootstrap is run with the "noisily"
option. If you are interested in seeing the behavior I can send you an
annotated do file that illustrates the problem?
Austin Nichols <[email protected]> 23/10/2013 16:42 >>>
>> Sounds like a problem with your data to me, not the program. If your
>> clusters seem to cross strata, because of the coding in your data, you
>> can define a new cluster variable
>> egen newc=group(group id)
>> or you can specify that clusters are defined by two variables
>> bootstrap, strata(group) cluster(group id) idcluster(newid):
On Wed, Oct 23, 2013 at 6:11 AM, Chris Frost <[email protected]> wrote:
I think that there is a problem with the bootstrap command when used
in conjunction with the "cluster" and "strata" options. The problem
arises because the command "bootstrap, strata(group) cluster(id)
idcluster(newid) ....." creates a variable "newid" which is only
unique (at the cluster level) within each strata. For example if there
are 1000 subjects (with multiple measures per subject) each with a
unique id but in two equal size groups the above command will result
in each bootstrap sample having only 500 values of newid with subjects
being erroneously paired up: this will lead to incorrect variance
estimates with a command such as bootstrap, strata(group) cluster(id)
idcluster(newid): mixed outcome i.group || newid:
Am I correct? Can this be fixed?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/