<[email protected]> asks about bootstrapping -clogit- results, using the
-cluster()- option of -bootstrap-:
> I am using Stata 7.
>
> Given that -clogit- doesn't have the option of clustered standard errors, I
> performed bootstrap to correct them.
>
> This is the code:
>
> #delimit;
> set more 1;
> set matsize 800;
> set seed 1;
>
> bs "clogit choice private public time cost distpri distpub incpri incpub,
> group(id)" "_b[time] _b[cost] _b[distpri] _b[distpub] _b[incpri] _b[incpub]",
> cluster(area) reps(0) saving(bsclog) replace;
>
> I got my results, but it took Stata 2 hours to compute the se with 0
> replications, and 6 hours with 200 replications. -clogit- on the same data
> (456399 obs when arranged in the long format) takes 1mn to run and I'm
> running it on the University network. When I included the controls, it took
> Stata 1 month to compute the se, for a model it takes 13mn to run with
> clogit!!!!!
>
> I went a that point to try and check what was making it so slow, and decided to
> draw the random sample manually, then do -clogit-.
>
> This is the code for 1 draw only: (I'm planning to do a loop for the number of
> replications required once I solve the problem below)
>
> set seed 1
> set matsize 800
> set more 1
>
> bsample, cluster(area)
> clogit choice private public time cost distpri distpub incpri incpub, group(id)
>
> I got mixed results in the sense that the speed at which I obtained the
> results was as expected, but -bsample- is mixing up the data as I should have
> 1:2 matching (McFadden choice model), and I get 4:8.
>
> So summing-up, my problem with -bsample- is how to incorporate the id so I
> could have the appropiate matching.
Bootstrapping -clogit- (in the absence of clusters)
------------------------------------------------------------------------------
Let's begin by discussing how to bootstrap results from -clogit-; we'll talk
about -clogit- with clustered groups later.
The -clogit- command requires grouped data. Thus, when bootstrapping the
results from -clogit-, you need to sample the groups (each group as a whole)
instead of the observations. That is, each group is itself a cluster of
information, thus use the -cluster()- option of -bootstrap- to sample the
groups.
It is usually the case that we need to specify the "cluster" variable in the
estimation command. For -clogit-, we identify this variable in the -group()-
option. Remember, this variable identifies the groups we are sampling with
replacement, thus each group that is sampled more than once must have a unique
identifier. That is, if the group with "id==1" is sampled twice, the repeat
group must have a different identifying value than the original. This is
accomplished using the -idcluster()- option.
Here we bootstrap the results from the first example in [R] clogit.
***** BEGIN: c1.do
version 7
clear
use http://www.stata-press.com/data/r7/clogitid
gen myid = id
bs "clogit y x1 x2, group(myid)" "_b[x1] _b[x2]", cluster(id) idclust(myid) /*
*/ dot
***** END: c1.do
Notice that -bs- will produce cluster samples using the -id- variable, but
will call -clogit- using the -myid- variable to identify the groups. -myid-
contains unique values for each sampled group.
Clustered -clogit-
------------------------------------------------------------------------------
"uctpmtd" has a slightly more complicated situation. There are clusters of
groups, so we need to sample the clusters with replacement, but still uniquely
identify the sampled groups. The -bs- command cannot handle this without a
little help from the user.
If -bs- were to supply me with the -group()- and -idcluster()- variables, I
could generate a new group variable that uniquely identified the sampled
groups (across the clusters), then run the -clogit- command with the new group
variable. The following details how I accomplished this.
Using the data from the above example, I artificially create a cluster
variable -clust-, each containing at most 5 groups.
***** BEGIN: c2a.do
version 7
clear
use http://www.stata-press.com/data/r7/clogitid
set seed 1234
* generate a cluster variable
sort id
by id: gen clust = _n==1
replace clust = 1+mod(sum(clust),5)
***** END: c2a.do
In order to ensure that -clogit- gets uniquely identified groups, while
sampling the clusters with replacement, I wrote a short program and placed it
in an ado-file: myclogit.ado (listed below).
-myclogit- is a wrap-around to -clogit-. Its purpose is to generate a new
group variable from the original group variable and the -idcluster()-
variable. The variables and options are passed through to -clogit-.
***** BEGIN: myclogit.ado
program define myclogit
version 7
syntax varlist , group(varname) idcluster(varname) [ * ]
/* preserve original order within -group()- */
tempvar newgroup order
gen `order' = _n
/* generate a new group id variable */
sort `idcluster' `group' `order'
by `idcluster' `group' : gen `newgroup' = _n==1
replace `newgroup' = sum(`newgroup')
clogit `varlist' , group(`newgroup') `options'
end
***** END: myclogit.ado
With -myclogit- I can now use -bs- to bootstrap the standard errors of the
coefficients, while accounting for clustering of groups.
***** BEGIN
gen myclust = clust
bs "myclogit y x1 x2, group(id) idcluster(myclust)" "_b[x1] _b[x2]", /*
*/ cluster(clust) idclust(myclust) dot
***** END
P.S.
Remember that when a group has multiple choices, -clogit- must account for all
possible choice combinations. In "uctpmtd"'s first attempt to bootstrap
results from -clogit-, each group that was sampled multiple times was causing
-clogit- to go through that much more work. "uctpmtd" should not experience
this if -myclogit- is used as described above.
--Jeff
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/