Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: regression models with small number of clusters


From   "Clive Nicholas" <[email protected]>
To   [email protected]
Subject   Re: st: regression models with small number of clusters
Date   Sat, 8 Jan 2005 02:19:11 -0000 (GMT)

Peter Muhlberger replied to Krishna D Rao:

> You could also try bootstrapping your coefficients from a random effects
> model, which would eliminate the small sample bias in your variance
> estimates.

A nice idea to Krishna's original poser, which I've been able to simulate
whilst incorporating Roger Newson's suggestion to fit a fixed effects
model to his data. I've followed Wood's (2004) suggestion in running 2000
bootstrapped simulations.

Since, as Roger points out, Krishna doesn't give us any detailed
information on his variables, I've assumed that the response variable in
the dataset simulated below is a uniformly distributed and continuous
variable ranged from 0-100:

. clear

. set more off

. set seed `=date("2005-01-07", "ymd")'

. set obs 360
obs was 0, now 360

. g id=_n

. g group=ceil(uniform()*12)

. tab group

      group |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |         50        6.94        6.94
          2 |         72       10.00       16.94
          3 |         82       11.39       28.33
          4 |         60        8.33       36.67
          5 |         54        7.50       44.17
          6 |         58        8.06       52.22
          7 |         60        8.33       60.56
          8 |         50        6.94       67.50
          9 |         62        8.61       76.11
         10 |         54        7.50       83.61
         11 |         54        7.50       91.11
         12 |         64        8.89      100.00
------------+-----------------------------------
      Total |        720      100.00

. expand 2
(360 observations created)

. g y=uniform()*100

. g x1=uniform()

. g x2=uniform()*5

. g x3=invnorm(uniform())

. by id: gen time=_n

. sort id

. l id group y x1 x2 x3 time in 1/10

     +----------------------------------------------------------------+
     | id   group          y         x1         x2          x3   time |
     |----------------------------------------------------------------|
  1. |  1       6   24.03016   .1445585   .6673999   -1.081646      1 |
  2. |  1       6   90.45777   .6376978   3.680331   -1.410077      2 |
  3. |  2       3   12.22887    .989752    1.70654   -1.028015      1 |
  4. |  2       3   95.63952   .6426014    4.66782    -.621906      2 |
  5. |  3      11   20.20495   .9287896   4.912792   -1.984773      1 |
     |----------------------------------------------------------------|
  6. |  3      11    57.8842   .6628636   3.113226   -.3708619      2 |
  7. |  4      10   50.49711   .3376878   2.095944    .2773025      1 |
  8. |  4      10   12.29717   .9934924   .8407423    .6124281      2 |
  9. |  5      12   88.63429   .2615153   .8085947    .1702638      1 |
 10. |  5      12    31.2398   .1373881    .556083    .4438728      2 |
     +----------------------------------------------------------------+

. tsset id time

. bs "areg y x1 x2 x3, absorb(id) cluster(group)" _b _se, size(360)
reps(2000) saving(kris) dots

command:      areg y x1 x2 x3 , absorb(id) cluster(group)
statistics:   b_x1       = _b[x1]
              b_x2       = _b[x2]
              b_x3       = _b[x3]
              b_cons     = _b[_cons]
              se_x1      = _se[x1]
              se_x2      = _se[x2]
              se_x3      = _se[x3]
              se_cons    = _se[_cons]

[...]

Bootstrap statistics                              Number of obs    =     720
                                                  Replications     =    2000
----------------------------------------------------------------------------
Variable   |  Reps  Observed      Bias  Std. Err. [95% Conf. Interval]
-----------+----------------------------------------------------------------
      b_x1 |  2000 -4.629344 -.0557279  11.65842  -27.49327   18.23458   (N)
           |                                      -28.52434   17.79685   (P)
           |                                       -28.4749   17.84085  (BC)
      b_x2 |  2000  1.275952 -.0159533  2.563146  -3.750765   6.302669   (N)
           |                                      -3.645857   6.343257   (P)
           |                                      -3.524056   6.556383  (BC)
      b_x3 |  2000  .5331025  .1028936  3.621918  -6.570027   7.636232   (N)
           |                                      -6.388688   7.618769   (P)
           |                                      -6.421975   7.581013  (BC)
    b_cons |  2000  49.81735   .001473  8.255104   33.62784   66.00686   (N)
           |                                       33.13453    65.7711   (P)
           |                                       32.77082   65.22783  (BC)
     se_x1 |  2000  8.710757  11.37133   5.44168  -1.961202   19.38272   (N)
           |                                        11.2756    32.2606   (P)
           |                                       5.797402   5.797402  (BC)
     se_x2 |  2000  1.648332  2.705572  1.060601  -.4316664    3.72833   (N)
           |                                       2.447499   6.681056   (P)
           |                                       1.554579   1.554579  (BC)
     se_x3 |  2000  2.375892  3.526444   1.52015  -.6053518   5.357137   (N)
           |                                       3.307063   9.179192   (P)
           |                                       1.835775   1.835775  (BC)
   se_cons |  2000  5.117283  8.791256  3.831319  -2.396513   12.63108   (N)
           |                                       7.339995   22.68564   (P)
           |                                       4.573055   4.573055  (BC)
----------------------------------------------------------------------------
Note:  N   = normal
       P   = percentile
       BC  = bias-corrected

In order to fire up -areg-, I was forced to take the liberty of
-expand-ing the dataset by at least 2 and creating a -time- variable (thus
simulating repeated observations for each individual; I've also assumed
that this panel dataset is balanced). Otherwise, -areg- returns an
"insufficient observations r(2000)" error.

Note that in order to control for fixed effects at different levels, both
-absorb- and -cluster- should be switched on for the _individual_ and
_group_ fixed effects respectively.

Unfortunately, I cannot simulate a dependent variable which induces
heteroscedasticity, but this example should now give Krishna enough
ammunition to solve his dilemma.

CLIVE NICHOLAS        |t: 0(044)7903 397793
Politics              |e: [email protected]
Newcastle University  |http://www.ncl.ac.uk/geps

Reference:

Wood M (2004) "Statistical Inference Using Bootstrap Confidence Intervals"
SIGNIFICANCE 1(4): 180-2.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index