Maury Gittleman <[email protected]>:
Just clustering on establishment is probably sufficient.
You can also specify two levels of clustering with -svyset- e.g.
webuse stage5a
svyset su1 [pweight=pw], fpc(fpc1) || su2
where su1 is your establishment id, fpc1 the number of distinct
employees in both years, and su2 is a person id.
Usually the second level of clustering is largely irrelevant. But not always...
svyset su1 [pweight=pw], fpc(fpc1) strat(strat)
svy: reg yreg x?
est sto c1lev
svyset su1 [pw=pw], fpc(fpc1) str(str) || su2, fpc(fpc2)
svy: reg yreg x?
est sto c2lev
esttab *, mti
On 11/8/07, Gittleman, Maury - BLS <[email protected]> wrote:
> Hello,
>
> I'm have a question concerning stata's approach to estimating standard
> errors in the presence of clustered survey data. The survey I'm using
> collects information on individual wages, by first selecting
> establishments at random, and then collecting information on multiple
> workers within each establishment. So, it is clear that, when I'm
> running regressions, I need to cluster on establishment.
>
> My question arises when I use two years of data from the same survey.
> For about 4/5 of the individuals, there will be data for two years, and
> I would expect that the correlation between the errors for any given
> individual will be higher than the correlation between the errors for
> two different individuals at the same establishment. My thinking is
> that I still want to define clusters by establishments, as the variance
> estimation is said to be robust to any arbitrary intra-cluster
> correlation.
>
> Is this the right way to go or is there an alternative approach that
> might be superior?
>
> Thanks very much.
>
> Maury
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/