Angel Rodriguez-Laso <[email protected]>
> I'm confused with the following results:
>
>
>
> . svyset psu [pweight=weight2007], strata(healtharea)fpc(psusperhealtharea)
>
> pweight: weight2007
> VCE: linearized
> Strata 1: healtharea
> SU 1: psu
> FPC 1: psusperhealtharea
>
> .
> end of do-file
>
> . svy: tab p29, deff deft
> (running tabulate on estimation sample)
>
> Number of strata = 11 Number of obs = 12140
> Number of PSUs = 1266 Population size = 12134,139
> Design df = 1255
>
> -------------------------------------------------
> Any permanent
> disability | proportions deff deft
> ----------+--------------------------------------
> 0, no | ,8887 -1981 ,9783
> 1, yes | ,1113 -1981 ,9783
> |
> Total | 1
> -------------------------------------------------
> Key: proportions = cell proportions
> deff = deff for variances of cell proportions
> deft = deft for variances of cell proportions
>
>
>
>
> Why do I get large negative deff values? Deft resembles more what I
> was expecting, but it should be the square root of deff and obviously
> this is not the case. Do you have any explanation for these results?
Stas Kolenikov <[email protected]> already pointed out that the sampling
weights appear to be normalized by the sample size. In fact, the sum of the
weights is less than the sample size. When the first stage is sampled without
replacement (i.e. the 'fpc()' in the above -svyset-), the 'deff' calculation
is
deff = V_db / (1-n/W) V_srswr
where 'V_db' is the design based variance estimate, 'V_srswr' is simple
randome sample with replacement variance estimate, 'n' is the sample size, and
'W' is an estimate for the population size. Here 'W' is the sum of the
sampling weights. Since Angel's sampling weights are normalized, they cannot
be used to estimate the population size, thus the above 'deff' calculation is
not valid. Without knowing what population size, we can't compute a valid
'deff' statistic.
On the other hand, the 'deft' calculation is
deft = sqrt( V_db / V_srswr )
which does not need an estimate of the population size, and thus will always
produce a valid value.
We will look into changing -svy: tabulate- and -estat effects- to report
missing values for 'deff' in the case where the 'W' calculation is less than
or equal to 'n'.
--Jeff
[email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/