Dear Statalisters,
I'm working with survey data with strata, clusters and
finite-population-correction variables and Stata 9.1 version.
I want to compare a continuous variable (quol) across ordered levels
of a categorical variable. quol is a measure of health related quality
of life from the Euroquol scale (social tariff), where values are
bounded between 0 and 1 (in fact, some few can be negative). It is not
a proportion, but the highest score you can get in the scale is 1. I
copy the distribution of quol (sorry, I haven't been able to
copy-paste the histogram): It is very left skewed, with a
discontinuity between 0.8 and 1.
tab quol
Calidad de
vida
(Euroquol-t
arifa
social) Freq. Percent Cum.
-,0757 6 0,05 0,05
-,0245 4 0,03 0,08
-,0161 3 0,02 0,11
,014 3 0,02 0,13
,0255 1 0,01 0,14
,0267 10 0,08 0,22
,0351 5 0,04 0,27
,0435 1 0,01 0,27
,0652 8 0,07 0,34
,0767 2 0,02 0,36
,0806 2 0,02 0,37
,0863 7 0,06 0,43
,0947 7 0,06 0,49
,1037 1 0,01 0,50
,1152 2 0,02 0,51
,1164 10 0,08 0,60
,1203 2 0,02 0,61
,1248 13 0,11 0,72
,1279 1 0,01 0,73
,1318 3 0,02 0,75
,1332 1 0,01 0,76
,1413999 1 0,01 0,77
,1459 12 0,10 0,87
,1664001 8 0,07 0,94
,1703 49 0,41 1,34
,1715 1 0,01 1,35
,1748 1 0,01 1,36
,176 6 0,05 1,41
,1799 4 0,03 1,44
,1838 1 0,01 1,45
,1844 1 0,01 1,46
,1869 1 0,01 1,47
,1875 1 0,01 1,48
,1959 1 0,01 1,48
,2009999 1 0,01 1,49
,2049 2 0,02 1,51
,2061 1 0,01 1,52
,2176 15 0,12 1,64
,2215 46 0,38 2,02
,2229 1 0,01 2,03
,2254 2 0,02 2,05
,2260001 6 0,05 2,10
,2299 7 0,06 2,16
,2311 1 0,01 2,16
,233 1 0,01 2,17
,235 1 0,01 2,18
,2356 10 0,08 2,26
,2369 1 0,01 2,27
,2426 2 0,02 2,29
,2471 1 0,01 2,30
,2561 1 0,01 2,31
,26 3 0,02 2,33
,2645 3 0,02 2,36
,2657 1 0,01 2,36
,2676001 3 0,02 2,39
,2715 23 0,19 2,58
,2727 43 0,36 2,94
,2766001 2 0,02 2,95
,2772 11 0,09 3,04
,2842 1 0,01 3,05
,2856001 3 0,02 3,08
,2881 2 0,02 3,09
,2895 1 0,01 3,10
,2946 1 0,01 3,11
,2965 1 0,01 3,12
,3022 1 0,01 3,13
,3061 5 0,04 3,17
,3112 7 0,06 3,23
,3157001 2 0,02 3,24
,3188 4 0,03 3,28
,3196 1 0,01 3,28
,3227 52 0,43 3,72
,3253 2 0,02 3,73
,3266 15 0,12 3,86
,3272001 2 0,02 3,87
,3278 8 0,07 3,94
,3292 1 0,01 3,95
,3311 6 0,05 4,00
,3368 8 0,07 4,06
,3393 3 0,02 4,09
,3477 1 0,01 4,10
,3573 1 0,01 4,10
,3612 12 0,10 4,20
,3624 3 0,02 4,23
,3663 1 0,01 4,24
,3739 55 0,46 4,69
,3747 1 0,01 4,70
,3753 2 0,02 4,72
,3778 52 0,43 5,15
,3784 2 0,02 5,17
,3862 17 0,14 5,31
,3907 3 0,02 5,33
,3989 1 0,01 5,34
,4085 1 0,01 5,35
,4124 19 0,16 5,51
,4163 40 0,33 5,84
,4175 4 0,03 5,87
,4208 7 0,06 5,93
,4253 1 0,01 5,94
,4265 1 0,01 5,95
,429 84 0,70 6,64
,4355 1 0,01 6,65
,438 1 0,01 6,66
,4458 3 0,02 6,68
,4585 13 0,11 6,79
,4636 13 0,11 6,90
,4675 90 0,75 7,65
,4681 3 0,02 7,67
,4759 55 0,46 8,13
,4765 2 0,02 8,14
,493 64 0,53 8,67
,5187 182 1,51 10,18
,5277 2 0,02 10,20
,5355 37 0,31 10,51
,5442 78 0,65 11,15
,5481001 5 0,04 11,19
,5526 16 0,13 11,33
,5827 8 0,07 11,39
,5942 68 0,56 11,96
,5993 17 0,14 12,10
,6038 31 0,26 12,36
,6077 2 0,02 12,37
,6339 12 0,10 12,47
,6378 3 0,02 12,50
,6423 2 0,02 12,51
,6454 104 0,86 13,38
,6493 116 0,96 14,34
,6538 8 0,07 14,40
,6589 9 0,07 14,48
,6839 30 0,25 14,73
,689 8 0,07 14,79
,6935 9 0,07 14,87
,6974 1 0,01 14,88
,7005 307 2,55 17,42
,705 26 0,22 17,64
,7089 26 0,22 17,85
,7351 50 0,41 18,27
,739 406 3,37 21,64
,7435 5 0,04 21,68
,7486 18 0,15 21,83
,7601 149 1,24 23,06
,7902 1.459 12,10 35,16
,7947 44 0,36 35,53
,7986 434 3,60 39,12
1 7.341 60,88 100,00
Total 12.059 100,00
Initially, the analysis would ask for one-way ANOVA (unless skewness
and non-homogeneity of variances can be a problem here) but, in any
case, I don't see tha ANOVA or -kwallis- is supported by -svy-
commands.
-Svy: regress- could be an alternative, inserting the independent
variable without dummies and assessing its significance (strong
skewness can be problematic, too). Nevertheless, I've read in
previous posts (http://www.stata.com/statalist/archive/2004-10/
msg00169.html)
that for bonded 0-1 dependent variables, -glm family(binomial)- or
-betafit- are recommended (although I'm not sure if this applies only
to dependent variables which are proportions, which is not the case).
My questions are:
1) In a non-survey setting, do -glm- or -betafit- have to be used when
the dependent variable is not a proportion but is bounded between 0
and 1? By the way, would you recommend any introductory text
explaining this approach?
2) Can these two approaches be used in survey settings?
3) Any recommendation in case the answer to 1 or 2 is "no"?
Many thanks,
Angel Rodriguez-Laso
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/