Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Comparing multiple means with survey data--revisited
From
Rieza Soelaeman <[email protected]>
To
[email protected]
Subject
st: Comparing multiple means with survey data--revisited
Date
Tue, 29 May 2012 22:37:28 -0500
Dear Stata-Lers,
I need your help in clarifying an earlier point made about testing the
difference between means in survey data (that is, you can't/shouldn't do
this, I have copied the thread at the end of this e-mail). I am trying to
replicate the work of a colleague who left recently. She created a table
where the rows represent levels of one variable, columns represent the
levels of another variable, and the cells contain the mean value of a third
variable for that row/column combination and the number of people in that
group.
Example:
In cells: Mean of Variable A (n)
-----------------------------------------------------------------------------------------------------
Variable B (years)
-----------------------------------------------------------------------------------------------------
Variable C
(months) 5-10 11-15 16-20 Total p-value
-----------------------------------------------------------------------------------------------------
0-9 -1.28 (21) -0.57 (60) -0.36 (75) -0.57 (156) 0.032
10-18 -1.44 (30) -0.92 (47) -1.00 (54) -1.07 (132) 0.15
19-27 -1.95 (64) -1.68 (77) -1.63 (126) -1.72 (268) 0.314
28-36 -1.92 (51) -1.83 (52) -1.72 (104) -1.80 (206) 0.652
37-45 -1.96 (36) -2.01 (61) -1.65 (54) -1.87 (151) 0.107
-----------------------------------------------------------------------------------------------------
Usng -svyset-, I was able to get the same means and ns in each cell, but was
not able to get the same significance level for the difference between the
means--she used SPSS to get the p-values. I suspect this is because I
specified the cluster, stratum, and pweights in my -svyset- command, whereas
the software she used allowed only for the specification of weights (to
specify a complex sampling design in SPSS requires an extension that costs
about $600).
For those who are familiar with SPSS, she used the following syntax after
applying weights, and subsetting for a specific level of VARIABLE_C:
MEANS TABLES= VARIABLE_A BY VARIABLE_B
/CELLS MEAN COUNT STDDEV
/STATISTICS ANOVA.
I believe the equivalent in Stata to get the means and p-values is to use
the following code, but as Steve pointed out in the conversation copied
below from 2009, this is not theoretically correct:
. svy: mean VARIABLE_A if (VARIABLE_C==4), over(VARIABLE_B)
. test [VARIABLE_A]_subpop_1 = [VARIABLE_A]_subpop_2 = [VARIABLE_A]_subpop_3
My question is whether I should be attempting to compare the means using the
-svyset-/-test- commands at all (is what I am trying to do
legitimate), or if I should omit this comparison from my tables?
Thanks,
Rieza
-----------------------------------------------------------------------------------------------------
Re: st: comparing multiple means with survey data
________________________________