|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Median test & ANOVA with sampling weights
hafida--
1. "Since it's not a case control study, I thought that comparing
those with and without diabetes was inappropriate"
That's not correct. You want to compare diabetics to the whole
population. This is *equivalent* to comparing diabetics to non-
diabetics. There is no stata command which compare part of a sample
with the whole sample, but there are plenty (-cendif-) is one, which
will compare a part to the other part and give you a CI for the
difference.
This is easiest to illustrate with means: Suppose the mean for
diabetics for a variable is 10 and that for non-diabetics is 10. The
difference is zero. If diabetics are 10% of the population, the mean
for the population is (.1 x 10) + (.9 x 10) = 1 + 9 = 10. The
difference between this and the diabetics' mean is also zero. On the
other hand, suppose that the mean for non-diabetics is 20; the
difference from the mean of the diabetics is 10. Then the population
mean is .1 x 10 + .9 x 20 = 1 + 18 =19; the difference from the mean
of diabetics is 9. Notice that the diabetic/population difference is
< diabetic/non-diabetic difference. This is because the d
2. As -cendif- is a rank procedure, you will get the same results for
any transformation. There is no need to transform.
3. If you are uncertain of basic math functions, it is time to
review; you will not be happy in epidemiology without a working
knowledge of back-transformations. To answer your question about the
"cubic": x^3 and x^(1/3) are inverses in Stata (-help operators-).
Not sure what this means? try a google search on: inverse function
introduction.
I strongly suggest that you consult a Biostatistics staff member at
Newcastle.
Good luck!
-Steve
On Sep 19, 2008, at 11:06 PM,
[email protected] wrote:
Hi Steve and all,
I think you're correctly recognising my situation: I might have
taken the sampling issue wrong so far.
For additional information, I'm working with a data set from a
national longitudinal survey with three age cohorts (young, mids,
older) which were randomly re-sampled from Medicare database
employing stratified random sampling.
. svyset [pweight=o1wtarea], strata(o4state)
pweight: o1wtarea
VCE: linearized
Single unit: missing
Strata 1: o4state
SU 1: <observations>
FPC 1: <zero>
I focus on older cohort only at a certain time point (4th survey)
and my sample is those with diabetes. My project aims to look at if
different patterns of cardiovascular medication use is associated
with quality of life (4 dimensions of SF-36). The study design is
pretty simple, cross sectional. However, I have received some input
that comparison between my sample and the entire in the cohort
(older at survey 4) is worth performing. Since it's not a case
control study, I thought that comparing those with and without
diabetes was inappropriate, leading me to consider using -svy-
(which maybe equally or even more inappropriate!). Your suggestion,
however, indicates that my previous thought was ok and I perhaps
needn't use -svy- at all. Did I take it correctly?
Some of the dependent variables are skewed and -gladder- offers
cubic transformation to best approximate normal distribution. If
any median test is not fairly robust, is comparing transformed
means acceptable in this case? (My concern is that cubic
transformation, perhaps unlike log, will inflate type I error).
Also, what is the command to perform a back transformation from
cubic? (I'm definitely not a maths nerd :)).
thanks,
hafida--
On Sep 20, 2008, at 1:11 AM Steven Samuels to statalist wrote:
hafida--
You've given us very little information about your survey sample
and its design. More would have been helpful.
You appear to be misusing the terms "sample" and "population". A
"population" is the larger group of people represented by the
sample; statistics for a population are known from outside sources
such as a census. For example, in the U.S. a sample of 1500 people
might represent the population of millions. What you are calling
"sample" and "population" appear to be, respectively, one subgroup
of a sample (those with dmstat=1) and the entire sample.
The proper way to compare one subgroup to the whole group is to
compare the subgroup to the others. So, form two groups: group = 1
if dmstat =1 and group = 2 if dmstat is not 1 (the rest of the
sample).
-pctile- will estimate weighted medians, but the CI's will not be
correct, for they assume independent observations. To proceed, you
must know the sampling design, including cluster and stratum
information. The program -cendif- by Roger Newson (-findit cendif-)
will estimate differences in the medians and accommodates sampling
weights and clustering. The sign test, in contrast, is for a set of
paired independent observations, not for any list of paired numbers.
To do ANOVA, you must first -svyset- your data and use -svy: reg-.
There is nothing special about -svy: reg-; ust set up the ANOVA as
you would do with ordinary -reg-. To compare individual groups to
one another, after the regression run -test-, with options -mtest
(holm)- or -mtest(sidak)-.
Your post shows that you are fairly new to sampling concepts.
Before proceeding, I suggest that you look at a good text; I
recommend "Sampling Design and Analysis", by Sharon Lohr. Your
faculty may be able to suggest local resources.
-Steve
On Sep 19, 2008, at 7:53 AM,
[email protected] wrote:
I'm using a survey data and wonder how can I perform a
comparison between median in the sample and in the population.
Medians were separately obtained using -pctile- or -_pctile-.
. pctile pctGH = o4gh [pw=o1wtarea], nq(4) genp(percent)
. list percent pct in 1/4
+-----------------+
| percent pctGH |
|-----------------|
1. | 25 50 |
2. | 50 67 |
3. | 75 77 |
4. | . . |
+-----------------+
. pctile pctileGH1 = o4gh if dmstat==1 [pw=o1wtarea], nq(4) genp
(pctGH1)
. list pctGH1 pctileGH1 in 1/4
+------------------+
| pctGH1 pctileGH1 |
|------------------|
1. | 25 40 |
2. | 50 60 |
3. | 75 72 |
4. | . . |
+------------------+
Should I calculate the difference between each value in the
sample and population first and carry out a sign test then? If so,
how is sampling weight taken into account? (I mean, can I use
weighted median in the population to substract each 'unweighted'
value?)
Secondly, is it possible to perform one-way ANOVA with sampling
weight, particularly for post-hoc comparison? Using svy: regress
did not give enough information.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/