|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Median test & ANOVA with sampling weights
I meant to finish the sentence in the second paragraph:
"This is because the mean for the population will be closer to that
of the diabetics because it contains a contribution from the diabetics."
-Steve
On Sep 21, 2008, at 12:53 PM, Steven Samuels wrote:
hafida--
1. "Since it's not a case control study, I thought that comparing
those with and without diabetes was inappropriate"
That's not correct. You want to compare diabetics to the whole
population. This is *equivalent* to comparing diabetics to non-
diabetics. There is no stata command which compare part of a
sample with the whole sample, but there are plenty (-cendif-) is
one, which will compare a part to the other part and give you a CI
for the difference.
This is easiest to illustrate with means: Suppose the mean for
diabetics for a variable is 10 and that for non-diabetics is 10.
The difference is zero. If diabetics are 10% of the population,
the mean for the population is (.1 x 10) + (.9 x 10) = 1 + 9 =
10. The difference between this and the diabetics' mean is also
zero. On the other hand, suppose that the mean for non-diabetics
is 20; the difference from the mean of the diabetics is 10. Then
the population mean is .1 x 10 + .9 x 20 = 1 + 18 =19; the
difference from the mean of diabetics is 9. Notice that the
diabetic/population difference is < diabetic/non-diabetic
difference. This is because the d
2. As -cendif- is a rank procedure, you will get the same results
for any transformation. There is no need to transform.
3. If you are uncertain of basic math functions, it is time to
review; you will not be happy in epidemiology without a working
knowledge of back-transformations. To answer your question about
the "cubic": x^3 and x^(1/3) are inverses in Stata (-help
operators-). Not sure what this means? try a google search on:
inverse function introduction.
I strongly suggest that you consult a Biostatistics staff member at
Newcastle.
Good luck!
-Steve
On Sep 19, 2008, at 11:06 PM,
[email protected] wrote:
Hi Steve and all,
I think you're correctly recognising my situation: I might have
taken the sampling issue wrong so far.
For additional information, I'm working with a data set from a
national longitudinal survey with three age cohorts (young, mids,
older) which were randomly re-sampled from Medicare database
employing stratified random sampling.
. svyset [pweight=o1wtarea], strata(o4state)
pweight: o1wtarea
VCE: linearized
Single unit: missing
Strata 1: o4state
SU 1: <observations>
FPC 1: <zero>
I focus on older cohort only at a certain time point (4th survey)
and my sample is those with diabetes. My project aims to look at
if different patterns of cardiovascular medication use is
associated with quality of life (4 dimensions of SF-36). The study
design is pretty simple, cross sectional. However, I have received
some input that comparison between my sample and the entire in the
cohort (older at survey 4) is worth performing. Since it's not a
case control study, I thought that comparing those with and
without diabetes was inappropriate, leading me to consider using -
svy- (which maybe equally or even more inappropriate!). Your
suggestion, however, indicates that my previous thought was ok and
I perhaps needn't use -svy- at all. Did I take it correctly?
Some of the dependent variables are skewed and -gladder- offers
cubic transformation to best approximate normal distribution. If
any median test is not fairly robust, is comparing transformed
means acceptable in this case? (My concern is that cubic
transformation, perhaps unlike log, will inflate type I error).
Also, what is the command to perform a back transformation from
cubic? (I'm definitely not a maths nerd :)).
thanks,
hafida--
On Sep 20, 2008, at 1:11 AM Steven Samuels to statalist wrote:
hafida--
You've given us very little information about your survey sample
and its design. More would have been helpful.
You appear to be misusing the terms "sample" and "population". A
"population" is the larger group of people represented by the
sample; statistics for a population are known from outside sources
such as a census. For example, in the U.S. a sample of 1500 people
might represent the population of millions. What you are calling
"sample" and "population" appear to be, respectively, one
subgroup of a sample (those with dmstat=1) and the entire sample.
The proper way to compare one subgroup to the whole group is to
compare the subgroup to the others. So, form two groups: group = 1
if dmstat =1 and group = 2 if dmstat is not 1 (the rest of the
sample).
-pctile- will estimate weighted medians, but the CI's will not be
correct, for they assume independent observations. To proceed, you
must know the sampling design, including cluster and stratum
information. The program -cendif- by Roger Newson (-findit
cendif-) will estimate differences in the medians and accommodates
sampling weights and clustering. The sign test, in contrast, is
for a set of paired independent observations, not for any list of
paired numbers.
To do ANOVA, you must first -svyset- your data and use -svy: reg-.
There is nothing special about -svy: reg-; ust set up the ANOVA as
you would do with ordinary -reg-. To compare individual groups to
one another, after the regression run -test-, with options -mtest
(holm)- or -mtest(sidak)-.
Your post shows that you are fairly new to sampling concepts.
Before proceeding, I suggest that you look at a good text; I
recommend "Sampling Design and Analysis", by Sharon Lohr. Your
faculty may be able to suggest local resources.
-Steve
On Sep 19, 2008, at 7:53 AM,
[email protected] wrote:
I'm using a survey data and wonder how can I perform a
comparison between median in the sample and in the population.
Medians were separately obtained using -pctile- or -_pctile-.
. pctile pctGH = o4gh [pw=o1wtarea], nq(4) genp(percent)
. list percent pct in 1/4
+-----------------+
| percent pctGH |
|-----------------|
1. | 25 50 |
2. | 50 67 |
3. | 75 77 |
4. | . . |
+-----------------+
. pctile pctileGH1 = o4gh if dmstat==1 [pw=o1wtarea], nq(4)
genp(pctGH1)
. list pctGH1 pctileGH1 in 1/4
+------------------+
| pctGH1 pctileGH1 |
|------------------|
1. | 25 40 |
2. | 50 60 |
3. | 75 72 |
4. | . . |
+------------------+
Should I calculate the difference between each value in the
sample and population first and carry out a sign test then? If so,
how is sampling weight taken into account? (I mean, can I use
weighted median in the population to substract each 'unweighted'
value?)
Secondly, is it possible to perform one-way ANOVA with
sampling weight, particularly for post-hoc comparison? Using svy:
regress did not give enough information.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/