Date   Sat, 20 Sep 2008 10:06:25 +0700

Hi Steve and all,
I think you're correctly recognising my situation: I might have taken the sampling issue wrong so far. 
For additional information, I'm working with a data set from a national longitudinal survey with three age cohorts (young, mids, older) which were randomly re-sampled from Medicare database employing stratified random sampling. 

. svyset [pweight=o1wtarea], strata(o4state)
      pweight: o1wtarea
          VCE: linearized
  Single unit: missing
     Strata 1: o4state
         SU 1: <observations>
        FPC 1: <zero>

I focus on older cohort only at a certain time point (4th survey) and my sample is those with diabetes. My project aims to look at if different patterns of cardiovascular medication use is associated with quality of life (4 dimensions of SF-36). The study design is pretty simple, cross sectional. However, I have received some input that comparison between my sample and the entire in the cohort (older at survey 4) is worth performing. Since it's not a case control study, I thought that comparing those with and without diabetes was inappropriate, leading me to consider using -svy- (which maybe equally or even more inappropriate!). Your suggestion, however, indicates that my previous thought was ok and I perhaps needn't use -svy- at all. Did I take it correctly?

Some of the dependent variables are skewed and -gladder- offers cubic transformation to best approximate normal distribution. If any median test is not fairly robust, is comparing transformed means acceptable in this case? (My concern is that cubic transformation, perhaps unlike log, will inflate type I error). Also, what is the command to perform a back transformation from cubic? (I'm definitely not a maths nerd :)).


On Sep 20, 2008, at 1:11 AM Steven Samuels to statalist wrote:


You've given us very little information about your survey sample and its design. More would have been helpful.

You appear to be misusing the terms "sample" and "population". A "population" is the larger group of people represented by the sample; statistics for a population are known from outside sources such as a census. For example, in the U.S. a sample of 1500 people might represent the population of millions. What you are calling "sample" and "population" appear to be, respectively,  one subgroup of a sample (those with dmstat=1) and the entire sample.

The proper way to compare one subgroup to the whole group is to compare the subgroup to the others. So, form two groups: group = 1 if dmstat =1 and group = 2 if dmstat is not 1 (the rest of the sample).

-pctile- will estimate weighted medians, but the CI's will not be correct, for they assume independent observations. To proceed, you must know the sampling design, including cluster and stratum information. The program -cendif- by Roger Newson (-findit cendif-) will estimate differences in the medians and accommodates sampling weights and clustering. The sign test, in contrast, is for a set of paired independent observations, not for any list of paired numbers.

To do ANOVA, you must first -svyset- your data and use -svy: reg-. There is nothing special about -svy: reg-; ust set up the ANOVA as you would do with ordinary -reg-. To compare individual groups to one another, after the regression  run -test-, with options -mtest(holm)- or -mtest(sidak)-.

Your post shows that you are fairly new to sampling concepts. Before proceeding, I suggest that you look at a good text; I recommend "Sampling Design and Analysis", by Sharon Lohr.  Your faculty may be able to suggest local resources.


