FAQ: How does dtable handle survey data?

Home / Resources & support / FAQs / How does dtable handle survey data?

How does dtable handle survey data?

How to generate a table of descriptive statistics for survey data?

Title		How does dtable handle survey data?
Author		Mia Lv, StataCorp

If you are working with survey data that have been svyset previously, generating a table of descriptive statistics for these data is straightforward. Simply use the svy option with dtable. There is no need to respecify the survey weights with dtable. Then all the statistics are calculated using the specified survey weights as applicable, and all the tests are calculated using the full survey settings including clustering and stratification. In this FAQ, we will be discussing statistics and tests separately.

Statistics

When you specify svy with dtable, the default sample frequency statistic is sum of the weights (sumw). If you wish to report the unweighted frequency instead, you can do so by specifying the option sample( , statistics(frequency)). For example, you can have

. webuse nhanes2l, clear
(Second National Health and Nutrition Examination Survey)

. svyset psu [pweight=finalwgt], strata(strata)
(output omitted)

. dtable, svy continuous(age , statistics(mean sd)) continuous(weight , statistics(p50)) 
     factor(sex,statistics(fvfrequency fvproportion))



                 Summary    
 
N                117,157,513
Age (years)  42.253 (15.502)
Weight (kg)           70.420
Sex                         
  Male      56,159,480 0.479
  Female    60,998,033 0.521

. dtable, svy continuous(age , statistics(mean sd)) continuous(weight , statistics(p50))
     factor(sex,statistics(fvfrequency fvproportion)) sample(Frequency,statistics(frequency))



                 Summary    
 
Frequency             10,351
Age (years)  42.253 (15.502)
Weight (kg)           70.420
Sex                         
  Male      56,159,480 0.479
  Female    60,998,033 0.521

We see that the first table reports the sum of the weights and the second one reports the sample size (frequency).

Statistics for continuous and factor variables are computed using the weights previously specified with svyset. This means that we can reproduce these statistics by specifying the weights with dtable and dropping the svy option.

. dtable [pweight=finalwgt], continuous(age , statistics(mean sd)) 
     continuous(weight , statistics(p50)) factor(sex,statistics(fvfrequency fvproportion))



                 Summary    
 
N                117,157,513
Age (years)  42.253 (15.502)
Weight (kg)           70.420
Sex                         
  Male      56,159,480 0.479
  Female    60,998,033 0.521

To see the detailed formulas used to calculate statistics when weights are applied, see Methods and formulas in [R] table.

On the other hand, if your goal is to report descriptive statistics for a subpopulation, you need to specify both the svy and subpop() options with dtable. And you can reproduce all the reported statistics by specifying the weights and if qualifier with dtable; the only exceptions are the variance and sd statistics because these have different formulas for subpopulation estimation.

For example, the following two commands will report identical results for all the statistics except variance and sd.

. dtable, svy subpop(if region==1) continuous(age, statistics(mean variance sd semean)) 
     continuous(weight , statistics(p50)) factor(sex,statistics(fvfrequency fvproportion))



                       Summary           
 
N                              24,237,893
Age (years) 43.185 239.608 (15.479) 0.355
Weight (kg)                        70.420
Sex                                      
  Male                   11,880,038 0.490
  Female                 12,357,855 0.510



. dtable if region==1 [pweight=finalwgt],  continuous(age, statistics(mean variance sd semean)) 
     continuous(weight , statistics(p50)) factor(sex,statistics(fvfrequency fvproportion))



                       Summary           
 
N                              24,237,893
Age (years) 43.185 244.896 (15.649) 0.355
Weight (kg)                        70.420
Sex                                      
  Male                   11,880,038 0.490
  Female                 12,357,855 0.510

The formula of subpopulation variance is documented in Methods and formulas in [R] dtable.

Tests

Please note that the svy option changes the list of tests supported by dtable. For continuous variables, the Kruskal–Wallis rank test (kwallis) is not allowed with svy. As for factor variables, the following tests are disallowed with svy: Fisher's exact test (fisher), likelihood-ratio \(\chi^2\) test (lrchi2), Goodman and Kruskal's gamma (gamma), Kendall's \(\tau\) (kendall), and Cramér's V (cramer). Nevertheless, the survey-adjusted likelihood-ratio test (svylr), survey-adjusted Wald test (svywald), and survey-adjusted log-linear Wald test (svyllwald) are exclusively allowed with svy.

When the svy or subpop() option is specified with dtable, the tests for continuous variables are computed using the prefix svy: or svy, subpop(): with regress, poisson, or gsem. For factor variables, the tests are computed using the prefix svy: or subpop(): with tabulate twoway. Please refer to Methods and formulas in [R] dtable for details. Below, we demonstrate how to reproduce the test results for both continuous and factor variables.

. webuse nhanes2l, clear
(Second National Health and Nutrition Examination Survey)

. svyset psu [pweight=finalwgt], strata(strata)

Sampling weights: finalwgt
             VCE: linearized
     Single unit: missing
        Strata 1: strata
 Sampling unit 1: psu
           FPC 1: <zero>

. dtable, svy subpop(if region==1) continuous(age , test(regress)) continuous(weight, 
     test(poisson))  factor(sex, test(svywald)) by(race,tests nototal)
note: using test regress across levels of race for age.
note: using test poisson across levels of race for weight.
note: using test svywald across levels of race for sex.



                                       Race                          
                   White             Black           Other       Test
 
N           22,970,498 (94.8%) 1,112,539 (4.6%)  154,856 (0.6%)      
Age (years)    43.285 (15.483)  41.626 (15.492) 39.625 (13.223) 0.617
Weight (kg)    71.494 (14.640)  75.437 (16.948) 56.621 (10.332) 0.010
Sex                                                                  
  Male      11,314,500 (49.3%)  499,951 (44.9%)  65,587 (42.4%) 0.079
  Female    11,655,998 (50.7%)  612,588 (55.1%)  89,269 (57.6%)      



. *reproduce the p-value for age

. quietly: svy, subpop(if region==1): regress age i.race

. testparm i.race

Adjusted Wald test

 ( 1)  2.race = 0
 ( 2)  3.race = 0

       F(  2,     6) =    0.52
	    Prob > F =    0.6168

. *reproduce the p-value for weight

. quietly: svy, subpop(if region==1): poisson weight i.race

. testparm i.race

Adjusted Wald test

 ( 1)  [weight]2.race = 0
 ( 2)  [weight]3.race = 0

       F(  2,     6) =   11.14
	    Prob > F =    0.0096

. *reproduce the p-value for sex

. svy, subpop(if region==1): tabulate sex race, wald
(running tabulate on estimation sample)

Number of strata =  7                             Number of obs   =      2,096
Number of PSUs   = 14                             Population size = 24,237,893
                                                  Subpop. no. obs =      2,096
                                                  Subpop. size    = 24,237,893
                                                  Design df       =          7

 White  Black  Other  Total

                       Race           
      Sex  
   
     Male   .4668  .0206  .0027  .4901
   Female   .4809  .0253  .0037  .5099
                                      
    Total   .9477  .0459  .0064      1

Key: Cell proportion

  Wald (Pearson):
    Unadjusted    chi2(2)         =    9.2914
    Adjusted      F(2, 6)         =    3.9820     P = 0.0793

Note: 24 strata omitted because they contain no subpopulation members.

How does dtable handle survey data?

How to generate a table of descriptive statistics for survey data?

Statistics

Tests

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies


		Race
Sex

Male		.4668 .0206 .0027 .4901
Female		.4809 .0253 .0037 .5099

Total		.9477 .0459 .0064 1

Stata/MP4 Annual License (download)

How does dtable handle survey data?

How to generate a table of descriptive statistics for survey data?

Statistics

Tests

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies