Home  /  Resources & support  /  FAQs  /  Combine multiple tables obtained with table or dtable using collect

Combine multiple tables obtained with table or dtable using collect

Title   Combine multiple tables obtained with table or dtable using collect
Author Mia Lv, StataCorp
Chris Cheng, StataCorp

Introduction

You may already be familiar with the new table command, but you are wondering how to combine multiple different tables obtained with table into one table (by appending them vertically or horizontally). For example, you may want to combine

. sysuse auto
(1978 automobile data)

. table (rep78), statistic(mean price) statistic(sd price)

Mean Standard deviation
Repair record 1978
1 4564.5 522.5519
2 5967.625 3579.357
3 6429.233 3525.14
4 6071.5 1709.608
5 5913 2615.763
Total 6146.043 2912.44

and

. table (foreign), statistic(mean price) statistic(sd price)

Mean Standard deviation
Car origin
Domestic 6072.423 3097.104
Foreign 6384.682 2621.915
Total 6165.257 2949.496

into one table:

Mean Standard deviation
Repair record 1978
1 4564.5 522.5519
2 5967.625 3579.357
3 6429.233 3525.14
4 6071.5 1709.608
5 5913 2615.763
Total 6146.043 2912.44
Car origin
Domestic 6072.423 3097.104
Foreign 6384.682 2621.915
Total 6165.257 2949.496

An incorrect way

Before we see the solution, I want to first show you a (commonly used) wrong way to combine two tables. Many users thought we could just use table with two variables in the row dimension because the final table has two row variables:

. table (foreign rep78), statistic(mean price) statistic(sd price)

However, this syntax will generate a different table using crossed categories between foreign and rep78 instead of including them independently. This syntax is not wrong; however, it just does not fit our purpose of appending two tables together.

Mean Standard deviation
Car origin
Domestic
Repair record 1978
1 4564.5 522.5519
2 5967.625 3579.357
3 6607.074 3661.267
4 5881.556 1592.019
5 4204.5 311.8341
Total 6179.25 3188.969
Foreign
Repair record 1978
3 4828.667 1285.613
4 6261.444 1896.092
5 6292.667 2765.629
Total 6070.143 2220.984
Total
Repair record 1978
1 4564.5 522.5519
2 5967.625 3579.357
3 6429.233 3525.14
4 6071.5 1709.608
5 5913 2615.763
Total 6146.043 2912.44

we can see that the rows are identified by two variables jointly while rep78 is nested within foreign.

Example 1a: Combine two tables vertically

Let’s continue with the example at the beginning of this FAQ, which appends two tables vertically. The appropriate way to generate the combined table is to specify the append option with each table command except the first one and then use collect layout to include all the categorical variables as the row tags.

Here is the code to generate the combined table that we are looking for:

. sysuse auto, clear
(1978 automobile data)

. collect clear

. table (rep78), statistic(mean price) statistic(sd price)

. table (foreign), statistic(mean price) statistic(sd price) append

. collect layout (rep78 foreign) (result)

The output table looks like

Mean Standard deviation
Repair record 1978
1 4564.5 522.5519
2 5967.625 3579.357
3 6429.233 3525.14
4 6071.5 1709.608
5 5913 2615.763
Total 6146.043 2912.44
Car origin
Domestic 6072.423 3097.104
Foreign 6384.682 2621.915
Total 6165.257 2949.496

Let us explain how this code works.

. collect clear

deletes anything that we previously saved in the collection.

. table (rep78), statistic(mean price) statistic(sd price)

creates the first part of the combined table in a collection called Table.

. table (foreign), statistic(mean price) statistic(sd price) append

creates the second part of the combined table also in the collection Table (default for the command table). If append was not specified here, it would clear all the elements saved previously in this collection. But if append is specified, it will not clear those elements. Instead, it will append any new elements to the existing collection Table.

After the second table command, the collect layout specification shows only the second table by default. We can type collect layout to double-check the current setting.

. collect layout

Collection: Table
      Rows: foreign
   Columns: result
   Table 1: 4 x 2

Mean Standard deviation
Car origin
Domestic 6072.423 3097.104
Foreign 6384.682 2621.915
Total 6165.257 2949.496

We see the rows only include the levels of foreign. Now let’s add rep78 to the rows using the following syntax:

. collect layout (rep78 foreign) (result)

This above syntax allows rep78 (all the levels) and foreign (all the levels) to be used to identify the table rows. And we do not need to change other specifications such as columns or the collection name.

On the other hand, if we want to include only the result of certain levels of any categorical variable instead of all the levels in the combined table, we can do that by including the levels we want in the row specification. Here is an example:

. collect layout (rep78[2 3 4] foreign[1]) (result)

Collection: Table
      Rows: rep78[2 3 4] foreign[1]
   Columns: result
   Table 1: 6 x 2

Mean Standard deviation
Repair record 1978
2 5967.625 3579.357
3 6429.233 3525.14
4 6071.5 1709.608
Car origin
Foreign 6384.682 2621.915

If you want to hide the variable label Car origin in the table row header for the variable foreign, you can use the following command:

. collect style header foreign, title(hide)

And here is the output table:

. collect layout

Collection: Table
      Rows: rep78[2 3 4] foreign[1]
   Columns: result
   Table 1: 5 x 2

Mean Standard deviation
Repair record 1978
2 5967.625 3579.357
3 6429.233 3525.14
4 6071.5 1709.608
Foreign 6384.682 2621.915

Example 1b: Combine multiple tables vertically using a loop

This method also works if you are appending more than two tables into one. Here is an example:

. webuse nhanes2l, clear
(Second National Health and Nutrition Examination Survey)

. collect clear

. local varlist highlead highbp agegrp hlthstat heartatk diabetes

. foreach var of local varlist {
     table (`var'), statistic(frequency) statistic(percent) name(Table) append totals(`var')
     }

. collect layout (`varlist') (result)

Frequency Percent
High lead level
lead<25 4,655 94.08
lead>=25 293 5.92
High blood pressure
0 5,975 57.72
1 4,376 42.28
Age group
20–29 2,320 22.41
30–39 1,622 15.67
40–49 1,272 12.29
50–59 1,291 12.47
60–69 2,860 27.63
70+ 986 9.53
Health status
Excellent 2,407 23.29
Very good 2,591 25.07
Good 2,938 28.43
Fair 1,670 16.16
Poor 729 7.05
Prior heart attack
No heart attack 9,873 95.40
Had heart attack 476 4.60
Diabetes status
Not diabetic 9,850 95.18
Diabetic 499 4.82

You may notice that in this example, append is specified with every table command, even the first. This is because we are using a for loop, and we make every table command have the same options. Although append is not necessary with the first table command, it will not affect anything because we start from an empty collection (we called collect clear before).

By the way, this example aims to show you how to combine multiple tables into one using a loop. However, this type of descriptive table may be created much more easily using one dtable command. For example,

. webuse nhanes2l, clear
(Second National Health and Nutrition Examination Survey)

. dtable, factor(highlead highbp agegrp hlthstat heartatk diabetes)

Summary
N 10,351 High lead level lead<25 4,655 (94.1%) lead>=25 293 (5.9%) High blood pressure 0 5,975 (57.7%) 1 4,376 (42.3%) Age group 20–29 2,320 (22.4%) 30–39 1,622 (15.7%) 40–49 1,272 (12.3%) 50–59 1,291 (12.5%) 60–69 2,860 (27.6%) 70+ 986 (9.5%) Health status Excellent 2,407 (23.3%) Very good 2,591 (25.1%) Good 2,938 (28.4%) Fair 1,670 (16.2%) Poor 729 (7.1%) Prior heart attack No heart attack 9,873 (95.4%) Had heart attack 476 (4.6%) Diabetes status Not diabetic 9,850 (95.2%) Diabetic 499 (4.8%)

Please note that dtable does not align each statistic like table does. Instead, it bundles all the statistics in the same cell and aligns the combined statistics as a whole. That is because dtable often deals with a mixture of different statistics to accommodate different variable types. However, if you want to change this style, you can further customize the table using the collect suite of commands after you call dtable. In this case, you can use the command

. collect layout (var) (result[fvfrequency fvpercent])

Example 2a: Combine two-way tables horizontally

Let’s say you want to combine

Birthweight<2500g
0 1
Race
White 73 23
Black 15 11
Other 42 25

and

Smoked during pregnancy
Nonsmoker Smoker
Race
White 44 52
Black 16 10
Other 55 12

into the following table:

Birthweight<2500g Smoked during pregnancy
0 1 Nonsmoker Smoker
Race
White 73 23 44 52
Black 15 11 16 10
Other 15 11 16 10

It contains two side-by-side tables, which are combined horizontally. How can we do that?

You can use the similar method we adopted in example 1a. In the collect layout command, instead of specifying two row tags, you should specify two column tags. Here is the code to generate the above (combined) table:

. clear all

. webuse lbw
(Hosmer & Lemeshow data)

. table race low, nototal
. table race smoke, nototal append
. collect layout (race) (low smoke)

As an alternative, you can also use dtable and collect to generate the same table:

. clear all

. webuse lbw
(Hosmer & Lemeshow data)

. dtable, 
     by(race, nototals)
     nosample
     factor(low smoke, statistics( fvfrequency))
     style(table)
. collect layout (race#result) (var)

dtable generates a descriptive table that is the transpose of our target table. Then we need to transpose that descriptive table by switching the order of row tags and column tags in collect layout to generate the table we want. For more detailed information about how to transpose a table, please see FAQ: How do you transpose a table generated with collect/table/dtable/etable?

By the way, dtable has a different default style than table. Here I specified the option style(table) with dtable to force it to have the same style as table so that the generated table will look exactly the same as the above combined table generated with table and collect.

Example 2b: Combine three one-way tables horizontally

Except for the above situations where we want to combine different tables with a joint row variable (horizontally), sometimes we want to combine tables with different row variables. For example, we want to combine these three one-way frequency tables:

. table var1

Frequency
var1
1 7
2 7
3 5
Total 19
. table var2

Frequency
var2
1 8
2 5
3 2
Total 15
. table var3

Frequency
var3
1 2
2 5
3 6
Total 13

into

Frequency
var1 var2 var3
1 7 8 2
2 7 5 5
3 5 2 6
Total 19 15 13

It makes sense to combine them because var1var3 all have the same set of unique values (1,2,3).

When we have more than one command to collect items into a collection (we need to specify the append option to avoid clearing up the previously saved items), there is a dimension called cmdset tracking different items saved by different commands. We can list all of its levels by

. collect levelsof cmdset

Now we are thinking about how to utilize that dimension to define the columns in the final table. Our first try is

. clear all
. set obs 20
. set seed 7011
. generate var1 = runiformint(1,3) if uniform()<0.8
. generate var2 = runiformint(1,3) if uniform()<0.8
. generate var3 = runiformint(1,3) if uniform()<0.8
. table var1,
. table var2, append
. table var3, append
. collect layout (var1 var2 var3) (cmdset)

However, the output table looks like

Collection: Table
      Rows: var1 var2 var3
   Columns: cmdset
   Table 1: 15 x 3

Command results index
1 2 3
var1
1 7
2 7
3 5
Total 19
var2
1 8
2 5
3 2
Total 15
var3
1 2
2 5
3 6
Total 13

Now we find that the table has nine rows instead of three rows as expected. This is because Stata sees var1[1], var2[1], and var3[1] as three different tags. Let’s fix that using collect remap and adjust the format as well.

. collect remap var2 = var1 
. collect remap var3 = var1
. collect style header var1, title(hide)
. collect style header cmdset, title(hide)
. collect label levels cmdset 1 "var1", modify
. collect label levels cmdset 2 "var2", modify
. collect label levels cmdset 3 "var3", modify
. collect layout (var1) (cmdset)

After collect remap, the dimensions var2 and var3 all become var1, so we can use var1 alone to define the rows for the table. The output table becomes

var1 var2 var3
1 7 8 2
2 7 5 5
3 5 2 6
Total 19 15 13

That’s the exact table we want to generate.

Example 2c: Combine two two-way tables vertically

The remap strategy in example 2b applies to circumstances where we want to combine tables with different row or column variables and those variables have the same set of values.

Now let’s go back to example 2a. What if we would like to put the two two-way tables

Birthweight<2500g
0 1
Race
White 73 23
Black 15 11
Other 42 25

and

Smoked during pregnancy
Nonsmoker Smoker
Race
White 44 52
Black 16 10
Other 55 12

atop one another instead of side-by-side, given both low and smoke are binary? We can use the following code:

. clear all

. webuse lbw
(Hosmer & Lemeshow data)

. table race low, nototal
. table race smoke, nototal append
. collect remap smoke = low
. collect layout (cmdset#race) (low)
. collect style header low cmdset, title(hide)
. collect label levels cmdset 1 Birthweight<2500g 2 Smoked during pregnancy
. collect layout

This table has race nested within cmdset on rows, and low defines the columns (race is remapped to low).

The output table looks like

0 1
Birthweight<2500g
Race
White 73 23
Black 15 11
Other 42 25
Smoked during pregnancy
Race
White 44 52
Black 16 10
Other 55 12

Example 3a: Combine multiple tables generated by dtable using collect combine

In the following example, we generate three descriptive tables for the same variable in three different subpopulation groups using dtable.

. dtable i.foreign mpg if rep78==3

Summary
N 30
Car origin
Domestic 27 (90.0%)
Foreign 3 (10.0%)
Mileage (mpg) 19.433 (4.141)
. dtable i.foreign mpg if rep78==4

Summary
N 18
Car origin
Domestic 9 (50.0%)
Foreign 9 (50.0%)
Mileage (mpg) 21.667 (4.935)
. dtable i.foreign mpg if rep78==5

Summary
N 11
Car origin
Domestic 2 (18.2%)
Foreign 9 (81.8%)
Mileage (mpg) 27.364 (8.732)

We want to combine them into one big table. However, we cannot use append because this option is not available with dtable. Instead, we can save them in different collections first (by specifying the option name() with dtable to overwrite the default collection name DTable) and then combine them using collect combine. When we run collect combine, Stata will create a dimension called collection in the combined collection to mark which collection each item is originally from. Then we can use the dimension collection to define the rows or columns in our table.

Here is the code:

. clear all

. sysuse auto
(1978 automobile data)

. dtable i.foreign mpg if rep78==3, name(a1)
. dtable i.foreign mpg if rep78==4, name(a2)
. dtable i.foreign mpg if rep78==5, name(a3)
. collect combine all = a1 a2 a3
. *change the display format for means and sds
. collect style cell result[mean sd], nformat(%8.2fc)
. collect layout (collection#var) (result)

Now we can specify the layout by having var nested in the three collections on rows, while result is put on the columns. In this way, we obtain a long table.

Summary
a1 N 30 Car origin Domestic 27 (90.0%) Foreign 3 (10.0%) Mileage (mpg) 19.43 (4.14) a2 N 18 Car origin Domestic 9 (50.0%) Foreign 9 (50.0%) Mileage (mpg) 21.67 (4.93) a3 N 11 Car origin Domestic 2 (18.2%) Foreign 9 (81.8%) Mileage (mpg) 27.36 (8.73)

Another possible layout is to have result nested within collection on the columns. Then we combine the three tables horizontally, and we obtain a wide table.

. collect layout (var) (collection#result)

Collection: all
      Rows: var
   Columns: collection#result
   Table 1: 5 x 3

a1 a2 a3
Summary Summary Summary
N 30 18 11 Car origin Domestic 27 (90.0%) 9 (50.0%) 2 (18.2%) Foreign 3 (10.0%) 9 (50.0%) 9 (81.8%) Mileage (mpg) 19.43 (4.14) 21.67 (4.93) 27.36 (8.73)

Example 3b: Combine tables generated by dtable and table

Occasionally, we are going to need to combine tables generated by dtable and table. For example, we want to generate a descriptive table for a set of variables, including continuous variables and categorical variables. More specifically, we need to obtain means and sds for continuous variables and obtain frequencies and percentages by rural for all the categorical variables. In this case, dtable can take care of the statistics for continuous variables. However, dtable cannot obtain factor percentages by another variable, which can be done by table. In this situation, we need to run dtable for continuous variables and table for categorical variables and then combine two tables together. Here is the complete code:

. clear all

. webuse nhanes2l
(1978 automobile data)

. * continuous variable part
. dtable height weight bpsystol albumin vitaminc zinc copper, by(rural, nototal)
. * factor variable part
. local vlist highlead highbp agegrp hlthstat heartatk diabetes
. foreach var of local vlist { 
     table (`var') (rural), statistic(frequency) statistic(percent, across(rural)) name(DTable) append nototal
     }
. collect style header highlead highbp agegrp hlthstat heartatk diabetes, title(label)
. collect layout (var highlead highbp agegrp hlthstat heartatk diabetes) (rural#result[_dtable_stats])

The output table looks like

Rural
Urban Rural
N 6,548 (63.3%) 3,803 (36.7%) Height (cm) 167.457 (9.761) 167.984 (9.465) Weight (kg) 71.427 (15.363) 72.708 (15.314) Systolic blood pressure 130.460 (23.526) 131.607 (22.980) Serum albumin (g/dL) 4.665 (0.333) 4.677 (0.328) Serum vitamin C (mg/dL) 1.066 (0.592) 0.981 (0.558) Serum zinc (mcg/dL) 86.399 (14.506) 86.691 (14.432) Serum copper (mcg/dL) 126.054 (33.126) 124.860 (31.469) High lead level lead<25 2,902 (62.3%) 1,753 (37.7%) lead>=25 209 (71.3%) 84 (28.7%) High blood pressure 0 3,798 (63.6%) 2,177 (36.4%) 1 2,750 (62.8%) 1,626 (37.2%) Age group 20–29 1,631 (70.3%) 689 (29.7%) 30–39 981 (60.5%) 641 (39.5%) 40–49 777 (61.1%) 495 (38.9%) 50–59 810 (62.7%) 481 (37.3%) 60–69 1,743 (60.9%) 1,117 (39.1%) 70+ 606 (61.5%) 380 (38.5%) Health status Excellent 1,609 (66.8%) 798 (33.2%) Very good 1,713 (66.1%) 878 (33.9%) Good 1,878 (63.9%) 1,060 (36.1%) Fair 950 (56.9%) 720 (43.1%) Poor 389 (53.4%) 340 (46.6%) Prior heart attack No heart attack 6,272 (63.5%) 3,601 (36.5%) Had heart attack 275 (57.8%) 201 (42.2%) Diabetes status Not diabetic 6,233 (63.3%) 3,617 (36.7%) Diabetic 314 (62.9%) 185 (37.1%)

In the above code, we have specified the option append with each table command to append the statistics we compute using table to the same collection that was created by the dtable command before (we must specify the collection name using name(DTable); otherwise, it will use the default name Table). If the append option is not specified, it will first clear the collection and then save its results.

You may wonder why we use the result tag result[_dtable_stats] in the final layout. This is because _dtable_stats is a composite result, which includes frequency, percent, mean, and sd, which are all the statistics involved in this table. On each row, these four statistics will not be all available together, but any available ones will be displayed. This composite result is generated by the first dtable command. You can check all the elements of this composite result by typing

. collect query composite _dtable_stats

Composite definition
Collection: DTable
 Composite: _dtable_stats
  Elements: frequency
            percent
            mean
            sd
 Delimiter: " "
      Trim: on
  Override: off

Read more about tables in the Customizable Tables and Collected Results Reference Manual.