FAQ: A comparison of different tests for trend

Home / Resources & support / FAQs / A comparison of different tests for trend

Note: This FAQ is for Stata 16 and older versions.

As of Stata 17, the nptrend command performs four different nonparametric tests for trend: the Cochran-Armitage test, the Jonckheere-Terpstra test, the linear-by-linear trend test, and a test using ranks developed by Cuzick.

This question was originally posed on Statalist and answered by several users and StataCorp’s Bill Sribney. Bill’s answer, with slight editing, appears below.

Does Stata provide a test for trend?

Title		A comparison of different tests for trend
Author		William Sribney, StataCorp

Let me make a bunch of comments comparing SAS PROC FREQ, Pearson’s correlation, Patrick Royston’s ptrend command, linear regression, logit/probit regression, Stata’s vwls command, and Stata's nptrend command.

Tests for trend in 2 x r tables

Let me use Les Kalish’s example:

                  Outcome 



  group    Good   Better    Best 

    y_i    a_1=1    a_2=2    a_3=3

        

                           

   y_1=0    19       31       67

           n_11     n_12     n_13

                           

   y_2=1     1        5       21

           n_21     n_22     n_23

                           


            20       36       88
            n_+1     n_+2     n_+3

PROC FREQ

When I used SAS in grad school to analyze these data, we used

    PR0C FREQ DATA=...
         TABLES GROUP*OUTCOME / CHISQ CMH SCORES=TABLE

The test statistic was shown on the output as


                               DF      Value       Prob

                                           

Mantel-Haenszel Chi-Square      1      4.515      0.034

The test statistic is not a Mantel–Haenszel—at least not according to what I learned a Mantel–Haenszel statistic is (from Gary Koch at UNC—note that any errors here, I should add, are those of this student, not of this great researcher/teacher).

Dr. Koch called this chi-squared statistic Qs, where s stands for score.

Chi-squared statistic for trend Qs

Let me express Qs in terms of a simpler statistic, T:

T = (sum over group i)(sum over outcome j) n_ij * a_j * y_i

The a_j are scores; here 1, 2, 3, but there can be other choices for the scores (I’ll get to this later).

Under the null hypothesis there is no association between group and outcome, so we can consider the permutation (i.e., randomization) distribution of T. That is, we fix the margins of the table, just as we do for Fisher’s exact test, and then consider all the possible permutations that give these same marginal counts.

Under this null hypothesis permutation distribution, it is easy to see that the mean of T is

E(T) = N * a_bar * y_bar

where a_bar is the weighted average of a_j (using the marginal counts n_+j):

a_bar = (sum over j) n_+j * a_j / N

Similarly, y_bar is a weighted average of y_i.

The variance of T, under the permutation distribution, is (exactly)

V(T) = (N - 1) * S_a² * S_y²

where S_a² is the standard deviation squared for a_j:

S_a² = (1/(N-1)) * (sum over j) n_+j * (a_j - a_bar)²

We can compute a chi-squared statistic:

Qs = (T - E(T))² / V(T)

If you look at the formula for Qs, you see something interesting. It is simply

Qs = (N - 1) * r_ay²

where r_ay is Pearson’s correlation coefficient for a and y.

Just Pearson’s correlation

This “test of trend” is nothing more than Pearson’s correlation coefficient.

Let’s try this.

. list



      y   a   weight 

 

  1.  0   1       19 

  2.  0   2       31 

  3.  0   3       67 

  4.  1   1        1 

  5.  1   2        5 

 

  6.  1   3       21 


    
. corr y a [fw=weight]
(obs=144)

                      y        a

   

           y     1.0000

           a     0.1777   1.0000


. return list

scalars:
                  r(N) =  144
                r(rho) =  .1776868721791401

matrices:
                  r(C) :  2 x 2


. display (r(N)-1)*r(rho)^2
4.5148853

PROC FREQ gave chi-squared = 4.515.

Royston’s ptrend and the Cochran–Armitage test

Let’s now use Patrick Royston’s ptrend command. Patrick posted his ptrend command on Statalist. The data must look like the following for this command:

. list
    


      a   n1   n2 

 

  1.  1   19    1 

  2.  2   31    5 

  3.  3   67   21 



. ptrend n1 n2 a

            n1         n2       _prop          a  
  1.        19          1       0.950          1  
  2.        31          5       0.861          2  
  3.        67         21       0.761          3  

Trend analysis for proportions

Regression of p = n1/(n1+n2) on a:
Slope = -.09553, std. error =   .0448, Z =   2.132

Overall chi2(2) =         4.551,  pr>chi2 = 0.1027
Chi2(1) for trend =       4.546,  pr>chi2 = 0.0330
Chi2(1) for departure =   0.004,  pr>chi2 = 0.9467

The “Chi2(1) for trend” is slightly different. It’s 4.546 rather than 4.515.

Well, ptrend is just using N rather than N − 1 in the formula:

Qtrend = Chi2(1) for trend = N * r_ay²

Let’s go back to data arranged for the corr computation and show this.

. quietly corr y a [fw=weight]
    
. display r(N)*r(rho)^2
4.5464579

Qtrend is just Pearson’s correlation again. A regression is performed here to compute the slope, and the test of slope = 0 is given by the Qtrend statistic. This is just the relationship between Pearson’s correlation and regression.

Qdeparture (="Chi2(1) for departure" as Royston’s output nicely labels it) is the statistic for the Cochran–Armitage test. But Qtrend and Qdeparture are usually performed at the same time, so lumping them together under the name “Cochran–Armitage” is sometimes loosely done.

The null hypothesis for the Cochran–Armitage test is that the trend is linear, and the test is for “departures” from linearity; i.e., it’s simply a goodness-of-fit test for the linear model.

Qs (or equivalently Qtrend) tests the null hypothesis of no association. Since it’s just a Pearson’s correlation, we know that it’s powerful against alternative hypotheses of monotonic trend, but it’s not at all powerful against curvilinear (or other) associations with a 0 linear component.

Model it

Rich Goldstein recommended logistic regression. Regression is certainly a better context to understand what you are doing—rather than all these chi-squared tests that are simply Pearson’s correlations or goodness-of-fit tests under another name. Since Pearson’s correlation is equivalent to a regression of y on “a”, why not just do the regression

. regress y a [fw=weight]

      Source         SS       df       MS              Number of obs =     144

              F(  1,   142) =    4.63

       Model    .692624451     1  .692624451            Prob > F      =  0.0331

    Residual    21.2448755   142    .1496118            R-squared     =  0.0316

              Adj R-squared =  0.0248

       Total       21.9375   143  .153409091            Root MSE      =   .3868




           y        Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

   

           a     .0955344   .0444011     2.15   0.033     .0077618     .183307

       _cons    -.0486823   .1144041    -0.43   0.671    -.2748375     .177473

But recall that y is a 0/1 variable. Heck, wouldn’t you be laughed at by your colleagues if you presented this result? They’d say, “Don’t ya know anything, you dummy, you should be using logit/probit for a 0/1 dependent variable!” But call these same results a “chi-squared test for linear trend” and, oh wow, instant respectability. Your colleagues walk away thinking how smart you are and jealous about all those special statistical tests you know.

I guess it sounds as if I’m agreeing fully with Rich Goldstein’s recommendation for logit (or probit, which Rich didn’t mention). I’ll say some redeeming things about Pearson’s correlation (hey, let’s call it what it really is) below, but for now, let me give you another modeling alternative.

Try the vwls command

One can do a little better using the command vwls (variance-weighted least squares) in Stata rather than regress

. vwls y a [fw=weight]
    
Variance-weighted least-squares regression           Number of obs   =     144
Goodness-of-fit chi2(1)    =    0.01                 Model chi2(1)   =    7.79
Prob > chi2                =  0.9359                 Prob > chi2     =  0.0053



           y        Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]

   

           a     .0944406   .0338345     2.79   0.005     .0281261     .160755

       _cons    -.0459275   .0758028    -0.61   0.545    -.1944984    .1026433

The “test for linear trend” is again the test of the coefficient of a = 0. It also gives you a goodness-of-fit test. The linear model is the same as regress, but the weighting is a little different. The weights are 1/Var(y|a). Now

	Var(y|a_j) = p_j * (1 - p_j) / n_+j

where p_j = n_2j / n_+j (recall n_2j = #(y=1) for outcome j).

Essentially, this means you are downweighting (relative to regress) points a_j that have p_j close to 0 or 1. regress with weights n_ij puts too much weight on these points.

If you are determined to fit a linear model for y vs. a_j, I believe vwls is a better way to do it.

vwls can do more. y can be a continuous variable, too. The regressors x1, x2, x3, ... can be any variables as long as for each unique value of the combination of x1, x2, x3, ... there are enough points to reasonably estimate the variance of y. For example, Gary Koch had us using this to model y = # of colds versus x = 1 (drug) or 0 (placebo).

You can do the equivalent of vwls in SAS using PROC CATMOD with a RESPONSE MEAN option.

Using different scores a_j: a_j = average ranks

As I said above, the scores a_j are simply a regressor variable, and we can use anything we want for them.

Let’s revisit Les’s little dataset:

                Outcome 



group   Good    Better    Best 

  y_i   a_1=1    a_2=2    a_3=3

      

                         

y_1=0    19       31        67

         n_11     n_12      n_13

                         

y_2=1     1        5        21

         n_21     n_22      n_23

                         


          20       36        88
          n_+1     n_+2      n_+3

ranks   1-20     21-56    57-144

sum of
ranks    210      1386      8844

average
rank     10.5     38.5    100.5

Let’s now use the average ranks instead of a_j.

Call these scores r_j. Let’s compute Pearson’s coefficient for y and r_j.

. list



      y   a   weight       r 

 

  1.  0   1       19    10.5 

  2.  0   2       31    38.5 

  3.  0   3       67   100.5 

  4.  1   1        1    10.5 

  5.  1   2        5    38.5 

 

  6.  1   3       21   100.5 




. corr y r [fw=w]
(obs=144)

                      y        r

   

           y     1.0000

           r     0.1755   1.0000


. display (r(N)-1)*r(rho)^2
4.4063138

The above is our chi-squared test statistic for Pearson's correlation coefficient, which Gary Koch called Qs.

This is exactly what nptrend is doing. nptrend is again Pearson's correlation coefficient by another name.

nptrend, unfortunately, does not allow weights, so we must expand the data:

. expand weight
(138 observations created)

. nptrend a, by(y)

         y     score       obs      sum of ranks
         0         0       117        8126.5
         1         1        27        2313.5

          z  =  2.10
  Prob > |z| =  0.036


. ret list

scalars:
                 r(p) =  .0358061342131948
                 r(z) =  2.099122151413003
                 r(T) =  2313.5
                 r(N) =  144

. display r(z)^2
4.4063138

Voila, the same answer!

nptrend takes the variable given after the command—here “a”—and computes the average ranks for it just as we did. It then correlates the average ranks with the values in y.

In SAS, if you specify SCORES=MODRIDIT, the scores used are the average ranks. PROC FREQ with SCORES=MODRIDIT should give exactly what nptrend produces.

More on nptrend: r x c tables

nptrend allows y to be anything. If y is 0, 1, 2, then we are doing Pearson’s correlation in 3 x 3 tables.

nptrend a, by(y) allows you to substitute different values (i.e., scores) for y: when you specify the score(scorevar) option on nptrend, it uses the values of scorevar in place of the values of y in computing the correlation coefficient.

nptrend always uses averaged ranks for the scores for the “a” variable.

This is a little confusing. Remember, we are simply computing corr y x. y can be anything, and x can be anything in theory. But nptrend is restrictive; it allows any values for y but only allows x to be averaged ranks.

There are other ways to do r x c tables. We do not have to assume an ordering on r. If we don’t, guess what we get. ANCOVA! The scores for the outcome “a” are the “continuous” variable. We can model different slopes per block r. Test slopes = 0 with a chi-squared stat, give it a fancy name, and no one will know we are just doing ANCOVA!

Stratified 2 x c tables

Generalizing the above analysis to stratified 2 x c tables gives what I learned as the “Extended Mantel–Haenszel” test. Rothman calls it the “Mantel extension test” and then says it can be done with or without strata. Yes, it can be done with or without strata, but in the terminology I know, it’s only called the Extended Mantel–Haenszel test only when there are strata. Thanks, Ken, for adding yet another name for unstratified 2 x c tables!

Say, the test for stratified tables is NOT just Pearson’s correlation coefficient in disguise. First, define, as we did before, the statistic

T_h = (sum over group i)(sum over outcome j) n_hij * a_hj * y_hi

for each stratum h. Then form the statistic T by summing over strata. Sometimes the sum is weighted by stratum totals, sometimes strata are equally weighted. (I believe that SAS does the former; Rothman gives the formula for the latter.) Since the strata are independent, the mean and variance of T are easy to compute from T_h.

Distribution of Pearson’s correlation coefficient

I said that Qs = (N − 1) * r_ay² (where r_ay is Pearson’s correlation coefficient) had a chi-squared distribution. And one can use this to get a p-value for Pearson’s correlation coefficient. But this isn’t what pwcorr does. pwcorr is using a t distribution to get a p-value (which assumes normality of "a" and y). These are asymptotically equivalent procedures (under the null hypothesis).

Qs is a second-moment approximation for the permutation distribution for r_ay. The permutation distribution for r_ay makes no assumptions about "a" and y.

If you want to get fussy about getting p-values, then one should compute the permutation distribution or compute higher-moment terms for the permutation distribution for r_ay.

For small N, Pearson’s correlation coefficient has an advantage over logistic regression. One can compute the permutation distribution of Pearson’s correlation coefficient exactly. For the exact distribution, true type I error is equal to nominal type I error. Such won’t be the case for logistic. Also, I bet it’s more powerful than logistic for small N. One could do some simulations and look at this—it’s a good master’s paper project, perhaps. This is the case where Pearson’s correlation coefficient is a better choice than logistic regression or other regression modeling.

Tests for trend in Stata

Clearly, we need a command to do r x c tables, stratified and unstratified, with various choices of scores.

We plan to implement something in the future. But, in the meantime, for moderate to large N, there is logit/probit regression (and vwls). If you do want Qs for linear scores, it’s easy to use corr as I’ve done here. One can also use stmh, stmc, and tabodds.

For average-rank scores, it’s not too difficult to compute the average ranks yourself and then use corr.

For stratified tables, there’s nothing easy available I know of.

Does Stata provide a test for trend?

Tests for trend in 2 x r tables

PROC FREQ

Chi-squared statistic for trend Qs

Just Pearson’s correlation

Royston’s ptrend and the Cochran–Armitage test

Model it

Try the vwls command

Using different scores a_j: a_j = average ranks

More on nptrend: r x c tables

Stratified 2 x c tables

Distribution of Pearson’s correlation coefficient

Tests for trend in Stata

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies


group	Good	Better	Best
y_i	a_1=1	a_2=2	a_3=3


y_1=0	19	31	67
	n_11	n_12	n_13

y_2=1	1	5	21
	n_21	n_22	n_23

Source	SS df MS	Number of obs = 144
		F( 1, 142) = 4.63
Model	.692624451 1 .692624451	Prob > F = 0.0331
Residual	21.2448755 142 .1496118	R-squared = 0.0316
		Adj R-squared = 0.0248
Total	21.9375 143 .153409091	Root MSE = .3868


y		Coef. Std. Err. t P>\|t\| [95% Conf. Interval]

a		.0955344 .0444011 2.15 0.033 .0077618 .183307
_cons		-.0486823 .1144041 -0.43 0.671 -.2748375 .177473


y		Coef. Std. Err. z P>\|z\| [95% Conf. Interval]

a		.0944406 .0338345 2.79 0.005 .0281261 .160755
_cons		-.0459275 .0758028 -0.61 0.545 -.1944984 .1026433


		y a weight r

1.		0 1 19 10.5
2.		0 2 31 38.5
3.		0 3 67 100.5
4.		1 1 1 10.5
5.		1 2 5 38.5

6.		1 3 21 100.5

Stata/MP4 Annual License (download)

Does Stata provide a test for trend?

Tests for trend in 2 x r tables

PROC FREQ

Chi-squared statistic for trend Qs

Just Pearson’s correlation

Royston’s ptrend and the Cochran–Armitage test

Model it

Try the vwls command

Using different scores aj: aj = average ranks

More on nptrend: r x c tables

Stratified 2 x c tables

Distribution of Pearson’s correlation coefficient

Tests for trend in Stata

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

Using different scores a_j: a_j = average ranks