Home  /  Products  /  Stata 17  /  Nonparametric tests for trend
This page announced the new features in Stata 17. Please see our Stata 18 page for the new features in Stata 18.

Nonparametric tests for trend

Highlights

  • nptrend performs four nonparametric tests for trend

    • Cochran–Armitage test
    • Jonckheere–Terpstra test
    • linear-by-linear test
    • Cuzick's test with ranks
  • option for exact p-values

Trend tests involve responses in ordered groups. They test whether response values tend to either increase or decrease across groups.

Trend tests are typically used when there is only a small amount of data and no covariates to control for, and a test yielding a p-value valid in small samples is desired. nptrend has an option to compute exact p-values based on Monte Carlo permutations or a full enumeration of the permutation distribution (the latter practical only for extremely small samples).

nptrend performs four different tests for trend:

  • the Cochran–Armitage test,
  • the Jonckheere–Terpstra test,
  • the linear-by-linear trend test, and
  • a test using ranks developed by Cuzick.

To calculate the Cochran–Armitage statistic for trend, you type

. nptrend relief, group(dose) carmitage


Let's see it work

For the Cochran–Armitage test (when the response is 0/1), linear-by-linear trend test, and Cuzick's test, the groups have scores as well. It tests the trend in the proportions of positive responses across the groups.

Here we have fictional data from a clinical trial of a new drug for treating migraines. The variable dose contains the dose of the drug given to a subject. The variable relief is 0/1, with 0 indicating no relief and 1 partial or total relief.

Here is a tabulation of the data:

. webuse migraine
(Fictional migraine drug data)

. tabulate dose relief, row nokey

Relief of migraine
Mycureit after 2 hours
dose in mg 0 1 Total
10 80 120 200
40.00 60.00 100.00
20 92 108 200
46.00 54.00 100.00
30 83 117 200
41.50 58.50 100.00
40 63 137 200
31.50 68.50 100.00
Total 318 482 800
39.75 60.25 100.00

We will test whether there is a trend by dose in the proportion of subjects reporting relief.

. nptrend relief, group(dose) carmitage

Cochran–Armitage test for trend

   Number of observations =      800
         Number of groups =        4
Number of response levels =        2

Mean
response Number
Group Group score score of obs
dose
10 10 .6 200
20 20 .54 200
30 30 .585 200
40 40 .685 200
Statistic = .003 Std. err. = .0015476 z = 1.939 Prob > |z| = 0.0526 Test of departure from trend: chi2(2) = 5.45 Prob > chi2 = 0.0656

nptrend first displays a table of the mean response score by group. The mean response score in this case is simply the proportion of subjects in the group reporting relief.

The Cochran–Armitage \( z \) statistic tests for a linear trend. A \( \chi^2 \) statistic that tests for departure from a linear trend is also calculated.

When either the \( z \) statistic for linear trend or the \( \chi^2 \) statistic for departure from linear trend is large, it means that the test for independence between response and group is rejected. \( z \) being large means there is a linear trend that rejects independence. \( \chi^2 \) being large means there are differences other than the linear trend that reject independence.

In the example above, the linear test for trend gave a p-value of 0.0526, not quite reaching significance at the 0.05 level. The test of departure from trend gave a p-value of 0.0656, meaning there is weak evidence, not reaching significance, for a nonlinear association between dose and relief.

Trends other than linear can also be tested using the scoregroup() option. For this example, specifying scoregroup(1 4 9 16) would test a quadratic trend in dose.

The Cochran–Armitage test requires that responses be 0/1 or else the group indicator be 0/1. The other trend tests computed by nptrend have no restriction on the response; the response variable can have any value.

Here's an example with the responses being ocular exposure to ultraviolet radiation for 32 pairs of sunglasses. Sunglasses are classified into 3 groups according to the amount of visible light transmitted. We list some of the data:

. webuse sg
(Ultraviolet radiation exposure with sunglasses)

. list in 1/12, separator(6)

  group exposure
1. < 25% 1.4
2. < 25% 1.4
3. < 25% 1.4
4. < 25% 1.6
5. < 25% 2.3
6. < 25% 2.5
7. 25% to 35% .9
8. 25% to 35% 1
9. 25% to 35% 1.1
10. 25% to 35% 1.1
11. 25% to 35% 1.2
12. 25% to 35% 1.2

The Jonckheere–Terpstra test is useful when it is not clear what the trend might be and we simply want to test for any trend. It tests whether the ordering of the responses is associated with the ordering of the groups.

To compute the Jonckheere–Terpstra test, we specify the option jterpstra.

. nptrend exposure, group(group) jterpstra

Jonckheere–Terpstra test for trend

   Number of observations =       32
         Number of groups =        3
Number of response levels =       23

Mean
response Number
Group Group score score of obs
group
< 25% 1 1.766667 6
25% to 35% 2 2.311111 18
> 35% 3 4.85 8
Statistic = 82 Std. err. = 54.80056 z = 1.496 Prob > |z| = 0.1346

We see that the mean response score increases as the group indicator increases, but the p-value from the Jonckheere–Terpstra test is 0.1346, not reaching significance at the 0.05 level.

Because the Jonckheere–Terpstra statistic tests for any type of trend in responses across ordered groups, it will not be as powerful as a test that accurately hypothesizes the true trend. The linear-by-linear trend test allows you to do just this. The linear-by-linear trend test uses the numeric values of the responses to specify the trend being tested. How the trend is hypothesized to vary across groups is specified by the numeric values of the group variable.

The linear-by-linear statistic is equivalent to the Pearson correlation coefficient, the difference being that the Pearson correlation coefficient is standardized by the standard deviations of the scores. The p-values are slightly different because the p-value for the linear-by-linear test is based on its permutation distribution while the p-value for the Pearson correlation coefficient assumes normality.

To compute the linear-by-linear test, we specify the option linear. We also specify notable to suppress the display of the mean response scores by group.

. nptrend exposure, group(group) linear notable

Linear-by-linear test for trend

   Number of observations =       32
         Number of groups =        3
Number of response levels =       23

                Statistic = .7035156
                Std. err. = .3063377
                        z =    2.297
               Prob > |z| =   0.0216

The p-value from the linear-by-linear test is 0.0216, which is considerably different from the p-value computed by the Jonckheere–Terpstra test, which was 0.1346. This is not surprising because the linear-by-linear test assumes a specific trend based on numerical values, whereas the Jonckheere–Terpstra statistic tests for any trend.

The fourth trend test computed by nptrend is a test based on ranks developed by Cuzick.

. nptrend exposure, group(group) cuzick

. nptrend exposure, group(group) cuzick notable

Cuzick's test with rank scores

   Number of observations =       32
         Number of groups =        3
Number of response levels =       23

                Statistic =  1.65625
                Std. err. = 1.090461
                        z =    1.519
               Prob > |z| =   0.1288

In this case, it produces a p-value that is similar to the p-value from the Jonckheere–Terpstra test.


Exact p-values

nptrend will also compute exact p-values using Monte Carlo permutations when the exact option is specified. Here we compute the exact p-value for the Jonckheere–Terpstra test.

. nptrend exposure, group(group) jterpstra notable exact

Permutations (10,000): ..........1,000..........2,000..........3,000..........4,000..........5,00
> 0..........6,000..........7,000..........8,000..........9,000..........10,000 done

Jonckheere–Terpstra test for trend

   Number of observations =       32
         Number of groups =        3
Number of response levels =       23

                Statistic =       82
                Std. err. = 54.80056
                        z =    1.496
               Prob > |z| =   0.1346
               Exact prob =   0.1510 (10,000 Monte Carlo permutations)

By default, 10,000 Monte Carlo permutations are used. This gave an exact p-value of 0.1510, differently slightly from the p-value of 0.1346, computed using a normal approximation.

Monte Carlo permutations give results with random error, so for more precision, more permutations can be computed. Below, we use 100,000 permutations, and have a dot displayed every 1,000th permutation to monitor the progress. We specify a random-number seed so we can duplicate the results and the option show, which displays a detailed table of the Monte Carlo results.

. nptrend exposure, group(group) jterpstra notable ///
> exact(montecarlo, reps(100000) dots(1000) rseed(1234) show)

Permutations (100,000): ..........10,000..........20,000..........30,000..........40,000....
> ......50,000..........60,000..........70,000..........80,000..........90,000..........100,
> 000 done

Monte Carlo permutation results                Number of observations =      32
Permutation variable: group                    Number of permutations = 100,000

Monte Carlo error
T T(obs) Test c n p SE(p) [95% CI(p)]
_pm_1 82 lower 93358 100000 .9336 .0008 .9320 .9351
upper 6874 100000 .0687 .0008 .0672 .0703
two-sided .1375 .0011 .1353 .1396
Notes: For lower one-sided test, c = #{T <= T(obs)} and p = p_lower = c/n. For upper one-sided test, c = #{T >= T(obs)} and p = p_upper = c/n. For two-sided test, p = 2*min(p_lower, p_upper); SE and CI approximate. Jonckheere–Terpstra test for trend Number of observations = 32 Number of groups = 3 Number of response levels = 23 Statistic = 82 Std. err. = 54.80056 z = 1.496 Prob > |z| = 0.1346 Exact prob = 0.1375 (100,000 Monte Carlo permutations)

The exact p-value from the Monte Carlo computation is 0.1375, close to the approximate p-value of 0.1346. From the detailed table of the results, we see that the 95% confidence interval for the Monte Carlo p-value is [0.1353, 0.1396], which does not include the approximate p-value.

This example has only 32 observations. Should we wish to publish the results, we would likely want to run nptrend again, specifying 1,000,000 or more permutations to reduce the Monte Carlo error further. Permutations are generated using a fast algorithm, and the computation is not time-consuming.

For extremely small datasets, the exact(enumerate) option can be used to fully enumerate the permutation distribution. It gives an exact p-value without any Monte Carlo error.


Additional resources

Read more about tests for trend in the Stata Base Reference Manual; see [R] nptrend.