In the spotlight: nptrend
New in Stata 17, the nptrend command has three additional tests for trend: the Cochran–Armitage test, the Jonckheere–Terpstra test, and the linear-by-linear trend test. Also new in Stata 17 is an option for computing exact p-values.
Say you have two variables, x and y, and you want to see whether there is a trend between them, that is, whether larger values of x are associated with larger values of y.
For trend tests, x typically defines groups that are ordered. For example, x might define groups of subjects given different drug doses, say, 10, 20, 30, or 40 mg, in a clinical trial.
The y variable is typically a response. Responses could be 0 or 1, for example, whether or not a drug gave relief for a migraine headache. Or responses could be ordered categories, such as the degree of relief: none, a little, some, a lot, or complete. Or responses could be continuous values, such as the time when relief began.
Let's run an example using nptrend when responses are 0 or 1. Here we have the variable dose containing the dose of the drug given to a subject. The variable relief is 0/1, with 0 indicating no relief of the migraine and 1 indicating partial or total relief.
. tabulate dose relief, row nokey
Relief of migraine | ||||
Mycureit | after 2 hours | |||
dose in mg | 0 1 | Total | ||
10 | 80 120 | 200 | ||
40.00 60.00 | 100.00 | |||
20 | 92 108 | 200 | ||
46.00 54.00 | 100.00 | |||
30 | 83 117 | 200 | ||
41.50 58.50 | 100.00 | |||
40 | 63 137 | 200 | ||
31.50 68.50 | 100.00 | |||
Total | 318 482 | 800 | ||
39.75 60.25 | 100.00 |
For 0/1 responses, the Cochran–Armitage statistic tests whether there is a linear trend in response probabilities by group.
. nptrend relief, group(dose) carmitage Cochran–Armitage test for trend Number of observations = 800 Number of groups = 4 Number of response levels = 2
Mean | ||
response Number | ||
Group | Group score score of obs | |
dose | ||
10 | 10 .6 200 | |
20 | 20 .54 200 | |
30 | 30 .585 200 | |
40 | 40 .685 200 | |
nptrend not only reports the Cochran–Armitage test for linear trend but also shows a test for departure from linear trend, which is a measure of nonlinear association between relief and dose.
The p-value for the Cochran–Armitage statistic is calculated using a normal approximation to a permutation test. nptrend optionally computes exact permutation p-values. Here a Monte Carlo procedure with 100,000 random permutations is used.
. nptrend relief, group(dose) carmitage exact(montecarlo, reps(100000) dots(1000) rseed(1234)) Permutations (100,000): ..........10,000..........20,000..........30,000........ ..40,000..........50,000..........60,000..........70,000..........80,000...... ....90,000..........100,000 done Cochran–Armitage test for trend Number of observations = 800 Number of groups = 4 Number of response levels = 2
Mean | ||
response Number | ||
Group | Group score score of obs | |
dose | ||
10 | 10 .6 200 | |
20 | 20 .54 200 | |
30 | 30 .585 200 | |
40 | 40 .685 200 | |
The exact p-value is 0.0592 and is larger than the normal-approximation p-value of 0.0526. If an accurate p-value is wanted, then it is a good idea to compute an exact p-value by using many Monte Carlo permutations.
The Cochran–Armitage statistic tests for a linear trend in response probabilities for 0/1 responses. If the response is not 0/1, then nptrend with the option linear can be used to perform the linear-by-linear trend test, which is a permutation version of a Pearson correlation. It again tests for a linear trend.
If you want to test for a general trend, not just a linear trend, you can use the Jonckheere–Terpstra test. For any two distinct groups, say, group \(j\) and group \(j'\), the Jonckheere–Terpstra test looks at all possible pairs of responses \(y_{jk}, y_{j'k'}\), where \(k\) and \(k'\) run over all observations in groups \(j\) and \(j'\), respectively. The numbers of concordant and discordant pairs are counted, and the test statistic is the difference in the numbers of concordant and discordant pairs, summed across all pairs of distinct groups. No assumptions whatsoever are made about the functional form of the trend. So the Jonckheere–Terpstra test is a good choice when you want to test for trend but have no idea what the trend might be.
Here's an example. There are three groups of sunglasses, each with a different amount of light transmission. The response is exposure to ultraviolet radiation. Data are
Transmission of | ||
Group | visible light | Ocular exposure to ultraviolet radiation |
1 | < 25% | 1.4 1.4 1.4 1.6 2.3 2.5 |
2 | 25 to 35% | 0.9 1.0 1.1 1.1 1.2 1.2 1.5 1.9 2.2 2.6 2.6 |
2.6 2.8 2.8 3.2 3.5 4.3 5.1 | ||
3 | > 35% | 0.8 1.7 1.7 1.7 3.4 7.1 8.9 13.5 |
Here's the output from nptrend when computing the Jonckheere–Terpstra test:
. nptrend exposure, group(group) jterpstra Jonckheere–Terpstra test for trend Number of observations = 32 Number of groups = 3 Number of response levels = 23
Mean | ||
response Number | ||
Group | Group score score of obs | |
group | ||
< 25% | 1 1.766667 6 | |
25% to 35% | 2 2.311111 18 | |
> 35% | 3 4.85 8 | |
For such a small sample size, you would likely want to run nptrend again with the exact option to get an exact p-value.
nptrend also computes Cuzick's test for trend, which was available in earlier versions of nptrend.
— by Bill Sribney
Principal Statistician and Software Developer