Randomization test of mean is compuationally inaccessible when the number of groups exceeds two

With the advent of fast computers, the randomization test of mean (also called the permutation test) received some attention in the recent years. Here we show that the randomization test is possible only for two-group design; comparing three groups requires a number of permutations so vast that even three groups of ten participants is beyond the current capabilities of modern computers. Further, we show that the rate of increase in the number of permutation is so large that simply adding one more participant per group to the data results in a computation time increased by at least one order of magnitude (in the three-group design) or more. Hence, the exhaustive randomization test may never be a viable alternative to ANOVAs.


Denis Cousineau Université de Montréal
With the advent of fast computers, the randomization test of mean (also called the permutation test) received some attention in the recent years.Here we show that the randomization test is possible only for two-group design; comparing three groups requires a number of permutations so vast that even three groups of ten participants is beyond the current capabilities of modern computers.Further, we show that the rate of increase in the number of permutation is so large that simply adding one more participant per group to the data results in a computation time increased by at least one order of magnitude (in the three-group design) or more.Hence, the exhaustive randomization test may never be a viable alternative to ANOVAs.
With the advent of fast computers, an alternative to the Analysis of Variance (ANOVA) test of mean is receiving an increased amount of attention.This test, the randomization test (also called the permutation test), was proposed by Fisher in 1935(Fisher, 1935/1951).It evaluates the significance of the results by examining the way the data might have been if there had been no effect of the conditions.To do so, the data are shuffled across groups and for each permutation, the effect size is computed.Finally, the probability of the observed effect size is assessed with regard to all the possible effect sizes.
The randomization test is a wonderful test because it does not require that the distribution of the population(s) from which the data are sampled be known.In particular, it does not require that the populations be normally distributed, as is the case for the ANOVA test.Further, it does not require that the variance be homogeneous across The type-I error rate and the power of the randomization tests were examined in two-group designs using Monte Carlo simulations.Mewhort (2005) varied the asymmetry of the data and found the randomization test to be more powerful than the ANOVA test while maintaining the same type-I error rate.Armstrong, Bors & Cheng (2007) examined the impact of heterogeneous variances and unequal sample sizes and found that the randomization test is both powerful and reliable except when the smaller of the two groups had the largest variance.Because the last condition was extreme (a ratio of 2:1 between the sample sizes and a ratio of 9:1 between the variances), the overall pattern of results is favorable to the randomization test.
Facing all these advantages, the randomization test comes with one difficulty: all the possible permutations of the data between the groups must be examined.The number of permutations increases rapidly with the number of participants.For two groups, it involves picking data to be placed in group 1, the remaining data being placed in the second group.The following computes the number of permutations: (1) where is the number of data in group 1 and is the total number of data ( ).For example, with two groups of 10 participants, the number of permutations is 184 756, a number clearly within the grasp of actual computers based on the von Newman architecture.
The general impression is therefore that the randomization test is the test of mean to use with small sample sizes and that they will soon be used routinely.
As we show here, this impression is wrong and based on the fact that only two-group designs were examined.Adding just one more group results in a dramatic increase in the number of permutations, and the numbers are so dramatically high that they will forever be out of reach of computers.
Computing the number of permutations for groups of data ( = 1, ..., ) involves first selecting data for group 1, then among the remaining -data, selecting data, and so on.The general formula is (2) which contains p factors, but the last one simplifies to 1 as there is only one way to select data among remaining data.This formula simplifies to: (3) The first factor of Equation ( 2) is equivalent to Equation (1) in a two-group design.In case where all the groups are of equal size ( = = … = = ), this formula can be simplified to: . (4) As an illustration, adding a third group of 10 participants brings the number of permutations from 184 756 to more than 5 thousand billion (5.5 × 10 12 ).It represents an increase in the number of permutations by a factor of 3 million.
If we accept to run permutation test when the number of permutations does not exceed 10 millions (requiring less than an hour on a typical computer) (or 200 millions; requiring less than a day), we would be able to compare (a) two groups of 12 (14) participants, (b) three groups of 5 (8) participants, (c) four groups of 3 (4) participants, (d) five groups of 2 (2) participants.Clearly, with such small sample sizes, performing any test of means is questionable in the first place.
Table 1 lists the number of permutations as a function of Table 1.Number of permutations that have to be examined as a function of the number of groups (2 to 5) and as a number of participants per group (2 to 15).
the number of groups and of group size (all groups assumed equal) and.By comparison, the number of seconds elapsed since the beginning of the universe is believed to be about 300 000 000 000 000 000 (3 × 10 17 ).
With the improvement of computers, maybe these figures will soon be accessible?It is possible to show that it is not the case.Suppose that a computer can perform a randomization test with groups of participants in an acceptable amount of time.What would be the impact of adding just one more participant in each group?In the limit, adding one more participant to two groups has a consequence to increase the number of permutation by four, so that the computation time will likewise increase by a factor of four.However, for a four-group design, adding one more participant in each group increases the number of permutation by a factor of 256.Following the Moore's law (computers double their processing speed every year and a half), it will take twelve years before this extra participant can again be computed in an acceptable amount of time.
Table 2 lists the factor of increase in the number of permutations when one extra participant is added to each group as a function of the number of data per group (all groups assumed to be of equal size).At the limit, adding one more participant in each of the p groups increases the number of permutations by a factor of .

Discussion
If permutation test is to become an alternative to ANOVA test, we need to reconsider seriously the necessity to explore all the permutations.Hayes, 2000Hayes, , 1998Hayes, , 1996, proposed , proposed to use only a sample of permutations chosen randomly.He proposed to limit the number of permutations to 5000, but this number could now easily be increased to 50,000, a safer sample size to infer decision thresholds for small probability (e.g. and of 0.01).
The situation reported above pertains to independentgroup designs.In repeated-measure designs, those figures may change drastically.Indeed, to test the significance of a within-subject factor, data need not be moved between participants.This restriction reduces considerably the number of possible permutations as they now increase as a function of the number of participants ( ), not the total number of measures ( ).The total number of within-subject permutation is given by: (4) where is the number of repeated measures.For example, for 10 participants measured in three conditions, there would be 60,466,176 possible permutations.This number is large, but not inaccessible to actual computers.It is also 91,803.3times smaller than if independent groups had been used.Further explorations are required to assess the number of permutations in factorial designs and in designs involving both within and between subject factors.
There is still the possibility that permutation results can be computed efficiently.For example, Gill (2007) found that the 2-group permutation test could be decomposed using Fourier transform into a single difference statistic which can be computed in linear time.Likewise, Mewhort, Johns and Kelly (2010) showed how the Fourier transform could be used with factorial designs in which the number of levels is always 2 (e.g., a 2 × 2 design).However, it seems that a similar result cannot be achieved regarding sums of squares statistics.Hence, in the absence of a similar decomposition for multi-group designs, randomization tests may be forever inaccessible.