Confidence intervals in within-subject designs : A simpler solution to Loftus and Masson ’ s method

Within-subject ANOVAs are a powerful tool to analyze data because the variance associated to differences between the participants is removed from the analysis. Hence, small differences, when present for most of the participants, can be significant even when the participants are very different from one another. Yet, graphs showing standard error or confidence interval bars are misleading since these bars include the between-subject variability. Loftus and Masson (1994) noticed this fact and proposed an alternate method to compute the error bars. However, i) their approach requires that the ANOVA be performed first, which is paradoxical since a graph is an aid to decide whether to perform analyses or not; ii) their method provides a single error bar for all the conditions, masking information such as the heterogeneity of variances across conditions; iii) the method proposed is difficult to implement in commonly-used graphing software. Here we propose a simple alternative and sow how it can be implemented in SPSS.

Mauchly's W = 0.74, p > .50 for factor 2 and W = 0.68, p > .50 for the interaction; this test cannot be performed for factor 1 since it has only two levels but we generated the data such that it is also homogeneous).The Greenhouse-Geiser and the Huynh-Feldt epsilons are close to 1 so that we don't need to use corrections (Huynh, 1978, Rouanet andLepine, 1970).
Using multivariate tests (such as Hotelling's T or Wilks λ) does not change the results in any way.
The inconsistency between the graph and the analyses comes from the fact that we are using a repeated-measure design.All the participants are measured in each of the 10 combinations of the factor 1 and factor 2 levels.Hence, it is possible to assess whether a given participant systematically scores high or systematically scores low.In fact, that is what happens in the present case.Figure 2 shows the results for each individual participant.As seen, there is a tremendous amount of difference between each participant.Hence, we can safely conclude that the participants differ significantly (in fact, this information is indeed provided by most statistical software, F(1, 15) = 710, p < .001).However, in general, we don't care about this: in psychology, it is a plain fact that most humans differ.What we really want to know is if the factors influence the results.By looking carefully at the second condition of the factor 1 (the right panel of Figure 2), we see that for most of the participants, scores decrease when going from the first level of factor 2 to the fifth.
Therefore, if we could ignore the relative position of the participants, the trend would be very clear.
Figure 3 shows exactly that.Each participant's scores were adjusted so that its relative position is no longer present.As seen on the right panel, the downward trend is very clear whereas in the other condition on the left, there is no visible trend.Hence, we should observe an interaction.
The repeated-measure ANOVA got those results right because it first starts computing the between-subject sum of square (SSS, Keppel, 1973) and removes it from the total sum of square before partitioning the remaining sum of square in the usual manner (main effects, interaction, and error terms).Hence, a great deal of variability is removed, giving more chance to the F ration to exceed the critical value.
Because the mean square of error MSe = √(MSe / dle) is a direct measure of the unexplained variation, Loftus and Masson (1994) suggested to use this term for the computation of the error bar.Indeed, the ANOVA test uses the quantity √(MSe / dle)so that a confidence interval is t α , dle ×√(MSe / dle) where a is the confidence level (often 5%) and t is obtained in a Student table with dle degrees of freedom.By extension with other tests, we can equate the standard error to the √(MSe / dle) term as it is the part that does not depend on a confidence level.
Although Loftus and Masson solution is sound in providing standard errors and confidence intervals exempted of between-subject differences, it has three limitations: First, the analyses must be performed first, in order to get the √(MSe / dle) term that is used as the error bars in the graphs.This is paradoxical since graphs should precede any analysis, as their purpose is to help anticipate the results of the analyses.In addition, in factorial designs, there are more than 1 error terms so which one to use is ambiguous.For instance, if you expect a main effect and no interaction, use the error term associated with that effect whereas if you expect an interaction, use the interaction error term… Second, Loftus and Masson's method provides the size for a unique error bar that will be applied to all the points.The (omnibus) ANOVA do use a single error term per effect, and in this respect, the graph is congruent with the analysis.However, we may want to look at other information on the graph.For instance, are the variances homogeneous across levels of the factors?By using a single error bar, this information is lost.
Thirdly, most plotting software either computes error bars automatically (but then, they wrongly include betweensubject differences) or they must be provided manually.This last technique takes a lot more time, requiring multiple steps and manual interventions.
In the following, we present an alternative technique that solves all three limitations.Further, it can be implemented easily in most statistical software.We show how using SPSS 13.
Consider the data from three participants presented in Table 2.As seen by the marginal means, there seems to be an effect of the manipulation.However, there are also large differences between the participants.For instance, the first one is on average 55 ms faster than the mean of the group.Hence, if we add 55 ms to all the obtained performance, we would "erase" the particularities of that participant.The participant mean (noted 1 X ) minus the group mean (noted  (1) for all conditions i and all participants j, all the individual differences will be erased.Figure 3 was made using Ys instead of Xs.Further, a graph of Y as a function of the conditions can be made showing means and error bars automatically.Figure 4 shows the results from the fictitious experiment.It is now evident from inspection of the graph that an interaction is present.
In SPSS 13, computing the mean for each participant is performed using the following syntax: Aggregate outfile=* mode=addvariables /break = subject /x.subj = mean(x).where x is the name of the column containing the dependant variable and subject is the name of the column containing subject identification.1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 ☺ 10.00 11.00 12.00 13.00 14.00 15.00 16.00

Figure 1 :
Figure 1: Fictitious results of an experiment with two factors, the first with two levels and the second with 5 levels.Error bars show the mean ± 1 standard error.

Figure 2 :
Figure 2: The individual results of the 16 simulated participants of Figure 1.Left panel is for the first level of the first factor.

Figure 4 :
Figure4: Same as in Figure1except that the error bars does not include variability associated with between-subject differences.

Table 1 .
Results of a 2 × 5 experiment with the first factor having two levels and the second factor having 5 levels ***: p < .001

Table 2 .
Hypothetical results from a repeated-measure experiment with one factor having three levels.The individual results of the 16 simulated participants of Figure1after the individual differences were removed.