Analysis of Variances with STATISTICA
Introduction to ANOVA / MANOVA.. 3
The Purpose of Analysis of Variance. 3
The Partitioning of Sums of Squares. 3
Between-Groups and Repeated Measures. 8
Incomplete (Nested) Designs. 8
Analysis of Covariance (ANCOVA) 9
Multivariate Designs: MANOVA/MANCOVA.. 11
Contrast Analysis and Post hoc Tests. 12
Why Compare Individual Sets of Means?. 12
Assumptions and Effects of Violating Assumptions. 13
Deviation from Normal Distribution. 13
Homogeneity of Variances and Covariances. 14
Sphericity and Compound Symmetry. 15
Methods for Analysis of Variance. 17
Specifying the General ANOVA/MANOVA Analysis. 18
General ANOVA/MANOVA Startup Panel and Quick Tab. 18
One-way ANOVA in General ANOVA/MANOVA.. 21
Main Effects ANOVA in General ANOVA/MANOVA.. 21
Factorial ANOVA in General ANOVA/MANOVA.. 21
Repeated Measures ANOVA in General ANOVA/MANOVA.. 22
ANOVA/MANOVA Quick Specs – Options Tab. 22
ANOVA/MANOVA Quick Specs – Quick Tab. 23
Specify Within-Subjects Factor 24
Reviewing General ANOVA/MANOVA Results. 26
GLM and ANOVA Results – Summary Tab. 28
GLM, GRM, and ANOVA Results – Means Tab. 30
GLM, GRM, and ANOVA Results – Comps Tab. 33
GLM, GRM, and ANOVA Results – Matrix Tab. 36
GLM , GLZ, GRM, PLS, and ANOVA Results – Report Tab. 36
GLM, GRM, and ANOVA Results – Profiler Tab. 38
Examples of the ANOVA Analysis. 42
Example 1: Breakdown and One-Way ANOVA.. 42
Example 2: Simple Factorial ANOVA with Repeated Measures. 47
Example 3: A 2 x 3 Between-Groups ANOVA Design. 54
Example 4: A 2-Level Between-Group x 4-Level Within-Subject Repeated Measures Design 62
Introduction to ANOVA / MANOVA
A general introduction to ANOVA and a discussion of the general topics in the analysis of variance techniques, including repeated measures designs, ANCOVA, MANOVA, unbalanced and incomplete designs, contrast effects, post-hoc comparisons, assumptions, etc. For related information, see also Variance Components (topics related to estimation of variance components in mixed model designs), Experimental Design/DOE (topics related to specialized applications of ANOVA in industrial settings), and Repeatability and Reproducibility Analysis (topics related to specialized designs for evaluating the reliability and precision of measurement systems).
Basic Ideas
The Purpose of Analysis of Variance
In general, the purpose of analysis of variance (ANOVA) is to test for significant differences between means. Elementary Concepts provides a brief introduction to the basics of statistical significance testing. If we are only comparing two means, ANOVA will produce the same results as the t test for independent samples (if we are comparing two different groups of cases or observations) or the t test for dependent samples (if we are comparing two variables in one set of cases or observations). If you are not familiar with these tests, you may want to read Basic Statistics and Tables.
Why the name analysis of variance? It may seem odd that a procedure that compares means is called analysis of variance. However, this name is derived from the fact that in order to test for statistical significance between means, we are actually comparing (i.e., analyzing) variances.
· The Partitioning of Sums of Squares
The Partitioning of Sums of Squares
At the heart of ANOVA is the fact that variances can be divided, that is, partitioned. Remember that the variance is computed as the sum of squared deviations from the overall mean, divided by n-1 (sample size minus one). Thus, given a certain n, the variance is a function of the sums of (deviation) squares, or SS for short. Partitioning of variance works as follows. Consider this data set:
|
Group 1 |
Group 2 |
Observation 1 |
2 |
6 |
Mean |
2 |
6 |
Overall Mean |
4 |
The means for the two groups are quite different (2 and 6, respectively). The sums of squares within each group are equal to 2. Adding them together, we get 4. If we now repeat these computations ignoring group membership, that is, if we compute the total SS based on the overall mean, we get the number 28. In other words, computing the variance (sums of squares) based on the within-group variability yields a much smaller estimate of variance than computing it based on the total variability (the overall mean). The reason for this in the above example is of course that there is a large difference between means, and it is this difference that accounts for the difference in the SS. In fact, if we were to perform an ANOVA on the above data, we would get the following result:
|
MAIN EFFECT |
||||
SS |
df |
MS |
F |
p |
|
Effect |
24.0 |
1 |
24.0 |
24.0 |
.008 |
As can be seen in the above table, the total SS (28) was partitioned into the SS due to within-group variability (2+2=4) and variability due to differences between means (28-(2+2)=24).
SS Error and SS Effect. The within-group variability (SS) is usually referred to as Error variance. This term denotes the fact that we cannot readily explain or account for it in the current design. However, the SS Effect we can explain. Namely, it is due to the differences in means between the groups. Put another way, group membership explains this variability because we know that it is due to the differences in means.
Significance testing. The basic idea of statistical significance testing is discussed in Elementary Concepts, which also explains why very many statistical tests represent ratios of explained to unexplained variability. ANOVA is a good example of this. Here, we base this test on a comparison of the variance due to the between-groups variability (called Mean Square Effect, or MSeffect) with the within-group variability (called Mean Square Error, or Mserror; this term was first used by Edgeworth, 1885). Under the null hypothesis (that there are no mean differences between groups in the population), we would still expect some minor random fluctuation in the means for the two groups when taking small samples (as in our example). Therefore, under the null hypothesis, the variance estimated based on within-group variability should be about the same as the variance due to between-groups variability. We can compare those two estimates of variance via the F test (see also F Distribution), which tests whether the ratio of the two variance estimates is significantly greater than 1. In our example above, that test is highly significant, and we would in fact conclude that the means for the two groups are significantly different from each other.
Summary of the basic logic of ANOVA. To summarize the discussion up to this point, the purpose of analysis of variance is to test differences in means (for groups or variables) for statistical significance. This is accomplished by analyzing the variance, that is, by partitioning the total variance into the component that is due to true random error (i.e., within-group SS) and the components that are due to differences between means. These latter variance components are then tested for statistical significance, and, if significant, we reject the null hypothesis of no differences between means and accept the alternative hypothesis that the means (in the population) are different from each other.
Dependent and independent variables. The variables that are measured (e.g., a test score) are called dependent variables. The variables that are manipulated or controlled (e.g., a teaching method or some other criterion used to divide observations into groups that are compared) are called factors or independent variables. For more information on this important distinction, refer to Elementary Concepts.
Multi-Factor ANOVA
In the simple example above, it may have occurred to you that we could have simply computed a t test for independent samples to arrive at the same conclusion. And, indeed, we would get the identical result if we were to compare the two groups using this test. However, ANOVA is a much more flexible and powerful technique that can be applied to much more complex research issues.
Multiple factors. The world is complex and multivariate in nature, and instances when a single variable completely explains a phenomenon are rare. For example, when trying to explore how to grow a bigger tomato, we would need to consider factors that have to do with the plants’ genetic makeup, soil conditions, lighting, temperature, etc. Thus, in a typical experiment, many factors are taken into account. One important reason for using ANOVA methods rather than multiple two-group studies analyzed via t tests is that the former method is more efficient, and with fewer observations we can gain more information. Let’s expand on this statement.
Controlling for factors. Suppose that in the above two-group example we introduce another grouping factor, for example, Gender. Imagine that in each group we have 3 males and 3 females. We could summarize this design in a 2 by 2 table:
|
Experimental |
Experimental |
Males |
2 |
6 |
Mean |
2 |
6 |
Females |
4 |
8 |
Mean |
4 |
8 |
Before performing any computations, it appears that we can partition the total variance into at least 3 sources: (1) error (within-group) variability, (2) variability due to experimental group membership, and (3) variability due to gender. (Note that there is an additional source – interaction – that we will discuss shortly.) What would have happened had we not included gender as a factor in the study but rather computed a simple t test? If we compute the SS ignoring the gender factor (use the within-group means ignoring or collapsing across gender; the result is SS=10+10=20), we will see that the resulting within-group SS is larger than it is when we include gender (use the within- group, within-gender means to compute those SS; they will be equal to 2 in each group, thus the combined SS-within is equal to 2+2+2+2=8). This difference is due to the fact that the means for males are systematically lower than those for females, and this difference in means adds variability if we ignore this factor. Controlling for error variance increases the sensitivity (power) of a test. This example demonstrates another principal of ANOVA that makes it preferable over simple two-group t test studies: In ANOVA we can test each factor while controlling for all others; this is actually the reason why ANOVA is more statistically powerful (i.e., we need fewer observations to find a significant effect) than the simple t test.
Interaction Effects
There is another advantage of ANOVA over simple t-tests: with ANOVA, we can detect interaction effects between variables, and, therefore, to test more complex hypotheses about reality. Let’s consider another example to illustrate this point. (The term interaction was first used by Fisher, 1926.)
Main effects, two-way interaction. Imagine that we have a sample of highly achievement-oriented students and another of achievement “avoiders.” We now create two random halves in each sample, and give one half of each sample a challenging test, the other an easy test. We measure how hard the students work on the test. The means of this (fictitious) study are as follows:
|
Achievement- |
Achievement- |
Challenging Test |
10 |
5 |
How can we summarize these results? Is it appropriate to conclude that (1) challenging tests make students work harder, (2) achievement-oriented students work harder than achievement- avoiders? Neither of these statements captures the essence of this clearly systematic pattern of means. The appropriate way to summarize the result would be to say that challenging tests make only achievement-oriented students work harder, while easy tests make only achievement- avoiders work harder. In other words, the type of achievement orientation and test difficulty interact in their effect on effort; specifically, this is an example of a two-way interaction between achievement orientation and test difficulty. Note that statements 1 and 2 above describe so-called main effects.
Higher order interactions. While the previous two-way interaction can be put into words relatively easily, higher order interactions are increasingly difficult to verbalize. Imagine that we had included factor Gender in the achievement study above, and we had obtained the following pattern of means:
Females |
Achievement-oriented |
Achievement-avoiders |
Challenging Test |
10 |
5 |
Males |
Achievement-oriented |
Achievement-avoiders |
Challenging Test |
1 |
6 |
How can we now summarize the results of our study? Graphs of means for all effects greatly facilitate the interpretation of complex effects. The pattern shown in the table above (and in the graph below) represents a three-way interaction between factors.
Thus, we may summarize this pattern by saying that for females there is a two-way interaction between achievement-orientation type and test difficulty: Achievement-oriented females work harder on challenging tests than on easy tests, achievement-avoiding females work harder on easy tests than on difficult tests. For males, this interaction is reversed. As you can see, the description of the interaction has become much more involved.
A general way to express interactions. A general way to express all interactions is to say that an effect is modified (qualified) by another effect. Let’s try this with the two-way interaction above. The main effect for test difficulty is modified by achievement orientation. For the three-way interaction in the previous paragraph, we can summarize that the two-way interaction between test difficulty and achievement orientation is modified (qualified) by gender. If we have a four-way interaction, we can say that the three-way interaction is modified by the fourth variable, that is, that there are different types of interactions in the different levels of the fourth variable. As it turns out, in many areas of research five- or higher- way interactions are not that uncommon.
Complex Designs
A review of the basic “building blocks” of complex designs.
· Between-Groups and Repeated Measures
Between-Groups and Repeated Measures
When we want to compare two groups, we use the t test for independent samples; when we want to compare two variables given the same subjects (observations), we use the t test for dependent samples. This distinction – dependent and independent samples – is important for ANOVA as well. Basically, if we have repeated measurements of the same variable (under different conditions or at different points in time) on the same subjects, then the factor is a repeated measures factor (also called a within-subjects factor because to estimate its significance we compute the within-subjects SS). If we compare different groups of subjects (e.g., males and females; three strains of bacteria, etc.), we refer to the factor as a between-groups factor. The computations of significance tests are different for these different types of factors; however, the logic of computations and interpretations is the same.
Between-within designs. In many instances, experiments call for the inclusion of between-groups and repeated measures factors. For example, we may measure math skills in male and female students (gender, a between-groups factor) at the beginning and the end of the semester. The two measurements on each student would constitute a within-subjects (repeated measures) factor. The interpretation of main effects and interactions is not affected by whether a factor is between-groups or repeated measures, and both factors may obviously interact with each other (e.g., females improve over the semester while males deteriorate).
Incomplete (Nested) Designs
There are instances where we may decide to ignore interaction effects. This happens when (1) we know that in the population the interaction effect is negligible, or (2) when a complete factorial design (this term was first introduced by Fisher, 1935a) cannot be used for economic reasons.
Imagine a study where we want to evaluate the effect of four fuel additives on gas mileage. For our test, our company has provided us with four cars and four drivers. A complete factorial experiment, that is, one in which each combination of driver, additive, and car appears at least once, would require 4 x 4 x 4 = 64 individual test conditions (groups). However, we may not have the resources (time) to run all of these conditions; moreover, it seems unlikely that the type of driver would interact with the fuel additive to an extent that would be of practical relevance. Given these considerations, we could actually run a so-called Latin square design and “get away” with only 16 individual groups (the four additives are denoted by letters A, B, C, and D):
|
Car |
|||
1 |
2 |
3 |
4 |
|
Driver 1 |
A |
B |
C |
D |
Latin square designs (this term was first used by Euler, 1782) are described in most textbooks on experimental methods (e.g., Hays, 1988; Lindman, 1974; Milliken & Johnson, 1984; Winer, 1962), and we do not want to discuss here the details of how they are constructed. Suffice it to say that this design is incomplete insofar as not all combinations of factor levels occur in the design. For example, Driver 1 will only drive Car 1 with additive A, while Driver 3 will drive that car with additive C. In a sense, the levels of the additives factor (A, B, C, and D) are placed into the cells of the car by driver matrix like “eggs into a nest.” This mnemonic device is sometimes useful for remembering the nature of nested designs.
Note that there are several other statistical procedures that may be used to analyze these types of designs; see the section on Methods for Analysis of Variance for details. In particular, the methods discussed in the Variance Components and Mixed Model ANOVA/ANCOVA section are very efficient for analyzing designs with unbalanced nesting (when the nested factors have different numbers of levels within the levels of the factors in which they are nested), very large nested designs (e.g., with more than 200 levels overall), or hierarchically nested designs (with or without random factors).
Analysis of Covariance (ANCOVA)
General Idea
The Basic Ideas section discussed briefly the idea of “controlling” for factors and how the inclusion of additional factors can reduce the error SS and increase the statistical power (sensitivity) of our design. This idea can be extended to continuous variables, and when such continuous variables are included as factors in the design they are called covariates.
Fixed Covariates
Suppose that we want to compare the math skills of students who were randomly assigned to one of two alternative textbooks. Imagine that we also have data about the general intelligence (IQ) for each student in the study. We would suspect that general intelligence is related to math skills, and we can use this information to make our test more sensitive. Specifically, imagine that in each one of the two groups we can compute the correlation coefficient (see Basic Statistics and Tables) between IQ and math skills. Remember that once we have computed the correlation coefficient we can estimate the amount of variance in math skills that is accounted for by IQ, and the amount of (residual) variance that we cannot explain with IQ (refer also to Elementary Concepts and Basic Statistics and Tables). We may use this residual variance in the ANOVA as an estimate of the true error SS after controlling for IQ. If the correlation between IQ and math skills is substantial, then a large reduction in the error SS may be achieved.
Effect of a covariate on the F test. In the F test (see also F Distribution), to evaluate the statistical significance of between-groups differences, we compute the ratio of the between- groups variance (MSeffect) over the error variance (MSerror). If MSerror becomes smaller, due to the explanatory power of IQ, then the overall F value will become larger.
Multiple covariates. The logic described above for the case of a single covariate (IQ) can easily be extended to the case of multiple covariates. For example, in addition to IQ, we might include measures of motivation, spatial reasoning, etc., and instead of a simple correlation, compute the multiple correlation coefficient (see Multiple Regression).
When the F value gets smaller. In some studies with covariates it happens that the F value actually becomes smaller (less significant) after including covariates in the design. This is usually an indication that the covariates are not only correlated with the dependent variable (e.g., math skills), but also with the between-groups factors (e.g., the two different textbooks). For example, imagine that we measured IQ at the end of the semester, after the students in the different experimental groups had used the respective textbook for almost one year. It is possible that, even though students were initially randomly assigned to one of the two textbooks, the different books were so different that both math skills and IQ improved differentially in the two groups. In that case, the covariate will not only partition variance away from the error variance, but also from the variance due to the between- groups factor. Put another way, after controlling for the differences in IQ that were produced by the two textbooks, the math skills are not that different. Put in yet a third way, by “eliminating” the effects of IQ, we have inadvertently eliminated the true effect of the textbooks on students’ math skills.
Adjusted means. When the latter case happens, that is, when the covariate is affected by the between-groups factor, then it is appropriate to compute so-called adjusted means. These are the means that we would get after removing all differences that can be accounted for by the covariate.
Interactions between covariates and factors. Just as we can test for interactions between factors, we can also test for the interactions between covariates and between-groups factors. Specifically, imagine that one of the textbooks is particularly suited for intelligent students, while the other actually bores those students but challenges the less intelligent ones. As a result, we may find a positive correlation in the first group (the more intelligent, the better the performance), but a zero or slightly negative correlation in the second group (the more intelligent the student, the less likely he or she is to acquire math skills from the particular textbook). In some older statistics textbooks this condition is discussed as a case where the assumptions for analysis of covariance are violated (see Assumptions and Effects of Violating Assumptions). However, because ANOVA/MANOVA uses a very general approach to analysis of covariance, we can specifically estimate the statistical significance of interactions between factors and covariates.
Changing Covariates
While fixed covariates are commonly discussed in textbooks on ANOVA, changing covariates are discussed less frequently. In general, when we have repeated measures, we are interested in testing the differences in repeated measurements on the same subjects. Thus we are actually interested in evaluating the significance of changes. If we have a covariate that is also measured at each point when the dependent variable is measured, then we can compute the correlation between the changes in the covariate and the changes in the dependent variable. For example, we could study math anxiety and math skills at the beginning and at the end of the semester. It would be interesting to see whether any changes in math anxiety over the semester correlate with changes in math skills.
Multivariate Designs: MANOVA/MANCOVA
Between-Groups Designs
All examples discussed so far have involved only one dependent variable. Even though the computations become increasingly complex, the logic and nature of the computations do not change when there is more than one dependent variable at a time. For example, we may conduct a study where we try two different textbooks, and we are interested in the students’ improvements in math and physics. In that case, we have two dependent variables, and our hypothesis is that both together are affected by the difference in textbooks. We could now perform a multivariate analysis of variance (MANOVA) to test this hypothesis. Instead of a univariate F value, we would obtain a multivariate F value (Wilks’ lambda) based on a comparison of the error variance/covariance matrix and the effect variance/covariance matrix. The “covariance” here is included because the two measures are probably correlated and we must take this correlation into account when performing the significance test. Obviously, if we were to take the same measure twice, then we would really not learn anything new. If we take a correlated measure, we gain some new information, but the new variable will also contain redundant information that is expressed in the covariance between the variables.
Interpreting results. If the overall multivariate test is significant, we conclude that the respective effect (e.g., textbook) is significant. However, our next question would of course be whether only math skills improved, only physics skills improved, or both. In fact, after obtaining a significant multivariate test for a particular main effect or interaction, customarily we would examine the univariate F tests (see also F Distribution) for each variable to interpret the respective effect. In other words, we would identify the specific dependent variables that contributed to the significant overall effect.
Repeated Measures Designs
If we were to measure math and physics skills at the beginning of the semester and the end of the semester, we would have a multivariate repeated measure. Again, the logic of significance testing in such designs is simply an extension of the univariate case. Note that MANOVA methods are also commonly used to test the significance of univariate repeated measures factors with more than two levels; this application will be discussed later in this section.
Sum Scores versus MANOVA
Even experienced users of ANOVA and MANOVA techniques are often puzzled by the differences in results that sometimes occur when performing a MANOVA on, for example, three variables as compared to a univariate ANOVA on the sum of the three variables. The logic underlying the summing of variables is that each variable contains some “true” value of the variable in question, as well as some random measurement error. Therefore, by summing up variables, the measurement error will sum to approximately 0 across all measurements, and the sum score will become more and more reliable (increasingly equal to the sum of true scores). In fact, under these circumstances, ANOVA on sums is appropriate and represents a very sensitive (powerful) method. However, if the dependent variable is truly multi- dimensional iature, then summing is inappropriate. For example, suppose that my dependent measure consists of four indicators of success in society, and each indicator represents a completely independent way in which a person could “make it” in life (e.g., successful professional, successful entrepreneur, successful homemaker, etc.). Now, summing up the scores on those variables would be like adding apples to oranges, and the resulting sum score will not be a reliable indicator of a single underlying dimension. Thus, we should treat such data as multivariate indicators of success in a MANOVA.
Contrast Analysis and Post hoc Tests
· Why Compare Individual Sets of Means?
Why Compare Individual Sets of Means?
Usually, experimental hypotheses are stated in terms that are more specific than simply main effects or interactions. We may have the specific hypothesis that a particular textbook will improve math skills in males, but not in females, while another book would be about equally effective for both genders, but less effective overall for males. Now generally, we are predicting an interaction here: the effectiveness of the book is modified (qualified) by the student’s gender. However, we have a particular prediction concerning the nature of the interaction: we expect a significant difference between genders for one book, but not the other. This type of specific prediction is usually tested via contrast analysis.
Contrast Analysis
Briefly, contrast analysis allows us to test the statistical significance of predicted specific differences in particular parts of our complex design. It is a major and indispensable component of the analysis of every complex ANOVA design.
Post hoc Comparisons
Sometimes we find effects in our experiment that were not expected. Even though in most cases a creative experimenter will be able to explain almost any pattern of means, it would not be appropriate to analyze and evaluate that pattern as if we had predicted it all along. The problem here is one of capitalizing on chance when performing multiple tests post hoc, that is, without a priori hypotheses. To illustrate this point, let’s consider the following “experiment.” Imagine we were to write down a number between 1 and 10 on 100 pieces of paper. We then put all of those pieces into a hat and draw 20 samples (of pieces of paper) of 5 observations each, and compute the means (from the numbers written on the pieces of paper) for each group. How likely do you think it is that we will find two sample means that are significantly different from each other? It is very likely! Selecting the extreme means obtained from 20 samples is very different from taking only 2 samples from the hat in the first place, which is what the test via the contrast analysis implies. Without going into further detail, there are several so-called post hoc tests that are explicitly based on the first scenario (taking the extremes from 20 samples), that is, they are based on the assumption that we have chosen for our comparison the most extreme (different) means out of k total means in the design. Those tests apply “corrections” that are designed to offset the advantage of post hoc selection of the most extreme comparisons.
Assumptions and Effects of Violating Assumptions
· Deviation from Normal Distribution
· Homogeneity of Variances and Covariances
· Sphericity and Compound Symmetry
Deviation from Normal Distribution
Assumptions. It is assumed that the dependent variable is measured on at least an interval scale level (see Elementary Concepts). Moreover, the dependent variable should be normally distributed within groups.
Effects of violations. Overall, the F test (see also F Distribution) is remarkably robust to deviations from normality (see Lindman, 1974, for a summary). If the kurtosis (see Basic Statistics and Tables) is greater than 0, then the F tends to be too small and we cannot reject the null hypothesis even though it is incorrect. The opposite is the case when the kurtosis is less than 0. The skewness of the distribution usually does not have a sizable effect on the F statistic. If the n per cell is fairly large, then deviations from normality do not matter much at all because of the central limit theorem, according to which the sampling distribution of the mean approximates the normal distribution, regardless of the distribution of the variable in the population. A detailed discussion of the robustness of the F statistic can be found in Box and Anderson (1955), or Lindman (1974).
Homogeneity of Variances
Assumptions. It is assumed that the variances in the different groups of the design are identical; this assumption is called the homogeneity of variances assumption. Remember that at the beginning of this section we computed the error variance (SS error) by adding up the sums of squares within each group. If the variances in the two groups are different from each other, then adding the two together is not appropriate, and will not yield an estimate of the common within-group variance (since no common variance exists).
Effects of violations. Lindman (1974, p. 33) shows that the F statistic is quite robust against violations of this assumption (heterogeneity of variances; see also Box, 1954a, 1954b; Hsu, 1938).
Special case: correlated means and variances. However, one instance when the F statistic is very misleading is when the means are correlated with variances across cells of the design. A scatterplot of variances or standard deviations against the means will detect such correlations. The reason why this is a “dangerous” violation is the following: Imagine that we have 8 cells in the design, 7 with about equal means but one with a much higher mean. The F statistic may suggest a statistically significant effect. However, suppose that there also is a much larger variance in the cell with the highest mean, that is, the means and the variances are correlated across cells (the higher the mean the larger the variance). In that case, the high mean in the one cell is actually quite unreliable, as is indicated by the large variance. However, because the overall F statistic is based on a pooled within-cell variance estimate, the high mean is identified as significantly different from the others, when in fact it is not at all significantly different if we based the test on the within-cell variance in that cell alone.
This pattern – a high mean and a large variance in one cell – frequently occurs when there are outliers present in the data. One or two extreme cases in a cell with only 10 cases can greatly bias the mean, and will dramatically increase the variance.
Homogeneity of Variances and Covariances
Assumptions. In multivariate designs, with multiple dependent measures, the homogeneity of variances assumption described earlier also applies. However, since there are multiple dependent variables, it is also required that their intercorrelations (covariances) are homogeneous across the cells of the design. There are various specific tests of this assumption.
Effects of violations. The multivariate equivalent of the F test is Wilks’ lambda. Not much is known about the robustness of Wilks’ lambda to violations of this assumption. However, because the interpretation of MANOVA results usually rests on the interpretation of significant univariate effects (after the overall test is significant), the above discussion concerning univariate ANOVA basically applies, and important significant univariate effects should be carefully scrutinized.
Special case: ANCOVA. A special serious violation of the homogeneity of variances/covariances assumption may occur when covariates are involved in the design. Specifically, if the correlations of the covariates with the dependent measure(s) are very different in different cells of the design, gross misinterpretations of results may occur. Remember that in ANCOVA, we in essence perform a regression analysis within each cell to partition out the variance component due to the covariates. The homogeneity of variances/covariances assumption implies that we perform this regression analysis subject to the constraint that all regression equations (slopes) across the cells of the design are the same. If this is not the case, serious biases may occur. There are specific tests of this assumption, and it is advisable to look at those tests to ensure that the regression equations in different cells are approximately the same.
Sphericity and Compound Symmetry
Reasons for Using the Multivariate Approach to Repeated Measures ANOVA. In repeated measures ANOVA containing repeated measures factors with more than two levels, additional special assumptions enter the picture: The compound symmetry assumption and the assumption of sphericity. Because these assumptions rarely hold (see below), the MANOVA approach to repeated measures ANOVA has gained popularity in recent years (both tests are automatically computed in ANOVA/MANOVA). The compound symmetry assumption requires that the variances (pooled within-group) and covariances (across subjects) of the different repeated measures are homogeneous (identical). This is a sufficient condition for the univariate F test for repeated measures to be valid (i.e., for the reported F values to actually follow the F distribution). However, it is not a necessary condition. The sphericity assumption is a necessary and sufficient condition for the F test to be valid; it states that the within-subject “model” consists of independent (orthogonal) components. The nature of these assumptions, and the effects of violations are usually not well-described in ANOVA textbooks; in the following paragraphs we will try to clarify this matter and explain what it means when the results of the univariate approach differ from the multivariate approach to repeated measures ANOVA.
The necessity of independent hypotheses. One general way of looking at ANOVA is to consider it a model fitting procedure. In a sense we bring to our data a set of a priori hypotheses; we then partition the variance (test main effects, interactions) to test those hypotheses. Computationally, this approach translates into generating a set of contrasts (comparisons between means in the design) that specify the main effect and interaction hypotheses. However, if these contrasts are not independent of each other, then the partitioning of variances runs afoul. For example, if two contrasts A and B are identical to each other and we partition out their components from the total variance, then we take the same thing out twice. Intuitively, specifying the two (not independent) hypotheses “the mean in Cell 1 is higher than the mean in Cell 2” and “the mean in Cell 1 is higher than the mean in Cell 2” is silly and simply makes no sense. Thus, hypotheses must be independent of each other, or orthogonal (the term orthogonality was first used by Yates, 1933).
Independent hypotheses in repeated measures. The general algorithm implemented will attempt to generate, for each effect, a set of independent (orthogonal) contrasts. In repeated measures ANOVA, these contrasts specify a set of hypotheses about differences between the levels of the repeated measures factor. However, if these differences are correlated across subjects, then the resulting contrasts are no longer independent. For example, in a study where we measured learning at three times during the experimental session, it may happen that the changes from time 1 to time 2 are negatively correlated with the changes from time 2 to time 3: subjects who learn most of the material between time 1 and time 2 improve less from time 2 to time 3. In fact, in most instances where a repeated measures ANOVA is used, we would probably suspect that the changes across levels are correlated across subjects. However, when this happens, the compound symmetry and sphericity assumptions have been violated, and independent contrasts cannot be computed.
Effects of violations and remedies. When the compound symmetry or sphericity assumptions have been violated, the univariate ANOVA table will give erroneous results. Before multivariate procedures were well understood, various approximations were introduced to compensate for the violations (e.g., Greenhouse & Geisser, 1959; Huynh & Feldt, 1970), and these techniques are still widely used.
MANOVA approach to repeated measures. To summarize, the problem of compound symmetry and sphericity pertains to the fact that multiple contrasts involved in testing repeated measures effects (with more than two levels) are not independent of each other. However, they do not need to be independent of each other if we use multivariate criteria to simultaneously test the statistical significance of the two or more repeated measures contrasts. This “insight” is the reason why MANOVA methods are increasingly applied to test the significance of univariate repeated measures factors with more than two levels. We wholeheartedly endorse this approach because it simply bypasses the assumption of compound symmetry and sphericity altogether.
Cases when the MANOVA approach cannot be used. There are instances (designs) when the MANOVA approach cannot be applied; specifically, when there are few subjects in the design and many levels on the repeated measures factor, there may not be enough degrees of freedom to perform the multivariate analysis. For example, if we have 12 subjects and p = 4 repeated measures factors, each at k = 3 levels, then the four-way interaction would “consume” (k-1)p = 24 = 16 degrees of freedom. However, we have only 12 subjects, so in this instance the multivariate test cannot be performed.
Differences in univariate and multivariate results. Anyone whose research involves extensive repeated measures designs has seen cases when the univariate approach to repeated measures ANOVA gives clearly different results from the multivariate approach. To repeat the point, this means that the differences between the levels of the respective repeated measures factors are in some way correlated across subjects. Sometimes, this insight by itself is of considerable interest.
Methods for Analysis of Variance
Several sections in this online textbook discuss methods for performing analysis of variance. Although many of the available statistics overlap in the different sections, each is best suited for particular applications.
General ANCOVA/MANCOVA: This section includes discussions of full factorial designs, repeated measures designs, multivariate design (MANOVA), designs with balanced nesting (designs can be unbalanced, i.e., have unequal n), for evaluating planned and post-hoc comparisons, etc.
General Linear Models: This extremely comprehensive section discusses a complete implementation of the general linear model, and describes the sigma-restricted as well as the overparameterized approach. This section includes information on incomplete designs, complex analysis of covariance designs, nested designs (balanced or unbalanced), mixed model ANOVA designs (with random effects), and huge balanced ANOVA designs (efficiently). It also contains descriptions of six types of Sums of Squares.
General Regression Models: This section discusses the between subject designs and multivariate designs that are appropriate for stepwise regression as well as discussing how to perform stepwise and best-subset model building (for continuous as well as categorical predictors).
Mixed ANCOVA and Variance Components: This section includes discussions of experiments with random effects (mixed model ANOVA), estimating variance components for random effects, or large main effect designs (e.g., with factors with over 100 levels) with or without random effects, or large designs with many factors, when we do not need to estimate all interactions.
Experimental Design (DOE): This section includes discussions of standard experimental designs for industrial/manufacturing applications, including 2**(k-p) and 3**(k-p) designs, central composite and non-factorial designs, designs for mixtures, D and A optimal designs, and designs for arbitrarily constrained experimental regions.
Repeatability and Reproducibility Analysis (in the Process Analysis section): This topic in the Process Analysis section includes a discussion of specialized designs for evaluating the reliability and precision of measurement systems; these designs usually include two or three random factors, and specialized statistics can be computed for evaluating the quality of a measurement system (typically in industrial/manufacturing applications).
Breakdown Tables (in the Basic Statistics section): This topic includes discussions of experiments with only one factor (and many levels), or with multiple factors, when a complete ANOVA table is not required.
Specifying the General ANOVA/MANOVA Analysis
General ANOVA/MANOVA Startup Panel and Quick Tab
Ribbon bar. Select the Statistics tab. In the Base group, click ANOVA to display the General ANOVA/MANOVA Startup Panel.
Classic menus. Select ANOVA from the Statistics menu to display the General ANOVA/MANOVA Startup Panel.
The Startup Panel contains one tab, Quick, which contains options to select the desired method of analysis (see also, General ANOVA/MANOVA – Index). In order to perform a General ANOVA/MANOVA analysis, a data file must be selected at this point.
For more information, refer to the Introductory Overview. See also General ANOVA/MANOVA – Index or Methods for Analysis of Variance. For related ANOVA and regression methods, refer also to GLM, DOE and Variance Components.
OK. Click the OK button to display an analysis specification dialog box, which differs depending on the Specification method selected on the Quick tab.
Cancel. Click the Cancel button to close the Startup Panel without performing an analysis.
Options. Click the Options button to display the following menu commands:
Output. Select Output to display the Analysis/Graph Output Manager dialog box, which contains options to customize the current analysis output management of STATISTICA.
Display. Select Display to display the Analysis/Graph Display Options dialog box, which contains options to customize the current analysis display of STATISTICA.
Create Macro. Select Create Macro to display the New Macro dialog box. When running analyses in STATISTICA, all options and output choices are automatically recorded; when you click Create Macro, the complete recording of all your actions will be translated into a STATISTICA Visual Basic program that can be run to recreate the analysis. See Macro (STATISTICA Visual Basic) Overview for further details.
Close Analysis. Select Close Analysis to close all dialog boxes associated with the analysis. Note that results spreadsheets/graphs will not be closed, only analysis dialogs will close.
Open Data. Click the Open Data button to display the Select Data Source dialog box, which contains options to choose the spreadsheet on which to perform the analysis. The Select Data Source dialog box contains a list of the spreadsheets that are currently active.
Select Cases. Click the Select Cases button to display the Analysis/Graph Case Selection Conditions dialog box, which is used to create conditions for which cases will be included (or excluded) in the current analysis. More information is available in the case selection conditions overview, syntax summary, and dialog box description.
W. Click the W (Weight) button to display the Analysis/Graph Case Weights dialog box, which contains options to adjust the contribution of individual cases to the outcome of the current analysis by “weighting” those cases in proportion to the values of a selected variable.
Weighted moments. Select the Weighted moments check box to specify that each observation contributes the weighting variable’s value for that observation. The weight values need not be integers. This module can use fractional case weights in most computations. Some other modules use case weights as integer case multipliers or frequency values. This check box will only be available after you have defined a weight variable via the W option above.
DF = W-1 / N-1. When the Weighted moments check box is selected, moment statistics (e.g., mean, variance) can be based on the sum of the weight values for the weighting variable (W-1), or on the number of (unweighted) observations (N-1). The sums of squares and cross products will always be based on the weighted values of the respective observations. However, in computations requiring the degrees of freedom (e.g., standard deviation, ANOVA tables), the value for the degrees of freedom can either be computed based on the sum of the weight values, or based on the number of observations. Moment statistics are based on the sum of the weight values for the weighting variable if the W-1 option button is selected, and are based on the number of (unweighted) observations if the N-1 options button is selected. For more information on options for using integer case weights, see also Define Weight.
Quick Tab
The Quick tab of the General ANOVA/MANOVA Startup Panel contains options to select the desired method of analysis. In order to perform a General ANOVA/MANOVA analysis, a data file must be selected at this point.
This tab presents a list of common experimental analysis designs (in the Type of analysis list; see also Methods for Analysis of Variance), and provides access to the three different user interfaces available in the STATISTICA General ANOVA/MANOVA module via the Specification method list [these user interfaces are also available in the STATISTICA General Linear Models (GLM), General Regression Models (GRM), Generalized Linear/Nonlinear Models (GLZ), and General Partial Least Squares Models (PLS) modules].
Type of analysis. The Type of Analysis list presents four choices for the type of ANOVA/MANOVA analysis model (see the Introductory Overview). Select the type of design that you want to perform. For more information on a particular type of analysis, click on the link below.
Specification method. The Specification method list presents the three alternative user interfaces available in ANOVA/MANOVA. You can choose among the three different user interfaces in the Specification method list only when Repeated measures ANOVA is selected as the Type of analysis. When any other Type of analysis is selected, only the Quick specs dialog is available.
Quick specs dialog. Select Quick specs dialog to use the respective Quick Specs dialog box corresponding to the selection in the Type of analysis box. The Quick Specs dialog box will prompt you to select dependent variable(s) and categorical predictor variables (depending on the selection in the Type of analysis box), and construct a default model. Use the options on the Quick Specs dialog box – Options tab to modify various computational specifications, or click the Syntax editor button to further customize the model via command syntax (see Analysis Syntax).
Analysis Wizard. Select Analysis Wizard to use a sequence of dialog boxes that will guide you through the steps for specifying an analysis. At the conclusion of the sequence of dialog boxes, you can either compute the results or click the Syntax editor button to further customize the model via command syntax, open an existing file with command syntax, or save syntax in a file for future repetitive use. This option is only available when Repeated measures ANOVA is selected as the Type of analysis.
Analysis syntax editor. Select Analysis syntax editor to specify a model via the MAN Analysis Syntax Editor dialog. That dialog provides various options for specifying designs and for modifying various parameters of the computational procedures. You can also open an existing text file with command syntax, or save syntax in a file for future repetitive use. Refer to the description of the Analysis Syntax Editor dialog box, and the description of the MANOVA Syntax for additional details. This option is only available when Repeated measures ANOVA is selected as the Type of analysis.
Note: between-groups designs. In order to specify a between-groups design, at least one dependent variable must be selected, at least one categorical predictor (grouping variable) must be selected, and at least two independent variable codes must be specified (via the Factor codes button, which can be found on both the ANOVA/MANOVA Quick Specs – Quick tab and the MAN Analysis Wizard Extended Options – Quick tab) for each between-groups factor. Note that if you do not explicitly specify the codes, by default, STATISTICA will use as codes all values encountered in the specified independent variables.
Note: within-groups designs. In order to specify a repeated measures factor design, at least two dependent variables must be selected, and the repeated measures factor has to be identified via the Within effects button available on the ANOVA/MANOVA Quick Specs – Quick tab. Multiple dependent variables that cannot be interpreted (by STATISTICA, given the design you specified) as levels of repeated measures factors are interpreted as multiple dependent variables in a MANOVA design (this will occur if, for example, you select two or more dependent variables and do not define them as repeated measures, or whenever you select more dependent variables than can be accounted for by the currently defined repeated measure factor and its levels). Note that if you have multiple repeated measures factors, you must use the GLM module.
Note: Empty Cell Designs. ANOVA/MANOVA will automatically handle designs with empty cells. To analyze Latin squares, Greco-Latin squares, or other balanced incomplete designs, simply specify them as if they were complete factorial designs. Then specify the design as a Main effects ANOVA, to estimate the main effects. In order to analyze unbalanced missing cell designs, or complex “messy” designs (as, for example, discussed in Milliken & Johnson, 1992) choose the appropriate type of method for constructing hypotheses by selecting one of the Sums of squares options from the ANOVA/MANOVA Quick Specs – Options tab; for a detailed discussion of how to analyze such designs, refer to the GLM Six types of sums of squares topic.
Note: Huge Balanced ANOVA Designs. Most between ANOVA designs can be analyzed much more efficiently when they are balanced, i.e., when all cells in the ANOVA design have equal N, when there are no missing cells in the design. STATISTICA GLM contains an option to “instruct” the program that the design is balanced, and that the more efficient computational methods can be used. Even very large designs with effects with degrees of freedom in the hundreds can thus be analyzed in mere seconds, while the general computational procedures (that do not assume a balanced design) may take several minutes to accomplish the same. See Efficient Computations for (Huge) Balanced ANOVA Designs in the GLM Introductory Overview for additional details.
One-way ANOVA in General ANOVA/MANOVA
Select One-way ANOVA from the General ANOVA/MANOVA Startup Panel – Quick tab to specify one-way ANOVA or one-way MANOVA designs. In one-way experimental designs, the effect of a single grouping variable (e.g., Gender: Male vs. Female) on one or more dependent variables can be evaluated.
Main Effects ANOVA in General ANOVA/MANOVA
Select Main effects ANOVA from the General ANOVA/MANOVA Startup Panel – Quick tab to specify main effects only ANOVA or MANOVA designs. In the subsequent Quick specs dialog box, you can specify up to four categorical predictor variables. STATISTICA will estimate and evaluate the main-effects-only model. Those types of models are frequently used in the area of industrial experimentation to screen large numbers of factors in highly fractionalized designs (e.g., 2-level screening designs; see Experimental Design). This option should also be chosen when you want to analyze balanced incomplete (nested) designs, when only the main effects can be estimated. Note that if you want to select five or more categorical predictor variables, you must use the General Linear Models module.
Factorial ANOVA in General ANOVA/MANOVA
Select Factorial ANOVA from the General ANOVA/MANOVA Startup Panel – Quick tab to specify factorial ANOVA or MANOVA designs. In the subsequent Quick specs dialog box, you can specify up to four categorical predictor variables, and specify either a full factorial ANOVA (MANOVA) model, or a custom factorial design that includes terms to a user-specified factorial degree (e.g., main effects and two-way interactions only). Those types of models are frequently used in the area of industrial experimentation, where fractional factorial designs (e.g., 2(k-p) fractional factorial designs or 3(k-p) and Box Behnken designs) are commonly used to evaluate many factors and their lower-order interactions in few experimental runs (observations). Note that if you want to select five or more categorical predictor variables, you must use the General Linear Models module.
Repeated Measures ANOVA in General ANOVA/MANOVA
Select Repeated measures ANOVA from the General ANOVA/MANOVA Startup Panel – Quick tab to specify designs with repeated measures. In the subsequent Quick specs dialog box, you can specify up to four categorical predictor variables, and two or more dependent variables that will be interpreted by the program as repeated measures (e.g., the scores on a test taken at Time 1 and Time 2). The Quick Specs dialog box contains options for specifying the within-subject (repeated measures) design, and you can specify univariate designs with a single factor or measurement that is measured repeatedly. Note that if you want to select five or more categorical predictor variables or multiple within-subject (repeated measures) factors, you must use the General Linear Models module.
The specification of repeated measures designs is further described in the context of the Quick Specs dialog box, as well as the MAN Analysis Syntax Editor. These designs are also discussed in Notes.
ANOVA/MANOVA Quick Specs
Select Quick specs dialog from the Specification method list and specify a Type of analysis on the General ANOVA/MANOVA Startup Panel – Quick tab. Then click the OK button to display the ANOVA/MANOVA Quick Specs dialog box for the type of analysis selected. The dialog box contains two tabs: Quick and Options. Note that the title of this dialog box will always reflect the Type of analysis that has been selected in the Startup Panel. Use the options on the Quick tab to select and specify the types of variables for your general ANOVA/MANOVA model. Use the options on the Options tab to specify the method of parameterization, type of sums of squares, and a cross validation variable.
OK. Click the OK button after you have specified your design to display the ANOVA Results dialog box.
Cancel. Click the Cancel button to return to the General ANOVA/MANOVA Startup Panel.
Options. Click the Options button to display the Options menu.
Syntax editor. Click the Syntax editor button to display the MAN Analysis Syntax Editor; all current specifications (e.g., variable selections, design specifications) on the Quick specs dialog will automatically be transferred (translated) to the syntax editor, and can be further modified or saved. However, note that when you click the < Back button in the MAN Analysis Syntax Editor after introducing additional customizations, those customizations specified on the MAN Analysis Syntax Editor will not be translated “back” to the Quick specs dialog.
ANOVA/MANOVA Quick Specs – Options Tab
Select the Options tab of the ANOVA/MANOVA Quick Specs dialog box to access the options described here.
Sweep delta. Enter the negative exponent for a base-10 constant delta (delta = 10-sdelta) in the Sweep delta field; the default value is 7. Delta is used (1) in sweeping, to detect redundant columns in the design matrix, and (2) for evaluating the estimability of hypotheses; specifically a value of 2*delta is used for the estimability check.
Inverse delta. Enter the negative exponent for a base-10 constant delta (delta = 10-idelta) in the Inverse delta field; the default value is 12. Delta for matrix inversion is used to check for matrix singularity in matrix inversion calculations.
Parameterization. Select the type of parameterization options you want to use for your general ANOVA/MANOVA model in the Parameterization group box.
Sigma-restricted. Select the Sigma-restricted check box to compute the design matrix for categorical predictors in the model based on sigma-restricted coding; if it is not selected, the overparameterized model will be used. The sigma-restricted model is the default parameterization; see the GLM Introductory Overview topic The Sigma-Restricted vs. Overparameterized Model for details.
No intercept. Select the No intercept check box to exclude the intercept from the model.
Lack of fit. Select the Lack of fit check box to compute the sums of squares for the pure error, i.e., the sums of squares within all unique combinations of values for the (categorical) predictor variables. On the ANOVA Results dialog box, options are available to test the lack-of-fit hypothesis. Note that in large designs with continuous predictors, the computations necessary to estimate the pure error can be very time consuming. See the GLM Introductory Overview topic Lack-of-Fit Tests Using Pure Error for a discussion of lack-of-fit tests and pure error; see also Experimental Design.
Cross-validation. Click the Cross-validation button to display the Cross-Validation dialog box for specifying a categorical variable and a (code) value to identify observations that should be included in the computations for fitting the model (the analysis sample); all other observations with valid data for all predictor variables and dependent variables will automatically be classified as belonging to the validation sample (see the Residuals tab for a description of the available residual statistics for observations in the validation sample); note that all observations with valid data for all predictor variables but missing data for the dependent variables will automatically be classified as belonging to the prediction sample (see the Residuals tab topic for a description of available statistics for the prediction sample).
Sums of squares. Select the method for constructing main effect and interaction hypotheses in unbalanced and incomplete designs in the Sums of squares group box. These methods are discussed in the GLM Introductory Overview. For the sigma-restricted model the default value is Type VI (unique or effective hypothesis decomposition; see Hocking, 1985) and Type IV is not valid; for the overparameterized model the default value is Type III (orthogonal; see Goodnight, 1980), and Type VI is not valid.
ANOVA/MANOVA Quick Specs – Quick Tab
Select the Quick tab of the ANOVA/MANOVA Quick Specs dialog box to access the options described here. The specific options that are available in this dialog box depend on the Type of analysis selected on the Quick tab of the Startup Panel. For alternative ways of specifying designs in ANOVA, see Methods for Specifying Designs.
Variables. Click the Variables button to display the standard variable selection dialog box. Depending on the Type of analysis selected on the General ANOVA/MANOVA Startup Panel – Quick tab, STATISTICA will prompt you to select one or more dependent variables and up to four categorical predictor variables (grouping variables or factors in the design). For example, if you select One-way ANOVA from the General ANOVA/MANOVA Startup Panel – Quick tab, you will be prompted to enter one or more dependent variables and a single predictor variable. Note that in the ANOVA module, you can only specify up to four categorical predictors. If you want to specify five or more categorical predictors, use the General Linear Models module. For details concerning different types of designs, and the distinction between continuous and categorical predictor variables, see the GLM Introductory Overview.
Within effects. Click the Within effects button to display the Specify Within-Subjects Factor dialog box. In this dialog box, specify the within-subject (repeated measures) factor and the respective number of levels (see also, Notes for a discussion of repeated measures ANOVA). In short, the variables in the dependent variable list are assigned to levels for the repeated measures factor. Note that in the ANOVA module, you can only specify one within-subject (repeated measures) factor. If you want to specify multiple repeated measures factors, use the General Linear Models module. This button is only available if you have selected Repeated measures ANOVA as the Type of analysis on the General ANOVA/MANOVA Startup Panel – Quick tab.
Factor codes. Click the Factor codes button to display the Select Codes for Indep. Vars (Factors) dialog box for selecting the codes identifying the levels for the categorical predictor variables (grouping variables). Codes must be integer values or text labels (but can be dates, times, etc.), and at least two codes must be specified for each categorical predictor variable.
Between effects. Click the Between Effects button to display the GLM Between Effects dialog box. In this dialog box, specify the factorial degree of your design. This button is only available if you have selected Factorial ANOVA or Repeated measures ANOVA as the Type of analysis on the General ANOVA/MANOVA Startup Panel – Quick tab.
If the design for the categorical predictor variables needs to be customized, click the Syntax editor button (in the ANOVA/MANOVA Quick Specs dialog box) to further customize the model via command syntax. For alternative ways of specifying designs in ANOVA/MANOVA, see Methods for Specifying Designs.
Specify Within-Subjects Factor
Click the Within Effects button on the ANOVA/MANOVA Quick Specs dialog box – Quick tab to display the Specify within-in subjects factor dialog box
In this dialog box, specify the within-subject (repeated measures) factor and the respective number of levels (see also the GLM Introductory Overview for a discussion of repeated measures ANOVA). In short, the variables in the dependent variable list are assigned to levels for the repeated measures factor. Note that with the ANOVA module, you can specify only one within-subject (repeated measures) factor. If you want to specify multiple repeated measures factors, use the General Linear Models module.
If the factor specified in this dialog box does not account for all of the previously selected dependent variables (via the Variables button), a MANOVA will be performed. See Multiple Dependent Measures (MANOVA) and ANOVA/MANOVA Notes – Specifying Within-Subjects Univariate Designs for further details.
No. of levels. Enter the Number of levels of your within-subject (repeated measures) factor.
Factor Name. Enter the Factor name of your within-subject (repeated measures) factor.
OK. Click the OK button after you have specified your within-subject (repeated measures) factor to return to the ANOVA/MANOVA Quick Specs – Quick tab.
Cancel. Click the cancel button to return to the ANOVA/MANOVA Quick Specs – Quick tab, ignoring any changes that you have made in this dialog box.
Select Codes for …
The Select Codes for … dialog box is displayed whenever you need to select codes for grouping variables (i.e., specific values of a variable to be used to identify group membership of cases or observations). It will display the grouping variable name and then prompt you to enter the codes for that variable (e.g., 1, Male, “#52“).
You can either type specific codes in the window, or enter an * (asterisk) to accept all of the codes available for the variable. When entering text codes in this window, you can save time by typing the codes in the edit field with lower case letters and STATISTICA will automatically convert them to upper case letters (e.g., smith will be converted to SMITH). However, since STATISTICA distinguishes between lower and upper case letters in codes, be sure to place any code that needs to remain in lower case letters in single or double quotes (e.g., ‘Smith’, “Pepsi”, “StatSoft”, see below).
The following conventions apply when entering code names:
1. If codes consist of only uppercase letters and numbers (e.g., ITEM9, MALE) but does not start with a number, then the code will be displayed in the edit field without single or double quotation marks around them.
2. Codes that have been entered in the spreadsheet in upper and lower case or only lower case letters (e.g., ‘Male’, “test1”) must have quotation marks around them (single or double) in this edit window in order to preserve the upper and lower case character formatting.
3. Codes that start with a number or a character other than a letter (e.g., “=shift3”, ’39lbs’, “49°C”, “15-Apr”) must have quotation marks (single or double) around them.
4. Date values will be displayed in quotation marks (single or double) if the variable’s format is set to Date (see Variable).
For information on using code values greater than 32,000 and using dates as codes, see Codes (Values of Grouping Variables).
Shortcut. If you leave all fields in this dialog box empty (blank) and click OK, STATISTICA will identify and automatically use all available codes (for the previously specified grouping variables). Also, the same effect will be achieved if you do not enter this dialog box but click the OK button in the previous (design specification) dialog box without explicitly specifying any codes.
All. Select all of the variable’s values by clicking the All button. When you click this button, STATISTICA will automatically search the variable values (taking into account any case selection conditions) and enter all the variable’s codes in the edit window. Alternatively, to select all codes for a variable, click the OK button (leaving the edit window blank) or enter an * (asterisk) in the edit window.
Zoom. If you want to view the values of the variable before selecting specific codes, click the View button to display the Labels/Stats Window. You can browse through a table of sorted variable values in this window (all values will be shown regardless of any currently specified case selection conditions).
Select All. This button is available when you have more than one variable listed in this dialog box. If you want to automatically enter all of the codes for each of the variables listed (taking into account any case selection conditions), then click the Select All button. You can also select all codes for each variable by clicking the OK button before entering any codes or clicking any other buttons.
Reviewing General ANOVA/MANOVA Results
ANOVA Results
Click the OK button in the ANOVA/MANOVA Quick Specs dialog box to display the ANOVA Results dialog box, which can contain as many as eight tabs: Quick, Summary, Means, Comps, Profiler, Resids, Matrix, and Report. Note that this dialog box will also be displayed when you click the OK (Run) button in the MAN Analysis Wizard Between Design dialog box, the MAN Analysis Wizard Extended Options dialog box, and the MANOVA Analysis Syntax Editor.
Use the options on the Quick tab to produce summaries of the main results, for example, ANOVA tables, parameter estimates, etc.
The Summary tab contains options to produce additional summaries of the main results, for example, R-square, descriptive statistics, etc.
The Means tab contains options to compute (1) observed unweighted means, (2) observed weighted means, and (3) least squares (predicted) means; tables of marginal means can be displayed in spreadsheets, or summarized in graphs (with or without confidence intervals).
The Comps tab contains options to test specific hypotheses about linear combinations of (least squares) means (planned comparisons).
The Profiler tab contains options to compute and display desirability profiles for combinations of multiple variables.
The Resids tab contains options to compute predicted values and detailed residual statistics (e.g., residual, deleted residuals, Mahalanobis distance, leverage values, DFFITS values, etc.).
The Matrix tab contains options to review various matrices involved in the computations of the main results, as well as detailed collinearity statistics and partial and semi-partial (or part) correlations.
Finally, the Report tab contains options to send results to a report.
More results. Click the More results button to display the ANOVA More Results dialog box with additional tabs and options.
Modify. Click the Modify button to return to the previous dialog box for the respective analysis (see Methods for Specifying Designs).
Close. Click the Close button to close the results dialog box.
Options. Click the Options button to display the Options menu.
By Group. Click the By Group button to display the By Group specification dialog box.
ANOVA Results – Quick Tab
Select the Quick tab of the ANOVA Results dialog box to access options to display the main results for the current analysis.
All effects/Graphs. Click the All effects/Graphs button to display the Table of all Effects dialog box. This dialog box shows the summary ANOVA (MANOVA) table for all effects; you can then select an effect and produce a spreadsheet or plot of the observed unweighted, observed weighted, and least squares means. Refer also to the description of the options on the Means tab for details concerning the different means computed by the program, and their standard errors.
All effects. Click the All effects button to create a spreadsheet with the ANOVA (MANOVA) table for all effects. If the design is univariate iature (involves only a single dependent variable), then the univariate results ANOVA table will be displayed; the univariate results ANOVA table is also displayed for univariate repeated measures designs; if the design is multivariate iature, then the multivariate results MANOVA table will be displayed, showing the statistics as selected in the Multivariate tests box (via the Summary tab). For a discussion of the different types of designs, and how the respective ANOVA/MANOVA tables are computed, see the GLM Introductory Overview.
Effect sizes. Click the Effect sizes button to create a spreadsheet with the ANOVA (MANOVA) table for all effects and the effect sizes and powers (i.e., Partial eta-squared, Non-centrality, and Observed power). Partial eta-squared is the proportion of the variability in the dependent variables that is explained by the effect. The Non-centrality value is the main statistic used to compute power, and the Power column contains the power values of the significant test on the effect. The ANOVA (MANOVA) table is described above, see All effects.
Alpha values. The values in the Alpha values group box are used in all results spreadsheets and graphs, whenever a confidence limit is to be computed, or a particular result to be highlighted based its statistical significance.
Confidence limits. Enter a value in the Confidence limits field to be used for constructing confidence limits in the respective results spreadsheets or graphs (e.g., spreadsheet of parameter estimates, graph of means); by default 95% confidence limits will be constructed.
Significance level. Enter a value in the Significance level field to be used for all spreadsheets and graphs, where statistically significant results are to be highlighted (e.g., in the All effects spreadsheet); by default all results significant at the p < .05 level will be highlighted.
GLM and ANOVA Results – Summary Tab
Select the Summary tab of the GLM Results or the ANOVA Results dialogs to access options to display the main results for the current analysis. Depending on the type of design, whether or not random effects are present in the design, whether there are categorical predictor variables in the design, and/or whether there are within-subject (repeated measures) in the design, some of the options described below may not be available on the Summary tab. For instance, if you select Huge balanced ANOVA from the GLM Startup Panel – Quick tab or specify SSTYPE=BALANCED in the GLM (STATISTICA) syntax, several advanced options are not available in the results dialog box (due to the computational shortcuts employed in order to efficiently analyze huge balanced designs; see also Balanced ANOVA in the Introductory Overview for details).
All effects/Graphs. Click the All effects/Graphs button to display the Table of All Effects, which shows the summary ANOVA (MANOVA) table for all effects; you can then select an effect and produce a spreadsheet or graph of the observed unweighted, observed weighted, and least squares means. Refer also to the description of the options on the Means tab for details concerning the different means computed by STATISTICA, and their standard errors. This button is available only if 1) the current design includes categorical predictor variables or within-subject (repeated measures) effects, and 2) if there are random effects in the current design, there is only a single dependent variable (multivariate results for mixed-model designs cannot be computed).
All effects. Click the All effects button to display a spreadsheet with the ANOVA (MANOVA) table for all effects. If the design is univariate iature (involves only a single dependent variable), the univariate results ANOVA table will be produced; the univariate results ANOVA table is also produced for univariate repeated measures designs (where appropriate, multivariate tests for repeated measures can be computed via the Within effects options, see below); if the design is multivariate iature, the multivariate results MANOVA table will be displayed, showing the statistics as selected in the Multivariate. tests group box, see below; if the design includes random effects and multiple dependent variables, multiple univariate ANOVA tables (spreadsheets) will be created, one for each dependent variable (in that case, the tests reported in the ANOVA table will use synthesized error terms). For a discussion of the different types of designs, and how the respective ANOVA/MANOVA tables are computed, see the Introductory Overview.
Univariate results. Click the Univariate results button to create a spreadsheet with the standard univariate ANOVA table for each dependent variable, regardless of whether the design includes within-subject (repeated measures factors). To review the univariate results for the within-subject design, use option Univ. tests in the Within effects box (see below). If analyzing a Mixed Model ANOVA, the results generated by clicking the Univarate results button are fixed effects results. These results serve as a comparison to the results with random effects generated by clicking the All effects button.
Cell statistics. Click the Cell statistics button to create a spreadsheet of the descriptive statistics for each cell in the design; specifically, descriptive statistics are computed for the dependent variables, as well as any continuous predictors (covariates) in the design, for each column of the overparameterized design matrix for categorical effects. Thus, marginal means and standard deviations are available for each categorical effect in the design. Note that for lower-order effects (e.g., main-effects in designs that also contain interactions involving the main effects), the reported means are weighted marginal means, and as such estimates of the weighted population marginal means (for details, see, for example, Milliken and Johnson, 1984, page 132; see also the discussion of means in the description of the options on the Means tab). Least squares means (e.g., see Searle, 1987) can be computed on the Means tab, or via the All effects/Graphs option (see above); usually, in factorial designs, it is the least squares means that should be reviewed when interpreting significant effects from the ANOVA or MANOVA.
Between effects. The options in the Between effects group box allow you to review, as appropriate for the given design, various results statistics for the between-group design such as Design terms, Whole model R, Coefficients, and Estimate. For specific details on these tests/options, see Summary Results for Between Effects in GLM and ANOVA.
Within effects. The options in the Within effects group box allow you to review, as appropriate for the given design, various results statistics for the within-subject (repeated measures) design such as Multivariate tests, Univariate. tests, G-G and H-F tests, Effect SSCPs, Sphericity, Error SSCPs, and Error Corrs. For specific details on these tests/options, see Summary Results for Within Effects in GLM and ANOVA. If the current design does not include within-subject (repeated measures) factors, these options are not displayed on this tab.
Random effects. The options in the Random effects group box allow you to display the results related to the analysis of the random effects in the model such as Variance components, Expected mean squares, Bar plot, Denominator synthesis, and Pie chart. This is only available in GLM, not in ANOVA. For specific details on these tests/options, see Summary Results for Random Effects in GLM.
Multiv. tests. In the Multiv. tests group box you can select the specific multivariate test statistics that are to be reported in the respective results spreadsheets. For descriptions of the different multivariate tests statistics, refer to the GLM Introductory Overview topic Multivariate Designs. These options are only available if the current design is multivariate iature, i.e., if there are multiple dependent measures, or a within-subject (repeated measures) design with effects that have more than 2 levels (and hence, multivariate tests for those effects can be computed).
Alpha values. Use the Alpha values group box to specify Confidence limits and Significance level values. These values are used in all results spreadsheets and graphs whenever a confidence limit is to be computed or a particular result is to be highlighted based its statistical significance.
Confidence limits. Enter the value to be used for constructing confidence limits in the respective results spreadsheets or graphs (e.g., spreadsheet of parameter estimates, graph of means) in the Confidence limits field. By default 95% confidence limits will be constructed.
Significance level. Enter the value to be used for all spreadsheets and graphs where statistically significant results are to be highlighted (e.g., in the All effects spreadsheet) in the Significance level field. By default all results significant at the p<.05 level will be highlighted.
GLM, GRM, and ANOVA Results – Means Tab
Select the Means tab of the GLM Results, GLZ Results, GRM Results, or the ANOVA Results dialog boxes to access options to display the means for any effect containing categorical predictor variables only, or for repeated measures effects. If there are no categorical effects or repeated measures effects in the model, these options are not available.
Effect. Select the desired effect in the Effect drop-down list, and then select to display or plot either the Observed, unweighted; Observed, weighted; or Least squares means. You can also display the means (unweighted, weighted, or least squares) for all categorical effects by clicking the respective All marginal tables buttons (see below).
Observed, unweighted. Click the Observed, unweighted button to produce a spreadsheet of the observed unweighted means for the selected Effect (see above). These are computed by averaging the means across the levels and combinations of levels of the factors not used in the marginal means table (or plot), and then dividing by the number of means in the average. Thus, each mean that is averaged to compute a marginal mean is implicitly assigned the same weight, regardless of the number of observations on which the respective mean is based. The resulting estimate is an unbiased estimate of m-bar (mu-bar), the population marginal mean. If the design is not balanced, and some means are based on different numbers of observations, then you can also compute the weighted marginal means (weighted by the respective cell N’s). Note that the weighted mean is an unbiased estimate of the weighted population marginal mean (for details, see, for example, Milliken and Johnson, 1984, page 132), and the standard errors for these means are estimated from the pooled within-cell variances.
Plot. Click the Plot button to create a graph of the observed unweighted means for the selected Effect (see above). Depending upon your design, when you click this button, the Dependent Vars for the Plot dialog box will be displayed, which allows you to specify the dependent variables to use in the means plot. Next, the Specify the Arrangement of the Factors in the Plot dialog box may be displayed, which allows you to specify the arrangement of factors that STATISTICA will use in the means plot.
All marginal tables, observed unweighted. Click the All marginal tables, observed unweighted button to produce spreadsheets of the observed unweighted means for all of the categorical effects (regardless of what is selected in the Effect field).
Observed, weighted. Click the Observed, weighted button to produce a spreadsheet of the observed weighted means for the selected Effect (see above). These are computed as the standard means for the respective combinations of factor levels, directly from the data. Thus, the resulting means are weighted marginal means, since they are weighted by the number of observations in each cell of the design (in full factorial designs, you could also compute the weighted marginal means by averaging the cell means involved in each marginal mean, weighted by the respective number of observations in the respective cells). Note that the weighted mean is an unbiased estimate of the weighted population marginal mean (for details, see, for example, Milliken and Johnson, 1984, page 132), and the standard errors for these means are estimated from the respective cell variances for each respective mean (i.e., the respective actual observed standard deviations in each cell).
Plot. Click the Plot button to create a graph of the observed weighted means for the selected Effect (see above). Depending upon your design, when you click this button, the Dependent Vars for the Plot dialog box will be displayed, where you can specify the dependent variables to use in the means plot. Next, the Specify the Arrangement of the Factors in the Plot dialog box may be displayed, where you can specify the arrangement of factors that STATISTICA will use in the means plot.
All marginal tables, observed weighted. Click the All marginal tables, observed weighted button to produce spreadsheets of the observed weighted means for all of the categorical effects (regardless of what is selected in the Effect field).
Least squares means. Click the Least squares means button to produce a spreadsheet of the least squares means for the selected Effect. Least squares means are the expected population marginal means, given the current model. Thus, these are usually the means of interest when interpreting significant effects from the ANOVA or MANOVA table. Note that for full factorial designs without missing cells, the Least squares means are identical to the Observed, unweighted means (see above). Least squares means are also sometimes called predicted means, because they are the predicted values when all factors in the model are either held at their means or the factor levels for the respective means. Note that if there are continuous predictors (covariates) in the model, the least squares means are computed from the values for those predictors as set in the Covariate values group box (see below). For details concerning the computation of least squares means refer to Milliken and Johnson (1992), Searle, Speed, and Milliken (1980), or Searle (1987). Note that when you are in the GLZ module, STATISTICA does not compute the least squares means, rather the equivalent expected values for the respective non-linear (generalized linear) model, i.e., the predicted means are computed.
Plot. Click the Plot button to create a graph of the least squares means for the selected Effect. Depending upon your design, when you click this button, the Dependent Vars for the Plot dialog box will be displayed, where you can specify the dependent variables to use in the means plot. Next, the Specify the Arrangement of the Factors in the Plot dialog box may be displayed, where you can specify the arrangement of factors that STATISTICA will use in the means plot.
All marginal tables, least squares means. Click the All marginal tables, least squares means button to produce spreadsheets of the least squares means for all of the categorical effects (regardless of what is selected in the Effect field).
Covariate values. The options in the Covariate values group box determine at what values the continuous predictor variables (covariates) will be set for the computation of least squares means. By default, the values for any continuous predictors (covariates) in the model will be held at their respective overall Covariate means. You can also specify User-defined values for the covariates; after selecting this option button, click the Define button to display the Values for Covariates dialog box and specify the values. Finally, you can set the values for the continuous predictor variables so as to compute the Adjusted means, these are the predicted values (means) after “adjusting” for the variation of the means of the continuous predictor variables over the cells in the current Effect (see above). Adjusted means are widely discussed in the traditional analysis of covariance (ANCOVA) literature; see, for example, Finn (1974), Pedhazur (1973), or Winer, Brown, and Michels, K. M. (1991). The Adjusted means option button is only available in full factorial designs. Note that the Covariate values group box will not be available when you are using the ANOVA module.
Show std errs. Select the Show std errs check box to display standard errors and confidence limits for the means in the spreadsheet or plot of means (see the above buttons). The plot of means will show the confidence limits as error bars around the respective means. The actual confidence limits are based on the current setting in the Confidence limits field available on the GLM Results – Quick tab.
Note: standard errors for unweighted marginal means. The standard errors for the observed unweighted means are computed based on the current error term from the ANOVA table:
Std.Err.(m-bar) = sest / t * sqrt[S(1/ni)]
In this formula, sest is the estimated sigma (computed as the square root of the estimated error variance from the current ANOVA table), t is the number of means that is averaged to compute the respective marginal mean, and ni refers to the number of observations in the t experimental conditions from which the respective unweighted marginal mean is computed.
Note: standard errors for weighted marginal means. The standard errors for the marginal means are computed as if you had ignored the other factors (those not in the marginal means table). Thus, for weighted marginal means the standard error is not dependent on the estimate of the error variance from the current ANOVA table, and hence, it is not dependent on the current model that is being fit to the data.
Show means +/- std errs. Select this check box to show in the tables and plots of means the plus or minus standard error range around each mean. These will only be shown if the Show std errs check box is also selected. By default, when the Show means +/- std errs check box is cleared, the (95%) confidence intervals will be computed instead (or any other confidence interval, consistent with the specification in the Confidence limits field of the Quick tab).
GLM, GRM, and ANOVA Results – Comps Tab
Select the Comps tab (Comparisons tab) of the GLM Results, GRM Results, or the ANOVA Results dialogs to access options to perform a priori (planned) comparisons between the means in the design. Note that complex a priori hypotheses can also be tested via the Estimate button, on the Summary tab (see the Between effects group box). A discussion of the rationale and applications of planned comparisons and along with post-hoc tests is also provided in the Contrast analysis and post-hoc tests section in the context of the ANOVA/MANOVA module. Note that these options are only available if the current design contains effects for categorical predictor variables, or within subject (repeated measures) effects.
Note: planned (a priori) comparisons (contrast analysis ). A priori planned comparisons are usually performed after an analysis involving effects for categorical predictor variables has yielded significant effects. The purpose of planned comparisons then is to determine whether the pattern of means for the respective effect follows the one that was hypothesized, that is, you compare the specific means for the effect of interest that were hypothesized to be different from each other (e.g., in a 3-level effect Group, you might test whether the mean for level 1 is significantly different from the mean for level 3). STATISTICA GLM provides a convenient user-interface for specifying contrast coefficients; these coefficients are then used to compare the least squares means (see also the Means tab for details) for the respective chosen Effect (see below). Thus, the contrasts for the planned comparisons are applied to the means predicted by the current model; these means are identical to the observed unweighted means in the case of full factorial designs without continuous predictors (covariates).
Note: random effects. The error terms for all planned comparisons will always be computed from the sums of squares residuals. Those error terms may not be appropriate, and when random effects are involved, you should interpret the results of planned comparisons with caution.
Effect. Select the desired effect from all of those effects in the current design in the Effect drop-down box. A priori planned comparisons are performed on the marginal means (least squares, see below) for effects involving only categorical predictor variables.
Planned comparisons of LS means. The options in the Planned comparisons of LS means group box allow you to compute planned comparisons of the least squares means for the current model. The contrast coefficients can be entered Separately for each factor in the current Effect (see above), or Together as a vector simultaneously for all factors (see below). When there are continuous predictors (covariates) in the model, then the least squares means used in the comparison are computed from the covariates at their means (regardless of the selection in the Covariate values group box on the Means tab).
Display least squares means. Click the Display least squares means button to display a spreadsheet with the least squares means for the currently selected Effect; see also the Means tab.
Contrasts for LS means. Click the Contrasts for LS means button to display the respective contrast specification dialog for the chosen Effect. If you requested to enter the contrast coefficients Separately for each factor (see below), then the contrast specification dialog will allow you to enter the contrast coefficients for each factor; if you requested to enter the contrast coefficients Together (contrast vector), then the contrast specification dialog will prompt you to enter a matrix (or vector) of contrast coefficients for all levels of the chosen effect (the respective contrast specification dialog will show and label all levels of the respective effect on the dialog).
Depending on the type of Effect that you have selected (e.g., a main effect, within-subject effect, interactions, etc.) and the option buttons you have selected in the Enter contrasts separately or together and/or the Contrasts for dependent variables group boxes various contrast specification dialogs will be displayed. See the Specify Contrasts for This Factor, Specify Contrasts, Contrast for Between Group Factors, Enter Contrasts for this Factor, Repeated Measures, Contrasts for Within-Subject Factors, and Contrasts for Dependent Variables dialogs for further details.
Compute. After you specify your contrasts for least squares (via the Contrasts for LS means button, see above), click the Compute button to display three spreadsheets: the Between contrast coefficients spreadsheet, Contrast estimates spreadsheet, and the Univariate or Multivariate test of significance for planned comparisons spreadsheet.
Enter contrasts separately or together. Use the options in the Enter contrasts separately or together group box to specify how you want to enter the contrasts when you click the Contrasts for LS means button (see above). Select the Separately for each factor option button to enter the contrast coefficients for each factor in the current Effect. Select the Together (contrast vector) option button to enter the contrast coefficients for each cell in the current Effect (combination of factor levels for the factors in the current Effect).
Note that the method of computing the results for the planned comparison is actually identical, regardless of how the contrast coefficients were entered, and any contrast specified via the separately method can also be represented via the together method (but not vice versa). Specifically, when Separately for each factor is selected, the Kronecker Product (see the STATISTICA Visual Basic function MatrixKroneckerMultiply) of the specified matrices of contrast coefficients for each factor will be applied to the set of least squares means for the respective chosen Effect.
Note: separately for each factor. This method of specifying contrasts is most convenient when you want to explore interaction effects, for example, to test partial interactions within the levels of other factors. Suppose you had a three-way design with factors A, B, and C, each at 2 levels (so the design is a 2x2x2 between group full factorial design), and you found a significant three-way interaction effect. Recall that a three-way interaction effect can be interpreted as a two-way interaction, modified by the level of a third factor. Suppose further that the original hypothesis for the study was that a two-way interaction effect exists at level 1 of C, but no such effect exists at level 2 of factor C. Entering contrast coefficients Separately for each factor, you could enter the following coefficients:
· For factor A: 1 -1
· For factor B: 1 -1
· For factor C: 1 0
The Kronecker product of these vectors shows which least squares means in the design are compared by this hypothesis:
Levels, Factor C |
|
|
|
1 |
|
|
|
|
|
|
2 |
|
|
|
Levels, Factor B |
|
1 |
|
|
|
2 |
|
|
1 |
|
|
|
2 |
|
Levels, Factor A |
1 |
|
2 |
|
1 |
|
2 |
1 |
|
2 |
|
1 |
|
2 |
Coefficients |
1 |
|
-1 |
|
-1 |
|
1 |
0 |
|
0 |
|
0 |
|
0 |
Thus, this hypothesis tests the A by B interaction within level 1 of factor C.
Note: together (contrast vectors). This method of specifying contrasts can be used to compare any set of least squares means in the current Effect. In the table shown above, you could specify directly the contrast vector shown in the row labeled Coefficients. You could also compare any set of least squares means within the three-way interaction. For example:
Levels, Factor C |
|
|
|
1 |
|
|
|
|
|
|
2 |
|
|
|
Levels, Factor B |
|
1 |
|
|
|
2 |
|
|
1 |
|
|
|
2 |
|
Levels, Factor A |
1 |
|
2 |
|
1 |
|
2 |
1 |
|
2 |
|
1 |
|
2 |
Coefficients |
1 |
|
0 |
|
0 |
|
-1 |
0 |
|
1 |
|
-1 |
|
0 |
This set of coefficients cannot be set up in terms of main effects and interactions of factors (i.e., via option button Separately for each factor), and could only be specified via the Together option button.
Contrasts for dependent variables. Use the options in the Contrasts for dependent variables group box to determine if you are able to specify a set of contrast matrices for the dependent measures after you click the Contrasts for LS means button (see above). Select the Yes option button if you want to specify a set of contrast matrices for the dependent measures. Select the No option button, if you do not want to. Note that these options are only available if the current design involves multiple dependent variables, or, in case of within subject (repeated measures) designs, multiple dependent measures.
Multivariate tests. Use the options in the Multivariate tests group box to select the specific multivariate tests statistics that are to be reported in the respective results spreadsheets. For description of the different multivariate tests statistics, refer to the Multivariate designs topic in the Introductory Overview. These options are only available if the current design is multivariate iature, i.e., if there are multiple dependent measures, or a within-subject (repeated measures) design with effects that have more than 2 levels (and hence, multivariate tests for those effects can be computed).
GLM, GRM, and ANOVA Results – Matrix Tab
Select the Matrix tab of the GLM Results, GRM Results, or ANOVA Results dialog to access options to review various matrices involved in the computations for of the ANOVA tables.
Between design. The options under Between design are used to review various matrices computed for the between design. For details on the specific matrices, see Between Design Matrices in GLM, GRM, and ANOVA.
Between effects. The options under Between effects are used to review the sums of squares and cross-product matrices and derived matrices for the between effects in the design. For details on the specific matrices, see Between Effects Matrices in GLM, GRM, and ANOVA.
Within effects. The options under Within effects are used to review various matrices involved in the computations for the within-subjects (repeated measures) effects. For details on the specific matrices, see Within Effects Matrices in GLM, GRM, and ANOVA.
GLM , GLZ, GRM, PLS, and ANOVA Results – Report Tab
Select the Report tab of the GLM Results, GLZ Results, GRM Results, PLS Results, or the ANOVA Results dialog boxes to access the options described here.
Send/print to Report window. Use the options in the Sent/print to Report window group box to send the variable specifications, command syntax, and prediction equation to a report. Note that the Also send to Report Window check box must be selected in the Analysis/Graph Output Manager dialog box for these options to work. If it is not selected, you will be prompted to modify your output options when you click the Variables and command syntax or Pred. equation buttons (see below).
Variables and command syntax. Click this button to send the current data specifications, including the data file name, currently selected variables, and codes to a report. Use the options in the Analysis/Graph Output Manager dialog box to control the level of detail of the printout (e.g., whether to print long and short text labels, etc.). Also, this option will send the command syntax for the current analysis to the report window. The command syntax provides a detailed log of the specifications for the current analysis. You can send the command syntax to the report window even if you originally specified the design via Quick Specs dialog boxes or the Analysis Wizard dialog boxes.
Pred. equation. Click the Pred. equation button to send the current prediction equation for each dependent variable (for the between design only) to a report. If there are within (repeated measures) factors in the design, they will be ignored, i.e., a prediction equation will be printed for each dependent variable that was originally selected. This option is very useful if you want to copy the prediction equation and paste it into another spreadsheet or an equation plotter (e.g., plot a User-Defined functions in 2D or 3D graphs, see the Graph Options dialog box – Custom Function options pane topic). This option is not available in GLZ and PLS.
# digits. Enter the number of digits that you want displayed in the prediction equation in the # digits field. This option is not available in GLZ and PLS.
Model Profiler. Click this button to display the Model Profiler, where you can run simulations based on the specified model. Note that this option is only available if General liner models was selected in the Startup Panel.
Code generator. If your program is licensed to include this feature, you can generate computer code to implement the current model for predicting new observations. When you click this button you have the following choices:
PMML. Click this command to generate code in Predictive Model Markup Language (PMML) which is an XML-based language for storing information about fully trained (parameterized) models, and for sharing those models with other applications. STATISTICA and STATISTICA Enterprise Server contain facilities to use this information to compute predicted values or classifications, i.e., to quickly and efficiently deploy models (typically in the context of data mining projects).
STATISTICA Visual Basic (SVB). Click this command to generate a STATISTICA Visual Basic program containing the code implementing the model. This code will be generated in a form compatible with the nodes of STATISTICA Data Miner; however, you can also simply copy/paste the relevant portion of this code to include it in your custom Visual Basic programs. The code will automatically be displayed in the STATISTICA Visual Basic program editor window.
C/C++. Click this command to generate code compatible with the C/C++ computer language. This option is useful if you want to include the information provided by the final model into custom (C/C++) programs. (See also, Using C/C++/C# Code for Deployment.)
C#. Click this command to generate code as C#.
Java. Click this command to generate code in Java script.
SAS. Click this command to generate deployment code for the created model as SAS code (a .sas file). See also, Rules for SAS Variable Names and Example: SAS Deployment.
SQL stored procedure in C#. Click this command to generate code as a C# class intended for use in a SQL Server user defined function.
SQL User Defined Function in C#. Click this command to generate code as a C# class intended for use as a SQL Server user-defined function.
TeraData. Click this command to generate code as C Computer language function intended for use as a user-defined function in a TeraData querying environment.
Deployment to STATISTICA Enterprise. Click this command to deploy the results as an Analysis Configuration in STATISTICA Enterprise. Note that appropriately formatted data must be available in a STATISTICA Enterprise Data Configuration before the results can be deployed to an Analysis Configuration.
GLM, GRM, and ANOVA Results – Profiler Tab
Select the Profiler tab of the GLM Results, GRM Results, or the ANOVA Results dialog box to access the options described here. For an overview of response/desirability profiling see Desirability Profiling in GLM, GRM, and MANOVA and Experimental Design Profiling Predicted Responses and Response Desirability.
Vars. Click the Vars button to display a standard variable selection dialog box, in which you select the dependent variables to profile by selecting those variables on the list. If multiple dependent variables are specified for the analysis, use this button to select the dependent variable or variables for which to profile responses. Note that you can specify desirability function settings for all dependent variables in the analysis; thus, you can use the variable selection dialog box to easily select subsets of dependent variables or single dependent variables to profile
View. Click the View button to display a compound graph of the prediction profiles for each of the dependent variables that are selected to be profiled. The prediction profile compound graph contains a number of features that are useful for interpreting the effects of the predictor variables on responses on the dependent variables. For each dependent variable, a graph is produced showing the predicted values of the dependent variables at the minimum and maximum values of each predictor variable, and at each additional grid point for each predictor variable. Also shown are the current levels for each predictor variable. The predicted values that are shown for the dependent variables are the predicted responses at each level of each factor, holding all the other factors (including the block factor) constant at their current levels. Confidence intervals or prediction intervals for the predicted values are also shown if the Confidence intervals or the Prediction intervals option buttons, respectively, are selected in the Options for Response Profiler dialog box.
If the Show desirability function check box is selected, clicking the View button will also produce a desirability function graph accompanying the predicted values for each of the dependent variables. The desirability function graph shows the desirability of the response (which can range from 0.0 for undesirable up to 1.0 for very desirable) across the observed range of each dependent variable (see Desirability function specifications options for details on specifying desirability function values for each dependent variable). Similar to the graphs of the predicted values for each dependent variable, graphs are produced for the overall desirability at each level of each factor, holding all other factors (including the block factor) constant at their current levels. Inspection of the desirability function graphs shows how the desirability of the responses on the dependent variables changes as the levels of the factors change.
1. Click the 1 button to plot the desirability function values in a surface plot, along with the specified grid points for the factors. A surface graph will be produced for each pair of factors, showing how the response desirability varies at each combination of grid points for each pair of factors, holding all other factors constant at their current levels. Note that several different options can be specified for fitting the desirability function values to the surface in the Options for Response Profiler dialog box.
2. Click the 2 button to plot the contours of the desirability function, along with the specified grid points for the factors. A contour plot will be produced for each pair of factors, showing how the response desirability varies at each combination of grid points for each pair of factors, holding all other factors constant at their current levels. Note that several different options can be specified for fitting the contours of the desirability function in the Options for Response Profiler dialog box.
Options. Click the Options button to display the Options for Response Profiler dialog box.
Set factors at value. Use the options in the Set factors at value group box to specify the current levels of the predictor variables for the prediction profile compound graph (available via the View button), the surface plot (available via the 1 button), and the contour plot (available via the 2 button). Select the Means option button to set the current level of each predictor variable to the mean of the respective variable. This is the default option. Select the User vals option button to set the current level of each predictor variable to user-specified values. These values can be inspected and/or specified by clicking the accompanying , which will display the Select Factor/Covariate Values dialog box, in which you can specify the current level for each predictor variable. Select the Optimum option button to set the current level of each predictor variable to the value determined by optimizing the response desirability.
Grid. Click the Grid button to specify the range and the grid points for each of the predictor variables in the analysis. Use the combo box by the Grid button to specify the factor for which you want to specify grid points. Clicking the Grid button will then display the Specifications for Factor Grid dialog box, in which you can specify the minimum value, the maximum value, and the number of intervals in the grid for the factor. These specifications determine the grid points for the factor, by setting the lowest grid point to the minimum value, the next lowest grid point to the minimum value plus the difference of the minimum value from the maximum value divided by the number of intervals, and so on up to the highest grid point.
Grid points serve two functions in the Response/desirability profiler. They determine the plot points for the factors on the prediction profile compound graph (available via the View button), the surface plot (available via the 1 button), and the contour plot (available via the 2 button).
Desirability function specifications. Use the Desirability function specifications group box to enter desirability function specifications for the class displayed in the Variable combo box. These specifications determine the desirability function values (from 0.0 for undesirable to 1.0 for very desirable) corresponding to predicted values on the class. These specifications are entered in the Value and Desirability edit fields (see below). Note that the majority of the options in this group box are not available unless the Show desirability function check box is selected.
Show desirability function. Select this check box to enable the Desirability function specifications edit fields. By default, the Show desirability function check box is not selected and the Desirability function specifications edit fields are disabled. The Show desirability function check box is always selected when the Set factors at value Optimum option button has been selected. If the Set factors at value Optimum option has been selected and the Show desirability function check box is deselected, STATISTICA will then deselect the Optimum option button and select the Mean option button.
Variable. Use the Variable combo box to select a class for which to specify Desirability function specifications. Select the class for which you want to specify Desirability function settings by selecting the class from the Variable button or combo box, and then enter the settings in the Value and Desirability edit fields.
Value – Low, Medium, and High Values. STATISTICA allows for up to three “inflection points” in the desirability function for predicted values for each dependent variable. For example, suppose that some intermediate predicted value on a dependent variable is highly desirable, and that lower and higher predicted values on the variable become progressively less desirable as they depart further from the “target” intermediate value. This type of desirability function would have three inflection points: the low value for the dependent variable, below which the response is undesirable, the high value for the dependent variable, above which the response is undesirable, and the medium value for the dependent variable, at which the response becomes increasingly desirable as it approaches the target value. The default specifications for the low value, medium value, and high value settings use a simple “higher is better” type of desirability function with only two inflection points. The low value is set to the observed minimum value for the dependent variable, the high value is set to the observed maximum value for the dependent variable, and the medium value is set to the mid-point between these two extremes. You can specify any other type of desirability function with up to three inflection points by entering the inflection points for the variable in the low value, medium value, and high value boxes. The only restriction is that adjacent inflection points must be in ascending order or equal in value.
Desirability – Low, Medium, and High Values. Desirability values (from 0.0 for undesirable to 1.0 for very desirable) can be specified for the corresponding inflection points of the desirability function for each of the dependent variables. For the example “target” type of desirability function described above, you would want to specify desirability values of 0.0 for responses with values below the low inflection point or above the high inflection point, and a desirability value of 1.0 for the targeted intermediate value. You would therefore specify values of 0.0, 1.0, and 0.0 for desirability in the low value, medium value, and high value boxes. The default specifications for the level of desirability at the three inflection points are based on a simple “higher is better” type of desirability function. Desirability is set to 0.0 at the low value, 0.5 at the medium value, and 1.0 at the high value. You can specify any other valid desirability values (from 0.0 to 1.0) by entering the appropriate value in the respective boxes.
Curvature – s and t parameters. The desirability of responses need not decrease (or increase) linearly between inflection points in the desirability function. Perhaps there is a “critical region” close to a desired, intermediate response on a dependent variable beyond which the desirability of the response at first drops off very quickly, but drops off less quickly as the departure from the “targeted” value becomes greater. To model this type of desirability function requires “curvature” parameters to take into account the nonlinearity in the “falloff” of desirability between inflection points. In the s parameter and t parameter boxes, you can specify a value for the exponent of the desirability function (from 0.0 up to 50, inclusive) representing the curvature in the desirability function between the low and medium inflection points of the function, and between the medium and high inflection points of the function, respectively. Assuming that an intermediate response is most desirable, values greater than 1.0 for the s parameter and t parameter represent initial quicker “falloff” in desirability but subsequent slower “falloff” in desirability as the departure from the “targeted” value becomes greater. Values less than 1.0 for the s parameter and t parameter represent initial slower “falloff” in desirability but subsequent quicker “falloff” in desirability as the departure from the “targeted” value becomes greater. The default specifications for the s parameter and t parameter are values of 1.0, representing linear “falloff” in desirability between the medium and low inflection points as well as between the medium and high inflection points. Further descriptions of the s parameter and t parameter and their effects in the desirability function can be found in the discussions of “two-sided” desirability functions in Derringer and Suich (1980) and in Box and Draper (1987).
Apply to all vars. Click the Apply to all vars button to apply the desirability settings you specify for one dependent variable to all the dependent variables in the analysis. This option is particularly useful if the same dependent variable is measured on, say, successive days. For example, you could specify the desirability of radioactivity readings of waste materials on the first day after the materials are discharged from a factory, then apply the same desirability settings for the radioactivity readings on subsequent days. If many days of readings are taken, the Apply desirability specifications to all variables option can save considerable data entry.
Reset specs. Click the Reset specs button to reset any changed desirability function settings for a dependent variable back to the default desirability function specifications for the variable (for details on default specifications, see Desirability function settings, above).
all vars. Click the all vars button to reset all desirability function settings for all dependent variables back to the default desirability function specifications (for details on default specifications, see Desirability function settings, above).
Open specs. Click the Open specs button to display a standard Open File dialog box that will prompt you for a file in which the desirability function settings specified for the classes in the analysis have been saved using the Save specs button (see below). Retrieval of previously saved settings can save considerable data entry in specifying desirability functions, especially when the analysis contains many classes each with distinct desirability function specifications.
Save specs. Click the Save specs button to display a standard Save As File dialog box that will prompt you for a file in which to save the desirability function settings specified for the classes in the analysis. These settings are then available for retrieval at a later time by using the Open specs option (see above). This can save considerable data entry in specifying desirability functions, especially when the analysis contains many classes each with distinct desirability function specifications.
Examples of the ANOVA Analysis
Example 1: Breakdown and One-Way ANOVA
Overview. You can compute various descriptive statistics (e.g., means, standard deviations, correlations, percentiles, etc.) broken down by one or more categorical variables (e.g., by Gender and Region) as well as perform a one-way Analysis of Variance via the Breakdown and one-way ANOVA procedure accessible from the Basic Statistics and Tables Startup Panel.
Open the example data file Adstudy.sta for this example, and start the Basic Statistics and Tables module.
Ribbon bar. Select the Home tab. In the File group, click the Open arrow and from the menu, select Open Examples. The Open a STATISTICA Data File dialog box is displayed. Adstudy.sta is located in the Datasets folder. Then, select the Statistics tab. In the Base group, click Basic Statistics to display the Basic Statistics and Tables Startup Panel.
Classic menus. From the File menu, select Open Examples to display the Open a STATISTICA Data File dialog box; Adstudy.sta is located in the Datasets folder. Then, from the Statistics menu, select Basic Statistics/Tables to display the Basic Statistics and Tables Startup Panel.
In the Basic Statistics and Tables Startup Panel, select Breakdown and one-way ANOVA and click the OK button.
In the Statistics by Groups (Breakdown) dialog box, select the Individual tables tab. Click the Variables button to display the variable selection dialog box.
Select Measure01 through Measure23 as the Dependent variables, and the two variables Gender (subject’s gender, Male and Female) and Advert (type of advertisement shown to the subjects, Coke and Pepsi) as the Grouping variables, and click OK.
Click the Codes for grouping variables button to display the Select codes for indep. vars (factors) dialog box, and select all codes for both of the grouping variables. To select all codes for a variable, you can either enter the code numbers in the respective edit field, click the respective All button, or enter an asterisk * in the respective edit field. Clicking the OK button without specifying any values is equivalent to selecting all values of all variables.
Click the OK button in this dialog box and in the Statistics by Groups (Breakdown) dialog box to display the Statistics by Groups – Results dialog box, which provides various options and procedures for analyzing the data within groups in order to obtain a better understanding of the differences between categories of the grouping variables.
Summary Table of Means. You can select the desired statistics to be displayed in the Summary: Table of statistics or Detailed two-way tables; select the Descriptives tab and select all the check boxes in the Statistics box. Now, click the Detailed two-way tables button to display the results spreadsheet.
This spreadsheet shows the selected descriptive statistics for the variables as broken down by the specified groups (scroll the spreadsheet to view the results for the rest of the variables). For example, looking at the means within each group in this spreadsheet, you can see that there is a slight difference between the means for Males and Females for variable Measure01. Now, examine the means within the Male and Female groups for variable Measure01; you can see that there is very little difference between the groups Pepsi and Coke within either gender; thus, the gender groups appear to be homogenous in this respect.
One-Way ANOVA and Post-Hoc Comparisons of Means. You can easily test the significance of these differences via the Analysis of Variance button on the ANOVA & tests tab in the Results dialog box. Click this button to display the spreadsheet with the results of the univariate analysis of variance for each dependent variable.
The one-way Analysis of Variance procedure gave statistically significant results for Measure05, Measure07, and Measure09. These significant results indicate that the means across the groups are different in magnitude. Now, return to the Results dialog box and select the Post-hoc tab to perform post-hoc tests for the significant differences between individual groups (means). You will first need to select the variable(s) for the comparisons. For this example, click the Variables button, select variable Measure07, and click OK. You can choose from among several post-hoc tests (an even larger selection of tests is available in the GLM module); for this example, click the LSD test or planned comparison button.
The LSD test is equivalent to the t-test for independent samples, based on the N in the groups involved in the comparison. The t-test for independent samples results from Example 1 showed that there was a significant difference between the responses for Males and Females for Measure07. Using the Breakdown and one-way ANOVA procedure, you can see from the LSD test that a significant difference occurs only when the females are shown the Coke advertisement.
Graphical presentation of results. These differences can be viewed graphically via the many graphic options in the Statistics by Groups – Results dialog box. For example, to compare the distributions of the selected variables within the specified groups, select the Descriptives tab and click the Categorized box & whisker button. In the Box-Whisker Type dialog box, ensure that the Median/Quart./Range option button is selected, and click the OK button. Next, select the appropriate variable(s) to produce the graphs. Shown below is the box-whisker plot for variable Measure07.
As you can see in the above box and whisker plot for variable Measure07, there does appear to be a difference in the distribution of values for the Female-Coke group as compared to the Male-Coke group.
Within-Group Correlations. Now let’s look at the correlations between variables within the specified groups. Return to the Statistics by Groups – Results dialog box, and select the Correlations tab. Note that numerous options are available on this tab to display various statistics and auxiliary information, in addition to the (within-group) correlation matrices. For this example, change the p-value for highlighting option to .001. Then, click the Within-group correlations & covariances button. The Select groups dialog box will be displayed, in which you can select one group (or All Groups) for the correlation matrices.
In Example 1, a correlation matrix was produced in which the correlation between variables Measure05 and Measure09 (r = -.47) was highly significant (p<.001). The Breakdown and one-way ANOVA procedure enables you to explore this significant correlation further by computing correlations within the specified grouping variables. Now, in the Select groups dialog box, select All Groups and then click the OK button to produce all four correlation matrix spreadsheets.
As you can see, the results reveal that the pattern of correlations is differentiated across the groups (e.g., the correlation is very high in the Female/Coke group and much lower in the other three groups). None of the correlations between Measure05 and Measure09 were significant at the .001 level; however, if you were to change the p-value for highlighting field to .05 on the Correlations tab of the Results dialog box, and click the Within-group correlations & covariances button again, you would find that the correlation between Measure05 and Measure09 is significant at that level (p=.02) for the group defined by Female gender and Coke advertisement.
Note that you can use the Difference tests: r, %, means option in the Basic Statistics and Tables Startup Panel to test differences between correlation coefficients.
Categorized scatterplots. The within-group correlations can be graphically presented via the Categ. scatterplots button on the Correlations tab of the Statistics by Groups – Results dialog box. When you click this button, you will be prompted to select the variables for the analysis. Select Measure05 in the First variable list field and Measure09 in the Second variable list field and then click the OK button to produce the plot.
The above categorized scatterplot clearly shows the strong negative correlation between Measure05 and Measure09 for the group Female/Coke.
Example 2: Simple Factorial ANOVA with Repeated Measures
For this example of a 2 x 2 (between) x 3 (repeated measures) design, open the data file Adstudy.sta:
Ribbon bar. Select the Home tab. In the File group, click the Open arrow and from the menu, select Open Examples. The Open a STATISTICA Data File dialog box is displayed. Adstudy.sta is located in the Datasets folder.
Classic menus. From the File menu, select Open Examples to display the Open a STATISTICA Data File dialog box; Adstudy.sta is located in the Datasets folder.
Calling the ANOVA module. To start an ANOVA/MANOVA analysis:
Ribbon bar. Select the Statistics tab. In the Base group, click ANOVA to display the General ANOVA/MANOVA Startup Panel.
Classic menus. Select ANOVA from the Statistics menu to display the General ANOVA/MANOVA Startup Panel.
The Startup Panel contains options to specify very simple analyses (e.g., via One-way ANOVA – designs with only one between-group factor) and more complex analyses (e.g., via Repeated measures ANOVA – designs with between-group factors and a within-subject factor).
Select Repeated measures ANOVA as the Type of analysis and Quick specs dialog as the Specification method.
Then, click the OK button to display the ANOVA/MANOVA Repeated Measures ANOVA dialog box.
Specifying the design (variables). The first (between-group) factor is Gender (with 2 levels: Male and Female). The second (between-group) factor is Advert (with 2 levels: Pepsi and Coke). The two factors are crossed, which means that there are both Male and Female subjects in the Pepsi and Coke groups. Each of those subjects responded to 3 questions (this repeated measure factor will be called Response: it has 3 levels represented by variables Measure01, Measure02, and Measure03).
Click the Variables button (on the ANOVA/MANOVA Repeated Measures ANOVA dialog) to display the variable selection dialog. Select Measure01 through Measure03 as dependent variables (in the Dependent variable list field) and Gender and Advert as factors [in the Categorical predictors (factors) field].
Then click the OK button to return to the previous dialog.
The repeated measures design. Note that the design of the experiment that we are about to analyze can be summarized as follows:
|
Between-Group |
Between-Group |
Repeated Measure Factor: Response |
||
Factor #1: Gender |
Factor #2: Advert |
Level #1: Measure01 |
Level #2: Measure02 |
Level #3: Measure03 |
|
Subject 1 |
Male |
Pepsi |
9 |
1 |
6 |
Subject 2 |
Male |
Coke |
6 |
7 |
1 |
Subject 3 |
Female |
Coke |
9 |
8 |
2 |
Specifying a repeated measures factor. The minimum necessary selection is now completed and, if you did not care about selecting the repeated measures factor, you would be ready to click the OK button and see the results of the analysis. However, for our example, specify that the three dependent variables you have selected are to be interpreted as three levels of a repeated measures (within-subject) factor. Unless you do so, STATISTICA assumes that those are three “different” dependent variables and will run a MANOVA (i.e., multivariate ANOVA).
In order to define the desired repeated measures factor, click the Within effects button to display the Specify Within-subjects Factors dialog.
Note that STATISTICA has suggested the selection of one repeated measures factor with 3 levels (default name R1). You can only specify one within-subject (repeated measures) factor via this dialog. To specify multiple within-subject factors, use the General Linear Models module (available in the optional Advanced Linear/Nonlinear Models package). Press the F1 key (or click ) in this dialog to review a comprehensive discussion of repeated measures and examples of designs. Edit the name for the factor (e.g., change the default R1 into RESPONSE), and click the OK button to exit the dialog.
Codes (defining the levels) for between-group factors. You do not need to manually specify codes for between-group factors [e.g., instruct STATISTICA that variable Gender has two levels: 1 and 2 (or Male and Female)] unless you want to prevent STATISTICA from using, by default, all codes encountered in the selected grouping variables in the datafile. To enter such custom code selection, click the Factor codes button to display the Select codes for indep. vars (factors) dialog.
This dialog contains various options. For example, you can review values of individual variables before you make your selections by clicking the Zoom button, scan the file and fill the codes fields (e.g., Gender and Advert) for some individual or all variables, etc. For now, click the OK button; STATISTICA automatically fills in the codes fields with all distinctive values encountered in the selected variables,
and closes the dialog.
Performing the analysis. When you click the OK button upon returning to the ANOVA/MANOVA Repeated Measures ANOVA dialog, the analysis is performed, and the ANOVA Results dialog is displayed. Various kinds of output spreadsheets and graphs are now available.
Note that this dialog is tabbed, which allows you to quickly locate results options. For example, if you want to perform planned comparisons, click the Comps tab. To view residual statistics, click the Resids tab. For this simple overview example, we will only use the results options available on the Quick tab.
Reviewing ANOVA results. Start by looking at the ANOVA summary of all effects table by clicking the All effects button (the one with a SUMM-ary icon ).
The only effect (ignoring the Intercept) in this analysis that is statistically significant (p =.007) is the RESPONSE effect. This result can be caused by many possible patterns of means of the RESPONSE effect (for more information, see the ANOVA – Introductory Overview). We will now look graphically at the marginal means for this effect to see what it means.
To bring back the ANOVA Results dialog (that is, “resume” the analysis), press CTRL+R, select Resume from the Statistics menu, or click the ANOVA Results button on the Analysis bar. When the ANOVA Results dialog is displayed, click the All effects/Graphs button to review the means for individual effects.
This dialog contains a summary Table of all effects (with most of the information you have seen in the All effects spreadsheet) and is used to review individual effects from that table in the form of the plots of the respective means (or, optionally, spreadsheets of the respective mean values).
Plot of Means for a Main Effect. Double-click on the significant main effect RESPONSE (the one marked with an asterisk in the p column) to see the respective plot.
The graph indicates that there is a clear decreasing trend; the means for the consecutive three questions are gradually lower. Even though there are no significant interactions in this design (see the discussion of the Table of all effects above), we will look at the highest-order interaction to examine the consistency of this strong decreasing trend across the between-group factors.
Plot of means for a three-way interaction. To see the plot of the highest-order interaction, double-click on the row marked RESPONSE*GENDER*ADVERT, representing the interaction between factors 1 (Gender), 2 (Advert), and 3 (Response), on the Table of All Effects dialog. An intermediate dialog, Specify the arrangement of the factors in the plot, is displayed, which is used to customize the default arrangement of factors in the graph.
Note that unlike the previous plot of a simple factor, the current effect can be visualized in a variety of ways. Click the OK button to accept the default arrangement and produce the plot of means.
As you can see, this pattern of means (split by the levels of the between-group factors) does not indicate any salient deviations from the overall pattern revealed in the first plot (for the main effect, RESPONSE). Now you can continue to interactively examine other effects; run post-hoc comparisons, planned comparisons, and extended diagnostics; etc., to further explore the results.
Interactive data analysis in STATISTICA. This simple example illustrates the way in which STATISTICA supports interactive data analysis. You are not forced to specify all output to be generated before seeing any results. Even simple analysis designs can, obviously, produce large amounts of output and countless graphs, but usually you cannot know what will be of interest until you have a chance to review the basic output. With STATISTICA, you can select specific types of output, interactively conduct follow-up tests, and run supplementary “what-if” analyses after the data are processed and basic output reviewed. STATISTICA‘s flexible computational procedures and wide selection of options used to visualize any combination of values from numerical output offer countless methods to explore your data and verify hypotheses.
Automating analyses (macros and STATISTICA Visual Basic). Any selections that you make in the course of the interactive data analysis (including both specifying the designs and choosing the output options) are automatically recorded in the industry standard Visual Basic code. You can save such macros for repeated use (you can also assign them to toolbar buttons, modify or edit them, combine with other programs, etc.).
Example 3: A 2 x 3 Between-Groups ANOVA Design
Data File. This example, based on a fictitious data set reported in Lindman (1974), begins with a simple analysis of a 2 x 3 complete factorial between-groups design.
Suppose that we have conducted an experiment to address the nature vs. nurture question; specifically, we test the performance of different rats in the “T-maze.” The T-maze is a simple maze, and the rats’ task is to learn to run straight to the food placed in a particular location, without errors. Three strains of rats whose general ability to solve the T-maze can be described as bright, mixed, and dull, were used. From each of these strains, we rear 4 animals in a free (stimulating) environment, and 4 animals in a restricted environment. The dependent measure is the number of errors made by each rat while running the T-maze problem.
The data for this study are contained in the STATISTICA example data file Rats.sta. Open this data file:
Ribbon bar. Select the Home tab. In the File group, click the Open arrow and from the menu, select Open Examples. The Open a STATISTICA Data File dialog box is displayed. Rats.sta is located in the Datasets folder.
Classic menus. From the File menu, select Open Examples to display the Open a STATISTICA Data File dialog box; Rats.sta is located in the Datasets folder.
A portion of this file is shown below.
Specifying the Analysis. Start the ANOVA analysis:
Ribbon bar. Select the Statistics tab, and in the Base group, click ANOVA.
Classic menus. Select ANOVA from the Statistics menu.
The General ANOVA/MANOVA Startup Panel will be displayed, in which we can enter the specifications for the design.
Independent and dependent variables. The ANOVA/MANOVA module classifies variables as either independent or dependent (see Elementary Concepts). Independent variables are those under the experimenter’s control. We may also refer to these variables as grouping variables, coding variables, or between-group factors. These variables contain the codes that were used to uniquely identify to which group in the experiment the respective case belongs.
In the data file Rats.sta, the codes 1-free and 2-restricted were used in the categorical predictor variable Envirnmt to denote whether the rat belongs to the group of rats that were raised in the free or restricted environment, respectively. The codes used for the second independent variables (Strain) are 1-bright, 2-mixed, and 3-dull. The dependent variable in an experiment is the one that depends on or is affected by the independent variables; in this study this would be the variable Errors, which contains the number of errors made by the rats running the maze.
Specifying the design. In the General ANOVA/MANOVA Startup Panel, select Factorial ANOVA as the Type of analysis and Quick specs dialog as the Specification Method. Then click the OK button to display the ANOVA/MANOVA Factorial ANOVA dialog. This is a 2 (Environment) by 3 (Strain) between-groups experimental factorial design.
Click the Variables button to display the standard variable selection dialog. Here, select Errors from the Dependent variable list and Envirnmt and Strain from the Categorical predictors (factors) list, and then click the OK button.
Next, specify the codes that were used to uniquely identify the groups; click the Factor codes button and either enter each of the codes for each variable or click the All button for each variable to enter all of the codes for that variable. Finally, click the OK button. The ANOVA/MANOVA Factorial ANOVA dialog will now look as follows:
Reviewing the Results. Click the OK button to begin the analysis. When complete, the ANOVA Results dialog will be displayed.
Click the All effects/Graphs button to display the table of all effects.
Summary ANOVA table. This table summarizes the main results of the analysis. Note that significant effects (p<.05) in this table are marked with an asterisk *. We can adjust the significance criterion (for highlighting) by entering the desired alpha level in the Significance level field on the Quick tab. Both of the main effects (Envirnmt and Strain) are statistically significant (p<.05) while the interaction is not (p>.05).
Reviewing marginal means. The marginal means for the Envirnmt main effect will now be reviewed. (Note that the marginal means are calculated as least squares means.) Select the Envirnmt main effect select the Spreadsheet option button under Display; and then click the OK button to produce a spreadsheet with the table of marginal means for the selected effect.
The default graph for all spreadsheets with marginal means is the means plot. In this case, the plot is rather simple. To produce this plot of the two means for the free and restricted environment, return to the Table of All Effects dialog (by clicking the All effects/Graphs button on the Quick tab), change the Display option to Graph, and again click the OK button.
It appears that rats raised in the more restricted environment made more errors than rats raised in the free environment.
Reviewing the interaction plot. Now, let’s look at all of the means simultaneously, that is, at the plot of the interaction of Envirnmt by Strain. Once again, return to the Table of All Effects dialog, and this time select the interaction effect (Environmt*Strain). Click the OK button, and the Arrangement of Factors dialog is displayed:
As you can see, we have full control over the order in which the factors in the interaction will be plotted. For this example, select STRAIN under x-axis, upper and ENVIRNMT under Line pattern (see above). Click the OK button, and the graph of means is displayed.
The graph, shown below, below nicely summarizes the results of this study, that is, the two main effects pattern. The rats raised in the restricted environment (dashed line) made more errors than those raised in the free environment (solid line). At the same time, the dull rats made the most errors, followed by the mixed rats, and the bright rats made the fewest number of errors.
Post Hoc Comparisons of Means. In the previous plot, we might ask whether the mixed strain of rats were significantly different from the dull and the bright strain. However, no a priori hypotheses about this question were specified, therefore, we should use post hoc comparisons to test the mean differences between strains of rats (refer to the Introductory Overview for an explanation of the logic of post hoc tests).
Specifying post hoc tests. After returning to the ANOVA Results dialog, click the More results dialog to display the larger ANOVA Results dialog, and then click on the Post-hoc tab. For this example, select Strain in the Effect box in order to compare the (unweighted) marginal means for that effect.
Choosing a test. The different options for post hoc tests on this tab all “protect” us to some extent against capitalizing on chance (due to the post hoc nature of the comparisons, see ANOVA/MANOVA Introductory Overview – Contrast Analysis and Post-hoc Tests). All tests enable us to compare means under the assumption that we bring no a priori hypotheses to the study. These tests are discussed in the Post-hoc tab topic. For now, select the Homogenous groups option button (in the Display group box) and click the Scheffé test button.
In this table, the means are sorted from smallest to largest, and the means that are not significantly different from each other have four “stars” (*) in the same column (i.e., they form a homogenous group of means); all means that do not share stars in the same column are significantly different from each other. Thus, and as discussed in Winer, Brown, and Michels (1991, p. 528), the only means that are significantly different from each other are the means group 1 (bright) and group 3 (dull). Thus, we would conclude that the dull strain of rats made significantly more errors than the bright strain of rats, while the mixed strain of rats is not significantly different from either.
Testing Assumptions. The ANOVA/MANOVA and GLM Introductory Overview – Assumptions and Effects of Violating Assumptions topic explains the assumptions underlying the use of ANOVA techniques. Now, we will review the data in terms of these assumptions. Return to the ANOVA Results dialog and click on the Assumptions tab, which contains options for many different tests and graphs; some are applicable only to more complex designs.
Distribution of dependent variable. ANOVA assumes that the distribution of the dependent variable (within groups) follows the normal distribution. We can view the distribution for all groups combined, or for only a selected group by selecting the group in the Effect drop-down box. For now, select the Environmt*Strain interaction effect, and click the Histograms button under Distribution of vars within groups. The Select groups dialog is first displayed, in which we can select to view the distribution for all groups combined or for only a selected group.
For this example, click the OK button to accept the default selection of All Groups, and a histogram of the distribution will be produced.
It appears as if the distribution across groups is multi-modal, that is to say, it has more than one “peak.” We could have anticipated that, given the fact that strong main effects were found. If you want to test the homogeneity assumption more thoroughly, you could look at the distributions within individual groups.
For this example, a potentially more serious violation of the ANOVA assumptions will be tested.
Correlation between mean and standard deviation. As mentioned in the Introductory Overview, deviation from normality is not the major “enemy”; the most likely “trap” to fall into is to base our interpretations of an effect on an “extreme” cell in the design with much greater than average variability. Put another way, when the means and the standard deviations are correlated across cells of the design, the performance (alpha error rate) of the F-test deteriorates greatly, and we may reject the null hypothesis with p<.05 when the real p-value is possibly as high as .50!
Now, look at the correlation between the 6 means and standard deviations in this design. We can elect to plot the means vs. either the standard deviations or the variances by clicking the appropriate button (Plot means vs. std. deviations or Variances, respectively) on the Assumptions tab. For this example, click the Plot means vs. std. deviations button.
Note that in the illustration above, a linear fit and regression bands have been added to the plot via the Graph Options dialog – Plot: Fitting options pane and the Plot: Regr. Bands options pane. Indeed, the means and standard deviations appear substantially correlated in this design. If an important decision were riding on this study, we would be well advised to double-check the significant main effects pattern by using for example, some nonparametric procedure (see the Nonparametrics module) that does not depend on raw scores (and variances) but rather on ranks. In any event, we should view these results with caution.
Homogeneity of variances. Now, look at the homogeneity of variance tests. On the Assumptions tab, various tests are available in the Homogeneity of variances/covariances group. You may try a univariate test (Cochran C, Hartley, Bartlett) to compute the standard homogeneity of variances test, or the Levene’s test, but neither will yield statistically significant results. Shown below is the Levene’s Test for Homogeneity of Variances spreadsheet.
Summary. Besides illustrating the major functional aspects of the ANOVA/MANOVA module, this analysis has demonstrated how important it is to be able to graph data easily (e.g., to produce the scatterplot of means vs. standard deviations). Had we relied oothing else but the F-tests of significance and the standard tests of homogeneity of variances, we would not have caught the potentially serious violation of assumptions that was detected in the scatterplot of means vs. standard deviations. As it stands, we would probably conclude that the effects of environment and genetic factors (Strain) both seem to have an (additive) effect on performance in the T-maze. However, the data should be further analyzed using nonparametric methods to ensure that the statistical significance (p) values from the ANOVA are not inflated.
Example 4: A 2-Level Between-Group x 4-Level Within-Subject Repeated Measures Design
Overview. This example demonstrates how to set up a repeated measures design. The use of the post-hoc testing facilities will be demonstrated, and a graphical summary of the results will be produced. Moreover, the univariate and multivariate tests will be computed.
Research Problem
Overview. This example is based on a (fictitious) data set reported in Winer, Brown, and Michels (1991, Table 7.7). Suppose we are interested in learning how different factors affect people’s ability to perform a fine-tuning task. For example, operators of complex industrial machinery constantly need to read (and process) various gauges and adjust machines (dials) accordingly. In this (fictitious) study, two methods for calibrating dials were examined, and each subject was tested with 4 different shapes of dials.
The resulting design is a 2 (Factor A: Method of calibration; with 2 levels) by 4 (Factor B: Four different shapes of dials) analysis of variance. The last factor is a within-subject or repeated measures factor because it represents repeated measurements on the same subjects; the first factor is a between-groups factor because subjects will be randomly assigned to work under one or the other Method condition.
Data file. The setup of a data file for repeated measures analysis is straightforward: The between-groups factor (A: Method of calibration) can be specified by setting up a variable containing the codes that uniquely identify to which experimental condition each subject belongs. Each repeated measurement is then put into a different variable. Shown below is an illustration of the data file Accuracy.sta.
Specifying the Design. Open the Accuracy.sta data set, and start the General ANOVA/MANOVA analysis.
Following are instructions to do this from the ribbon bar and from the classic menus.
Ribbon bar. Select the Home tab. In the File group, click the Open arrow and from the menu, select Open Examples to display the Open a STATISTICA Data File dialog box. Double-click the Datasets folder, and then open the Accuracy.sta data set.
Next, select the Statistics tab. In the Base group, click ANOVA to display the General ANOVA/MANOVA Startup Panel.
Classic menus. Open the data file by selecting Open Examples from the File menu to display the Open a STATISTICA Data File dialog. The data file is located in the Datasets folder.
Then, from the Statistics menu, select ANOVA to display the General ANOVA/MANOVA Startup Panel.
In the Startup Panel, select Repeated measures ANOVA as the Type of analysis and Quick specs dialog as the Specification method, and then click the OK button (see the Introductory Overview for different methods of specifying designs). In the ANOVA/MANOVA Repeated Measures dialog, click the Variables button to display the standard variable selection dialog. In the Dependent variable list select B1 through B4; in the Categorical predictors (factors) list, select A. Click the OK button. Next, click the Factor codes button to display the Select codes for indep. vars (factors) dialog. Click the All button to select the codes (1 and 2) for the independent variable, and click the OK button to close this dialog and return to the ANOVA/MANOVA Repeated Measures dialog.
Specifying the repeated measures factors. Next click the Within effects button to display the Specify within-subjects factor dialog. Name the repeated measures factor B (in the Factor Name box) and specify 4 as the No. of levels. When reading the data, STATISTICA will go through the list of dependent variables and assign them to represent the consecutive levels of the repeated measures factor. See General ANOVA/MANOVA and GLM Notes – Specifying Within-Subjects Univariate and Multivariate Designs for more information on how to specify repeated measures factors.
Now click the OK button to close this dialog, and click the OK button in the ANOVA/MANOVA Repeated Measures dialog to run the analysis and display the ANOVA Results dialog.
Reviewing Results. The Results dialog contains options for examining the results of the experiment in great detail.
Let’s first look at the summary table of All effects/Graphs (click the All effects/Graphs button on the Quick tab).
Select Effect B*A as shown above (even though this effect is not statistically significant), and click the OK button; also click OK in response to the prompt about the assignment of factors to aspects of the graph (i.e., accept the defaults in the Arrangement of Factors dialog).
It is apparent that the pattern of means across the levels of the repeated measures factor B is approximately the same in the two conditions A1 and A2. However, it appears that there is a particularly strong difference between the two methods for dial B4, where the confidence bars for the means do not overlap.
Planned Comparison. Next, let’s examine the differences between the means for B4. Click on the Comps (comparisons) tab, and then click the Contrasts for LS means button to specify the contrasts for least squares means. Note that the so-called least squares means represent the best estimate of the population means mu, given our current model; hence, STATISTICA performs the planned comparison contrasts based on the least squares means; however, in this case it is not so important since this is a complete design where the least squares means are usually identical to the observed means.
We are interested in comparing method A1 with method A2, for dial B4 only. So, in the Specify Contrasts for this Factor dialog, for specifying the contrast for factor A, set the contrast coefficients as shown below:
Click the OK button and on the larger Specify Contrasts for this Factor dialog, set all coefficients to 0 (to ignore the respective means in the comparison), except for B4.
Refer to General ANOVA/MANOVA and GLM Notes – Specifying Univariate and Multivariate Between-Groups Designs for additional details on the logic of testing planned comparisons.
Now click the OK button, and click the Compute button on the Comps tab. Here are the results.
It appears that, as was evident in the plot of means earlier, the two means are significantly different from each other.
Post-Hoc Testing. Since we did not have a priori hypotheses about the pattern of means in this experiment, the a priori contrast, based on our examination of the pattern of means is not “fair.” As described in the Introductory Overview, the planned comparison method capitalizes on chance when you only compare those means that happen to be most different (in a study of, for example, 2*4=8 means, such as in this study).
To compute post-hoc tests, click the More results button to display the larger and more comprehensive Results dialog. Click on the Post-hoc tab, select effect B*A (i.e., the B by A interaction effect) in the Effect box, select the Significant differences option button as the display format in the Display group, and then click the Bonferroni button.
As you can see, using this more conservative method for testing the statistical significance of differences between means, the two B4 dials, across the levels of A are not reliably different from each other.
The post-hoc tests are further explained in the Post-hoc tests in GLM, GRM, and ANOVA topic; note also that when testing means in an interaction effect of between-group and within-subject (repeated measures) effects, there are several ways (options in STATISTICA) to estimate the proper error term for the comparison. These issues are discussed in Error Term for Post-hoc Tests in GLM, GRM, and ANOVA; see also Winer, Brown, and Michel (1991, p. 529-531) for a discussion of the Pooled MS reported in this results spreadsheet.
Tests for the B main effect. Winer, Brown, and Michels (1991, Table 7.10) summarize the results of using the Newman-Keuls procedure for testing the differences in the B main effect. To compute those tests, on the Post-hoc
Analysis of Variances with STATISTICA
Introduction to ANOVA / MANOVA.. 3
The Purpose of Analysis of Variance. 3
The Partitioning of Sums of Squares. 3
Between-Groups and Repeated Measures. 8
Incomplete (Nested) Designs. 8
Analysis of Covariance (ANCOVA) 9
Multivariate Designs: MANOVA/MANCOVA.. 11
Contrast Analysis and Post hoc Tests. 12
Why Compare Individual Sets of Means?. 12
Assumptions and Effects of Violating Assumptions. 13
Deviation from Normal Distribution. 13
Homogeneity of Variances and Covariances. 14
Sphericity and Compound Symmetry. 15
Methods for Analysis of Variance. 17
Specifying the General ANOVA/MANOVA Analysis. 18
General ANOVA/MANOVA Startup Panel and Quick Tab. 18
One-way ANOVA in General ANOVA/MANOVA.. 21
Main Effects ANOVA in General ANOVA/MANOVA.. 21
Factorial ANOVA in General ANOVA/MANOVA.. 21
Repeated Measures ANOVA in General ANOVA/MANOVA.. 22
ANOVA/MANOVA Quick Specs – Options Tab. 22
ANOVA/MANOVA Quick Specs – Quick Tab. 23
Specify Within-Subjects Factor 24
Reviewing General ANOVA/MANOVA Results. 26
GLM and ANOVA Results – Summary Tab. 28
GLM, GRM, and ANOVA Results – Means Tab. 30
GLM, GRM, and ANOVA Results – Comps Tab. 33
GLM, GRM, and ANOVA Results – Matrix Tab. 36
GLM , GLZ, GRM, PLS, and ANOVA Results – Report Tab. 36
GLM, GRM, and ANOVA Results – Profiler Tab. 38
Examples of the ANOVA Analysis. 42
Example 1: Breakdown and One-Way ANOVA.. 42
Example 2: Simple Factorial ANOVA with Repeated Measures. 47
Example 3: A 2 x 3 Between-Groups ANOVA Design. 54
Example 4: A 2-Level Between-Group x 4-Level Within-Subject Repeated Measures Design 62
Introduction to ANOVA / MANOVA
A general introduction to ANOVA and a discussion of the general topics in the analysis of variance techniques, including repeated measures designs, ANCOVA, MANOVA, unbalanced and incomplete designs, contrast effects, post-hoc comparisons, assumptions, etc. For related information, see also Variance Components (topics related to estimation of variance components in mixed model designs), Experimental Design/DOE (topics related to specialized applications of ANOVA in industrial settings), and Repeatability and Reproducibility Analysis (topics related to specialized designs for evaluating the reliability and precision of measurement systems).
Basic Ideas
The Purpose of Analysis of Variance
In general, the purpose of analysis of variance (ANOVA) is to test for significant differences between means. Elementary Concepts provides a brief introduction to the basics of statistical significance testing. If we are only comparing two means, ANOVA will produce the same results as the t test for independent samples (if we are comparing two different groups of cases or observations) or the t test for dependent samples (if we are comparing two variables in one set of cases or observations). If you are not familiar with these tests, you may want to read Basic Statistics and Tables.
Why the name analysis of variance? It may seem odd that a procedure that compares means is called analysis of variance. However, this name is derived from the fact that in order to test for statistical significance between means, we are actually comparing (i.e., analyzing) variances.
· The Partitioning of Sums of Squares
The Partitioning of Sums of Squares
At the heart of ANOVA is the fact that variances can be divided, that is, partitioned. Remember that the variance is computed as the sum of squared deviations from the overall mean, divided by n-1 (sample size minus one). Thus, given a certain n, the variance is a function of the sums of (deviation) squares, or SS for short. Partitioning of variance works as follows. Consider this data set:
|
Group 1 |
Group 2 |
Observation 1 |
2 |
6 |
Mean |
2 |
6 |
Overall Mean |
4 |
The means for the two groups are quite different (2 and 6, respectively). The sums of squares within each group are equal to 2. Adding them together, we get 4. If we now repeat these computations ignoring group membership, that is, if we compute the total SS based on the overall mean, we get the number 28. In other words, computing the variance (sums of squares) based on the within-group variability yields a much smaller estimate of variance than computing it based on the total variability (the overall mean). The reason for this in the above example is of course that there is a large difference between means, and it is this difference that accounts for the difference in the SS. In fact, if we were to perform an ANOVA on the above data, we would get the following result:
|
MAIN EFFECT |
||||
SS |
df |
MS |
F |
p |
|
Effect |
24.0 |
1 |
24.0 |
24.0 |
.008 |
As can be seen in the above table, the total SS (28) was partitioned into the SS due to within-group variability (2+2=4) and variability due to differences between means (28-(2+2)=24).
SS Error and SS Effect. The within-group variability (SS) is usually referred to as Error variance. This term denotes the fact that we cannot readily explain or account for it in the current design. However, the SS Effect we can explain. Namely, it is due to the differences in means between the groups. Put another way, group membership explains this variability because we know that it is due to the differences in means.
Significance testing. The basic idea of statistical significance testing is discussed in Elementary Concepts, which also explains why very many statistical tests represent ratios of explained to unexplained variability. ANOVA is a good example of this. Here, we base this test on a comparison of the variance due to the between-groups variability (called Mean Square Effect, or MSeffect) with the within-group variability (called Mean Square Error, or Mserror; this term was first used by Edgeworth, 1885). Under the null hypothesis (that there are no mean differences between groups in the population), we would still expect some minor random fluctuation in the means for the two groups when taking small samples (as in our example). Therefore, under the null hypothesis, the variance estimated based on within-group variability should be about the same as the variance due to between-groups variability. We can compare those two estimates of variance via the F test (see also F Distribution), which tests whether the ratio of the two variance estimates is significantly greater than 1. In our example above, that test is highly significant, and we would in fact conclude that the means for the two groups are significantly different from each other.
Summary of the basic logic of ANOVA. To summarize the discussion up to this point, the purpose of analysis of variance is to test differences in means (for groups or variables) for statistical significance. This is accomplished by analyzing the variance, that is, by partitioning the total variance into the component that is due to true random error (i.e., within-group SS) and the components that are due to differences between means. These latter variance components are then tested for statistical significance, and, if significant, we reject the null hypothesis of no differences between means and accept the alternative hypothesis that the means (in the population) are different from each other.
Dependent and independent variables. The variables that are measured (e.g., a test score) are called dependent variables. The variables that are manipulated or controlled (e.g., a teaching method or some other criterion used to divide observations into groups that are compared) are called factors or independent variables. For more information on this important distinction, refer to Elementary Concepts.
Multi-Factor ANOVA
In the simple example above, it may have occurred to you that we could have simply computed a t test for independent samples to arrive at the same conclusion. And, indeed, we would get the identical result if we were to compare the two groups using this test. However, ANOVA is a much more flexible and powerful technique that can be applied to much more complex research issues.
Multiple factors. The world is complex and multivariate in nature, and instances when a single variable completely explains a phenomenon are rare. For example, when trying to explore how to grow a bigger tomato, we would need to consider factors that have to do with the plants’ genetic makeup, soil conditions, lighting, temperature, etc. Thus, in a typical experiment, many factors are taken into account. One important reason for using ANOVA methods rather than multiple two-group studies analyzed via t tests is that the former method is more efficient, and with fewer observations we can gain more information. Let’s expand on this statement.
Controlling for factors. Suppose that in the above two-group example we introduce another grouping factor, for example, Gender. Imagine that in each group we have 3 males and 3 females. We could summarize this design in a 2 by 2 table:
|
Experimental |
Experimental |
Males |
2 |
6 |
Mean |
2 |
6 |
Females |
4 |
8 |
Mean |
4 |
8 |
Before performing any computations, it appears that we can partition the total variance into at least 3 sources: (1) error (within-group) variability, (2) variability due to experimental group membership, and (3) variability due to gender. (Note that there is an additional source – interaction – that we will discuss shortly.) What would have happened had we not included gender as a factor in the study but rather computed a simple t test? If we compute the SS ignoring the gender factor (use the within-group means ignoring or collapsing across gender; the result is SS=10+10=20), we will see that the resulting within-group SS is larger than it is when we include gender (use the within- group, within-gender means to compute those SS; they will be equal to 2 in each group, thus the combined SS-within is equal to 2+2+2+2=8). This difference is due to the fact that the means for males are systematically lower than those for females, and this difference in means adds variability if we ignore this factor. Controlling for error variance increases the sensitivity (power) of a test. This example demonstrates another principal of ANOVA that makes it preferable over simple two-group t test studies: In ANOVA we can test each factor while controlling for all others; this is actually the reason why ANOVA is more statistically powerful (i.e., we need fewer observations to find a significant effect) than the simple t test.
Interaction Effects
There is another advantage of ANOVA over simple t-tests: with ANOVA, we can detect interaction effects between variables, and, therefore, to test more complex hypotheses about reality. Let’s consider another example to illustrate this point. (The term interaction was first used by Fisher, 1926.)
Main effects, two-way interaction. Imagine that we have a sample of highly achievement-oriented students and another of achievement “avoiders.” We now create two random halves in each sample, and give one half of each sample a challenging test, the other an easy test. We measure how hard the students work on the test. The means of this (fictitious) study are as follows:
|
Achievement- |
Achievement- |
Challenging Test |
10 |
5 |
How can we summarize these results? Is it appropriate to conclude that (1) challenging tests make students work harder, (2) achievement-oriented students work harder than achievement- avoiders? Neither of these statements captures the essence of this clearly systematic pattern of means. The appropriate way to summarize the result would be to say that challenging tests make only achievement-oriented students work harder, while easy tests make only achievement- avoiders work harder. In other words, the type of achievement orientation and test difficulty interact in their effect on effort; specifically, this is an example of a two-way interaction between achievement orientation and test difficulty. Note that statements 1 and 2 above describe so-called main effects.
Higher order interactions. While the previous two-way interaction can be put into words relatively easily, higher order interactions are increasingly difficult to verbalize. Imagine that we had included factor Gender in the achievement study above, and we had obtained the following pattern of means:
Females |
Achievement-oriented |
Achievement-avoiders |
Challenging Test |
10 |
5 |
Males |
Achievement-oriented |
Achievement-avoiders |
Challenging Test |
1 |
6 |
How can we now summarize the results of our study? Graphs of means for all effects greatly facilitate the interpretation of complex effects. The pattern shown in the table above (and in the graph below) represents a three-way interaction between factors.
Thus, we may summarize this pattern by saying that for females there is a two-way interaction between achievement-orientation type and test difficulty: Achievement-oriented females work harder on challenging tests than on easy tests, achievement-avoiding females work harder on easy tests than on difficult tests. For males, this interaction is reversed. As you can see, the description of the interaction has become much more involved.
A general way to express interactions. A general way to express all interactions is to say that an effect is modified (qualified) by another effect. Let’s try this with the two-way interaction above. The main effect for test difficulty is modified by achievement orientation. For the three-way interaction in the previous paragraph, we can summarize that the two-way interaction between test difficulty and achievement orientation is modified (qualified) by gender. If we have a four-way interaction, we can say that the three-way interaction is modified by the fourth variable, that is, that there are different types of interactions in the different levels of the fourth variable. As it turns out, in many areas of research five- or higher- way interactions are not that uncommon.
Complex Designs
A review of the basic “building blocks” of complex designs.
· Between-Groups and Repeated Measures
Between-Groups and Repeated Measures
When we want to compare two groups, we use the t test for independent samples; when we want to compare two variables given the same subjects (observations), we use the t test for dependent samples. This distinction – dependent and independent samples – is important for ANOVA as well. Basically, if we have repeated measurements of the same variable (under different conditions or at different points in time) on the same subjects, then the factor is a repeated measures factor (also called a within-subjects factor because to estimate its significance we compute the within-subjects SS). If we compare different groups of subjects (e.g., males and females; three strains of bacteria, etc.), we refer to the factor as a between-groups factor. The computations of significance tests are different for these different types of factors; however, the logic of computations and interpretations is the same.
Between-within designs. In many instances, experiments call for the inclusion of between-groups and repeated measures factors. For example, we may measure math skills in male and female students (gender, a between-groups factor) at the beginning and the end of the semester. The two measurements on each student would constitute a within-subjects (repeated measures) factor. The interpretation of main effects and interactions is not affected by whether a factor is between-groups or repeated measures, and both factors may obviously interact with each other (e.g., females improve over the semester while males deteriorate).
Incomplete (Nested) Designs
There are instances where we may decide to ignore interaction effects. This happens when (1) we know that in the population the interaction effect is negligible, or (2) when a complete factorial design (this term was first introduced by Fisher, 1935a) cannot be used for economic reasons.
Imagine a study where we want to evaluate the effect of four fuel additives on gas mileage. For our test, our company has provided us with four cars and four drivers. A complete factorial experiment, that is, one in which each combination of driver, additive, and car appears at least once, would require 4 x 4 x 4 = 64 individual test conditions (groups). However, we may not have the resources (time) to run all of these conditions; moreover, it seems unlikely that the type of driver would interact with the fuel additive to an extent that would be of practical relevance. Given these considerations, we could actually run a so-called Latin square design and “get away” with only 16 individual groups (the four additives are denoted by letters A, B, C, and D):
|
Car |
|||
1 |
2 |
3 |
4 |
|
Driver 1 |
A |
B |
C |
D |
Latin square designs (this term was first used by Euler, 1782) are described in most textbooks on experimental methods (e.g., Hays, 1988; Lindman, 1974; Milliken & Johnson, 1984; Winer, 1962), and we do not want to discuss here the details of how they are constructed. Suffice it to say that this design is incomplete insofar as not all combinations of factor levels occur in the design. For example, Driver 1 will only drive Car 1 with additive A, while Driver 3 will drive that car with additive C. In a sense, the levels of the additives factor (A, B, C, and D) are placed into the cells of the car by driver matrix like “eggs into a nest.” This mnemonic device is sometimes useful for remembering the nature of nested designs.
Note that there are several other statistical procedures that may be used to analyze these types of designs; see the section on Methods for Analysis of Variance for details. In particular, the methods discussed in the Variance Components and Mixed Model ANOVA/ANCOVA section are very efficient for analyzing designs with unbalanced nesting (when the nested factors have different numbers of levels within the levels of the factors in which they are nested), very large nested designs (e.g., with more than 200 levels overall), or hierarchically nested designs (with or without random factors).
Analysis of Covariance (ANCOVA)
General Idea
The Basic Ideas section discussed briefly the idea of “controlling” for factors and how the inclusion of additional factors can reduce the error SS and increase the statistical power (sensitivity) of our design. This idea can be extended to continuous variables, and when such continuous variables are included as factors in the design they are called covariates.
Fixed Covariates
Suppose that we want to compare the math skills of students who were randomly assigned to one of two alternative textbooks. Imagine that we also have data about the general intelligence (IQ) for each student in the study. We would suspect that general intelligence is related to math skills, and we can use this information to make our test more sensitive. Specifically, imagine that in each one of the two groups we can compute the correlation coefficient (see Basic Statistics and Tables) between IQ and math skills. Remember that once we have computed the correlation coefficient we can estimate the amount of variance in math skills that is accounted for by IQ, and the amount of (residual) variance that we cannot explain with IQ (refer also to Elementary Concepts and Basic Statistics and Tables). We may use this residual variance in the ANOVA as an estimate of the true error SS after controlling for IQ. If the correlation between IQ and math skills is substantial, then a large reduction in the error SS may be achieved.
Effect of a covariate on the F test. In the F test (see also F Distribution), to evaluate the statistical significance of between-groups differences, we compute the ratio of the between- groups variance (MSeffect) over the error variance (MSerror). If MSerror becomes smaller, due to the explanatory power of IQ, then the overall F value will become larger.
Multiple covariates. The logic described above for the case of a single covariate (IQ) can easily be extended to the case of multiple covariates. For example, in addition to IQ, we might include measures of motivation, spatial reasoning, etc., and instead of a simple correlation, compute the multiple correlation coefficient (see Multiple Regression).
When the F value gets smaller. In some studies with covariates it happens that the F value actually becomes smaller (less significant) after including covariates in the design. This is usually an indication that the covariates are not only correlated with the dependent variable (e.g., math skills), but also with the between-groups factors (e.g., the two different textbooks). For example, imagine that we measured IQ at the end of the semester, after the students in the different experimental groups had used the respective textbook for almost one year. It is possible that, even though students were initially randomly assigned to one of the two textbooks, the different books were so different that both math skills and IQ improved differentially in the two groups. In that case, the covariate will not only partition variance away from the error variance, but also from the variance due to the between- groups factor. Put another way, after controlling for the differences in IQ that were produced by the two textbooks, the math skills are not that different. Put in yet a third way, by “eliminating” the effects of IQ, we have inadvertently eliminated the true effect of the textbooks on students’ math skills.
Adjusted means. When the latter case happens, that is, when the covariate is affected by the between-groups factor, then it is appropriate to compute so-called adjusted means. These are the means that we would get after removing all differences that can be accounted for by the covariate.
Interactions between covariates and factors. Just as we can test for interactions between factors, we can also test for the interactions between covariates and between-groups factors. Specifically, imagine that one of the textbooks is particularly suited for intelligent students, while the other actually bores those students but challenges the less intelligent ones. As a result, we may find a positive correlation in the first group (the more intelligent, the better the performance), but a zero or slightly negative correlation in the second group (the more intelligent the student, the less likely he or she is to acquire math skills from the particular textbook). In some older statistics textbooks this condition is discussed as a case where the assumptions for analysis of covariance are violated (see Assumptions and Effects of Violating Assumptions). However, because ANOVA/MANOVA uses a very general approach to analysis of covariance, we can specifically estimate the statistical significance of interactions between factors and covariates.
Changing Covariates
While fixed covariates are commonly discussed in textbooks on ANOVA, changing covariates are discussed less frequently. In general, when we have repeated measures, we are interested in testing the differences in repeated measurements on the same subjects. Thus we are actually interested in evaluating the significance of changes. If we have a covariate that is also measured at each point when the dependent variable is measured, then we can compute the correlation between the changes in the covariate and the changes in the dependent variable. For example, we could study math anxiety and math skills at the beginning and at the end of the semester. It would be interesting to see whether any changes in math anxiety over the semester correlate with changes in math skills.
Multivariate Designs: MANOVA/MANCOVA
Between-Groups Designs
All examples discussed so far have involved only one dependent variable. Even though the computations become increasingly complex, the logic and nature of the computations do not change when there is more than one dependent variable at a time. For example, we may conduct a study where we try two different textbooks, and we are interested in the students’ improvements in math and physics. In that case, we have two dependent variables, and our hypothesis is that both together are affected by the difference in textbooks. We could now perform a multivariate analysis of variance (MANOVA) to test this hypothesis. Instead of a univariate F value, we would obtain a multivariate F value (Wilks’ lambda) based on a comparison of the error variance/covariance matrix and the effect variance/covariance matrix. The “covariance” here is included because the two measures are probably correlated and we must take this correlation into account when performing the significance test. Obviously, if we were to take the same measure twice, then we would really not learn anything new. If we take a correlated measure, we gain some new information, but the new variable will also contain redundant information that is expressed in the covariance between the variables.
Interpreting results. If the overall multivariate test is significant, we conclude that the respective effect (e.g., textbook) is significant. However, our next question would of course be whether only math skills improved, only physics skills improved, or both. In fact, after obtaining a significant multivariate test for a particular main effect or interaction, customarily we would examine the univariate F tests (see also F Distribution) for each variable to interpret the respective effect. In other words, we would identify the specific dependent variables that contributed to the significant overall effect.
Repeated Measures Designs
If we were to measure math and physics skills at the beginning of the semester and the end of the semester, we would have a multivariate repeated measure. Again, the logic of significance testing in such designs is simply an extension of the univariate case. Note that MANOVA methods are also commonly used to test the significance of univariate repeated measures factors with more than two levels; this application will be discussed later in this section.
Sum Scores versus MANOVA
Even experienced users of ANOVA and MANOVA techniques are often puzzled by the differences in results that sometimes occur when performing a MANOVA on, for example, three variables as compared to a univariate ANOVA on the sum of the three variables. The logic underlying the summing of variables is that each variable contains some “true” value of the variable in question, as well as some random measurement error. Therefore, by summing up variables, the measurement error will sum to approximately 0 across all measurements, and the sum score will become more and more reliable (increasingly equal to the sum of true scores). In fact, under these circumstances, ANOVA on sums is appropriate and represents a very sensitive (powerful) method. However, if the dependent variable is truly multi- dimensional iature, then summing is inappropriate. For example, suppose that my dependent measure consists of four indicators of success in society, and each indicator represents a completely independent way in which a person could “make it” in life (e.g., successful professional, successful entrepreneur, successful homemaker, etc.). Now, summing up the scores on those variables would be like adding apples to oranges, and the resulting sum score will not be a reliable indicator of a single underlying dimension. Thus, we should treat such data as multivariate indicators of success in a MANOVA.
Contrast Analysis and Post hoc Tests
· Why Compare Individual Sets of Means?
Why Compare Individual Sets of Means?
Usually, experimental hypotheses are stated in terms that are more specific than simply main effects or interactions. We may have the specific hypothesis that a particular textbook will improve math skills in males, but not in females, while another book would be about equally effective for both genders, but less effective overall for males. Now generally, we are predicting an interaction here: the effectiveness of the book is modified (qualified) by the student’s gender. However, we have a particular prediction concerning the nature of the interaction: we expect a significant difference between genders for one book, but not the other. This type of specific prediction is usually tested via contrast analysis.
Contrast Analysis
Briefly, contrast analysis allows us to test the statistical significance of predicted specific differences in particular parts of our complex design. It is a major and indispensable component of the analysis of every complex ANOVA design.
Post hoc Comparisons
Sometimes we find effects in our experiment that were not expected. Even though in most cases a creative experimenter will be able to explain almost any pattern of means, it would not be appropriate to analyze and evaluate that pattern as if we had predicted it all along. The problem here is one of capitalizing on chance when performing multiple tests post hoc, that is, without a priori hypotheses. To illustrate this point, let’s consider the following “experiment.” Imagine we were to write down a number between 1 and 10 on 100 pieces of paper. We then put all of those pieces into a hat and draw 20 samples (of pieces of paper) of 5 observations each, and compute the means (from the numbers written on the pieces of paper) for each group. How likely do you think it is that we will find two sample means that are significantly different from each other? It is very likely! Selecting the extreme means obtained from 20 samples is very different from taking only 2 samples from the hat in the first place, which is what the test via the contrast analysis implies. Without going into further detail, there are several so-called post hoc tests that are explicitly based on the first scenario (taking the extremes from 20 samples), that is, they are based on the assumption that we have chosen for our comparison the most extreme (different) means out of k total means in the design. Those tests apply “corrections” that are designed to offset the advantage of post hoc selection of the most extreme comparisons.
Assumptions and Effects of Violating Assumptions
· Deviation from Normal Distribution
· Homogeneity of Variances and Covariances
· Sphericity and Compound Symmetry
Deviation from Normal Distribution
Assumptions. It is assumed that the dependent variable is measured on at least an interval scale level (see Elementary Concepts). Moreover, the dependent variable should be normally distributed within groups.
Effects of violations. Overall, the F test (see also F Distribution) is remarkably robust to deviations from normality (see Lindman, 1974, for a summary). If the kurtosis (see Basic Statistics and Tables) is greater than 0, then the F tends to be too small and we cannot reject the null hypothesis even though it is incorrect. The opposite is the case when the kurtosis is less than 0. The skewness of the distribution usually does not have a sizable effect on the F statistic. If the n per cell is fairly large, then deviations from normality do not matter much at all because of the central limit theorem, according to which the sampling distribution of the mean approximates the normal distribution, regardless of the distribution of the variable in the population. A detailed discussion of the robustness of the F statistic can be found in Box and Anderson (1955), or Lindman (1974).
Homogeneity of Variances
Assumptions. It is assumed that the variances in the different groups of the design are identical; this assumption is called the homogeneity of variances assumption. Remember that at the beginning of this section we computed the error variance (SS error) by adding up the sums of squares within each group. If the variances in the two groups are different from each other, then adding the two together is not appropriate, and will not yield an estimate of the common within-group variance (since no common variance exists).
Effects of violations. Lindman (1974, p. 33) shows that the F statistic is quite robust against violations of this assumption (heterogeneity of variances; see also Box, 1954a, 1954b; Hsu, 1938).
Special case: correlated means and variances. However, one instance when the F statistic is very misleading is when the means are correlated with variances across cells of the design. A scatterplot of variances or standard deviations against the means will detect such correlations. The reason why this is a “dangerous” violation is the following: Imagine that we have 8 cells in the design, 7 with about equal means but one with a much higher mean. The F statistic may suggest a statistically significant effect. However, suppose that there also is a much larger variance in the cell with the highest mean, that is, the means and the variances are correlated across cells (the higher the mean the larger the variance). In that case, the high mean in the one cell is actually quite unreliable, as is indicated by the large variance. However, because the overall F statistic is based on a pooled within-cell variance estimate, the high mean is identified as significantly different from the others, when in fact it is not at all significantly different if we based the test on the within-cell variance in that cell alone.
This pattern – a high mean and a large variance in one cell – frequently occurs when there are outliers present in the data. One or two extreme cases in a cell with only 10 cases can greatly bias the mean, and will dramatically increase the variance.
Homogeneity of Variances and Covariances
Assumptions. In multivariate designs, with multiple dependent measures, the homogeneity of variances assumption described earlier also applies. However, since there are multiple dependent variables, it is also required that their intercorrelations (covariances) are homogeneous across the cells of the design. There are various specific tests of this assumption.
Effects of violations. The multivariate equivalent of the F test is Wilks’ lambda. Not much is known about the robustness of Wilks’ lambda to violations of this assumption. However, because the interpretation of MANOVA results usually rests on the interpretation of significant univariate effects (after the overall test is significant), the above discussion concerning univariate ANOVA basically applies, and important significant univariate effects should be carefully scrutinized.
Special case: ANCOVA. A special serious violation of the homogeneity of variances/covariances assumption may occur when covariates are involved in the design. Specifically, if the correlations of the covariates with the dependent measure(s) are very different in different cells of the design, gross misinterpretations of results may occur. Remember that in ANCOVA, we in essence perform a regression analysis within each cell to partition out the variance component due to the covariates. The homogeneity of variances/covariances assumption implies that we perform this regression analysis subject to the constraint that all regression equations (slopes) across the cells of the design are the same. If this is not the case, serious biases may occur. There are specific tests of this assumption, and it is advisable to look at those tests to ensure that the regression equations in different cells are approximately the same.
Sphericity and Compound Symmetry
Reasons for Using the Multivariate Approach to Repeated Measures ANOVA. In repeated measures ANOVA containing repeated measures factors with more than two levels, additional special assumptions enter the picture: The compound symmetry assumption and the assumption of sphericity. Because these assumptions rarely hold (see below), the MANOVA approach to repeated measures ANOVA has gained popularity in recent years (both tests are automatically computed in ANOVA/MANOVA). The compound symmetry assumption requires that the variances (pooled within-group) and covariances (across subjects) of the different repeated measures are homogeneous (identical). This is a sufficient condition for the univariate F test for repeated measures to be valid (i.e., for the reported F values to actually follow the F distribution). However, it is not a necessary condition. The sphericity assumption is a necessary and sufficient condition for the F test to be valid; it states that the within-subject “model” consists of independent (orthogonal) components. The nature of these assumptions, and the effects of violations are usually not well-described in ANOVA textbooks; in the following paragraphs we will try to clarify this matter and explain what it means when the results of the univariate approach differ from the multivariate approach to repeated measures ANOVA.
The necessity of independent hypotheses. One general way of looking at ANOVA is to consider it a model fitting procedure. In a sense we bring to our data a set of a priori hypotheses; we then partition the variance (test main effects, interactions) to test those hypotheses. Computationally, this approach translates into generating a set of contrasts (comparisons between means in the design) that specify the main effect and interaction hypotheses. However, if these contrasts are not independent of each other, then the partitioning of variances runs afoul. For example, if two contrasts A and B are identical to each other and we partition out their components from the total variance, then we take the same thing out twice. Intuitively, specifying the two (not independent) hypotheses “the mean in Cell 1 is higher than the mean in Cell 2” and “the mean in Cell 1 is higher than the mean in Cell 2” is silly and simply makes no sense. Thus, hypotheses must be independent of each other, or orthogonal (the term orthogonality was first used by Yates, 1933).
Independent hypotheses in repeated measures. The general algorithm implemented will attempt to generate, for each effect, a set of independent (orthogonal) contrasts. In repeated measures ANOVA, these contrasts specify a set of hypotheses about differences between the levels of the repeated measures factor. However, if these differences are correlated across subjects, then the resulting contrasts are no longer independent. For example, in a study where we measured learning at three times during the experimental session, it may happen that the changes from time 1 to time 2 are negatively correlated with the changes from time 2 to time 3: subjects who learn most of the material between time 1 and time 2 improve less from time 2 to time 3. In fact, in most instances where a repeated measures ANOVA is used, we would probably suspect that the changes across levels are correlated across subjects. However, when this happens, the compound symmetry and sphericity assumptions have been violated, and independent contrasts cannot be computed.
Effects of violations and remedies. When the compound symmetry or sphericity assumptions have been violated, the univariate ANOVA table will give erroneous results. Before multivariate procedures were well understood, various approximations were introduced to compensate for the violations (e.g., Greenhouse & Geisser, 1959; Huynh & Feldt, 1970), and these techniques are still widely used.
MANOVA approach to repeated measures. To summarize, the problem of compound symmetry and sphericity pertains to the fact that multiple contrasts involved in testing repeated measures effects (with more than two levels) are not independent of each other. However, they do not need to be independent of each other if we use multivariate criteria to simultaneously test the statistical significance of the two or more repeated measures contrasts. This “insight” is the reason why MANOVA methods are increasingly applied to test the significance of univariate repeated measures factors with more than two levels. We wholeheartedly endorse this approach because it simply bypasses the assumption of compound symmetry and sphericity altogether.
Cases when the MANOVA approach cannot be used. There are instances (designs) when the MANOVA approach cannot be applied; specifically, when there are few subjects in the design and many levels on the repeated measures factor, there may not be enough degrees of freedom to perform the multivariate analysis. For example, if we have 12 subjects and p = 4 repeated measures factors, each at k = 3 levels, then the four-way interaction would “consume” (k-1)p = 24 = 16 degrees of freedom. However, we have only 12 subjects, so in this instance the multivariate test cannot be performed.
Differences in univariate and multivariate results. Anyone whose research involves extensive repeated measures designs has seen cases when the univariate approach to repeated measures ANOVA gives clearly different results from the multivariate approach. To repeat the point, this means that the differences between the levels of the respective repeated measures factors are in some way correlated across subjects. Sometimes, this insight by itself is of considerable interest.
Methods for Analysis of Variance
Several sections in this online textbook discuss methods for performing analysis of variance. Although many of the available statistics overlap in the different sections, each is best suited for particular applications.
General ANCOVA/MANCOVA: This section includes discussions of full factorial designs, repeated measures designs, multivariate design (MANOVA), designs with balanced nesting (designs can be unbalanced, i.e., have unequal n), for evaluating planned and post-hoc comparisons, etc.
General Linear Models: This extremely comprehensive section discusses a complete implementation of the general linear model, and describes the sigma-restricted as well as the overparameterized approach. This section includes information on incomplete designs, complex analysis of covariance designs, nested designs (balanced or unbalanced), mixed model ANOVA designs (with random effects), and huge balanced ANOVA designs (efficiently). It also contains descriptions of six types of Sums of Squares.
General Regression Models: This section discusses the between subject designs and multivariate designs that are appropriate for stepwise regression as well as discussing how to perform stepwise and best-subset model building (for continuous as well as categorical predictors).
Mixed ANCOVA and Variance Components: This section includes discussions of experiments with random effects (mixed model ANOVA), estimating variance components for random effects, or large main effect designs (e.g., with factors with over 100 levels) with or without random effects, or large designs with many factors, when we do not need to estimate all interactions.
Experimental Design (DOE): This section includes discussions of standard experimental designs for industrial/manufacturing applications, including 2**(k-p) and 3**(k-p) designs, central composite and non-factorial designs, designs for mixtures, D and A optimal designs, and designs for arbitrarily constrained experimental regions.
Repeatability and Reproducibility Analysis (in the Process Analysis section): This topic in the Process Analysis section includes a discussion of specialized designs for evaluating the reliability and precision of measurement systems; these designs usually include two or three random factors, and specialized statistics can be computed for evaluating the quality of a measurement system (typically in industrial/manufacturing applications).
Breakdown Tables (in the Basic Statistics section): This topic includes discussions of experiments with only one factor (and many levels), or with multiple factors, when a complete ANOVA table is not required.
Specifying the General ANOVA/MANOVA Analysis
General ANOVA/MANOVA Startup Panel and Quick Tab
Ribbon bar. Select the Statistics tab. In the Base group, click ANOVA to display the General ANOVA/MANOVA Startup Panel.
Classic menus. Select ANOVA from the Statistics menu to display the General ANOVA/MANOVA Startup Panel.
The Startup Panel contains one tab, Quick, which contains options to select the desired method of analysis (see also, General ANOVA/MANOVA – Index). In order to perform a General ANOVA/MANOVA analysis, a data file must be selected at this point.
For more information, refer to the Introductory Overview. See also General ANOVA/MANOVA – Index or Methods for Analysis of Variance. For related ANOVA and regression methods, refer also to GLM, DOE and Variance Components.
OK. Click the OK button to display an analysis specification dialog box, which differs depending on the Specification method selected on the Quick tab.
Cancel. Click the Cancel button to close the Startup Panel without performing an analysis.
Options. Click the Options button to display the following menu commands:
Output. Select Output to display the Analysis/Graph Output Manager dialog box, which contains options to customize the current analysis output management of STATISTICA.
Display. Select Display to display the Analysis/Graph Display Options dialog box, which contains options to customize the current analysis display of STATISTICA.
Create Macro. Select Create Macro to display the New Macro dialog box. When running analyses in STATISTICA, all options and output choices are automatically recorded; when you click Create Macro, the complete recording of all your actions will be translated into a STATISTICA Visual Basic program that can be run to recreate the analysis. See Macro (STATISTICA Visual Basic) Overview for further details.
Close Analysis. Select Close Analysis to close all dialog boxes associated with the analysis. Note that results spreadsheets/graphs will not be closed, only analysis dialogs will close.
Open Data. Click the Open Data button to display the Select Data Source dialog box, which contains options to choose the spreadsheet on which to perform the analysis. The Select Data Source dialog box contains a list of the spreadsheets that are currently active.
Select Cases. Click the Select Cases button to display the Analysis/Graph Case Selection Conditions dialog box, which is used to create conditions for which cases will be included (or excluded) in the current analysis. More information is available in the case selection conditions overview, syntax summary, and dialog box description.
W. Click the W (Weight) button to display the Analysis/Graph Case Weights dialog box, which contains options to adjust the contribution of individual cases to the outcome of the current analysis by “weighting” those cases in proportion to the values of a selected variable.
Weighted moments. Select the Weighted moments check box to specify that each observation contributes the weighting variable’s value for that observation. The weight values need not be integers. This module can use fractional case weights in most computations. Some other modules use case weights as integer case multipliers or frequency values. This check box will only be available after you have defined a weight variable via the W option above.
DF = W-1 / N-1. When the Weighted moments check box is selected, moment statistics (e.g., mean, variance) can be based on the sum of the weight values for the weighting variable (W-1), or on the number of (unweighted) observations (N-1). The sums of squares and cross products will always be based on the weighted values of the respective observations. However, in computations requiring the degrees of freedom (e.g., standard deviation, ANOVA tables), the value for the degrees of freedom can either be computed based on the sum of the weight values, or based on the number of observations. Moment statistics are based on the sum of the weight values for the weighting variable if the W-1 option button is selected, and are based on the number of (unweighted) observations if the N-1 options button is selected. For more information on options for using integer case weights, see also Define Weight.
Quick Tab
The Quick tab of the General ANOVA/MANOVA Startup Panel contains options to select the desired method of analysis. In order to perform a General ANOVA/MANOVA analysis, a data file must be selected at this point.
This tab presents a list of common experimental analysis designs (in the Type of analysis list; see also Methods for Analysis of Variance), and provides access to the three different user interfaces available in the STATISTICA General ANOVA/MANOVA module via the Specification method list [these user interfaces are also available in the STATISTICA General Linear Models (GLM), General Regression Models (GRM), Generalized Linear/Nonlinear Models (GLZ), and General Partial Least Squares Models (PLS) modules].
Type of analysis. The Type of Analysis list presents four choices for the type of ANOVA/MANOVA analysis model (see the Introductory Overview). Select the type of design that you want to perform. For more information on a particular type of analysis, click on the link below.
Specification method. The Specification method list presents the three alternative user interfaces available in ANOVA/MANOVA. You can choose among the three different user interfaces in the Specification method list only when Repeated measures ANOVA is selected as the Type of analysis. When any other Type of analysis is selected, only the Quick specs dialog is available.
Quick specs dialog. Select Quick specs dialog to use the respective Quick Specs dialog box corresponding to the selection in the Type of analysis box. The Quick Specs dialog box will prompt you to select dependent variable(s) and categorical predictor variables (depending on the selection in the Type of analysis box), and construct a default model. Use the options on the Quick Specs dialog box – Options tab to modify various computational specifications, or click the Syntax editor button to further customize the model via command syntax (see Analysis Syntax).
Analysis Wizard. Select Analysis Wizard to use a sequence of dialog boxes that will guide you through the steps for specifying an analysis. At the conclusion of the sequence of dialog boxes, you can either compute the results or click the Syntax editor button to further customize the model via command syntax, open an existing file with command syntax, or save syntax in a file for future repetitive use. This option is only available when Repeated measures ANOVA is selected as the Type of analysis.
Analysis syntax editor. Select Analysis syntax editor to specify a model via the MAN Analysis Syntax Editor dialog. That dialog provides various options for specifying designs and for modifying various parameters of the computational procedures. You can also open an existing text file with command syntax, or save syntax in a file for future repetitive use. Refer to the description of the Analysis Syntax Editor dialog box, and the description of the MANOVA Syntax for additional details. This option is only available when Repeated measures ANOVA is selected as the Type of analysis.
Note: between-groups designs. In order to specify a between-groups design, at least one dependent variable must be selected, at least one categorical predictor (grouping variable) must be selected, and at least two independent variable codes must be specified (via the Factor codes button, which can be found on both the ANOVA/MANOVA Quick Specs – Quick tab and the MAN Analysis Wizard Extended Options – Quick tab) for each between-groups factor. Note that if you do not explicitly specify the codes, by default, STATISTICA will use as codes all values encountered in the specified independent variables.
Note: within-groups designs. In order to specify a repeated measures factor design, at least two dependent variables must be selected, and the repeated measures factor has to be identified via the Within effects button available on the ANOVA/MANOVA Quick Specs – Quick tab. Multiple dependent variables that cannot be interpreted (by STATISTICA, given the design you specified) as levels of repeated measures factors are interpreted as multiple dependent variables in a MANOVA design (this will occur if, for example, you select two or more dependent variables and do not define them as repeated measures, or whenever you select more dependent variables than can be accounted for by the currently defined repeated measure factor and its levels). Note that if you have multiple repeated measures factors, you must use the GLM module.
Note: Empty Cell Designs. ANOVA/MANOVA will automatically handle designs with empty cells. To analyze Latin squares, Greco-Latin squares, or other balanced incomplete designs, simply specify them as if they were complete factorial designs. Then specify the design as a Main effects ANOVA, to estimate the main effects. In order to analyze unbalanced missing cell designs, or complex “messy” designs (as, for example, discussed in Milliken & Johnson, 1992) choose the appropriate type of method for constructing hypotheses by selecting one of the Sums of squares options from the ANOVA/MANOVA Quick Specs – Options tab; for a detailed discussion of how to analyze such designs, refer to the GLM Six types of sums of squares topic.
Note: Huge Balanced ANOVA Designs. Most between ANOVA designs can be analyzed much more efficiently when they are balanced, i.e., when all cells in the ANOVA design have equal N, when there are no missing cells in the design. STATISTICA GLM contains an option to “instruct” the program that the design is balanced, and that the more efficient computational methods can be used. Even very large designs with effects with degrees of freedom in the hundreds can thus be analyzed in mere seconds, while the general computational procedures (that do not assume a balanced design) may take several minutes to accomplish the same. See Efficient Computations for (Huge) Balanced ANOVA Designs in the GLM Introductory Overview for additional details.
One-way ANOVA in General ANOVA/MANOVA
Select One-way ANOVA from the General ANOVA/MANOVA Startup Panel – Quick tab to specify one-way ANOVA or one-way MANOVA designs. In one-way experimental designs, the effect of a single grouping variable (e.g., Gender: Male vs. Female) on one or more dependent variables can be evaluated.
Main Effects ANOVA in General ANOVA/MANOVA
Select Main effects ANOVA from the General ANOVA/MANOVA Startup Panel – Quick tab to specify main effects only ANOVA or MANOVA designs. In the subsequent Quick specs dialog box, you can specify up to four categorical predictor variables. STATISTICA will estimate and evaluate the main-effects-only model. Those types of models are frequently used in the area of industrial experimentation to screen large numbers of factors in highly fractionalized designs (e.g., 2-level screening designs; see Experimental Design). This option should also be chosen when you want to analyze balanced incomplete (nested) designs, when only the main effects can be estimated. Note that if you want to select five or more categorical predictor variables, you must use the General Linear Models module.
Factorial ANOVA in General ANOVA/MANOVA
Select Factorial ANOVA from the General ANOVA/MANOVA Startup Panel – Quick tab to specify factorial ANOVA or MANOVA designs. In the subsequent Quick specs dialog box, you can specify up to four categorical predictor variables, and specify either a full factorial ANOVA (MANOVA) model, or a custom factorial design that includes terms to a user-specified factorial degree (e.g., main effects and two-way interactions only). Those types of models are frequently used in the area of industrial experimentation, where fractional factorial designs (e.g., 2(k-p) fractional factorial designs or 3(k-p) and Box Behnken designs) are commonly used to evaluate many factors and their lower-order interactions in few experimental runs (observations). Note that if you want to select five or more categorical predictor variables, you must use the General Linear Models module.
Repeated Measures ANOVA in General ANOVA/MANOVA
Select Repeated measures ANOVA from the General ANOVA/MANOVA Startup Panel – Quick tab to specify designs with repeated measures. In the subsequent Quick specs dialog box, you can specify up to four categorical predictor variables, and two or more dependent variables that will be interpreted by the program as repeated measures (e.g., the scores on a test taken at Time 1 and Time 2). The Quick Specs dialog box contains options for specifying the within-subject (repeated measures) design, and you can specify univariate designs with a single factor or measurement that is measured repeatedly. Note that if you want to select five or more categorical predictor variables or multiple within-subject (repeated measures) factors, you must use the General Linear Models module.
The specification of repeated measures designs is further described in the context of the Quick Specs dialog box, as well as the MAN Analysis Syntax Editor. These designs are also discussed in Notes.
ANOVA/MANOVA Quick Specs
Select Quick specs dialog from the Specification method list and specify a Type of analysis on the General ANOVA/MANOVA Startup Panel – Quick tab. Then click the OK button to display the ANOVA/MANOVA Quick Specs dialog box for the type of analysis selected. The dialog box contains two tabs: Quick and Options. Note that the title of this dialog box will always reflect the Type of analysis that has been selected in the Startup Panel. Use the options on the Quick tab to select and specify the types of variables for your general ANOVA/MANOVA model. Use the options on the Options tab to specify the method of parameterization, type of sums of squares, and a cross validation variable.
OK. Click the OK button after you have specified your design to display the ANOVA Results dialog box.
Cancel. Click the Cancel button to return to the General ANOVA/MANOVA Startup Panel.
Options. Click the Options button to display the Options menu.
Syntax editor. Click the Syntax editor button to display the MAN Analysis Syntax Editor; all current specifications (e.g., variable selections, design specifications) on the Quick specs dialog will automatically be transferred (translated) to the syntax editor, and can be further modified or saved. However, note that when you click the < Back button in the MAN Analysis Syntax Editor after introducing additional customizations, those customizations specified on the MAN Analysis Syntax Editor will not be translated “back” to the Quick specs dialog.
ANOVA/MANOVA Quick Specs – Options Tab
Select the Options tab of the ANOVA/MANOVA Quick Specs dialog box to access the options described here.
Sweep delta. Enter the negative exponent for a base-10 constant delta (delta = 10-sdelta) in the Sweep delta field; the default value is 7. Delta is used (1) in sweeping, to detect redundant columns in the design matrix, and (2) for evaluating the estimability of hypotheses; specifically a value of 2*delta is used for the estimability check.
Inverse delta. Enter the negative exponent for a base-10 constant delta (delta = 10-idelta) in the Inverse delta field; the default value is 12. Delta for matrix inversion is used to check for matrix singularity in matrix inversion calculations.
Parameterization. Select the type of parameterization options you want to use for your general ANOVA/MANOVA model in the Parameterization group box.
Sigma-restricted. Select the Sigma-restricted check box to compute the design matrix for categorical predictors in the model based on sigma-restricted coding; if it is not selected, the overparameterized model will be used. The sigma-restricted model is the default parameterization; see the GLM Introductory Overview topic The Sigma-Restricted vs. Overparameterized Model for details.
No intercept. Select the No intercept check box to exclude the intercept from the model.
Lack of fit. Select the Lack of fit check box to compute the sums of squares for the pure error, i.e., the sums of squares within all unique combinations of values for the (categorical) predictor variables. On the ANOVA Results dialog box, options are available to test the lack-of-fit hypothesis. Note that in large designs with continuous predictors, the computations necessary to estimate the pure error can be very time consuming. See the GLM Introductory Overview topic Lack-of-Fit Tests Using Pure Error for a discussion of lack-of-fit tests and pure error; see also Experimental Design.
Cross-validation. Click the Cross-validation button to display the Cross-Validation dialog box for specifying a categorical variable and a (code) value to identify observations that should be included in the computations for fitting the model (the analysis sample); all other observations with valid data for all predictor variables and dependent variables will automatically be classified as belonging to the validation sample (see the Residuals tab for a description of the available residual statistics for observations in the validation sample); note that all observations with valid data for all predictor variables but missing data for the dependent variables will automatically be classified as belonging to the prediction sample (see the Residuals tab topic for a description of available statistics for the prediction sample).
Sums of squares. Select the method for constructing main effect and interaction hypotheses in unbalanced and incomplete designs in the Sums of squares group box. These methods are discussed in the GLM Introductory Overview. For the sigma-restricted model the default value is Type VI (unique or effective hypothesis decomposition; see Hocking, 1985) and Type IV is not valid; for the overparameterized model the default value is Type III (orthogonal; see Goodnight, 1980), and Type VI is not valid.
ANOVA/MANOVA Quick Specs – Quick Tab
Select the Quick tab of the ANOVA/MANOVA Quick Specs dialog box to access the options described here. The specific options that are available in this dialog box depend on the Type of analysis selected on the Quick tab of the Startup Panel. For alternative ways of specifying designs in ANOVA, see Methods for Specifying Designs.
Variables. Click the Variables button to display the standard variable selection dialog box. Depending on the Type of analysis selected on the General ANOVA/MANOVA Startup Panel – Quick tab, STATISTICA will prompt you to select one or more dependent variables and up to four categorical predictor variables (grouping variables or factors in the design). For example, if you select One-way ANOVA from the General ANOVA/MANOVA Startup Panel – Quick tab, you will be prompted to enter one or more dependent variables and a single predictor variable. Note that in the ANOVA module, you can only specify up to four categorical predictors. If you want to specify five or more categorical predictors, use the General Linear Models module. For details concerning different types of designs, and the distinction between continuous and categorical predictor variables, see the GLM Introductory Overview.
Within effects. Click the Within effects button to display the Specify Within-Subjects Factor dialog box. In this dialog box, specify the within-subject (repeated measures) factor and the respective number of levels (see also, Notes for a discussion of repeated measures ANOVA). In short, the variables in the dependent variable list are assigned to levels for the repeated measures factor. Note that in the ANOVA module, you can only specify one within-subject (repeated measures) factor. If you want to specify multiple repeated measures factors, use the General Linear Models module. This button is only available if you have selected Repeated measures ANOVA as the Type of analysis on the General ANOVA/MANOVA Startup Panel – Quick tab.
Factor codes. Click the Factor codes button to display the Select Codes for Indep. Vars (Factors) dialog box for selecting the codes identifying the levels for the categorical predictor variables (grouping variables). Codes must be integer values or text labels (but can be dates, times, etc.), and at least two codes must be specified for each categorical predictor variable.
Between effects. Click the Between Effects button to display the GLM Between Effects dialog box. In this dialog box, specify the factorial degree of your design. This button is only available if you have selected Factorial ANOVA or Repeated measures ANOVA as the Type of analysis on the General ANOVA/MANOVA Startup Panel – Quick tab.
If the design for the categorical predictor variables needs to be customized, click the Syntax editor button (in the ANOVA/MANOVA Quick Specs dialog box) to further customize the model via command syntax. For alternative ways of specifying designs in ANOVA/MANOVA, see Methods for Specifying Designs.
Specify Within-Subjects Factor
Click the Within Effects button on the ANOVA/MANOVA Quick Specs dialog box – Quick tab to display the Specify within-in subjects factor dialog box
In this dialog box, specify the within-subject (repeated measures) factor and the respective number of levels (see also the GLM Introductory Overview for a discussion of repeated measures ANOVA). In short, the variables in the dependent variable list are assigned to levels for the repeated measures factor. Note that with the ANOVA module, you can specify only one within-subject (repeated measures) factor. If you want to specify multiple repeated measures factors, use the General Linear Models module.
If the factor specified in this dialog box does not account for all of the previously selected dependent variables (via the Variables button), a MANOVA will be performed. See Multiple Dependent Measures (MANOVA) and ANOVA/MANOVA Notes – Specifying Within-Subjects Univariate Designs for further details.
No. of levels. Enter the Number of levels of your within-subject (repeated measures) factor.
Factor Name. Enter the Factor name of your within-subject (repeated measures) factor.
OK. Click the OK button after you have specified your within-subject (repeated measures) factor to return to the ANOVA/MANOVA Quick Specs – Quick tab.
Cancel. Click the cancel button to return to the ANOVA/MANOVA Quick Specs – Quick tab, ignoring any changes that you have made in this dialog box.
Select Codes for …
The Select Codes for … dialog box is displayed whenever you need to select codes for grouping variables (i.e., specific values of a variable to be used to identify group membership of cases or observations). It will display the grouping variable name and then prompt you to enter the codes for that variable (e.g., 1, Male, “#52“).
You can either type specific codes in the window, or enter an * (asterisk) to accept all of the codes available for the variable. When entering text codes in this window, you can save time by typing the codes in the edit field with lower case letters and STATISTICA will automatically convert them to upper case letters (e.g., smith will be converted to SMITH). However, since STATISTICA distinguishes between lower and upper case letters in codes, be sure to place any code that needs to remain in lower case letters in single or double quotes (e.g., ‘Smith’, “Pepsi”, “StatSoft”, see below).
The following conventions apply when entering code names:
1. If codes consist of only uppercase letters and numbers (e.g., ITEM9, MALE) but does not start with a number, then the code will be displayed in the edit field without single or double quotation marks around them.
2. Codes that have been entered in the spreadsheet in upper and lower case or only lower case letters (e.g., ‘Male’, “test1”) must have quotation marks around them (single or double) in this edit window in order to preserve the upper and lower case character formatting.
3. Codes that start with a number or a character other than a letter (e.g., “=shift3”, ’39lbs’, “49°C”, “15-Apr”) must have quotation marks (single or double) around them.
4. Date values will be displayed in quotation marks (single or double) if the variable’s format is set to Date (see Variable).
For information on using code values greater than 32,000 and using dates as codes, see Codes (Values of Grouping Variables).
Shortcut. If you leave all fields in this dialog box empty (blank) and click OK, STATISTICA will identify and automatically use all available codes (for the previously specified grouping variables). Also, the same effect will be achieved if you do not enter this dialog box but click the OK button in the previous (design specification) dialog box without explicitly specifying any codes.
All. Select all of the variable’s values by clicking the All button. When you click this button, STATISTICA will automatically search the variable values (taking into account any case selection conditions) and enter all the variable’s codes in the edit window. Alternatively, to select all codes for a variable, click the OK button (leaving the edit window blank) or enter an * (asterisk) in the edit window.
Zoom. If you want to view the values of the variable before selecting specific codes, click the View button to display the Labels/Stats Window. You can browse through a table of sorted variable values in this window (all values will be shown regardless of any currently specified case selection conditions).
Select All. This button is available when you have more than one variable listed in this dialog box. If you want to automatically enter all of the codes for each of the variables listed (taking into account any case selection conditions), then click the Select All button. You can also select all codes for each variable by clicking the OK button before entering any codes or clicking any other buttons.
Reviewing General ANOVA/MANOVA Results
ANOVA Results
Click the OK button in the ANOVA/MANOVA Quick Specs dialog box to display the ANOVA Results dialog box, which can contain as many as eight tabs: Quick, Summary, Means, Comps, Profiler, Resids, Matrix, and Report. Note that this dialog box will also be displayed when you click the OK (Run) button in the MAN Analysis Wizard Between Design dialog box, the MAN Analysis Wizard Extended Options dialog box, and the MANOVA Analysis Syntax Editor.
Use the options on the Quick tab to produce summaries of the main results, for example, ANOVA tables, parameter estimates, etc.
The Summary tab contains options to produce additional summaries of the main results, for example, R-square, descriptive statistics, etc.
The Means tab contains options to compute (1) observed unweighted means, (2) observed weighted means, and (3) least squares (predicted) means; tables of marginal means can be displayed in spreadsheets, or summarized in graphs (with or without confidence intervals).
The Comps tab contains options to test specific hypotheses about linear combinations of (least squares) means (planned comparisons).
The Profiler tab contains options to compute and display desirability profiles for combinations of multiple variables.
The Resids tab contains options to compute predicted values and detailed residual statistics (e.g., residual, deleted residuals, Mahalanobis distance, leverage values, DFFITS values, etc.).
The Matrix tab contains options to review various matrices involved in the computations of the main results, as well as detailed collinearity statistics and partial and semi-partial (or part) correlations.
Finally, the Report tab contains options to send results to a report.
More results. Click the More results button to display the ANOVA More Results dialog box with additional tabs and options.
Modify. Click the Modify button to return to the previous dialog box for the respective analysis (see Methods for Specifying Designs).
Close. Click the Close button to close the results dialog box.
Options. Click the Options button to display the Options menu.
By Group. Click the By Group button to display the By Group specification dialog box.
ANOVA Results – Quick Tab
Select the Quick tab of the ANOVA Results dialog box to access options to display the main results for the current analysis.
All effects/Graphs. Click the All effects/Graphs button to display the Table of all Effects dialog box. This dialog box shows the summary ANOVA (MANOVA) table for all effects; you can then select an effect and produce a spreadsheet or plot of the observed unweighted, observed weighted, and least squares means. Refer also to the description of the options on the Means tab for details concerning the different means computed by the program, and their standard errors.
All effects. Click the All effects button to create a spreadsheet with the ANOVA (MANOVA) table for all effects. If the design is univariate iature (involves only a single dependent variable), then the univariate results ANOVA table will be displayed; the univariate results ANOVA table is also displayed for univariate repeated measures designs; if the design is multivariate iature, then the multivariate results MANOVA table will be displayed, showing the statistics as selected in the Multivariate tests box (via the Summary tab). For a discussion of the different types of designs, and how the respective ANOVA/MANOVA tables are computed, see the GLM Introductory Overview.
Effect sizes. Click the Effect sizes button to create a spreadsheet with the ANOVA (MANOVA) table for all effects and the effect sizes and powers (i.e., Partial eta-squared, Non-centrality, and Observed power). Partial eta-squared is the proportion of the variability in the dependent variables that is explained by the effect. The Non-centrality value is the main statistic used to compute power, and the Power column contains the power values of the significant test on the effect. The ANOVA (MANOVA) table is described above, see All effects.
Alpha values. The values in the Alpha values group box are used in all results spreadsheets and graphs, whenever a confidence limit is to be computed, or a particular result to be highlighted based its statistical significance.
Confidence limits. Enter a value in the Confidence limits field to be used for constructing confidence limits in the respective results spreadsheets or graphs (e.g., spreadsheet of parameter estimates, graph of means); by default 95% confidence limits will be constructed.
Significance level. Enter a value in the Significance level field to be used for all spreadsheets and graphs, where statistically significant results are to be highlighted (e.g., in the All effects spreadsheet); by default all results significant at the p < .05 level will be highlighted.
GLM and ANOVA Results – Summary Tab
Select the Summary tab of the GLM Results or the ANOVA Results dialogs to access options to display the main results for the current analysis. Depending on the type of design, whether or not random effects are present in the design, whether there are categorical predictor variables in the design, and/or whether there are within-subject (repeated measures) in the design, some of the options described below may not be available on the Summary tab. For instance, if you select Huge balanced ANOVA from the GLM Startup Panel – Quick tab or specify SSTYPE=BALANCED in the GLM (STATISTICA) syntax, several advanced options are not available in the results dialog box (due to the computational shortcuts employed in order to efficiently analyze huge balanced designs; see also Balanced ANOVA in the Introductory Overview for details).
All effects/Graphs. Click the All effects/Graphs button to display the Table of All Effects, which shows the summary ANOVA (MANOVA) table for all effects; you can then select an effect and produce a spreadsheet or graph of the observed unweighted, observed weighted, and least squares means. Refer also to the description of the options on the Means tab for details concerning the different means computed by STATISTICA, and their standard errors. This button is available only if 1) the current design includes categorical predictor variables or within-subject (repeated measures) effects, and 2) if there are random effects in the current design, there is only a single dependent variable (multivariate results for mixed-model designs cannot be computed).
All effects. Click the All effects button to display a spreadsheet with the ANOVA (MANOVA) table for all effects. If the design is univariate iature (involves only a single dependent variable), the univariate results ANOVA table will be produced; the univariate results ANOVA table is also produced for univariate repeated measures designs (where appropriate, multivariate tests for repeated measures can be computed via the Within effects options, see below); if the design is multivariate iature, the multivariate results MANOVA table will be displayed, showing the statistics as selected in the Multivariate. tests group box, see below; if the design includes random effects and multiple dependent variables, multiple univariate ANOVA tables (spreadsheets) will be created, one for each dependent variable (in that case, the tests reported in the ANOVA table will use synthesized error terms). For a discussion of the different types of designs, and how the respective ANOVA/MANOVA tables are computed, see the Introductory Overview.
Univariate results. Click the Univariate results button to create a spreadsheet with the standard univariate ANOVA table for each dependent variable, regardless of whether the design includes within-subject (repeated measures factors). To review the univariate results for the within-subject design, use option Univ. tests in the Within effects box (see below). If analyzing a Mixed Model ANOVA, the results generated by clicking the Univarate results button are fixed effects results. These results serve as a comparison to the results with random effects generated by clicking the All effects button.
Cell statistics. Click the Cell statistics button to create a spreadsheet of the descriptive statistics for each cell in the design; specifically, descriptive statistics are computed for the dependent variables, as well as any continuous predictors (covariates) in the design, for each column of the overparameterized design matrix for categorical effects. Thus, marginal means and standard deviations are available for each categorical effect in the design. Note that for lower-order effects (e.g., main-effects in designs that also contain interactions involving the main effects), the reported means are weighted marginal means, and as such estimates of the weighted population marginal means (for details, see, for example, Milliken and Johnson, 1984, page 132; see also the discussion of means in the description of the options on the Means tab). Least squares means (e.g., see Searle, 1987) can be computed on the Means tab, or via the All effects/Graphs option (see above); usually, in factorial designs, it is the least squares means that should be reviewed when interpreting significant effects from the ANOVA or MANOVA.
Between effects. The options in the Between effects group box allow you to review, as appropriate for the given design, various results statistics for the between-group design such as Design terms, Whole model R, Coefficients, and Estimate. For specific details on these tests/options, see Summary Results for Between Effects in GLM and ANOVA.
Within effects. The options in the Within effects group box allow you to review, as appropriate for the given design, various results statistics for the within-subject (repeated measures) design such as Multivariate tests, Univariate. tests, G-G and H-F tests, Effect SSCPs, Sphericity, Error SSCPs, and Error Corrs. For specific details on these tests/options, see Summary Results for Within Effects in GLM and ANOVA. If the current design does not include within-subject (repeated measures) factors, these options are not displayed on this tab.
Random effects. The options in the Random effects group box allow you to display the results related to the analysis of the random effects in the model such as Variance components, Expected mean squares, Bar plot, Denominator synthesis, and Pie chart. This is only available in GLM, not in ANOVA. For specific details on these tests/options, see Summary Results for Random Effects in GLM.
Multiv. tests. In the Multiv. tests group box you can select the specific multivariate test statistics that are to be reported in the respective results spreadsheets. For descriptions of the different multivariate tests statistics, refer to the GLM Introductory Overview topic Multivariate Designs. These options are only available if the current design is multivariate iature, i.e., if there are multiple dependent measures, or a within-subject (repeated measures) design with effects that have more than 2 levels (and hence, multivariate tests for those effects can be computed).
Alpha values. Use the Alpha values group box to specify Confidence limits and Significance level values. These values are used in all results spreadsheets and graphs whenever a confidence limit is to be computed or a particular result is to be highlighted based its statistical significance.
Confidence limits. Enter the value to be used for constructing confidence limits in the respective results spreadsheets or graphs (e.g., spreadsheet of parameter estimates, graph of means) in the Confidence limits field. By default 95% confidence limits will be constructed.
Significance level. Enter the value to be used for all spreadsheets and graphs where statistically significant results are to be highlighted (e.g., in the All effects spreadsheet) in the Significance level field. By default all results significant at the p<.05 level will be highlighted.
GLM, GRM, and ANOVA Results – Means Tab
Select the Means tab of the GLM Results, GLZ Results, GRM Results, or the ANOVA Results dialog boxes to access options to display the means for any effect containing categorical predictor variables only, or for repeated measures effects. If there are no categorical effects or repeated measures effects in the model, these options are not available.
Effect. Select the desired effect in the Effect drop-down list, and then select to display or plot either the Observed, unweighted; Observed, weighted; or Least squares means. You can also display the means (unweighted, weighted, or least squares) for all categorical effects by clicking the respective All marginal tables buttons (see below).
Observed, unweighted. Click the Observed, unweighted button to produce a spreadsheet of the observed unweighted means for the selected Effect (see above). These are computed by averaging the means across the levels and combinations of levels of the factors not used in the marginal means table (or plot), and then dividing by the number of means in the average. Thus, each mean that is averaged to compute a marginal mean is implicitly assigned the same weight, regardless of the number of observations on which the respective mean is based. The resulting estimate is an unbiased estimate of m-bar (mu-bar), the population marginal mean. If the design is not balanced, and some means are based on different numbers of observations, then you can also compute the weighted marginal means (weighted by the respective cell N’s). Note that the weighted mean is an unbiased estimate of the weighted population marginal mean (for details, see, for example, Milliken and Johnson, 1984, page 132), and the standard errors for these means are estimated from the pooled within-cell variances.
Plot. Click the Plot button to create a graph of the observed unweighted means for the selected Effect (see above). Depending upon your design, when you click this button, the Dependent Vars for the Plot dialog box will be displayed, which allows you to specify the dependent variables to use in the means plot. Next, the Specify the Arrangement of the Factors in the Plot dialog box may be displayed, which allows you to specify the arrangement of factors that STATISTICA will use in the means plot.
All marginal tables, observed unweighted. Click the All marginal tables, observed unweighted button to produce spreadsheets of the observed unweighted means for all of the categorical effects (regardless of what is selected in the Effect field).
Observed, weighted. Click the Observed, weighted button to produce a spreadsheet of the observed weighted means for the selected Effect (see above). These are computed as the standard means for the respective combinations of factor levels, directly from the data. Thus, the resulting means are weighted marginal means, since they are weighted by the number of observations in each cell of the design (in full factorial designs, you could also compute the weighted marginal means by averaging the cell means involved in each marginal mean, weighted by the respective number of observations in the respective cells). Note that the weighted mean is an unbiased estimate of the weighted population marginal mean (for details, see, for example, Milliken and Johnson, 1984, page 132), and the standard errors for these means are estimated from the respective cell variances for each respective mean (i.e., the respective actual observed standard deviations in each cell).
Plot. Click the Plot button to create a graph of the observed weighted means for the selected Effect (see above). Depending upon your design, when you click this button, the Dependent Vars for the Plot dialog box will be displayed, where you can specify the dependent variables to use in the means plot. Next, the Specify the Arrangement of the Factors in the Plot dialog box may be displayed, where you can specify the arrangement of factors that STATISTICA will use in the means plot.
All marginal tables, observed weighted. Click the All marginal tables, observed weighted button to produce spreadsheets of the observed weighted means for all of the categorical effects (regardless of what is selected in the Effect field).
Least squares means. Click the Least squares means button to produce a spreadsheet of the least squares means for the selected Effect. Least squares means are the expected population marginal means, given the current model. Thus, these are usually the means of interest when interpreting significant effects from the ANOVA or MANOVA table. Note that for full factorial designs without missing cells, the Least squares means are identical to the Observed, unweighted means (see above). Least squares means are also sometimes called predicted means, because they are the predicted values when all factors in the model are either held at their means or the factor levels for the respective means. Note that if there are continuous predictors (covariates) in the model, the least squares means are computed from the values for those predictors as set in the Covariate values group box (see below). For details concerning the computation of least squares means refer to Milliken and Johnson (1992), Searle, Speed, and Milliken (1980), or Searle (1987). Note that when you are in the GLZ module, STATISTICA does not compute the least squares means, rather the equivalent expected values for the respective non-linear (generalized linear) model, i.e., the predicted means are computed.
Plot. Click the Plot button to create a graph of the least squares means for the selected Effect. Depending upon your design, when you click this button, the Dependent Vars for the Plot dialog box will be displayed, where you can specify the dependent variables to use in the means plot. Next, the Specify the Arrangement of the Factors in the Plot dialog box may be displayed, where you can specify the arrangement of factors that STATISTICA will use in the means plot.
All marginal tables, least squares means. Click the All marginal tables, least squares means button to produce spreadsheets of the least squares means for all of the categorical effects (regardless of what is selected in the Effect field).
Covariate values. The options in the Covariate values group box determine at what values the continuous predictor variables (covariates) will be set for the computation of least squares means. By default, the values for any continuous predictors (covariates) in the model will be held at their respective overall Covariate means. You can also specify User-defined values for the covariates; after selecting this option button, click the Define button to display the Values for Covariates dialog box and specify the values. Finally, you can set the values for the continuous predictor variables so as to compute the Adjusted means, these are the predicted values (means) after “adjusting” for the variation of the means of the continuous predictor variables over the cells in the current Effect (see above). Adjusted means are widely discussed in the traditional analysis of covariance (ANCOVA) literature; see, for example, Finn (1974), Pedhazur (1973), or Winer, Brown, and Michels, K. M. (1991). The Adjusted means option button is only available in full factorial designs. Note that the Covariate values group box will not be available when you are using the ANOVA module.
Show std errs. Select the Show std errs check box to display standard errors and confidence limits for the means in the spreadsheet or plot of means (see the above buttons). The plot of means will show the confidence limits as error bars around the respective means. The actual confidence limits are based on the current setting in the Confidence limits field available on the GLM Results – Quick tab.
Note: standard errors for unweighted marginal means. The standard errors for the observed unweighted means are computed based on the current error term from the ANOVA table:
Std.Err.(m-bar) = sest / t * sqrt[S(1/ni)]
In this formula, sest is the estimated sigma (computed as the square root of the estimated error variance from the current ANOVA table), t is the number of means that is averaged to compute the respective marginal mean, and ni refers to the number of observations in the t experimental conditions from which the respective unweighted marginal mean is computed.
Note: standard errors for weighted marginal means. The standard errors for the marginal means are computed as if you had ignored the other factors (those not in the marginal means table). Thus, for weighted marginal means the standard error is not dependent on the estimate of the error variance from the current ANOVA table, and hence, it is not dependent on the current model that is being fit to the data.
Show means +/- std errs. Select this check box to show in the tables and plots of means the plus or minus standard error range around each mean. These will only be shown if the Show std errs check box is also selected. By default, when the Show means +/- std errs check box is cleared, the (95%) confidence intervals will be computed instead (or any other confidence interval, consistent with the specification in the Confidence limits field of the Quick tab).
GLM, GRM, and ANOVA Results – Comps Tab
Select the Comps tab (Comparisons tab) of the GLM Results, GRM Results, or the ANOVA Results dialogs to access options to perform a priori (planned) comparisons between the means in the design. Note that complex a priori hypotheses can also be tested via the Estimate button, on the Summary tab (see the Between effects group box). A discussion of the rationale and applications of planned comparisons and along with post-hoc tests is also provided in the Contrast analysis and post-hoc tests section in the context of the ANOVA/MANOVA module. Note that these options are only available if the current design contains effects for categorical predictor variables, or within subject (repeated measures) effects.
Note: planned (a priori) comparisons (contrast analysis ). A priori planned comparisons are usually performed after an analysis involving effects for categorical predictor variables has yielded significant effects. The purpose of planned comparisons then is to determine whether the pattern of means for the respective effect follows the one that was hypothesized, that is, you compare the specific means for the effect of interest that were hypothesized to be different from each other (e.g., in a 3-level effect Group, you might test whether the mean for level 1 is significantly different from the mean for level 3). STATISTICA GLM provides a convenient user-interface for specifying contrast coefficients; these coefficients are then used to compare the least squares means (see also the Means tab for details) for the respective chosen Effect (see below). Thus, the contrasts for the planned comparisons are applied to the means predicted by the current model; these means are identical to the observed unweighted means in the case of full factorial designs without continuous predictors (covariates).
Note: random effects. The error terms for all planned comparisons will always be computed from the sums of squares residuals. Those error terms may not be appropriate, and when random effects are involved, you should interpret the results of planned comparisons with caution.
Effect. Select the desired effect from all of those effects in the current design in the Effect drop-down box. A priori planned comparisons are performed on the marginal means (least squares, see below) for effects involving only categorical predictor variables.
Planned comparisons of LS means. The options in the Planned comparisons of LS means group box allow you to compute planned comparisons of the least squares means for the current model. The contrast coefficients can be entered Separately for each factor in the current Effect (see above), or Together as a vector simultaneously for all factors (see below). When there are continuous predictors (covariates) in the model, then the least squares means used in the comparison are computed from the covariates at their means (regardless of the selection in the Covariate values group box on the Means tab).
Display least squares means. Click the Display least squares means button to display a spreadsheet with the least squares means for the currently selected Effect; see also the Means tab.
Contrasts for LS means. Click the Contrasts for LS means button to display the respective contrast specification dialog for the chosen Effect. If you requested to enter the contrast coefficients Separately for each factor (see below), then the contrast specification dialog will allow you to enter the contrast coefficients for each factor; if you requested to enter the contrast coefficients Together (contrast vector), then the contrast specification dialog will prompt you to enter a matrix (or vector) of contrast coefficients for all levels of the chosen effect (the respective contrast specification dialog will show and label all levels of the respective effect on the dialog).
Depending on the type of Effect that you have selected (e.g., a main effect, within-subject effect, interactions, etc.) and the option buttons you have selected in the Enter contrasts separately or together and/or the Contrasts for dependent variables group boxes various contrast specification dialogs will be displayed. See the Specify Contrasts for This Factor, Specify Contrasts, Contrast for Between Group Factors, Enter Contrasts for this Factor, Repeated Measures, Contrasts for Within-Subject Factors, and Contrasts for Dependent Variables dialogs for further details.
Compute. After you specify your contrasts for least squares (via the Contrasts for LS means button, see above), click the Compute button to display three spreadsheets: the Between contrast coefficients spreadsheet, Contrast estimates spreadsheet, and the Univariate or Multivariate test of significance for planned comparisons spreadsheet.
Enter contrasts separately or together. Use the options in the Enter contrasts separately or together group box to specify how you want to enter the contrasts when you click the Contrasts for LS means button (see above). Select the Separately for each factor option button to enter the contrast coefficients for each factor in the current Effect. Select the Together (contrast vector) option button to enter the contrast coefficients for each cell in the current Effect (combination of factor levels for the factors in the current Effect).
Note that the method of computing the results for the planned comparison is actually identical, regardless of how the contrast coefficients were entered, and any contrast specified via the separately method can also be represented via the together method (but not vice versa). Specifically, when Separately for each factor is selected, the Kronecker Product (see the STATISTICA Visual Basic function MatrixKroneckerMultiply) of the specified matrices of contrast coefficients for each factor will be applied to the set of least squares means for the respective chosen Effect.
Note: separately for each factor. This method of specifying contrasts is most convenient when you want to explore interaction effects, for example, to test partial interactions within the levels of other factors. Suppose you had a three-way design with factors A, B, and C, each at 2 levels (so the design is a 2x2x2 between group full factorial design), and you found a significant three-way interaction effect. Recall that a three-way interaction effect can be interpreted as a two-way interaction, modified by the level of a third factor. Suppose further that the original hypothesis for the study was that a two-way interaction effect exists at level 1 of C, but no such effect exists at level 2 of factor C. Entering contrast coefficients Separately for each factor, you could enter the following coefficients:
· For factor A: 1 -1
· For factor B: 1 -1
· For factor C: 1 0
The Kronecker product of these vectors shows which least squares means in the design are compared by this hypothesis:
Levels, Factor C |
|
|
|
1 |
|
|
|
|
|
|
2 |
|
|
|
Levels, Factor B |
|
1 |
|
|
|
2 |
|
|
1 |
|
|
|
2 |
|
Levels, Factor A |
1 |
|
2 |
|
1 |
|
2 |
1 |
|
2 |
|
1 |
|
2 |
Coefficients |
1 |
|
-1 |
|
-1 |
|
1 |
0 |
|
0 |
|
0 |
|
0 |
Thus, this hypothesis tests the A by B interaction within level 1 of factor C.
Note: together (contrast vectors). This method of specifying contrasts can be used to compare any set of least squares means in the current Effect. In the table shown above, you could specify directly the contrast vector shown in the row labeled Coefficients. You could also compare any set of least squares means within the three-way interaction. For example:
Levels, Factor C |
|
|
|
1 |
|
|
|
|
|
|
2 |
|
|
|
Levels, Factor B |
|
1 |
|
|
|
2 |
|
|
1 |
|
|
|
2 |
|
Levels, Factor A |
1 |
|
2 |
|
1 |
|
2 |
1 |
|
2 |
|
1 |
|
2 |
Coefficients |
1 |
|
0 |
|
0 |
|
-1 |
0 |
|
1 |
|
-1 |
|
0 |
This set of coefficients cannot be set up in terms of main effects and interactions of factors (i.e., via option button Separately for each factor), and could only be specified via the Together option button.
Contrasts for dependent variables. Use the options in the Contrasts for dependent variables group box to determine if you are able to specify a set of contrast matrices for the dependent measures after you click the Contrasts for LS means button (see above). Select the Yes option button if you want to specify a set of contrast matrices for the dependent measures. Select the No option button, if you do not want to. Note that these options are only available if the current design involves multiple dependent variables, or, in case of within subject (repeated measures) designs, multiple dependent measures.
Multivariate tests. Use the options in the Multivariate tests group box to select the specific multivariate tests statistics that are to be reported in the respective results spreadsheets. For description of the different multivariate tests statistics, refer to the Multivariate designs topic in the Introductory Overview. These options are only available if the current design is multivariate iature, i.e., if there are multiple dependent measures, or a within-subject (repeated measures) design with effects that have more than 2 levels (and hence, multivariate tests for those effects can be computed).
GLM, GRM, and ANOVA Results – Matrix Tab
Select the Matrix tab of the GLM Results, GRM Results, or ANOVA Results dialog to access options to review various matrices involved in the computations for of the ANOVA tables.
Between design. The options under Between design are used to review various matrices computed for the between design. For details on the specific matrices, see Between Design Matrices in GLM, GRM, and ANOVA.
Between effects. The options under Between effects are used to review the sums of squares and cross-product matrices and derived matrices for the between effects in the design. For details on the specific matrices, see Between Effects Matrices in GLM, GRM, and ANOVA.
Within effects. The options under Within effects are used to review various matrices involved in the computations for the within-subjects (repeated measures) effects. For details on the specific matrices, see Within Effects Matrices in GLM, GRM, and ANOVA.
GLM , GLZ, GRM, PLS, and ANOVA Results – Report Tab
Select the Report tab of the GLM Results, GLZ Results, GRM Results, PLS Results, or the ANOVA Results dialog boxes to access the options described here.
Send/print to Report window. Use the options in the Sent/print to Report window group box to send the variable specifications, command syntax, and prediction equation to a report. Note that the Also send to Report Window check box must be selected in the Analysis/Graph Output Manager dialog box for these options to work. If it is not selected, you will be prompted to modify your output options when you click the Variables and command syntax or Pred. equation buttons (see below).
Variables and command syntax. Click this button to send the current data specifications, including the data file name, currently selected variables, and codes to a report. Use the options in the Analysis/Graph Output Manager dialog box to control the level of detail of the printout (e.g., whether to print long and short text labels, etc.). Also, this option will send the command syntax for the current analysis to the report window. The command syntax provides a detailed log of the specifications for the current analysis. You can send the command syntax to the report window even if you originally specified the design via Quick Specs dialog boxes or the Analysis Wizard dialog boxes.
Pred. equation. Click the Pred. equation button to send the current prediction equation for each dependent variable (for the between design only) to a report. If there are within (repeated measures) factors in the design, they will be ignored, i.e., a prediction equation will be printed for each dependent variable that was originally selected. This option is very useful if you want to copy the prediction equation and paste it into another spreadsheet or an equation plotter (e.g., plot a User-Defined functions in 2D or 3D graphs, see the Graph Options dialog box – Custom Function options pane topic). This option is not available in GLZ and PLS.
# digits. Enter the number of digits that you want displayed in the prediction equation in the # digits field. This option is not available in GLZ and PLS.
Model Profiler. Click this button to display the Model Profiler, where you can run simulations based on the specified model. Note that this option is only available if General liner models was selected in the Startup Panel.
Code generator. If your program is licensed to include this feature, you can generate computer code to implement the current model for predicting new observations. When you click this button you have the following choices:
PMML. Click this command to generate code in Predictive Model Markup Language (PMML) which is an XML-based language for storing information about fully trained (parameterized) models, and for sharing those models with other applications. STATISTICA and STATISTICA Enterprise Server contain facilities to use this information to compute predicted values or classifications, i.e., to quickly and efficiently deploy models (typically in the context of data mining projects).
STATISTICA Visual Basic (SVB). Click this command to generate a STATISTICA Visual Basic program containing the code implementing the model. This code will be generated in a form compatible with the nodes of STATISTICA Data Miner; however, you can also simply copy/paste the relevant portion of this code to include it in your custom Visual Basic programs. The code will automatically be displayed in the STATISTICA Visual Basic program editor window.
C/C++. Click this command to generate code compatible with the C/C++ computer language. This option is useful if you want to include the information provided by the final model into custom (C/C++) programs. (See also, Using C/C++/C# Code for Deployment.)
C#. Click this command to generate code as C#.
Java. Click this command to generate code in Java script.
SAS. Click this command to generate deployment code for the created model as SAS code (a .sas file). See also, Rules for SAS Variable Names and Example: SAS Deployment.
SQL stored procedure in C#. Click this command to generate code as a C# class intended for use in a SQL Server user defined function.
SQL User Defined Function in C#. Click this command to generate code as a C# class intended for use as a SQL Server user-defined function.
TeraData. Click this command to generate code as C Computer language function intended for use as a user-defined function in a TeraData querying environment.
Deployment to STATISTICA Enterprise. Click this command to deploy the results as an Analysis Configuration in STATISTICA Enterprise. Note that appropriately formatted data must be available in a STATISTICA Enterprise Data Configuration before the results can be deployed to an Analysis Configuration.
GLM, GRM, and ANOVA Results – Profiler Tab
Select the Profiler tab of the GLM Results, GRM Results, or the ANOVA Results dialog box to access the options described here. For an overview of response/desirability profiling see Desirability Profiling in GLM, GRM, and MANOVA and Experimental Design Profiling Predicted Responses and Response Desirability.
Vars. Click the Vars button to display a standard variable selection dialog box, in which you select the dependent variables to profile by selecting those variables on the list. If multiple dependent variables are specified for the analysis, use this button to select the dependent variable or variables for which to profile responses. Note that you can specify desirability function settings for all dependent variables in the analysis; thus, you can use the variable selection dialog box to easily select subsets of dependent variables or single dependent variables to profile
View. Click the View button to display a compound graph of the prediction profiles for each of the dependent variables that are selected to be profiled. The prediction profile compound graph contains a number of features that are useful for interpreting the effects of the predictor variables on responses on the dependent variables. For each dependent variable, a graph is produced showing the predicted values of the dependent variables at the minimum and maximum values of each predictor variable, and at each additional grid point for each predictor variable. Also shown are the current levels for each predictor variable. The predicted values that are shown for the dependent variables are the predicted responses at each level of each factor, holding all the other factors (including the block factor) constant at their current levels. Confidence intervals or prediction intervals for the predicted values are also shown if the Confidence intervals or the Prediction intervals option buttons, respectively, are selected in the Options for Response Profiler dialog box.
If the Show desirability function check box is selected, clicking the View button will also produce a desirability function graph accompanying the predicted values for each of the dependent variables. The desirability function graph shows the desirability of the response (which can range from 0.0 for undesirable up to 1.0 for very desirable) across the observed range of each dependent variable (see Desirability function specifications options for details on specifying desirability function values for each dependent variable). Similar to the graphs of the predicted values for each dependent variable, graphs are produced for the overall desirability at each level of each factor, holding all other factors (including the block factor) constant at their current levels. Inspection of the desirability function graphs shows how the desirability of the responses on the dependent variables changes as the levels of the factors change.
1. Click the 1 button to plot the desirability function values in a surface plot, along with the specified grid points for the factors. A surface graph will be produced for each pair of factors, showing how the response desirability varies at each combination of grid points for each pair of factors, holding all other factors constant at their current levels. Note that several different options can be specified for fitting the desirability function values to the surface in the Options for Response Profiler dialog box.
2. Click the 2 button to plot the contours of the desirability function, along with the specified grid points for the factors. A contour plot will be produced for each pair of factors, showing how the response desirability varies at each combination of grid points for each pair of factors, holding all other factors constant at their current levels. Note that several different options can be specified for fitting the contours of the desirability function in the Options for Response Profiler dialog box.
Options. Click the Options button to display the Options for Response Profiler dialog box.
Set factors at value. Use the options in the Set factors at value group box to specify the current levels of the predictor variables for the prediction profile compound graph (available via the View button), the surface plot (available via the 1 button), and the contour plot (available via the 2 button). Select the Means option button to set the current level of each predictor variable to the mean of the respective variable. This is the default option. Select the User vals option button to set the current level of each predictor variable to user-specified values. These values can be inspected and/or specified by clicking the accompanying , which will display the Select Factor/Covariate Values dialog box, in which you can specify the current level for each predictor variable. Select the Optimum option button to set the current level of each predictor variable to the value determined by optimizing the response desirability.
Grid. Click the Grid button to specify the range and the grid points for each of the predictor variables in the analysis. Use the combo box by the Grid button to specify the factor for which you want to specify grid points. Clicking the Grid button will then display the Specifications for Factor Grid dialog box, in which you can specify the minimum value, the maximum value, and the number of intervals in the grid for the factor. These specifications determine the grid points for the factor, by setting the lowest grid point to the minimum value, the next lowest grid point to the minimum value plus the difference of the minimum value from the maximum value divided by the number of intervals, and so on up to the highest grid point.
Grid points serve two functions in the Response/desirability profiler. They determine the plot points for the factors on the prediction profile compound graph (available via the View button), the surface plot (available via the 1 button), and the contour plot (available via the 2 button).
Desirability function specifications. Use the Desirability function specifications group box to enter desirability function specifications for the class displayed in the Variable combo box. These specifications determine the desirability function values (from 0.0 for undesirable to 1.0 for very desirable) corresponding to predicted values on the class. These specifications are entered in the Value and Desirability edit fields (see below). Note that the majority of the options in this group box are not available unless the Show desirability function check box is selected.
Show desirability function. Select this check box to enable the Desirability function specifications edit fields. By default, the Show desirability function check box is not selected and the Desirability function specifications edit fields are disabled. The Show desirability function check box is always selected when the Set factors at value Optimum option button has been selected. If the Set factors at value Optimum option has been selected and the Show desirability function check box is deselected, STATISTICA will then deselect the Optimum option button and select the Mean option button.
Variable. Use the Variable combo box to select a class for which to specify Desirability function specifications. Select the class for which you want to specify Desirability function settings by selecting the class from the Variable button or combo box, and then enter the settings in the Value and Desirability edit fields.
Value – Low, Medium, and High Values. STATISTICA allows for up to three “inflection points” in the desirability function for predicted values for each dependent variable. For example, suppose that some intermediate predicted value on a dependent variable is highly desirable, and that lower and higher predicted values on the variable become progressively less desirable as they depart further from the “target” intermediate value. This type of desirability function would have three inflection points: the low value for the dependent variable, below which the response is undesirable, the high value for the dependent variable, above which the response is undesirable, and the medium value for the dependent variable, at which the response becomes increasingly desirable as it approaches the target value. The default specifications for the low value, medium value, and high value settings use a simple “higher is better” type of desirability function with only two inflection points. The low value is set to the observed minimum value for the dependent variable, the high value is set to the observed maximum value for the dependent variable, and the medium value is set to the mid-point between these two extremes. You can specify any other type of desirability function with up to three inflection points by entering the inflection points for the variable in the low value, medium value, and high value boxes. The only restriction is that adjacent inflection points must be in ascending order or equal in value.
Desirability – Low, Medium, and High Values. Desirability values (from 0.0 for undesirable to 1.0 for very desirable) can be specified for the corresponding inflection points of the desirability function for each of the dependent variables. For the example “target” type of desirability function described above, you would want to specify desirability values of 0.0 for responses with values below the low inflection point or above the high inflection point, and a desirability value of 1.0 for the targeted intermediate value. You would therefore specify values of 0.0, 1.0, and 0.0 for desirability in the low value, medium value, and high value boxes. The default specifications for the level of desirability at the three inflection points are based on a simple “higher is better” type of desirability function. Desirability is set to 0.0 at the low value, 0.5 at the medium value, and 1.0 at the high value. You can specify any other valid desirability values (from 0.0 to 1.0) by entering the appropriate value in the respective boxes.
Curvature – s and t parameters. The desirability of responses need not decrease (or increase) linearly between inflection points in the desirability function. Perhaps there is a “critical region” close to a desired, intermediate response on a dependent variable beyond which the desirability of the response at first drops off very quickly, but drops off less quickly as the departure from the “targeted” value becomes greater. To model this type of desirability function requires “curvature” parameters to take into account the nonlinearity in the “falloff” of desirability between inflection points. In the s parameter and t parameter boxes, you can specify a value for the exponent of the desirability function (from 0.0 up to 50, inclusive) representing the curvature in the desirability function between the low and medium inflection points of the function, and between the medium and high inflection points of the function, respectively. Assuming that an intermediate response is most desirable, values greater than 1.0 for the s parameter and t parameter represent initial quicker “falloff” in desirability but subsequent slower “falloff” in desirability as the departure from the “targeted” value becomes greater. Values less than 1.0 for the s parameter and t parameter represent initial slower “falloff” in desirability but subsequent quicker “falloff” in desirability as the departure from the “targeted” value becomes greater. The default specifications for the s parameter and t parameter are values of 1.0, representing linear “falloff” in desirability between the medium and low inflection points as well as between the medium and high inflection points. Further descriptions of the s parameter and t parameter and their effects in the desirability function can be found in the discussions of “two-sided” desirability functions in Derringer and Suich (1980) and in Box and Draper (1987).
Apply to all vars. Click the Apply to all vars button to apply the desirability settings you specify for one dependent variable to all the dependent variables in the analysis. This option is particularly useful if the same dependent variable is measured on, say, successive days. For example, you could specify the desirability of radioactivity readings of waste materials on the first day after the materials are discharged from a factory, then apply the same desirability settings for the radioactivity readings on subsequent days. If many days of readings are taken, the Apply desirability specifications to all variables option can save considerable data entry.
Reset specs. Click the Reset specs button to reset any changed desirability function settings for a dependent variable back to the default desirability function specifications for the variable (for details on default specifications, see Desirability function settings, above).
all vars. Click the all vars button to reset all desirability function settings for all dependent variables back to the default desirability function specifications (for details on default specifications, see Desirability function settings, above).
Open specs. Click the Open specs button to display a standard Open File dialog box that will prompt you for a file in which the desirability function settings specified for the classes in the analysis have been saved using the Save specs button (see below). Retrieval of previously saved settings can save considerable data entry in specifying desirability functions, especially when the analysis contains many classes each with distinct desirability function specifications.
Save specs. Click the Save specs button to display a standard Save As File dialog box that will prompt you for a file in which to save the desirability function settings specified for the classes in the analysis. These settings are then available for retrieval at a later time by using the Open specs option (see above). This can save considerable data entry in specifying desirability functions, especially when the analysis contains many classes each with distinct desirability function specifications.
Examples of the ANOVA Analysis
Example 1: Breakdown and One-Way ANOVA
Overview. You can compute various descriptive statistics (e.g., means, standard deviations, correlations, percentiles, etc.) broken down by one or more categorical variables (e.g., by Gender and Region) as well as perform a one-way Analysis of Variance via the Breakdown and one-way ANOVA procedure accessible from the Basic Statistics and Tables Startup Panel.
Open the example data file Adstudy.sta for this example, and start the Basic Statistics and Tables module.
Ribbon bar. Select the Home tab. In the File group, click the Open arrow and from the menu, select Open Examples. The Open a STATISTICA Data File dialog box is displayed. Adstudy.sta is located in the Datasets folder. Then, select the Statistics tab. In the Base group, click Basic Statistics to display the Basic Statistics and Tables Startup Panel.
Classic menus. From the File menu, select Open Examples to display the Open a STATISTICA Data File dialog box; Adstudy.sta is located in the Datasets folder. Then, from the Statistics menu, select Basic Statistics/Tables to display the Basic Statistics and Tables Startup Panel.
In the Basic Statistics and Tables Startup Panel, select Breakdown and one-way ANOVA and click the OK button.
In the Statistics by Groups (Breakdown) dialog box, select the Individual tables tab. Click the Variables button to display the variable selection dialog box.
Select Measure01 through Measure23 as the Dependent variables, and the two variables Gender (subject’s gender, Male and Female) and Advert (type of advertisement shown to the subjects, Coke and Pepsi) as the Grouping variables, and click OK.
Click the Codes for grouping variables button to display the Select codes for indep. vars (factors) dialog box, and select all codes for both of the grouping variables. To select all codes for a variable, you can either enter the code numbers in the respective edit field, click the respective All button, or enter an asterisk * in the respective edit field. Clicking the OK button without specifying any values is equivalent to selecting all values of all variables.
Click the OK button in this dialog box and in the Statistics by Groups (Breakdown) dialog box to display the Statistics by Groups – Results dialog box, which provides various options and procedures for analyzing the data within groups in order to obtain a better understanding of the differences between categories of the grouping variables.
Summary Table of Means. You can select the desired statistics to be displayed in the Summary: Table of statistics or Detailed two-way tables; select the Descriptives tab and select all the check boxes in the Statistics box. Now, click the Detailed two-way tables button to display the results spreadsheet.
This spreadsheet shows the selected descriptive statistics for the variables as broken down by the specified groups (scroll the spreadsheet to view the results for the rest of the variables). For example, looking at the means within each group in this spreadsheet, you can see that there is a slight difference between the means for Males and Females for variable Measure01. Now, examine the means within the Male and Female groups for variable Measure01; you can see that there is very little difference between the groups Pepsi and Coke within either gender; thus, the gender groups appear to be homogenous in this respect.
One-Way ANOVA and Post-Hoc Comparisons of Means. You can easily test the significance of these differences via the Analysis of Variance button on the ANOVA & tests tab in the Results dialog box. Click this button to display the spreadsheet with the results of the univariate analysis of variance for each dependent variable.
The one-way Analysis of Variance procedure gave statistically significant results for Measure05, Measure07, and Measure09. These significant results indicate that the means across the groups are different in magnitude. Now, return to the Results dialog box and select the Post-hoc tab to perform post-hoc tests for the significant differences between individual groups (means). You will first need to select the variable(s) for the comparisons. For this example, click the Variables button, select variable Measure07, and click OK. You can choose from among several post-hoc tests (an even larger selection of tests is available in the GLM module); for this example, click the LSD test or planned comparison button.
The LSD test is equivalent to the t-test for independent samples, based on the N in the groups involved in the comparison. The t-test for independent samples results from Example 1 showed that there was a significant difference between the responses for Males and Females for Measure07. Using the Breakdown and one-way ANOVA procedure, you can see from the LSD test that a significant difference occurs only when the females are shown the Coke advertisement.
Graphical presentation of results. These differences can be viewed graphically via the many graphic options in the Statistics by Groups – Results dialog box. For example, to compare the distributions of the selected variables within the specified groups, select the Descriptives tab and click the Categorized box & whisker button. In the Box-Whisker Type dialog box, ensure that the Median/Quart./Range option button is selected, and click the OK button. Next, select the appropriate variable(s) to produce the graphs. Shown below is the box-whisker plot for variable Measure07.
As you can see in the above box and whisker plot for variable Measure07, there does appear to be a difference in the distribution of values for the Female-Coke group as compared to the Male-Coke group.
Within-Group Correlations. Now let’s look at the correlations between variables within the specified groups. Return to the Statistics by Groups – Results dialog box, and select the Correlations tab. Note that numerous options are available on this tab to display various statistics and auxiliary information, in addition to the (within-group) correlation matrices. For this example, change the p-value for highlighting option to .001. Then, click the Within-group correlations & covariances button. The Select groups dialog box will be displayed, in which you can select one group (or All Groups) for the correlation matrices.
In Example 1, a correlation matrix was produced in which the correlation between variables Measure05 and Measure09 (r = -.47) was highly significant (p<.001). The Breakdown and one-way ANOVA procedure enables you to explore this significant correlation further by computing correlations within the specified grouping variables. Now, in the Select groups dialog box, select All Groups and then click the OK button to produce all four correlation matrix spreadsheets.
As you can see, the results reveal that the pattern of correlations is differentiated across the groups (e.g., the correlation is very high in the Female/Coke group and much lower in the other three groups). None of the correlations between Measure05 and Measure09 were significant at the .001 level; however, if you were to change the p-value for highlighting field to .05 on the Correlations tab of the Results dialog box, and click the Within-group correlations & covariances button again, you would find that the correlation between Measure05 and Measure09 is significant at that level (p=.02) for the group defined by Female gender and Coke advertisement.
Note that you can use the Difference tests: r, %, means option in the Basic Statistics and Tables Startup Panel to test differences between correlation coefficients.
Categorized scatterplots. The within-group correlations can be graphically presented via the Categ. scatterplots button on the Correlations tab of the Statistics by Groups – Results dialog box. When you click this button, you will be prompted to select the variables for the analysis. Select Measure05 in the First variable list field and Measure09 in the Second variable list field and then click the OK button to produce the plot.
The above categorized scatterplot clearly shows the strong negative correlation between Measure05 and Measure09 for the group Female/Coke.
Example 2: Simple Factorial ANOVA with Repeated Measures
For this example of a 2 x 2 (between) x 3 (repeated measures) design, open the data file Adstudy.sta:
Ribbon bar. Select the Home tab. In the File group, click the Open arrow and from the menu, select Open Examples. The Open a STATISTICA Data File dialog box is displayed. Adstudy.sta is located in the Datasets folder.
Classic menus. From the File menu, select Open Examples to display the Open a STATISTICA Data File dialog box; Adstudy.sta is located in the Datasets folder.
Calling the ANOVA module. To start an ANOVA/MANOVA analysis:
Ribbon bar. Select the Statistics tab. In the Base group, click ANOVA to display the General ANOVA/MANOVA Startup Panel.
Classic menus. Select ANOVA from the Statistics menu to display the General ANOVA/MANOVA Startup Panel.
The Startup Panel contains options to specify very simple analyses (e.g., via One-way ANOVA – designs with only one between-group factor) and more complex analyses (e.g., via Repeated measures ANOVA – designs with between-group factors and a within-subject factor).
Select Repeated measures ANOVA as the Type of analysis and Quick specs dialog as the Specification method.
Then, click the OK button to display the ANOVA/MANOVA Repeated Measures ANOVA dialog box.
Specifying the design (variables). The first (between-group) factor is Gender (with 2 levels: Male and Female). The second (between-group) factor is Advert (with 2 levels: Pepsi and Coke). The two factors are crossed, which means that there are both Male and Female subjects in the Pepsi and Coke groups. Each of those subjects responded to 3 questions (this repeated measure factor will be called Response: it has 3 levels represented by variables Measure01, Measure02, and Measure03).
Click the Variables button (on the ANOVA/MANOVA Repeated Measures ANOVA dialog) to display the variable selection dialog. Select Measure01 through Measure03 as dependent variables (in the Dependent variable list field) and Gender and Advert as factors [in the Categorical predictors (factors) field].
Then click the OK button to return to the previous dialog.
The repeated measures design. Note that the design of the experiment that we are about to analyze can be summarized as follows:
|
Between-Group |
Between-Group |
Repeated Measure Factor: Response |
||
Factor #1: Gender |
Factor #2: Advert |
Level #1: Measure01 |
Level #2: Measure02 |
Level #3: Measure03 |
|
Subject 1 |
Male |
Pepsi |
9 |
1 |
6 |
Subject 2 |
Male |
Coke |
6 |
7 |
1 |
Subject 3 |
Female |
Coke |
9 |
8 |
2 |
Specifying a repeated measures factor. The minimum necessary selection is now completed and, if you did not care about selecting the repeated measures factor, you would be ready to click the OK button and see the results of the analysis. However, for our example, specify that the three dependent variables you have selected are to be interpreted as three levels of a repeated measures (within-subject) factor. Unless you do so, STATISTICA assumes that those are three “different” dependent variables and will run a MANOVA (i.e., multivariate ANOVA).
In order to define the desired repeated measures factor, click the Within effects button to display the Specify Within-subjects Factors dialog.
Note that STATISTICA has suggested the selection of one repeated measures factor with 3 levels (default name R1). You can only specify one within-subject (repeated measures) factor via this dialog. To specify multiple within-subject factors, use the General Linear Models module (available in the optional Advanced Linear/Nonlinear Models package). Press the F1 key (or click ) in this dialog to review a comprehensive discussion of repeated measures and examples of designs. Edit the name for the factor (e.g., change the default R1 into RESPONSE), and click the OK button to exit the dialog.
Codes (defining the levels) for between-group factors. You do not need to manually specify codes for between-group factors [e.g., instruct STATISTICA that variable Gender has two levels: 1 and 2 (or Male and Female)] unless you want to prevent STATISTICA from using, by default, all codes encountered in the selected grouping variables in the datafile. To enter such custom code selection, click the Factor codes button to display the Select codes for indep. vars (factors) dialog.
This dialog contains various options. For example, you can review values of individual variables before you make your selections by clicking the Zoom button, scan the file and fill the codes fields (e.g., Gender and Advert) for some individual or all variables, etc. For now, click the OK button; STATISTICA automatically fills in the codes fields with all distinctive values encountered in the selected variables,
and closes the dialog.
Performing the analysis. When you click the OK button upon returning to the ANOVA/MANOVA Repeated Measures ANOVA dialog, the analysis is performed, and the ANOVA Results dialog is displayed. Various kinds of output spreadsheets and graphs are now available.
Note that this dialog is tabbed, which allows you to quickly locate results options. For example, if you want to perform planned comparisons, click the Comps tab. To view residual statistics, click the Resids tab. For this simple overview example, we will only use the results options available on the Quick tab.
Reviewing ANOVA results. Start by looking at the ANOVA summary of all effects table by clicking the All effects button (the one with a SUMM-ary icon ).
The only effect (ignoring the Intercept) in this analysis that is statistically significant (p =.007) is the RESPONSE effect. This result can be caused by many possible patterns of means of the RESPONSE effect (for more information, see the ANOVA – Introductory Overview). We will now look graphically at the marginal means for this effect to see what it means.
To bring back the ANOVA Results dialog (that is, “resume” the analysis), press CTRL+R, select Resume from the Statistics menu, or click the ANOVA Results button on the Analysis bar. When the ANOVA Results dialog is displayed, click the All effects/Graphs button to review the means for individual effects.
This dialog contains a summary Table of all effects (with most of the information you have seen in the All effects spreadsheet) and is used to review individual effects from that table in the form of the plots of the respective means (or, optionally, spreadsheets of the respective mean values).
Plot of Means for a Main Effect. Double-click on the significant main effect RESPONSE (the one marked with an asterisk in the p column) to see the respective plot.
The graph indicates that there is a clear decreasing trend; the means for the consecutive three questions are gradually lower. Even though there are no significant interactions in this design (see the discussion of the Table of all effects above), we will look at the highest-order interaction to examine the consistency of this strong decreasing trend across the between-group factors.
Plot of means for a three-way interaction. To see the plot of the highest-order interaction, double-click on the row marked RESPONSE*GENDER*ADVERT, representing the interaction between factors 1 (Gender), 2 (Advert), and 3 (Response), on the Table of All Effects dialog. An intermediate dialog, Specify the arrangement of the factors in the plot, is displayed, which is used to customize the default arrangement of factors in the graph.
Note that unlike the previous plot of a simple factor, the current effect can be visualized in a variety of ways. Click the OK button to accept the default arrangement and produce the plot of means.
As you can see, this pattern of means (split by the levels of the between-group factors) does not indicate any salient deviations from the overall pattern revealed in the first plot (for the main effect, RESPONSE). Now you can continue to interactively examine other effects; run post-hoc comparisons, planned comparisons, and extended diagnostics; etc., to further explore the results.
Interactive data analysis in STATISTICA. This simple example illustrates the way in which STATISTICA supports interactive data analysis. You are not forced to specify all output to be generated before seeing any results. Even simple analysis designs can, obviously, produce large amounts of output and countless graphs, but usually you cannot know what will be of interest until you have a chance to review the basic output. With STATISTICA, you can select specific types of output, interactively conduct follow-up tests, and run supplementary “what-if” analyses after the data are processed and basic output reviewed. STATISTICA‘s flexible computational procedures and wide selection of options used to visualize any combination of values from numerical output offer countless methods to explore your data and verify hypotheses.
Automating analyses (macros and STATISTICA Visual Basic). Any selections that you make in the course of the interactive data analysis (including both specifying the designs and choosing the output options) are automatically recorded in the industry standard Visual Basic code. You can save such macros for repeated use (you can also assign them to toolbar buttons, modify or edit them, combine with other programs, etc.).
Example 3: A 2 x 3 Between-Groups ANOVA Design
Data File. This example, based on a fictitious data set reported in Lindman (1974), begins with a simple analysis of a 2 x 3 complete factorial between-groups design.
Suppose that we have conducted an experiment to address the nature vs. nurture question; specifically, we test the performance of different rats in the “T-maze.” The T-maze is a simple maze, and the rats’ task is to learn to run straight to the food placed in a particular location, without errors. Three strains of rats whose general ability to solve the T-maze can be described as bright, mixed, and dull, were used. From each of these strains, we rear 4 animals in a free (stimulating) environment, and 4 animals in a restricted environment. The dependent measure is the number of errors made by each rat while running the T-maze problem.
The data for this study are contained in the STATISTICA example data file Rats.sta. Open this data file:
Ribbon bar. Select the Home tab. In the File group, click the Open arrow and from the menu, select Open Examples. The Open a STATISTICA Data File dialog box is displayed. Rats.sta is located in the Datasets folder.
Classic menus. From the File menu, select Open Examples to display the Open a STATISTICA Data File dialog box; Rats.sta is located in the Datasets folder.
A portion of this file is shown below.
Specifying the Analysis. Start the ANOVA analysis:
Ribbon bar. Select the Statistics tab, and in the Base group, click ANOVA.
Classic menus. Select ANOVA from the Statistics menu.
The General ANOVA/MANOVA Startup Panel will be displayed, in which we can enter the specifications for the design.
Independent and dependent variables. The ANOVA/MANOVA module classifies variables as either independent or dependent (see Elementary Concepts). Independent variables are those under the experimenter’s control. We may also refer to these variables as grouping variables, coding variables, or between-group factors. These variables contain the codes that were used to uniquely identify to which group in the experiment the respective case belongs.
In the data file Rats.sta, the codes 1-free and 2-restricted were used in the categorical predictor variable Envirnmt to denote whether the rat belongs to the group of rats that were raised in the free or restricted environment, respectively. The codes used for the second independent variables (Strain) are 1-bright, 2-mixed, and 3-dull. The dependent variable in an experiment is the one that depends on or is affected by the independent variables; in this study this would be the variable Errors, which contains the number of errors made by the rats running the maze.
Specifying the design. In the General ANOVA/MANOVA Startup Panel, select Factorial ANOVA as the Type of analysis and Quick specs dialog as the Specification Method. Then click the OK button to display the ANOVA/MANOVA Factorial ANOVA dialog. This is a 2 (Environment) by 3 (Strain) between-groups experimental factorial design.
Click the Variables button to display the standard variable selection dialog. Here, select Errors from the Dependent variable list and Envirnmt and Strain from the Categorical predictors (factors) list, and then click the OK button.
Next, specify the codes that were used to uniquely identify the groups; click the Factor codes button and either enter each of the codes for each variable or click the All button for each variable to enter all of the codes for that variable. Finally, click the OK button. The ANOVA/MANOVA Factorial ANOVA dialog will now look as follows:
Reviewing the Results. Click the OK button to begin the analysis. When complete, the ANOVA Results dialog will be displayed.
Click the All effects/Graphs button to display the table of all effects.
Summary ANOVA table. This table summarizes the main results of the analysis. Note that significant effects (p<.05) in this table are marked with an asterisk *. We can adjust the significance criterion (for highlighting) by entering the desired alpha level in the Significance level field on the Quick tab. Both of the main effects (Envirnmt and Strain) are statistically significant (p<.05) while the interaction is not (p>.05).
Reviewing marginal means. The marginal means for the Envirnmt main effect will now be reviewed. (Note that the marginal means are calculated as least squares means.) Select the Envirnmt main effect select the Spreadsheet option button under Display; and then click the OK button to produce a spreadsheet with the table of marginal means for the selected effect.
The default graph for all spreadsheets with marginal means is the means plot. In this case, the plot is rather simple. To produce this plot of the two means for the free and restricted environment, return to the Table of All Effects dialog (by clicking the All effects/Graphs button on the Quick tab), change the Display option to Graph, and again click the OK button.
It appears that rats raised in the more restricted environment made more errors than rats raised in the free environment.
Reviewing the interaction plot. Now, let’s look at all of the means simultaneously, that is, at the plot of the interaction of Envirnmt by Strain. Once again, return to the Table of All Effects dialog, and this time select the interaction effect (Environmt*Strain). Click the OK button, and the Arrangement of Factors dialog is displayed:
As you can see, we have full control over the order in which the factors in the interaction will be plotted. For this example, select STRAIN under x-axis, upper and ENVIRNMT under Line pattern (see above). Click the OK button, and the graph of means is displayed.
The graph, shown below, below nicely summarizes the results of this study, that is, the two main effects pattern. The rats raised in the restricted environment (dashed line) made more errors than those raised in the free environment (solid line). At the same time, the dull rats made the most errors, followed by the mixed rats, and the bright rats made the fewest number of errors.
Post Hoc Comparisons of Means. In the previous plot, we might ask whether the mixed strain of rats were significantly different from the dull and the bright strain. However, no a priori hypotheses about this question were specified, therefore, we should use post hoc comparisons to test the mean differences between strains of rats (refer to the Introductory Overview for an explanation of the logic of post hoc tests).
Specifying post hoc tests. After returning to the ANOVA Results dialog, click the More results dialog to display the larger ANOVA Results dialog, and then click on the Post-hoc tab. For this example, select Strain in the Effect box in order to compare the (unweighted) marginal means for that effect.
Choosing a test. The different options for post hoc tests on this tab all “protect” us to some extent against capitalizing on chance (due to the post hoc nature of the comparisons, see ANOVA/MANOVA Introductory Overview – Contrast Analysis and Post-hoc Tests). All tests enable us to compare means under the assumption that we bring no a priori hypotheses to the study. These tests are discussed in the Post-hoc tab topic. For now, select the Homogenous groups option button (in the Display group box) and click the Scheffé test button.
In this table, the means are sorted from smallest to largest, and the means that are not significantly different from each other have four “stars” (*) in the same column (i.e., they form a homogenous group of means); all means that do not share stars in the same column are significantly different from each other. Thus, and as discussed in Winer, Brown, and Michels (1991, p. 528), the only means that are significantly different from each other are the means group 1 (bright) and group 3 (dull). Thus, we would conclude that the dull strain of rats made significantly more errors than the bright strain of rats, while the mixed strain of rats is not significantly different from either.
Testing Assumptions. The ANOVA/MANOVA and GLM Introductory Overview – Assumptions and Effects of Violating Assumptions topic explains the assumptions underlying the use of ANOVA techniques. Now, we will review the data in terms of these assumptions. Return to the ANOVA Results dialog and click on the Assumptions tab, which contains options for many different tests and graphs; some are applicable only to more complex designs.
Distribution of dependent variable. ANOVA assumes that the distribution of the dependent variable (within groups) follows the normal distribution. We can view the distribution for all groups combined, or for only a selected group by selecting the group in the Effect drop-down box. For now, select the Environmt*Strain interaction effect, and click the Histograms button under Distribution of vars within groups. The Select groups dialog is first displayed, in which we can select to view the distribution for all groups combined or for only a selected group.
For this example, click the OK button to accept the default selection of All Groups, and a histogram of the distribution will be produced.
It appears as if the distribution across groups is multi-modal, that is to say, it has more than one “peak.” We could have anticipated that, given the fact that strong main effects were found. If you want to test the homogeneity assumption more thoroughly, you could look at the distributions within individual groups.
For this example, a potentially more serious violation of the ANOVA assumptions will be tested.
Correlation between mean and standard deviation. As mentioned in the Introductory Overview, deviation from normality is not the major “enemy”; the most likely “trap” to fall into is to base our interpretations of an effect on an “extreme” cell in the design with much greater than average variability. Put another way, when the means and the standard deviations are correlated across cells of the design, the performance (alpha error rate) of the F-test deteriorates greatly, and we may reject the null hypothesis with p<.05 when the real p-value is possibly as high as .50!
Now, look at the correlation between the 6 means and standard deviations in this design. We can elect to plot the means vs. either the standard deviations or the variances by clicking the appropriate button (Plot means vs. std. deviations or Variances, respectively) on the Assumptions tab. For this example, click the Plot means vs. std. deviations button.
Note that in the illustration above, a linear fit and regression bands have been added to the plot via the Graph Options dialog – Plot: Fitting options pane and the Plot: Regr. Bands options pane. Indeed, the means and standard deviations appear substantially correlated in this design. If an important decision were riding on this study, we would be well advised to double-check the significant main effects pattern by using for example, some nonparametric procedure (see the Nonparametrics module) that does not depend on raw scores (and variances) but rather on ranks. In any event, we should view these results with caution.
Homogeneity of variances. Now, look at the homogeneity of variance tests. On the Assumptions tab, various tests are available in the Homogeneity of variances/covariances group. You may try a univariate test (Cochran C, Hartley, Bartlett) to compute the standard homogeneity of variances test, or the Levene’s test, but neither will yield statistically significant results. Shown below is the Levene’s Test for Homogeneity of Variances spreadsheet.
Summary. Besides illustrating the major functional aspects of the ANOVA/MANOVA module, this analysis has demonstrated how important it is to be able to graph data easily (e.g., to produce the scatterplot of means vs. standard deviations). Had we relied oothing else but the F-tests of significance and the standard tests of homogeneity of variances, we would not have caught the potentially serious violation of assumptions that was detected in the scatterplot of means vs. standard deviations. As it stands, we would probably conclude that the effects of environment and genetic factors (Strain) both seem to have an (additive) effect on performance in the T-maze. However, the data should be further analyzed using nonparametric methods to ensure that the statistical significance (p) values from the ANOVA are not inflated.
Example 4: A 2-Level Between-Group x 4-Level Within-Subject Repeated Measures Design
Overview. This example demonstrates how to set up a repeated measures design. The use of the post-hoc testing facilities will be demonstrated, and a graphical summary of the results will be produced. Moreover, the univariate and multivariate tests will be computed.
Research Problem
Overview. This example is based on a (fictitious) data set reported in Winer, Brown, and Michels (1991, Table 7.7). Suppose we are interested in learning how different factors affect people’s ability to perform a fine-tuning task. For example, operators of complex industrial machinery constantly need to read (and process) various gauges and adjust machines (dials) accordingly. In this (fictitious) study, two methods for calibrating dials were examined, and each subject was tested with 4 different shapes of dials.
The resulting design is a 2 (Factor A: Method of calibration; with 2 levels) by 4 (Factor B: Four different shapes of dials) analysis of variance. The last factor is a within-subject or repeated measures factor because it represents repeated measurements on the same subjects; the first factor is a between-groups factor because subjects will be randomly assigned to work under one or the other Method condition.
Data file. The setup of a data file for repeated measures analysis is straightforward: The between-groups factor (A: Method of calibration) can be specified by setting up a variable containing the codes that uniquely identify to which experimental condition each subject belongs. Each repeated measurement is then put into a different variable. Shown below is an illustration of the data file Accuracy.sta.
Specifying the Design. Open the Accuracy.sta data set, and start the General ANOVA/MANOVA analysis.
Following are instructions to do this from the ribbon bar and from the classic menus.
Ribbon bar. Select the Home tab. In the File group, click the Open arrow and from the menu, select Open Examples to display the Open a STATISTICA Data File dialog box. Double-click the Datasets folder, and then open the Accuracy.sta data set.
Next, select the Statistics tab. In the Base group, click ANOVA to display the General ANOVA/MANOVA Startup Panel.
Classic menus. Open the data file by selecting Open Examples from the File menu to display the Open a STATISTICA Data File dialog. The data file is located in the Datasets folder.
Then, from the Statistics menu, select ANOVA to display the General ANOVA/MANOVA Startup Panel.
In the Startup Panel, select Repeated measures ANOVA as the Type of analysis and Quick specs dialog as the Specification method, and then click the OK button (see the Introductory Overview for different methods of specifying designs). In the ANOVA/MANOVA Repeated Measures dialog, click the Variables button to display the standard variable selection dialog. In the Dependent variable list select B1 through B4; in the Categorical predictors (factors) list, select A. Click the OK button. Next, click the Factor codes button to display the Select codes for indep. vars (factors) dialog. Click the All button to select the codes (1 and 2) for the independent variable, and click the OK button to close this dialog and return to the ANOVA/MANOVA Repeated Measures dialog.
Specifying the repeated measures factors. Next click the Within effects button to display the Specify within-subjects factor dialog. Name the repeated measures factor B (in the Factor Name box) and specify 4 as the No. of levels. When reading the data, STATISTICA will go through the list of dependent variables and assign them to represent the consecutive levels of the repeated measures factor. See General ANOVA/MANOVA and GLM Notes – Specifying Within-Subjects Univariate and Multivariate Designs for more information on how to specify repeated measures factors.
Now click the OK button to close this dialog, and click the OK button in the ANOVA/MANOVA Repeated Measures dialog to run the analysis and display the ANOVA Results dialog.
Reviewing Results. The Results dialog contains options for examining the results of the experiment in great detail.
Let’s first look at the summary table of All effects/Graphs (click the All effects/Graphs button on the Quick tab).
Select Effect B*A as shown above (even though this effect is not statistically significant), and click the OK button; also click OK in response to the prompt about the assignment of factors to aspects of the graph (i.e., accept the defaults in the Arrangement of Factors dialog).
It is apparent that the pattern of means across the levels of the repeated measures factor B is approximately the same in the two conditions A1 and A2. However, it appears that there is a particularly strong difference between the two methods for dial B4, where the confidence bars for the means do not overlap.
Planned Comparison. Next, let’s examine the differences between the means for B4. Click on the Comps (comparisons) tab, and then click the Contrasts for LS means button to specify the contrasts for least squares means. Note that the so-called least squares means represent the best estimate of the population means mu, given our current model; hence, STATISTICA performs the planned comparison contrasts based on the least squares means; however, in this case it is not so important since this is a complete design where the least squares means are usually identical to the observed means.
We are interested in comparing method A1 with method A2, for dial B4 only. So, in the Specify Contrasts for this Factor dialog, for specifying the contrast for factor A, set the contrast coefficients as shown below:
Click the OK button and on the larger Specify Contrasts for this Factor dialog, set all coefficients to 0 (to ignore the respective means in the comparison), except for B4.
Refer to General ANOVA/MANOVA and GLM Notes – Specifying Univariate and Multivariate Between-Groups Designs for additional details on the logic of testing planned comparisons.
Now click the OK button, and click the Compute button on the Comps tab. Here are the results.
It appears that, as was evident in the plot of means earlier, the two means are significantly different from each other.
Post-Hoc Testing. Since we did not have a priori hypotheses about the pattern of means in this experiment, the a priori contrast, based on our examination of the pattern of means is not “fair.” As described in the Introductory Overview, the planned comparison method capitalizes on chance when you only compare those means that happen to be most different (in a study of, for example, 2*4=8 means, such as in this study).
To compute post-hoc tests, click the More results button to display the larger and more comprehensive Results dialog. Click on the Post-hoc tab, select effect B*A (i.e., the B by A interaction effect) in the Effect box, select the Significant differences option button as the display format in the Display group, and then click the Bonferroni button.
As you can see, using this more conservative method for testing the statistical significance of differences between means, the two B4 dials, across the levels of A are not reliably different from each other.
The post-hoc tests are further explained in the Post-hoc tests in GLM, GRM, and ANOVA topic; note also that when testing means in an interaction effect of between-group and within-subject (repeated measures) effects, there are several ways (options in STATISTICA) to estimate the proper error term for the comparison. These issues are discussed in Error Term for Post-hoc Tests in GLM, GRM, and ANOVA; see also Winer, Brown, and Michel (1991, p. 529-531) for a discussion of the Pooled MS reported in this results spreadsheet.
Tests for the B main effect. Winer, Brown, and Michels (1991, Table 7.10) summarize the results of using the Newman-Keuls procedure for testing the differences in the B main effect. To compute those tests, on the Post-hoc tab, select as the Effect the B main effect, then select the Homogeneous groups option button under Display, and then click the Newman-Keuls button.
In this table, the means are sorted from smallest to largest, and the means that are not significantly different from each other have four “stars” (*) in the same column (i.e., they form a homogenous group of means); all means that do not share stars in the same column are significantly different from each other. Thus, and as discussed in Winer, Brown, and Michels (1991, p. 528), the only means that are significantly different from each other are the means for B2 vs. B4 and B3, and B1 from B4 and B3.
Multivariate Approach. In the Introductory Overview and Notes, the special assumptions of univariate repeated measures ANOVA were discussed. In some scientific disciplines, the multivariate approach to repeated measures ANOVA with more than two levels has quickly become the only accepted way of analyzing these types of designs. This is because the multivariate approach does not rest on the assumption of sphericity or compound symmetry (see Assumptions – Sphericity and Compound Symmetry).
In short, univariate repeated measures ANOVA assumes that the changes across levels are uncorrelated across subjects. This assumption is highly suspect in most cases. In the present example, it is quite conceivable that subjects who improved a lot from time (dial) 1 to time (dial) 2 reached a ceiling in their accuracy, and improved less from time (dial) 2 to time (dial) 3 or 4. Given the suspicion that the sphericity assumption for univariate ANOVA has been violated, look at the multivariate statistics.
On the Summary tab, select all types of Multivariate tests (select all of the check boxes under Multivariate tests), and then click the Multiv. tests button under Within effects.
In this case, the same effect (B) is still statistically significant. Note that you can also apply the Greenhouse-Geisser and Huynh-Feldt corrections in this case without changing this pattern of results (see also Summary Results for Within Effects in GLM and ANOVA for a discussion of these tests).
Summary. To summarize, these analyses suggest that both factors A (Method) and B (Dials) significantly contributed to subjects’ accuracy. No evidence was found for any interaction between the two factors.