Many statistical applications in psychology, social science, business administration, and the natural sciences involve several groups.
For example, an environmentalist is interested in knowing if the average amount of pollution varies in several bodies of water.
A sociologist is interested in knowing if the amount of income a person earns varies according to his or her upbringing.
A consumer looking for a new car might compare the average gas mileage of several models.
For hypothesis tests comparing averages between more than two groups, statisticians have developed a method called “Analysis of Variance” (abbreviated ANOVA).
In this chapter, you will study the simplest form of ANOVA called single factor or one-way ANOVA. You will also study the F distribution, used for one-way ANOVA, and the test of two variances.
This is just a very brief overview of one-way ANOVA. You will study this topic in much greater detail in future statistics courses. One-Way ANOVA, as it is presented here, relies heavily on a calculator or computer.
The purpose of a one-way ANOVA test is to determine the existence of a statistically significant difference among several group means.
The test actually uses variances to help determine if the means are equal or not.
In order to perform a one-way ANOVA test, there are five basic assumptions to be fulfilled:
1. Each population from which a sample is taken is assumed to be normal.
2. All samples are randomly selected and independent.
3. The populations are assumed to have equal standard deviations (or variances).
4. The factor is a categorical variable.
5. The response is a numerical variable.
The Null and Alternative Hypothesis
The null hypothesis is simply that all the group population means are the same.
The alternative hypothesis is that at least one pair of means is different. For example, if there are k groups:
H0: μ1 = μ2 = μ3 = … = μk
Ha: At least two of the group means μ1, μ2, μ3, …, μk are not equal.
The graphs, a set of box plots representing the distribution of values with the group means indicated by a horizontal line through the box, help in the understanding of the hypothesis test.
In the first graph (red box plots), H0: μ1 = μ2 = μ3 and the three populations have the same distribution if the null hypothesis is true.
The variance of the combined data is approximately the same as the variance of each of the populations.
If the null hypothesis is false, then the variance of the combined data is larger which is caused by the different means as shown in the second graph (green box plots).