Seminar 5

Analysis of Variance (ANOVA)

One-Way ANOVA: Theory, Assumptions, and Interpretation¶


1. Motivation¶

In many applications we want to compare more than two population means.

Examples:

  • mean income across several regions
  • average exam score across multiple teaching methods
  • mean response time for different algorithms
  • effect of different treatments in an experiment

A naive approach would be to perform many pairwise $t$-tests.
This is incorrect, because it inflates the Type I error rate.

ANOVA provides a single global test for comparing multiple means.


2. Statistical Question¶

Are all group means equal, or does at least one group differ?

ANOVA tests equality of means, not variances (despite the name).


3. One-Way ANOVA Model¶

3.1 Data structure¶

Suppose we have $k$ groups.

Group $i$ has observations: $$ X_{i1}, X_{i2}, \dots, X_{in_i}, \quad i = 1,\dots,k. $$

Total sample size: $$ N = \sum_{i=1}^{k} n_i. $$


3.2 Model assumption¶

The one-way ANOVA model is: $$ X_{ij} = \mu_i + \varepsilon_{ij}, $$ where:

  • $\mu_i$ = mean of group $i$
  • $\varepsilon_{ij}$ = random error

Assumptions on errors: $$ \varepsilon_{ij} \sim \mathcal{N}(0, \sigma^2), $$ independently for all $i,j$.

Equivalently: $$ X_{ij} \sim \mathcal{N}(\mu_i, \sigma^2). $$


4. Hypotheses¶

Null hypothesis¶

$$ H_0: \mu_1 = \mu_2 = \dots = \mu_k $$

Alternative hypothesis¶

$$ H_1: \text{At least one } \mu_i \text{ differs} $$

Important:

  • ANOVA does not tell which means differ
  • it only tests whether any difference exists

5. Key Idea Behind ANOVA¶

ANOVA is based on variance decomposition.

Total variability in the data can be split into:

  1. variability between groups
  2. variability within groups

If group means are truly equal, between-group variability should be small relative to within-group variability.


6. Sample Means and Grand Mean¶

Group means: $$ \bar{X}_i = \frac{1}{n_i}\sum_{j=1}^{n_i} X_{ij} $$

Grand mean: $$ \bar{X} = \frac{1}{N}\sum_{i=1}^{k}\sum_{j=1}^{n_i} X_{ij} $$


7. Decomposition of Sums of Squares (Core Theory)¶

7.1 Total Sum of Squares (SST)¶

Measures total variability: $$ \text{SST} = \sum_{i=1}^{k}\sum_{j=1}^{n_i}(X_{ij} - \bar{X})^2 $$


7.2 Between-Group Sum of Squares (SSB)¶

Measures variability due to differences between group means: $$ \text{SSB} = \sum_{i=1}^{k} n_i(\bar{X}_i - \bar{X})^2 $$


7.3 Within-Group Sum of Squares (SSW)¶

Measures variability within groups: $$ \text{SSW} = \sum_{i=1}^{k}\sum_{j=1}^{n_i}(X_{ij} - \bar{X}_i)^2 $$


7.4 Fundamental identity¶

$$ \text{SST} = \text{SSB} + \text{SSW} $$

This decomposition is exact (not approximate).


8. Degrees of Freedom¶

Total¶

$$ \text{df}_{\text{total}} = N - 1 $$

Between groups¶

$$ \text{df}_{\text{between}} = k - 1 $$

Within groups¶

$$ \text{df}_{\text{within}} = N - k $$

And: $$ (N - 1) = (k - 1) + (N - k) $$


9. Mean Squares¶

To compare variances, sums of squares are normalized by degrees of freedom.

Mean square between¶

$$ \text{MSB} = \frac{\text{SSB}}{k - 1} $$

Mean square within¶

$$ \text{MSW} = \frac{\text{SSW}}{N - k} $$

Interpretation:

  • MSW estimates the common variance $\sigma^2$
  • MSB estimates $\sigma^2$ plus potential group effects

10. The F Statistic¶

The ANOVA test statistic is: $$ F_{\text{obs}} = \frac{\text{MSB}}{\text{MSW}} $$

Under $H_0$: $$ F_{\text{obs}} \sim F(k-1, N-k) $$


11. Why the F Distribution Appears¶

Key theoretical result:

  • $\text{SSB}/\sigma^2 \sim \chi^2(k-1)$
  • $\text{SSW}/\sigma^2 \sim \chi^2(N-k)$
  • SSB and SSW are independent

Therefore: $$ \frac{(\text{SSB}/(k-1))}{(\text{SSW}/(N-k))} \sim F(k-1, N-k) $$


12. Decision Rule¶

Given significance level $\alpha$:

  • Reject $H_0$ if: $$ F_{\text{obs}} > F_{1-\alpha}(k-1, N-k) $$

  • Equivalently, reject if: $$ \text{p-value} < \alpha $$


13. ANOVA Table¶

ANOVA Table (One-Way ANOVA)

</table>¶

14. Assumptions of One-Way ANOVA¶

  1. Independence of observations
  2. Normality within each group
  3. Homogeneity of variances: $$ \sigma_1^2 = \sigma_2^2 = \dots = \sigma_k^2 $$

ANOVA is:

  • fairly robust to mild non-normality
  • not robust to strong variance heterogeneity (especially with unbalanced $n_i$)

15. What ANOVA Does and Does Not Do¶

ANOVA tests:¶

  • existence of any difference among means

ANOVA does not:¶

  • identify which groups differ
  • quantify effect size (by default)
  • establish causality

16. Relationship to t-Test¶

Special case:

  • One-way ANOVA with $k=2$ groups

Then: $$ F = t^2 $$

ANOVA generalizes the two-sample $t$-test.


17. Practical Remarks¶

  • ANOVA answers "is there any effect?"
  • Always combine with:
    • diagnostic plots
    • effect sizes
    • post-hoc tests
  • For unequal variances, consider:
    • Welch ANOVA

18. Summary¶

  • ANOVA compares multiple means simultaneously
  • Based on variance decomposition
  • Test statistic: $$ F = \frac{\text{MSB}}{\text{MSW}} $$
  • Distribution: $$ F \sim F(k-1, N-k) $$
  • Requires independence, normality, equal variances

Graphical intuition for ANOVA¶

Before introducing the formal F-test, it is useful to develop a geometric and visual intuition.

In all examples below:

  • each column represents one group,
  • black squares are individual observations,
  • circles indicate group means.

The question ANOVA answers is not whether the means look different, but whether the between-group variability is large relative to the within-group variability.

Graphical intuition for ANOVA

Source Sum of Squares df Mean Square F
Between groups SSB k − 1 MSB = SSB/(k − 1) MSB/MSW
Within groups SSW N − k MSW = SSW/(N − k)
Total SST N − 1

Probably equal

Almost surely different

Ambiguous

Implementing One-way ANOVA¶

Problem: One-Way ANOVA (Faculty Ages by Rank)¶

A researcher claims that there is a difference in the average age of assistant professors, associate professors, and full professors at her university.

Faculty members are selected randomly, and their ages are recorded.
Assume that faculty ages are normally distributed.

Test the researcher’s claim at the $\alpha = 0.01$ significance level.

The observed data are:

Rank Ages
Assistant Professor 28, 32, 36, 42, 50, 33, 38
Associate Professor 44, 61, 52, 54, 62, 45, 46
Professor 54, 56, 55, 65, 52, 50, 46
In [ ]:
 

Post Hoc Tests After One-Way ANOVA

Why Post Hoc Tests Are Needed¶

In one-way ANOVA we test the global null hypothesis

$H_0:\ \mu_1 = \mu_2 = \dots = \mu_k$

If ANOVA rejects $H_0$, we only know that at least one mean differs, but:

❌ ANOVA does not tell us which groups differ.

To identify where the differences lie, we perform post hoc multiple comparison tests.


The Multiple Comparisons Problem¶

Suppose we have $k$ groups.

  • Number of pairwise comparisons:

$m = \binom{k}{2} = \frac{k(k-1)}{2}$

If we test each comparison at level $\alpha = 0.05$, then the probability of making at least one Type I error increases rapidly.

Why We Should NOT Use Multiple Two-Sample t-Tests¶

One should never use multiple two-sample t-tests when comparing more than two groups.
Doing so inflates the Type I error rate.


Inflation of Type I Error¶

Assume we perform hypothesis tests at significance level $\alpha = 0.05$.

For one test:

  • Probability of not making a Type I error: $1 - \alpha = 0.95$
  • Probability of a Type I error: $\alpha = 0.05$

What Happens With Multiple Comparisons?¶

Suppose we perform $m$ independent comparisons.

  • Probability of no Type I errors:

    $(1 - \alpha)^m$

  • Probability of at least one Type I error (Family-Wise Error Rate):

    $\boxed{\text{FWER} = 1 - (1 - \alpha)^m}$

This probability increases rapidly as $m$ grows.


Concrete Examples¶

Example 1: Two Comparisons¶

Let $\alpha = 0.05$ and $m = 2$.

  • Probability of no Type I error:

    $(1 - 0.05)^2 = 0.9025$

  • Probability of at least one Type I error:

    $1 - 0.9025 = 0.0975$

So the Type I error rate is almost doubled.


Example 2: Five Groups¶

For $k = 5$ groups:

$m = \binom{5}{2} = 10$

  • Probability of at least one Type I error:

    $1 - (1 - 0.05)^{10} \approx 0.401$

➡️ 40% chance of falsely detecting a difference!


Interpretation¶

Even if all group means are truly equal, using multiple two-sample t-tests:

  • makes false discoveries very likely
  • produces misleading scientific conclusions
  • invalidates reported p-values

Why ANOVA Fixes This¶

  • One-way ANOVA performs a single global test
  • Controls the Type I error at level $\alpha$
  • Tests:

    $H_0:\ \mu_1 = \mu_2 = \dots = \mu_k$

Only after rejecting ANOVA do we proceed to post hoc tests that explicitly control the family-wise error rate.


Key Takeaway (Exam-Ready Sentence)¶

Performing multiple two-sample t-tests inflates the Type I error rate, with
$\text{FWER} = 1 - (1 - \alpha)^m$,
which is why ANOVA followed by post hoc tests must be used instead.

Types of Post Hoc Tests (Big Picture)¶

Method Controls FWER Assumptions Notes
Bonferroni Yes Minimal Conservative
Holm–Bonferroni Yes Minimal Less conservative
Tukey HSD Yes Equal variances Most common after ANOVA
Scheffé Yes Very general Very conservative
Fisher LSD No Equal variances Only valid if ANOVA significant

Bonferroni Correction (Core Idea)¶

Bonferroni is based on a simple inequality:

$\mathbb{P}\left(\bigcup_{i=1}^m A_i\right) \le \sum_{i=1}^m \mathbb{P}(A_i)$

To ensure:

$\text{FWER} \le \alpha$

we test each hypothesis at level:

$\boxed{\alpha_{\text{Bonf}} = \frac{\alpha}{m}}$


Bonferroni Post Hoc Test (Step by Step)

Let $m = \binom{k}{2}$ pairwise comparisons.

Step 1: Form pairwise hypotheses¶

For each pair $(i,j)$:

$H_0^{(ij)}:\ \mu_i = \mu_j$

vs

$H_1^{(ij)}:\ \mu_i \neq \mu_j$


Step 2: Compute test statistics¶

Typically use two-sample t-tests:

  • Pooled variance if equal variances assumed
  • Welch t-test otherwise

Using the pooled within-group variance estimate from ANOVA (Mean Square Error):

$\text{MSE} = \text{MSW}$

the Bonferroni test statistic is

$t_{ij}¶

\frac{\bar{x}_i - \bar{x}_j} {\sqrt{\text{MSE}\left(\frac{1}{n_i} + \frac{1}{n_j}\right)}}$

where:

  • $\bar{x}_i,\bar{x}_j$ are the sample means of groups $i$ and $j$
  • $n_i,n_j$ are the corresponding sample sizes
  • $\text{MSE}$ is taken from the ANOVA table

Degrees of Freedom¶

$df = N - k$

where:

  • $N$ is the total sample size
  • $k$ is the number of groups

Bonferroni Adjustment¶

If $m = \binom{k}{2}$ pairwise comparisons are performed, the Bonferroni-adjusted significance level is

$\alpha_{\text{Bonf}} = \frac{\alpha}{m}$


Decision Rule (Two-Sided)¶

Reject $H_0^{(ij)}$ if either of the following equivalent conditions holds:

$|t_{ij}| > t_{1-\alpha_{\text{Bonf}}/2,\,df}$

or

$p_{ij} < \alpha_{\text{Bonf}}$


Equivalent p-Value Formulation¶

Alternatively, define the adjusted p-value

$p^{\text{Bonf}}_{ij} = \min(m \cdot p_{ij},\ 1)$

Reject $H_0^{(ij)}$ if

$p^{\text{Bonf}}_{ij} < \alpha$


Interpretation¶

If the Bonferroni-adjusted test rejects $H_0^{(ij)}$, we conclude that the mean responses of groups $i$ and $j$ differ, while maintaining family-wise error rate control at level $\alpha$.


Properties of Bonferroni¶

Advantages¶

✔ Very simple
✔ Works with any test statistic
✔ No distributional assumptions beyond the base test
✔ Valid for unbalanced designs

Disadvantages¶

❌ Conservative, especially when $m$ is large
❌ Reduced power (more Type II errors)


When to Use Bonferroni¶

Bonferroni is appropriate when:

  • Number of comparisons is small
  • Strong control of Type I error is required
  • Assumptions for Tukey HSD are doubtful
  • You want a safe default method

Comparison with Tukey HSD¶

Aspect Bonferroni Tukey
Power Lower Higher
FWER control Guaranteed Guaranteed
Assumes equal variances No Yes
Uses ANOVA MSE Optional Yes
Typical use General Standard ANOVA

Summary (Exam-Ready)¶

  • ANOVA answers whether differences exist
  • Post hoc tests answer where differences exist
  • Bonferroni controls FWER by splitting $\alpha$
  • Simple, robust, but conservative
  • Often a baseline method to compare with Tukey HSD

Key sentence:
Bonferroni correction controls the family-wise error rate by testing each comparison at level $\alpha/m$.

Tukey’s HSD (Honestly Significant Difference) Test

Context: Post Hoc Testing After ANOVA¶

Recall that one-way ANOVA tests the global hypothesis

$H_0:\ \mu_1 = \mu_2 = \dots = \mu_k$

If ANOVA rejects $H_0$, we conclude that at least one mean differs, but we still do not know which pairs of means differ.

👉 Tukey’s HSD is a post hoc multiple comparison procedure designed specifically for all pairwise comparisons after ANOVA.


What Tukey’s HSD Tests¶

For every pair of groups $(i,j)$, Tukey’s HSD tests

$H_0^{(ij)}:\ \mu_i = \mu_j$

vs

$H_1^{(ij)}:\ \mu_i \neq \mu_j$

while controlling the family-wise error rate (FWER) at level $\alpha$.


Key Idea Behind Tukey’s HSD¶

Tukey’s HSD uses the studentized range distribution, which accounts for the fact that:

  • we are comparing many means simultaneously
  • the maximum difference among sample means is more variable than a single difference

Instead of adjusting $\alpha$ (like Bonferroni), Tukey adjusts the critical value.


Assumptions of Tukey’s HSD¶

Tukey’s HSD relies on the same assumptions as one-way ANOVA:

  1. Independence of observations
  2. Normality within each group
  3. Equal population variances
  4. Balanced or approximately balanced design (robust if mildly unbalanced)

If variances are unequal, Tukey’s HSD may not be valid.


Test Statistic¶

Let:

  • $\bar x_i$ = sample mean of group $i$
  • $n_i$ = sample size of group $i$
  • $MSE$ = mean square error from ANOVA (IMPORTANT: This is SSW = Sum of Squares Within groups from ANOVA table)
  • $k$ = number of groups

For groups with equal sample sizes $n$:

$\text{SE} = \sqrt{\frac{MSE}{n}}$

For unequal sample sizes (Tukey–Kramer):

$\text{SE}_{ij} = \sqrt{\frac{MSE}{2}\left(\frac{1}{n_i} + \frac{1}{n_j}\right)}$


Tukey HSD Test Statistic¶

The Tukey test compares the absolute mean difference to a critical threshold:

$|\bar x_i - \bar x_j|$

Reject $H_0^{(ij)}$ if:

$|\bar x_i - \bar x_j| > q_{\alpha,k,df}\cdot \text{SE}_{ij}$

where:

  • $q_{\alpha,k,df}$ is the upper $\alpha$ quantile of the studentized range distribution
  • $df = N - k$ (within-group degrees of freedom)

Studentized Range Distribution¶

The studentized range statistic is:

$q = \frac{\max(\bar X_1,\dots,\bar X_k) - \min(\bar X_1,\dots,\bar X_k)}{S}$

where $S$ is an estimate of the standard deviation.

This distribution explicitly accounts for multiple comparisons among means.


Family-Wise Error Control¶

Tukey’s HSD guarantees:

$\mathbb{P}(\text{at least one Type I error}) \le \alpha$

for all pairwise mean comparisons.

This is exact control, not an approximation.


Tukey HSD Confidence Intervals¶

For each pair $(i,j)$, Tukey’s method produces simultaneous confidence intervals:

$(\bar x_i - \bar x_j) \ \pm\ q_{1-\alpha,k,df}\cdot \text{SE}_{ij}$

All intervals jointly have coverage probability at least $1-\alpha$.

If an interval does not contain 0, the corresponding means differ significantly.


Comparison with Bonferroni¶

Aspect Tukey HSD Bonferroni
Designed for pairwise means Yes No (general)
Uses ANOVA MSE Yes Optional
Equal variance assumption Yes No
Power Higher Lower
FWER control Exact Upper bound
Conservativeness Moderate Often very conservative

When to Use Tukey’s HSD¶

Use Tukey’s HSD when:

✔ ANOVA is significant
✔ You want all pairwise comparisons
✔ Variances are approximately equal
✔ You want higher power than Bonferroni

Avoid Tukey’s HSD when variances differ substantially.


Interpretation Example¶

If Tukey’s HSD finds that:

  • Group A vs B: significant
  • Group A vs C: significant
  • Group B vs C: not significant

then we conclude:

Group A differs from both B and C, while B and C are statistically indistinguishable.


Common Misconceptions¶

❌ Tukey’s HSD can be used without ANOVA
✔ It can, but it is intended as a post hoc method

❌ Tukey tests variances
✔ Tukey compares means, not variances

❌ Tukey is always better than Bonferroni
✔ Only when assumptions hold


Exam-Ready Summary¶

  • Tukey’s HSD is a post hoc test for all pairwise mean comparisons
  • Controls family-wise error rate exactly
  • Based on the studentized range distribution
  • More powerful than Bonferroni under equal variances
  • Standard choice after one-way ANOVA

Key sentence:
Tukey’s HSD controls the family-wise error rate by using the studentized range distribution to compare all pairwise mean differences simultaneously.

In [ ]:
 

Example 2¶

Problem¶

Three fuel injection systems are tested for efficiency, and the following coded data are obtained:

System 1 System 2 System 3
48 60 57
56 56 55
46 53 52
45 60 50
50 51 51

Question¶

Do the data support the hypothesis that the three fuel injection systems offer equivalent levels of efficiency?