Seminar 8

Mann–Whitney U Test (Wilcoxon Rank-Sum Test)

The Mann–Whitney U test is a non-parametric test for comparing two independent samples.
It assesses whether one distribution tends to produce larger values than the other and is a robust alternative to the two-sample t-test.


1. Problem setup¶

Let $$ X_1,\dots,X_{n_1} \sim F, \qquad Y_1,\dots,Y_{n_2} \sim G, $$ where all observations are independent.

The goal is to compare the distributions $F$ and $G$ without assuming normality.


2. Hypotheses¶

  • Null hypothesis $$ H_0: F = G $$

  • Alternative hypothesis $$ H_1: F \neq G $$ (or one-sided variants: $F$ stochastically dominates $G$ or vice versa)

⚠️ Important: this is not a test of equality of means in general.

Null hypothesis (Mann–Whitney U test).

Let $X$ and $Y$ be independent random variables representing observations from the two groups. The Mann–Whitney test is based on the null hypothesis $$ H_0:\; \mathbb P(X<Y) + \tfrac12\,\mathbb P(X=Y) = \tfrac12. $$

This states that a randomly chosen observation from one group is equally likely to be smaller or larger than a randomly chosen observation from the other group, with ties counted as half in each direction. Equivalently, under $H_0$ there is no systematic tendency for one distribution to produce larger values than the other. In the continuous case, where $\mathbb P(X=Y)=0$, this reduces to $\mathbb P(X<Y)=1/2$. The null hypothesis therefore concerns stochastic ordering of the two distributions, not equality of medians, except under additional assumptions such as a pure location shift.¶

3. Assumptions¶

  • Independence within and between samples
  • Continuous distributions (no ties, for exact theory)
  • Identical shapes under $H_0$ (location-shift model interpretation)

4. Test statistic¶

Rank-based form¶

  1. Pool all observations $X_1,\dots,X_{n_1},Y_1,\dots,Y_{n_2}$
  2. Rank them from smallest to largest (average ranks in case of ties)

The Mann–Whitney statistics can be written as $$ U_X = n_1 n_2 + \frac{n_1(n_1+1)}{2} - R_X, \qquad U_Y = n_1 n_2 + \frac{n_2(n_2+1)}{2} - R_Y, $$ where $$ R_X = \sum_{i=1}^{n_1} R(X_i), \qquad R_Y = \sum_{j=1}^{n_2} R(Y_j) $$ are the rank sums of the $X$ and $Y$ samples, respectively.

The test statistic used in the Mann–Whitney test is $$ U = \min(U_X, U_Y). $$

This symmetrization ensures invariance under relabeling of the two samples.

Notice that the table for critical values gives the values for the 2-tailed test. In other words, we will fail to reject $H_0$ when The U statistic is greater than the critical value from the table.


Pairwise-comparison form (theoretical form)¶

Equivalently, $$ U_X = \sum_{i=1}^{n_1}\sum_{j=1}^{n_2} \mathbf{1}\{X_i < Y_j\}, $$ with ties handled via midranks in practice.

This representation is central for the theoretical interpretation of the test.


5. Relationship between statistics¶

The rank-sum statistic $R_X$ and $U_X$ are affinely related: $$ R_X = U_X + \frac{n_1(n_1+1)}{2}. $$

All formulations $(R_X, U_X, U)$ induce identical tests and p-values, differing only by centering and symmetrization.


6. Exact null distribution (finite sample)¶

Under $H_0$:

  • All $n_1+n_2$ ranks are fixed
  • Every allocation of ranks to the $X$ and $Y$ samples is equally likely

Thus, $U_X$ has an exact permutation distribution depending only on $(n_1,n_2)$.

Formally: $$

\mathbb{P}(U_X = u)¶

\frac{#{\text{rank allocations yielding } u}}{\binom{n_1+n_2}{n_1}}. $$

This distribution is:

  • discrete
  • distribution-free
  • symmetric about $\frac{n_1 n_2}{2}$

Exact null distribution: numerical example (Mann–Whitney U)¶

Consider two samples:

  • Sample $X$ with size $n_1 = 2$
  • Sample $Y$ with size $n_2 = 3$

Under the null hypothesis $H_0$, the two samples come from the same continuous distribution.


Step 1. Fixed ranks under $H_0$¶

Pool all observations and assign ranks
$1,2,3,4,5$.

Under $H_0$:

  • The ranks themselves are fixed
  • Every allocation of $n_1 = 2$ ranks to sample $X$ is equally likely

Total number of allocations: $$ \binom{n_1+n_2}{n_1} = \binom{5}{2} = 10. $$

Each allocation has probability $1/10$.


Step 2. Definition of the statistic¶

Let $R_X$ be the sum of the ranks assigned to sample $X$.

The Mann–Whitney statistic is defined as $$ U_X = R_X - \frac{n_1(n_1+1)}{2} = R_X - 3. $$


Step 3. Enumerate all rank allocations¶

For $n_1 = 2$, the Mann–Whitney statistic is $$ U_X = R_X - 3. $$

Ranks assigned to $X$ $R_X$ $U_X$
$\{1,2\}$ 3 0
$\{1,3\}$ 4 1
$\{1,4\}$ 5 2
$\{1,5\}$ 6 3
$\{2,3\}$ 5 2
$\{2,4\}$ 6 3
$\{2,5\}$ 7 4
$\{3,4\}$ 7 4
$\{3,5\}$ 8 5
$\{4,5\}$ 9 6

Step 4. Exact null distribution of $U_X$¶

By counting how many allocations produce each value of $U_X$, we obtain:

$u$ Count $\mathbb{P}(U_X = u)$
0 1 0.1
1 1 0.1
2 2 0.2
3 2 0.2
4 2 0.2
5 1 0.1
6 1 0.1

Formally, $$

\mathbb{P}(U_X = u)¶

\frac{#{\text{rank allocations yielding } u}}{\binom{5}{2}}. $$


Step 5. Symmetry¶

Here, $$ n_1 n_2 = 2 \cdot 3 = 6, $$ so the distribution of $U_X$ is symmetric about $$ \frac{n_1 n_2}{2} = 3. $$

Indeed, $$ \mathbb{P}(U_X = 0) = \mathbb{P}(U_X = 6), \quad \mathbb{P}(U_X = 1) = \mathbb{P}(U_X = 5), \quad \mathbb{P}(U_X = 2) = \mathbb{P}(U_X = 4). $$

Moreover, $$ U_Y = n_1 n_2 - U_X = 6 - U_X, $$ so the two Mann–Whitney statistics are complementary for each allocation.


Conclusion¶

This example shows explicitly that under $H_0$:

  • $U_X$ has a finite-sample exact permutation distribution
  • The distribution depends only on $(n_1, n_2)$
  • No assumptions on the underlying population distribution are required

7. Support of the distribution¶

The minimum and maximum possible values of $U_X$ are: $$ U_{X,\min} = 0, \qquad U_{X,\max} = n_1 n_2. $$

Thus: $$ U_X \in \{0,1,\dots,n_1 n_2\}. $$

Each value corresponds to the number of $(X_i,Y_j)$ pairs such that $X_i < Y_j$.


8. Mean and variance under $H_0$¶

Under $H_0$: $$ \mathbb{E}[U_X] = \frac{n_1 n_2}{2}, $$ $$ \mathrm{Var}(U_X) = \frac{n_1 n_2 (n_1+n_2+1)}{12}. $$

The statistic $U=\min(U_X,U_Y)$ has the same null distribution by symmetry.


9. Why the test works (core theoretical reason)¶

Define the population parameter $$

\theta¶

\mathbb{P}(X < Y) + \tfrac12 \mathbb{P}(X = Y). $$

Then: $$ \mathbb{E}\!\left[\frac{U_X}{n_1 n_2}\right] = \theta. $$

Under $H_0: F = G$, we have $$ \theta = \tfrac12. $$

Thus, the Mann–Whitney test is a test of $$ H_0:\; \theta = \tfrac12, $$ corresponding to absence of stochastic dominance.


10. U-statistic structure¶

The statistic $U_X$ is a U-statistic with kernel $$ h(x,y) = \mathbf{1}\{x < y\}. $$

By Hoeffding’s theory of U-statistics:

  • $U_X$ is unbiased for $\theta$
  • $U_X$ is consistent
  • $U_X$ is asymptotically normal

11. Asymptotic null distribution (CLT)¶

As $n_1,n_2 \to \infty$: $$ \frac{U_X - \mathbb{E}[U_X]}{\sqrt{\mathrm{Var}(U_X)}} \;\xrightarrow{d}\; N(0,1). $$

This follows from the Hoeffding decomposition and the CLT for U-statistics.

Dealing with Ties in the Mann–Whitney U Test¶

It is possible that two or more observations take the same value. In this case, the Mann–Whitney U statistic can still be computed by allocating half of each tie to sample $X$ and half to sample $Y$ (equivalently, by using mean ranks).

However, when ties are present, the normal approximation to the distribution of $U$ must be used with a correction to the standard deviation. The adjusted standard deviation of $U$ is

$

\sigma_U¶

\sqrt{ \frac{n_x n_y}{N (N - 1)} \left[

\frac{N^3 - N}{12}¶

\sum_{j=1}^{g} \frac{t_j^3 - t_j}{12} \right] }, $

where

  • $N = n_x + n_y$,
  • $g$ is the number of groups of tied observations,
  • $t_j$ is the number of tied ranks in group $j$.

12. Interpretation¶

  • Tests stochastic dominance
  • Detects location shifts when distributional shapes coincide
  • Robust to non-normality
  • Sensitive to changes in distribution shape

13. When the test may mislead¶

  • Strongly different shapes
  • Heteroscedasticity
  • Many ties (requires correction)
  • Dependence between observations

In such cases, rejection does not necessarily correspond to a pure location shift.


14. Relation to the t-test¶

  • If $F$ and $G$ are normal with equal variances, the t-test is more powerful
  • Under heavy tails or outliers, Mann–Whitney is more robust
  • Mann–Whitney does not estimate a mean difference

15. One-sentence summary (exam-perfect)¶

The Mann–Whitney test is an exact, distribution-free U-statistic test of stochastic dominance whose null distribution arises from random rank allocations and converges asymptotically to a normal distribution.


16. One-line intuition¶

The test counts how often observations from one sample are smaller than those from the other.

In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 

Simultaneous Confidence Intervals in One-Way ANOVA - Bonferroni and Tukey (HSD / Tukey–Kramer) Methods


1. Motivation¶

In one-way ANOVA we test the global null hypothesis $$ H_0:\quad \mu_1=\mu_2=\cdots=\mu_k . $$

If this hypothesis is rejected, a natural next question is:

Which means differ, and by how much?

Using ordinary (single-parameter) confidence intervals for many comparisons leads to inflated Type I error, because several intervals are examined simultaneously.

Goal: Construct confidence intervals that hold simultaneously for a family of parameters with overall confidence level $1-\alpha$.


2. Model and notation¶

We consider the classical one-way ANOVA model $$ X_{ij} = \mu_i + \varepsilon_{ij}, \qquad i=1,\dots,k,\quad j=1,\dots,n_i, $$ where

  • $\varepsilon_{ij} \stackrel{\text{i.i.d.}}{\sim} N(0,\sigma^2)$,
  • samples are independent,
  • group variances are equal.

Define: $$ \bar X_i = \frac{1}{n_i}\sum_{j=1}^{n_i} X_{ij}, \qquad N=\sum_{i=1}^k n_i . $$

The Mean Square Error (MSE) is $$

\text{MSE}¶

\frac{1}{N-k} \sum{i=1}^k\sum{j=1}^{ni} (X{ij}-\bar X_i)^2 , $$ with $\nu=N-k$ degrees of freedom.


3. What does “simultaneous” mean?¶

Let $\theta_1,\dots,\theta_m$ be parameters of interest (e.g. mean differences).

Intervals $I_1,\dots,I_m$ are simultaneous confidence intervals with level $1-\alpha$ if $$ \mathbb P\big(\theta_1\in I_1,\dots,\theta_m\in I_m\big)\ge 1-\alpha . $$

This is stronger than marginal coverage $$ \mathbb P(\theta_\ell\in I_\ell)\ge 1-\alpha \quad \text{for each } \ell . $$


4. The multiple comparison problem¶

If we construct $m$ ordinary $1-\alpha$ confidence intervals independently, then $$ \mathbb P(\text{all correct}) \le (1-\alpha)^m , $$ which can be very small for large $m$.

Simultaneous methods control the family-wise error rate (FWER): $$ \mathbb P(\text{at least one false statement}) \le \alpha . $$


5. Bonferroni simultaneous confidence intervals¶

5.1 Bonferroni inequality¶

For any events $A_1,\dots,A_m$, $$ \mathbb P\Big(\bigcup_{\ell=1}^m A_\ell\Big) \le \sum_{\ell=1}^m \mathbb P(A_\ell). $$

This bound is distribution-free and does not require independence.


5.2 Bonferroni confidence intervals¶

Suppose $\hat\theta_\ell$ estimates $\theta_\ell$ and $$ \frac{\hat\theta_\ell-\theta_\ell} {\widehat{\mathrm{SE}}(\hat\theta_\ell)} \sim t_\nu . $$

Define intervals $$ I_\ell:\quad \hat\theta_\ell \pm t_{1-\alpha/(2m),\nu} \,\widehat{\mathrm{SE}}(\hat\theta_\ell), \qquad \ell=1,\dots,m . $$

Then $$ \mathbb P\big(\theta_1\in I_1,\dots,\theta_m\in I_m\big) \ge 1-\alpha . $$


5.3 Bonferroni CIs for pairwise mean differences¶

For comparisons $\mu_i-\mu_j$, $$ \widehat{\mu_i-\mu_j}=\bar X_i-\bar X_j , $$ with standard error $$

\widehat{\mathrm{SE}}(\bar X_i-\bar X_j)¶

\sqrt{\text{MSE} \Big(\frac{1}{n_i}+\frac{1}{n_j}\Big)} . $$

If $m=\binom{k}{2}$, the Bonferroni confidence interval is $$ (\bar X_i-\bar X_j) \pm t_{1-\alpha/(2m),\nu} \sqrt{\text{MSE} \Big(\frac{1}{n_i}+\frac{1}{n_j}\Big)} . $$


5.4 Properties of Bonferroni intervals¶

  • Valid for any collection of contrasts
  • No independence assumption required
  • Often conservative for large $m$
  • Most effective for few, pre-planned comparisons

6. Tukey’s method (HSD / Tukey–Kramer)¶

6.1 Studentized range distribution¶

Let $Z_1,\dots,Z_k \sim N(0,1)$ i.i.d. The studentized range is $$ Q = \frac{\max_i Z_i - \min_i Z_i}{\sqrt{S^2}}, $$ where $S^2$ is an independent variance estimator.

Its distribution depends on:

  • $k$: number of groups,
  • $\nu$: error degrees of freedom.

Quantiles are denoted $q_{1-\alpha}(k,\nu)$.


6.2 Tukey HSD (balanced design)¶

Assume equal sample sizes $$ n_1=\cdots=n_k=n . $$

Then $$

\bar X_i-\bar X_j¶

(\mu_i-\muj) + \sigma\sqrt{\frac{2}{n}}\,Z{ij}, $$ and $$ \sqrt{\frac{\text{MSE}}{n}} $$ estimates $\sigma/\sqrt{n}$.

The Tukey HSD confidence interval is $$ (\bar X_i-\bar X_j) \pm q_{1-\alpha}(k,\nu) \sqrt{\frac{\text{MSE}}{n}} . $$

These intervals are simultaneous for all $\binom{k}{2}$ pairwise differences.


6.3 Tukey–Kramer method (unbalanced design)¶

When group sizes differ, the Tukey–Kramer interval is $$ (\bar X_i-\bar X_j) \pm q_{1-\alpha}(k,\nu) \sqrt{ \frac{\text{MSE}}{2} \Big(\frac{1}{n_i}+\frac{1}{n_j}\Big) } . $$

This reduces to Tukey HSD when $n_i=n$.


6.4 Properties of Tukey intervals¶

  • Exact FWER control for all pairwise comparisons
  • Shorter than Bonferroni for many groups
  • Requires normality and homoscedasticity
  • Not valid for arbitrary contrasts

7. Relationship to the ANOVA F-test¶

  • ANOVA F-test asks:

    Is there at least one difference among means?

  • Simultaneous confidence intervals ask:

    Which differences exist, and how large are they?

Key facts:

  • If the ANOVA F-test is not significant, all Tukey intervals contain $0$
  • A difference is significant iff its simultaneous CI excludes $0$

8. Bonferroni vs Tukey: comparison¶

Aspect Bonferroni Tukey (HSD / Kramer)
Comparisons Arbitrary All pairwise
Error control Always valid Exact under ANOVA
Interval width Often wider Usually narrower
Planning Pre-specified Exploratory
Variance assumption None Equal variances

9. Practical guidance¶

  • Few planned contrasts $\rightarrow$ Bonferroni
  • All pairwise comparisons $\rightarrow$ Tukey
  • Many groups, exploratory analysis $\rightarrow$ Tukey
  • Teaching multiple testing theory $\rightarrow$ Bonferroni first

10. Summary¶

  • Simultaneous confidence intervals control family-wise error
  • Bonferroni is general, simple, and conservative
  • Tukey exploits ANOVA structure for efficient pairwise inference
  • Both methods extend naturally from one-way ANOVA

In [ ]:
 

Problem: One-Way ANOVA (Faculty Ages by Rank)¶

A researcher claims that there is a difference in the average age of assistant professors, associate professors, and full professors at her university.

Faculty members are selected randomly, and their ages are recorded.
Assume that faculty ages are normally distributed.

Test the researcher’s claim at the $\alpha = 0.01$ significance level.

The observed data are:

Rank Ages
Assistant Professor 28, 32, 36, 42, 50, 33, 38
Associate Professor 44, 61, 52, 54, 62, 45, 46
Professor 54, 56, 55, 65, 52, 50, 46