Mann–Whitney U Test (Wilcoxon Rank-Sum Test)

The Mann–Whitney U test is a non-parametric test for comparing two independent samples.
It assesses whether one distribution tends to produce larger values than the other and is a robust alternative to the two-sample t-test.

1. Problem setup¶

Let $$ X_1,\dots,X_{n_1} \sim F, \qquad Y_1,\dots,Y_{n_2} \sim G, $$ where all observations are independent.

The goal is to compare the distributions $F$ and $G$ without assuming normality.

2. Hypotheses¶

Null hypothesis $$ H_0: F = G $$
Alternative hypothesis $$ H_1: F \neq G $$ (or one-sided variants: $F$ stochastically dominates $G$ or vice versa)

⚠️ Important: this is not a test of equality of means in general.

Null hypothesis (Mann–Whitney U test).

Let $X$ and $Y$ be independent random variables representing observations from the two groups. The Mann–Whitney test is based on the null hypothesis $$ H_0:\; \mathbb P(X<Y) + \tfrac12\,\mathbb P(X=Y) = \tfrac12. $$

This states that a randomly chosen observation from one group is equally likely to be smaller or larger than a randomly chosen observation from the other group, with ties counted as half in each direction. Equivalently, under $H_0$ there is no systematic tendency for one distribution to produce larger values than the other. In the continuous case, where $\mathbb P(X=Y)=0$, this reduces to $\mathbb P(X<Y)=1/2$. The null hypothesis therefore concerns stochastic ordering of the two distributions, not equality of medians, except under additional assumptions such as a pure location shift.¶

3. Assumptions¶

Independence within and between samples
Continuous distributions (no ties, for exact theory)
Identical shapes under $H_0$ (location-shift model interpretation)

4. Test statistic¶

Rank-based form¶

Pool all observations $X_1,\dots,X_{n_1},Y_1,\dots,Y_{n_2}$
Rank them from smallest to largest (average ranks in case of ties)

The Mann–Whitney statistics can be written as $$ U_X = n_1 n_2 + \frac{n_1(n_1+1)}{2} - R_X, \qquad U_Y = n_1 n_2 + \frac{n_2(n_2+1)}{2} - R_Y, $$ where $$ R_X = \sum_{i=1}^{n_1} R(X_i), \qquad R_Y = \sum_{j=1}^{n_2} R(Y_j) $$ are the rank sums of the $X$ and $Y$ samples, respectively.

The test statistic used in the Mann–Whitney test is $$ U = \min(U_X, U_Y). $$

This symmetrization ensures invariance under relabeling of the two samples.

Notice that the table for critical values gives the values for the 2-tailed test. In other words, we will fail to reject $H_0$ when The U statistic is greater than the critical value from the table.

Pairwise-comparison form (theoretical form)¶

Equivalently, $$ U_X = \sum_{i=1}^{n_1}\sum_{j=1}^{n_2} \mathbf{1}\{X_i < Y_j\}, $$ with ties handled via midranks in practice.

This representation is central for the theoretical interpretation of the test.

5. Relationship between statistics¶

The rank-sum statistic $R_X$ and $U_X$ are affinely related: $$ R_X = U_X + \frac{n_1(n_1+1)}{2}. $$

All formulations $(R_X, U_X, U)$ induce identical tests and p-values, differing only by centering and symmetrization.

6. Exact null distribution (finite sample)¶

Under $H_0$:

All $n_1+n_2$ ranks are fixed
Every allocation of ranks to the $X$ and $Y$ samples is equally likely

Thus, $U_X$ has an exact permutation distribution depending only on $(n_1,n_2)$.

Formally: $$

\mathbb{P}(U_X = u)¶

\frac{#{\text{rank allocations yielding } u}}{\binom{n_1+n_2}{n_1}}. $$

This distribution is:

discrete
distribution-free
symmetric about $\frac{n_1 n_2}{2}$

Exact null distribution: numerical example (Mann–Whitney U)¶

Consider two samples:

Sample $X$ with size $n_1 = 2$
Sample $Y$ with size $n_2 = 3$

Under the null hypothesis $H_0$, the two samples come from the same continuous distribution.

Step 1. Fixed ranks under $H_0$¶

Pool all observations and assign ranks
$1,2,3,4,5$.

Under $H_0$:

The ranks themselves are fixed
Every allocation of $n_1 = 2$ ranks to sample $X$ is equally likely

Total number of allocations: $$ \binom{n_1+n_2}{n_1} = \binom{5}{2} = 10. $$

Each allocation has probability $1/10$.

Step 2. Definition of the statistic¶

Let $R_X$ be the sum of the ranks assigned to sample $X$.

The Mann–Whitney statistic is defined as $$ U_X = R_X - \frac{n_1(n_1+1)}{2} = R_X - 3. $$

Step 3. Enumerate all rank allocations¶

For $n_1 = 2$, the Mann–Whitney statistic is $$ U_X = R_X - 3. $$

Ranks assigned to $X$	$R_X$	$U_X$
$\{1,2\}$	3	0
$\{1,3\}$	4	1
$\{1,4\}$	5	2
$\{1,5\}$	6	3
$\{2,3\}$	5	2
$\{2,4\}$	6	3
$\{2,5\}$	7	4
$\{3,4\}$	7	4
$\{3,5\}$	8	5
$\{4,5\}$	9	6

Step 4. Exact null distribution of $U_X$¶

By counting how many allocations produce each value of $U_X$, we obtain:

$u$	Count	$\mathbb{P}(U_X = u)$
0	1	0.1
1	1	0.1
2	2	0.2
3	2	0.2
4	2	0.2
5	1	0.1
6	1	0.1

Formally, $$

\mathbb{P}(U_X = u)¶

\frac{#{\text{rank allocations yielding } u}}{\binom{5}{2}}. $$

Step 5. Symmetry¶

Here, $$ n_1 n_2 = 2 \cdot 3 = 6, $$ so the distribution of $U_X$ is symmetric about $$ \frac{n_1 n_2}{2} = 3. $$

Indeed, $$ \mathbb{P}(U_X = 0) = \mathbb{P}(U_X = 6), \quad \mathbb{P}(U_X = 1) = \mathbb{P}(U_X = 5), \quad \mathbb{P}(U_X = 2) = \mathbb{P}(U_X = 4). $$

Moreover, $$ U_Y = n_1 n_2 - U_X = 6 - U_X, $$ so the two Mann–Whitney statistics are complementary for each allocation.

Conclusion¶

This example shows explicitly that under $H_0$:

$U_X$ has a finite-sample exact permutation distribution
The distribution depends only on $(n_1, n_2)$
No assumptions on the underlying population distribution are required

7. Support of the distribution¶

The minimum and maximum possible values of $U_X$ are: $$ U_{X,\min} = 0, \qquad U_{X,\max} = n_1 n_2. $$

Thus: $$ U_X \in \{0,1,\dots,n_1 n_2\}. $$

Each value corresponds to the number of $(X_i,Y_j)$ pairs such that $X_i < Y_j$.

8. Mean and variance under $H_0$¶

Under $H_0$: $$ \mathbb{E}[U_X] = \frac{n_1 n_2}{2}, $$ $$ \mathrm{Var}(U_X) = \frac{n_1 n_2 (n_1+n_2+1)}{12}. $$

The statistic $U=\min(U_X,U_Y)$ has the same null distribution by symmetry.

9. Why the test works (core theoretical reason)¶

Define the population parameter $$

\theta¶

\mathbb{P}(X < Y) + \tfrac12 \mathbb{P}(X = Y). $$

Then: $$ \mathbb{E}\!\left[\frac{U_X}{n_1 n_2}\right] = \theta. $$

Under $H_0: F = G$, we have $$ \theta = \tfrac12. $$

Thus, the Mann–Whitney test is a test of $$ H_0:\; \theta = \tfrac12, $$ corresponding to absence of stochastic dominance.

10. U-statistic structure¶

The statistic $U_X$ is a U-statistic with kernel $$ h(x,y) = \mathbf{1}\{x < y\}. $$

By Hoeffding’s theory of U-statistics:

$U_X$ is unbiased for $\theta$
$U_X$ is consistent
$U_X$ is asymptotically normal

11. Asymptotic null distribution (CLT)¶

As $n_1,n_2 \to \infty$: $$ \frac{U_X - \mathbb{E}[U_X]}{\sqrt{\mathrm{Var}(U_X)}} \;\xrightarrow{d}\; N(0,1). $$

This follows from the Hoeffding decomposition and the CLT for U-statistics.

Dealing with Ties in the Mann–Whitney U Test¶

It is possible that two or more observations take the same value. In this case, the Mann–Whitney U statistic can still be computed by allocating half of each tie to sample $X$ and half to sample $Y$ (equivalently, by using mean ranks).

However, when ties are present, the normal approximation to the distribution of $U$ must be used with a correction to the standard deviation. The adjusted standard deviation of $U$ is

\sigma_U¶

\sqrt{ \frac{n_x n_y}{N (N - 1)} \left[

\frac{N^3 - N}{12}¶

\sum_{j=1}^{g} \frac{t_j^3 - t_j}{12} \right] }, $

where

$N = n_x + n_y$,
$g$ is the number of groups of tied observations,
$t_j$ is the number of tied ranks in group $j$.

12. Interpretation¶

Tests stochastic dominance
Detects location shifts when distributional shapes coincide
Robust to non-normality
Sensitive to changes in distribution shape

13. When the test may mislead¶

Strongly different shapes
Heteroscedasticity
Many ties (requires correction)
Dependence between observations

In such cases, rejection does not necessarily correspond to a pure location shift.

14. Relation to the t-test¶

If $F$ and $G$ are normal with equal variances, the t-test is more powerful
Under heavy tails or outliers, Mann–Whitney is more robust
Mann–Whitney does not estimate a mean difference

15. One-sentence summary (exam-perfect)¶

The Mann–Whitney test is an exact, distribution-free U-statistic test of stochastic dominance whose null distribution arises from random rank allocations and converges asymptotically to a normal distribution.

16. One-line intuition¶

The test counts how often observations from one sample are smaller than those from the other.

Simultaneous Confidence Intervals in One-Way ANOVA - Bonferroni and Tukey (HSD / Tukey–Kramer) Methods

1. Motivation¶

In one-way ANOVA we test the global null hypothesis $$ H_0:\quad \mu_1=\mu_2=\cdots=\mu_k . $$

If this hypothesis is rejected, a natural next question is:

Which means differ, and by how much?

Using ordinary (single-parameter) confidence intervals for many comparisons leads to inflated Type I error, because several intervals are examined simultaneously.

Goal: Construct confidence intervals that hold simultaneously for a family of parameters with overall confidence level $1-\alpha$.

2. Model and notation¶

We consider the classical one-way ANOVA model $$ X_{ij} = \mu_i + \varepsilon_{ij}, \qquad i=1,\dots,k,\quad j=1,\dots,n_i, $$ where

$\varepsilon_{ij} \stackrel{\text{i.i.d.}}{\sim} N(0,\sigma^2)$,
samples are independent,
group variances are equal.

Define: $$ \bar X_i = \frac{1}{n_i}\sum_{j=1}^{n_i} X_{ij}, \qquad N=\sum_{i=1}^k n_i . $$

The Mean Square Error (MSE) is $$

\text{MSE}¶

\frac{1}{N-k} \sum{i=1}^k\sum{j=1}^{ni} (X{ij}-\bar X_i)^2 , $$ with $\nu=N-k$ degrees of freedom.

3. What does “simultaneous” mean?¶

Let $\theta_1,\dots,\theta_m$ be parameters of interest (e.g. mean differences).

Intervals $I_1,\dots,I_m$ are simultaneous confidence intervals with level $1-\alpha$ if $$ \mathbb P\big(\theta_1\in I_1,\dots,\theta_m\in I_m\big)\ge 1-\alpha . $$

This is stronger than marginal coverage $$ \mathbb P(\theta_\ell\in I_\ell)\ge 1-\alpha \quad \text{for each } \ell . $$

4. The multiple comparison problem¶

If we construct $m$ ordinary $1-\alpha$ confidence intervals independently, then $$ \mathbb P(\text{all correct}) \le (1-\alpha)^m , $$ which can be very small for large $m$.

Simultaneous methods control the family-wise error rate (FWER): $$ \mathbb P(\text{at least one false statement}) \le \alpha . $$

5. Bonferroni simultaneous confidence intervals¶

5.1 Bonferroni inequality¶

For any events $A_1,\dots,A_m$, $$ \mathbb P\Big(\bigcup_{\ell=1}^m A_\ell\Big) \le \sum_{\ell=1}^m \mathbb P(A_\ell). $$

This bound is distribution-free and does not require independence.

5.2 Bonferroni confidence intervals¶

Suppose $\hat\theta_\ell$ estimates $\theta_\ell$ and $$ \frac{\hat\theta_\ell-\theta_\ell} {\widehat{\mathrm{SE}}(\hat\theta_\ell)} \sim t_\nu . $$

Define intervals $$ I_\ell:\quad \hat\theta_\ell \pm t_{1-\alpha/(2m),\nu} \,\widehat{\mathrm{SE}}(\hat\theta_\ell), \qquad \ell=1,\dots,m . $$

Then $$ \mathbb P\big(\theta_1\in I_1,\dots,\theta_m\in I_m\big) \ge 1-\alpha . $$

5.3 Bonferroni CIs for pairwise mean differences¶

For comparisons $\mu_i-\mu_j$, $$ \widehat{\mu_i-\mu_j}=\bar X_i-\bar X_j , $$ with standard error $$

\widehat{\mathrm{SE}}(\bar X_i-\bar X_j)¶

\sqrt{\text{MSE} \Big(\frac{1}{n_i}+\frac{1}{n_j}\Big)} . $$

If $m=\binom{k}{2}$, the Bonferroni confidence interval is $$ (\bar X_i-\bar X_j) \pm t_{1-\alpha/(2m),\nu} \sqrt{\text{MSE} \Big(\frac{1}{n_i}+\frac{1}{n_j}\Big)} . $$

5.4 Properties of Bonferroni intervals¶

Valid for any collection of contrasts
No independence assumption required
Often conservative for large $m$
Most effective for few, pre-planned comparisons

6. Tukey’s method (HSD / Tukey–Kramer)¶

6.1 Studentized range distribution¶

Let $Z_1,\dots,Z_k \sim N(0,1)$ i.i.d. The studentized range is $$ Q = \frac{\max_i Z_i - \min_i Z_i}{\sqrt{S^2}}, $$ where $S^2$ is an independent variance estimator.

Its distribution depends on:

$k$: number of groups,
$\nu$: error degrees of freedom.

Quantiles are denoted $q_{1-\alpha}(k,\nu)$.

6.2 Tukey HSD (balanced design)¶

Assume equal sample sizes $$ n_1=\cdots=n_k=n . $$

Then $$

\bar X_i-\bar X_j¶

(\mu_i-\muj) + \sigma\sqrt{\frac{2}{n}}\,Z{ij}, $$ and $$ \sqrt{\frac{\text{MSE}}{n}} $$ estimates $\sigma/\sqrt{n}$.

The Tukey HSD confidence interval is $$ (\bar X_i-\bar X_j) \pm q_{1-\alpha}(k,\nu) \sqrt{\frac{\text{MSE}}{n}} . $$

These intervals are simultaneous for all $\binom{k}{2}$ pairwise differences.

6.3 Tukey–Kramer method (unbalanced design)¶

When group sizes differ, the Tukey–Kramer interval is $$ (\bar X_i-\bar X_j) \pm q_{1-\alpha}(k,\nu) \sqrt{ \frac{\text{MSE}}{2} \Big(\frac{1}{n_i}+\frac{1}{n_j}\Big) } . $$

This reduces to Tukey HSD when $n_i=n$.

6.4 Properties of Tukey intervals¶

Exact FWER control for all pairwise comparisons
Shorter than Bonferroni for many groups
Requires normality and homoscedasticity
Not valid for arbitrary contrasts

7. Relationship to the ANOVA F-test¶

ANOVA F-test asks:

Is there at least one difference among means?
Simultaneous confidence intervals ask:

Which differences exist, and how large are they?

Key facts:

If the ANOVA F-test is not significant, all Tukey intervals contain $0$
A difference is significant iff its simultaneous CI excludes $0$

8. Bonferroni vs Tukey: comparison¶

Aspect	Bonferroni	Tukey (HSD / Kramer)
Comparisons	Arbitrary	All pairwise
Error control	Always valid	Exact under ANOVA
Interval width	Often wider	Usually narrower
Planning	Pre-specified	Exploratory
Variance assumption	None	Equal variances

9. Practical guidance¶

Few planned contrasts $\rightarrow$ Bonferroni
All pairwise comparisons $\rightarrow$ Tukey
Many groups, exploratory analysis $\rightarrow$ Tukey
Teaching multiple testing theory $\rightarrow$ Bonferroni first

10. Summary¶

Simultaneous confidence intervals control family-wise error
Bonferroni is general, simple, and conservative
Tukey exploits ANOVA structure for efficient pairwise inference
Both methods extend naturally from one-way ANOVA

Rank	Ages
Assistant Professor	28, 32, 36, 42, 50, 33, 38
Associate Professor	44, 61, 52, 54, 62, 45, 46
Professor	54, 56, 55, 65, 52, 50, 46

Seminar 8

Mann–Whitney U Test (Wilcoxon Rank-Sum Test)

1. Problem setup¶

2. Hypotheses¶

3. Assumptions¶

4. Test statistic¶

Rank-based form¶

Pairwise-comparison form (theoretical form)¶

5. Relationship between statistics¶

6. Exact null distribution (finite sample)¶

\mathbb{P}(U_X = u)¶

Exact null distribution: numerical example (Mann–Whitney U)¶

Step 1. Fixed ranks under $H_0$¶

Step 2. Definition of the statistic¶

Step 3. Enumerate all rank allocations¶

Step 4. Exact null distribution of $U_X$¶

\mathbb{P}(U_X = u)¶

Step 5. Symmetry¶

Conclusion¶

7. Support of the distribution¶

8. Mean and variance under $H_0$¶

9. Why the test works (core theoretical reason)¶

\theta¶

10. U-statistic structure¶

11. Asymptotic null distribution (CLT)¶

Dealing with Ties in the Mann–Whitney U Test¶

\sigma_U¶

\frac{N^3 - N}{12}¶

12. Interpretation¶

13. When the test may mislead¶

14. Relation to the t-test¶

15. One-sentence summary (exam-perfect)¶

16. One-line intuition¶

Simultaneous Confidence Intervals in One-Way ANOVA - Bonferroni and Tukey (HSD / Tukey–Kramer) Methods

1. Motivation¶

2. Model and notation¶

\text{MSE}¶

3. What does “simultaneous” mean?¶

4. The multiple comparison problem¶

5. Bonferroni simultaneous confidence intervals¶

5.1 Bonferroni inequality¶

5.2 Bonferroni confidence intervals¶

5.3 Bonferroni CIs for pairwise mean differences¶

\widehat{\mathrm{SE}}(\bar X_i-\bar X_j)¶

5.4 Properties of Bonferroni intervals¶

6. Tukey’s method (HSD / Tukey–Kramer)¶

6.1 Studentized range distribution¶

6.2 Tukey HSD (balanced design)¶

\bar X_i-\bar X_j¶

6.3 Tukey–Kramer method (unbalanced design)¶

6.4 Properties of Tukey intervals¶

7. Relationship to the ANOVA F-test¶

8. Bonferroni vs Tukey: comparison¶

9. Practical guidance¶

10. Summary¶

Problem: One-Way ANOVA (Faculty Ages by Rank)¶