Seminar 7

Non-Parametric Tests (The ones that we are going to discuss in here...)

1. One-sample tests & goodness-of-fit¶

Sign test¶

  • Tests whether the median equals a given value
  • Very weak assumptions
  • Mainly pedagogical (low power)

Wilcoxon signed-rank test¶

  • Non-parametric alternative to the one-sample t-test
  • Uses sign + magnitude
  • Assumes symmetry

Kolmogorov–Smirnov test (1-sample)¶

  • Tests full distribution against a known CDF
  • Sensitive to global differences
  • Parameters must be known (important caveat)

Anderson–Darling test¶

  • Goodness-of-fit test with strong tail sensitivity
  • Strictly better than KS for normality testing

2. Two-sample location tests (independent samples)¶

Mann–Whitney U test (Wilcoxon rank-sum)¶

  • Alternative to two-sample t-test
  • Tests stochastic dominance, not equality of means
  • Assumes identical shapes

Two-sample Kolmogorov–Smirnov test¶

  • Detects any distributional difference
  • Low power for pure location shifts

Brunner–Munzel test¶

  • Robust alternative to Mann–Whitney
  • Allows heteroscedasticity

3. Paired / repeated-measures tests¶

Wilcoxon signed-rank test (paired)¶

  • Non-parametric analogue of the paired t-test
  • Assumes symmetry

Sign test (paired)¶

  • Extremely robust
  • Very low power

4. More than two groups (one-way designs)¶

Kruskal–Wallis test¶

  • Non-parametric one-way ANOVA
  • Rank-based
  • Tests equality of distributions (not means)

Post-hoc procedures¶

  • Dunn test
  • Pairwise Wilcoxon tests with multiple-testing correction

5. Blocked & repeated-measures designs¶

Friedman test¶

  • Non-parametric one-way repeated-measures ANOVA
  • Blocks typically correspond to subjects

Quade test¶

  • Weighted version of Friedman
  • More powerful when blocks differ in importance

6. Factorial designs (two or more factors)¶

Aligned Rank Transform (ART) ANOVA¶

  • Non-parametric alternative to full factorial ANOVA
  • Correctly tests main effects and interactions
  • Essential for advanced courses

Permutation-based factorial ANOVA¶

  • Model-free
  • Handles interactions naturally
  • Strong conceptual link to ML validation

7. Scale / variance tests¶

Ansari–Bradley test¶

  • Rank-based test for equality of scale
  • Assumes symmetry

Fligner–Killeen test¶

  • Fully non-parametric
  • Robust to non-normality
  • Preferred in practice

Levene / Brown–Forsythe tests¶

  • Semi-parametric but widely used
  • Robust and practical

8. Association & dependence¶

Spearman’s rho¶

  • Rank correlation
  • Detects monotone relationships

Kendall’s tau¶

  • Concordance-based measure
  • Better for small samples and ties

Hoeffding’s D (optional / advanced)¶

  • Detects general dependence
  • Computationally heavier

9. Categorical data (non-parametric by nature)¶

Chi-squared tests¶

  • Goodness-of-fit
  • Independence
  • Homogeneity

Fisher’s exact test¶

  • Exact inference
  • Suitable for small samples

10. Resampling-based inference¶

Permutation tests¶

  • Location, scale, association
  • Distribution-free
  • Unifying framework for many classical tests

Bootstrap confidence intervals¶

  • Percentile interval
  • Bootstrap-t interval
  • BCa interval (recommended)

Minimal required¶

  1. Wilcoxon signed-rank
  2. Mann–Whitney U
  3. Kruskal–Wallis
  4. Friedman
  5. Spearman / Kendall
  6. Chi-squared + Fisher
  7. Aligned Rank Transform ANOVA
  8. Permutation tests
  9. Bootstrap confidence intervals

In [ ]:
 

We will cover by now:

  • Sign test
  • Wilcoxon signed-rank test
  • Mann–Whitney U test (Wilcoxon rank-sum)
  • Wilcoxon signed-rank test (paired)
  • Ansari–Bradley test
  • Levene / Brown–Forsythe tests
In [ ]:
 

Sign Test

Purpose¶

Tests whether the median of a population equals a given value, or whether the median of paired differences equals zero.


Hypotheses¶

  • H₀: median = m₀
  • H₁: median ≠ m₀ (or one-sided)

Exact statistic¶

Let

  • $$ d_i = x_i - m₀ $$
  • $$ S = \sum_{i=1}^n \mathbf{1}\{ d_i > 0 \} $$

Zero differences are discarded.


Null distribution¶

Under H₀: $$ S \sim \text{Binomial}(n, 1/2) $$

Exact p-values are computed from the binomial distribution.


Assumptions¶

  • Independent observations
  • Continuous distribution
  • No symmetry required

Why the test works (theory)¶

If m₀ is the true median, then $$ \mathbb{P}(X_i > m₀) = \mathbb{P}(X_i < m₀) = \tfrac{1}{2}. $$

By independence, the indicator variables are i.i.d. Bernoulli(1/2), yielding an exact distribution-free test.


Interpretation¶

  • Uses only signs → extremely robust
  • Low power due to loss of magnitude information
In [ ]:
 
In [2]:
import numpy as np
from math import comb

def sign_test(x, y=None, median=0, alternative="two-sided"):
    """
    Sign test.
    
    Parameters
    ----------
    x : array-like
        Sample data (or first sample if paired test).
    y : array-like or None
        Second sample for paired sign test.
    median : float
        Hypothesized median (used only if y is None).
    alternative : {"two-sided", "greater", "less"}
    
    Returns
    -------
    dict with test statistic and p-value
    """
    
    x = np.asarray(x)
    
    if y is not None:
        y = np.asarray(y)
        d = x - y
    else:
        d = x - median
    
    # Remove zeros (ties)
    d = d[d != 0]
    n = len(d)
    
    if n == 0:
        raise ValueError("All differences are zero.")
    
    S = np.sum(d > 0)  # number of positive signs
    
    # Binomial probabilities
    if alternative == "two-sided":
        k = min(S, n - S)
        p_value = 2 * sum(comb(n, i) * 0.5**n for i in range(k + 1))
        p_value = min(p_value, 1.0)
    elif alternative == "greater":
        p_value = sum(comb(n, i) * 0.5**n for i in range(S, n + 1))
    elif alternative == "less":
        p_value = sum(comb(n, i) * 0.5**n for i in range(0, S + 1))
    else:
        raise ValueError("alternative must be 'two-sided', 'greater', or 'less'")
    
    return {
        "n": n,
        "S": S,
        "p_value": p_value
    }
In [ ]:
 
In [3]:
# Example data
x = [2.1, -0.3, 1.4, 0.7, -1.2, 0.5, 0.9, -0.4]

result = sign_test(x, median=0, alternative="two-sided")

result
Out[3]:
{'n': 8, 'S': 5, 'p_value': 0.7265625}
In [ ]:
 

Example¶

A study is done to determine the effects of removing a renal blockage in patients whose renal function is impaired because of advanced metatstatic malignancy of nonurologic cause. The arterial blood pressure of a random sample of 10 patients is measured before and after surgery for treatment of the blockage yielded the following data:

  • before = [150, 132, 130, 116, 107, 100, 101, 96, 90, 78]
  • after = [90, 102, 80, 82, 90, 94, 84, 93, 90, 80]
In [12]:
before = [150, 132, 130, 116, 107, 100, 101, 96, 90, 78]
after  = [90, 102, 80, 82, 90, 94, 84, 93, 90, 80]

sign_test(before, after, alternative="greater")
Out[12]:
{'n': 9, 'S': 8, 'p_value': 0.01953125}

Wilcoxon Signed-Rank Test

The Wilcoxon signed-rank test is a non-parametric test for detecting a location shift in one-sample or paired-sample settings.
It is a robust alternative to the one-sample or paired t-test under non-normality.


1. Problem setup¶

One-sample case¶

Let $$ X_1, \dots, X_n \quad \text{i.i.d.} $$ and let $m_0$ be a hypothesized median.

Define differences: $$ d_i = X_i - m_0. $$

Paired two-sample case¶

Given paired observations $(X_i, Y_i)$, define $$ d_i = X_i - Y_i. $$

In both cases, inference is performed on the distribution of the differences $d_i$.


2. Hypotheses¶

  • Null hypothesis $$ H_0: \text{the distribution of } d_i \text{ is symmetric about } 0 $$

  • Alternative hypothesis $$ H_1: \text{the distribution of } d_i \text{ is not symmetric about } 0 $$ (or one-sided variants)

⚠️ Note: this is not merely a median test; symmetry is essential.


3. Assumptions¶

  • Independence of observations (or pairs)
  • Continuity (no ties in $|d_i|$)
  • Symmetry of the distribution of $d_i$ under $H_0$

4. Test statistic¶

  1. Remove zero differences $d_i = 0$
  2. Compute absolute differences $|d_i|$
  3. Rank $|d_i|$ to obtain ranks $$ R_i \in \{1,2,\dots,n\} $$
  4. Define signs $$ S_i = \operatorname{sign}(d_i) \in \{-1,+1\} $$

Signed-rank statistic (theoretical form)¶

$$ T_n = \sum_{i=1}^n R_i S_i $$

Positive rank-sum statistic (computational form)¶

$$ W^+ = \sum_{i=1}^n R_i \mathbf{1}\{d_i > 0\} $$

5. Relationship between the statistics¶

The two statistics are affinely equivalent: $$ W^+ = \frac{n(n+1)}{4} + \frac{1}{2} T_n, \quad T_n = 2W^+ - \frac{n(n+1)}{2}. $$

They induce identical tests, p-values, and decisions.


6. Exact null distribution (finite sample)¶

Under $H_0$:

  • The ranks $R_1,\dots,R_n$ are fixed
  • The signs $S_i$ are i.i.d. with $$ \mathbb{P}(S_i = +1) = \mathbb{P}(S_i = -1) = \tfrac12 $$

Thus, $$ T_n = \sum_{i=1}^n R_i S_i $$ has an exact permutation distribution.

Equivalently, $$ W^+ = \sum_{i=1}^n R_i B_i, \quad B_i \sim \text{Bernoulli}(1/2). $$

This distribution:

  • is discrete
  • depends only on $n$
  • is symmetric
  • is distribution-free

7. Support of the distribution¶

Let $$ S = \sum_{k=1}^n k = \frac{n(n+1)}{2}. $$

Then $$ W^+ \in \{0,1,\dots,S\}. $$

Each value corresponds to the sum of a subset of $\{1,\dots,n\}$.

Formally: $$

\mathbb{P}(W^+ = w)¶

\frac{#{\text{subsets of } {1,\dots,n} \text{ with sum } w}}{2^n}. $$


8. Symmetry of the distribution¶

For every subset $A \subseteq \{1,\dots,n\}$, its complement $A^c$ satisfies: $$ \sum_{k \in A^c} k = S - \sum_{k \in A} k. $$

Hence: $$ \mathbb{P}(W^+ = w) = \mathbb{P}(W^+ = S - w), $$ and $$ \mathbb{E}[W^+] = \frac{S}{2} = \frac{n(n+1)}{4}. $$


9. Mean and variance under $H_0$¶

For $W^+$: $$ \mathbb{E}[W^+] = \frac{n(n+1)}{4}, \quad \mathrm{Var}(W^+) = \frac{n(n+1)(2n+1)}{24}. $$

For $T_n$: $$ \mathbb{E}[T_n] = 0, \quad \mathrm{Var}(T_n) = \frac{n(n+1)(2n+1)}{6}. $$


10. Why the test works (core theoretical reason)¶

Under symmetry: $$ d_i \stackrel{d}{=} -d_i. $$

Therefore:

  • signs $S_i$ are independent of magnitudes $|d_i|$
  • conditional on the ranks, $S_i$ are i.i.d. Rademacher variables

Thus the statistic reduces to a randomly signed sum of fixed ranks, yielding:

  • an exact permutation distribution
  • distribution-free inference

11. Asymptotic null distribution (CLT)¶

Conditionally on the ranks: $$ T_n = \sum_{i=1}^n R_i S_i $$ is a sum of independent, mean-zero random variables.

Let $$ \sigma_n^2 = \sum_{i=1}^n R_i^2 \sim \frac{n^3}{3}. $$

Since $$ \max_i \frac{R_i^2}{\sigma_n^2} \to 0, $$ the Lindeberg condition holds.

Hence: $$ \frac{T_n}{\sqrt{\sigma_n^2}} \xrightarrow{d} N(0,1). $$

Equivalently: $$ \frac{W^+ - \mathbb{E}[W^+]}{\sqrt{\mathrm{Var}(W^+)}} \xrightarrow{d} N(0,1). $$


12. Interpretation¶

  • Tests for a location shift under symmetry
  • Uses both direction and magnitude
  • More powerful than the sign test
  • Robust to non-normality, but not to skewness

13. When the test fails¶

  • Strongly skewed distributions
  • Heavy ties
  • Discrete data
  • Dependence between observations

In these cases, the null distribution is distorted.


14. One-sentence summary (exam-perfect)¶

The Wilcoxon signed-rank test is an exact, distribution-free test for symmetry-based location shifts, whose null distribution arises from random sign permutations of fixed ranks and converges asymptotically to a normal distribution.

In [ ]:
 

EXAMPLE 1¶

Let $X_i$ denote the length (in centimeters) of a randomly selected pygmy sunfish,
for $i = 1, 2, \dots, 10$.

Suppose we obtain the following sample:

$$ 5.0,\; 3.9,\; 5.2,\; 5.5,\; 2.8,\; 6.1,\; 6.4,\; 2.6,\; 1.7,\; 4.3 $$

Can we conclude that the median length of pygmy sunfish differs significantly from
$3.7$ centimeters?

Formulate and perform an appropriate statistical test at a suitable significance level.

In [32]:
import numpy as np
from math import sqrt
from scipy.stats import wilcoxon, norm

def wilcoxon_signed_rank_one_sample(
    x,
    median_0=0,
    alpha=0.05,
    alternative="two-sided"
):
    """
    One-sample Wilcoxon signed-rank test.

    Reports:
    - W+ , W-
    - W_exact = min(W+, W-)  (classical table statistic)
    - exact p-value from scipy.stats.wilcoxon
    - asymptotic z and p-value (normal approximation)
    """

    x = np.asarray(x)
    d = x - median_0
    d = d[d != 0]
    n = len(d)

    if n == 0:
        raise ValueError("All observations equal the hypothesized median.")

    # ----- Signed ranks -----
    abs_d = np.abs(d)
    ranks = abs_d.argsort().argsort() + 1

    W_plus = np.sum(ranks[d > 0])
    W_minus = np.sum(ranks[d < 0])
    W_exact = min(W_plus, W_minus)

    # ----- Exact p-value (SciPy) -----
    stat_scipy, p_exact = wilcoxon(
        d,
        alternative=alternative,
        mode="exact" if n <= 25 else "approx"
    )

    # ----- Asymptotic approximation -----
    mu = n * (n + 1) / 4
    sigma = sqrt(n * (n + 1) * (2 * n + 1) / 24)

    z = (W_plus - mu - 0.5 * np.sign(W_plus - mu)) / sigma

    if alternative == "two-sided":
        p_asym = 2 * (1 - norm.cdf(abs(z)))
    elif alternative == "greater":
        p_asym = 1 - norm.cdf(z)
    else:
        p_asym = norm.cdf(z)

    return {
        "n": n,
        "W_plus": W_plus,
        "W_minus": W_minus,
        "W_exact": W_exact,
        "scipy_stat": stat_scipy,
        "p_value_exact": p_exact,
        "z": z,
        "p_value_asymptotic": p_asym
    }
In [33]:
# Example — One-sample Wilcoxon signed-rank test (table statistic)

# Input data
x = [5.0, 3.9, 5.2, 5.5, 2.8, 6.1, 6.4, 2.6, 1.7, 4.3]
median_0 = 3.7
alpha = 0.05

# Run the test
result = wilcoxon_signed_rank_one_sample(
    x,
    median_0=median_0,
    alpha=alpha,
    alternative="two-sided"
)

# Output
print("One-sample Wilcoxon signed-rank test")
print("-----------------------------------")
print(f"n = {result['n']}")
print(f"W+ = {result['W_plus']}")
print(f"W- = {result['W_minus']}")
print(f"W (min(W+, W-)) = {result['W_exact']}")

print("\nExact test (SciPy)")
print(f"Exact p-value = {result['p_value_exact']:.4f}")

if result["p_value_exact"] < alpha:
    print("Exact decision: REJECT H0")
else:
    print("Exact decision: DO NOT reject H0")

print("\nAsymptotic normal approximation")
print(f"z = {result['z']:.3f}")
print(f"p-value ≈ {result['p_value_asymptotic']:.4f}")

if result["p_value_asymptotic"] < alpha:
    print("Asymptotic decision: REJECT H0")
else:
    print("Asymptotic decision: DO NOT reject H0")
One-sample Wilcoxon signed-rank test
-----------------------------------
n = 10
W+ = 40
W- = 15
W (min(W+, W-)) = 15

Exact test (SciPy)
Exact p-value = 0.2324
Exact decision: DO NOT reject H0

Asymptotic normal approximation
z = 1.223
p-value ≈ 0.2213
Asymptotic decision: DO NOT reject H0

Example 2¶

The median age of the onset of diabetes is thought to be $45$ years.
The ages at onset for a random sample of $30$ people with diabetes are:

$$ \begin{aligned} &35.5,\; 44.5,\; 39.8,\; 33.3,\; 51.4,\; 51.3,\; 30.5,\; 48.9,\; 42.1,\; 40.3,\\ &46.8,\; 38.0,\; 40.1,\; 36.8,\; 39.3,\; 65.4,\; 42.6,\; 42.8,\; 59.8,\; 52.4,\\ &26.2,\; 60.9,\; 45.6,\; 27.1,\; 47.3,\; 36.6,\; 55.6,\; 45.1,\; 52.2,\; 43.5 \end{aligned} $$

Assuming that the distribution of the age at the onset of diabetes is symmetric,
is there evidence to conclude that the median age of the onset of diabetes
differs significantly from $45$ years?

Formulate and perform an appropriate statistical test at a suitable significance level.

In [34]:
    
# Input data
x = [35.5, 44.5, 39.8, 33.3, 51.4, 51.3, 30.5, 48.9, 42.1, 40.3,
    46.8, 38.0, 40.1, 36.8, 39.3, 65.4, 42.6, 42.8, 59.8, 52.4,
    26.2, 60.9, 45.6, 27.1, 47.3, 36.6, 55.6, 45.1, 52.2, 43.5]
median_0 = 45
alpha = 0.05

# Run the test
result = wilcoxon_signed_rank_one_sample(
    x,
    median_0=median_0,
    alpha=alpha,
    alternative="two-sided"
)

# Output
print("One-sample Wilcoxon signed-rank test")
print("-----------------------------------")
print(f"n = {result['n']}")
print(f"W+ = {result['W_plus']}")
print(f"W- = {result['W_minus']}")
print(f"W (min(W+, W-)) = {result['W_exact']}")

print("\nExact test (SciPy)")
print(f"Exact p-value = {result['p_value_exact']:.4f}")

if result["p_value_exact"] < alpha:
    print("Exact decision: REJECT H0")
else:
    print("Exact decision: DO NOT reject H0")

print("\nAsymptotic normal approximation")
print(f"z = {result['z']:.3f}")
print(f"p-value ≈ {result['p_value_asymptotic']:.4f}")

if result["p_value_asymptotic"] < alpha:
    print("Asymptotic decision: REJECT H0")
else:
    print("Asymptotic decision: DO NOT reject H0")    
One-sample Wilcoxon signed-rank test
-----------------------------------
n = 30
W+ = 200
W- = 265
W (min(W+, W-)) = 200

Exact test (SciPy)
Exact p-value = 0.5038
Exact decision: DO NOT reject H0

Asymptotic normal approximation
z = -0.658
p-value ≈ 0.5104
Asymptotic decision: DO NOT reject H0
In [ ]:
 

Example for paired data¶

Dental researchers have developed a new material for preventing cavities:
a plastic sealant that is applied to the chewing surfaces of teeth.

To determine whether the sealant is effective, it was applied to half of the teeth of each of 12 school-aged children. The remaining teeth for each child were left untreated. After two years, the number of cavities in the sealant-coated teeth and the uncoated teeth was recorded, resulting in the following data:

Child Coated Uncoated Diff
1 3 3 0
2 1 3 2
3 0 2 2
4 4 5 1
5 1 0 -1
6 0 1 1
7 1 5 4
8 2 0 -2
9 1 6 5
10 0 0 0
11 0 3 3
12 4 3 -1

Here, the difference is defined as

$$ \text{Diff} = (\text{Uncoated}) - (\text{Coated}). $$

Is there sufficient evidence to conclude that sealant-coated teeth are less prone to cavities than untreated teeth?

Formulate and perform an appropriate statistical test at a suitable significance level.

In [41]:
import numpy as np
from math import sqrt
from scipy.stats import wilcoxon, norm
from scipy.stats import rankdata

def wilcoxon_signed_rank_paired(
    x,
    y,
    alpha=0.05,
    alternative="two-sided"
):
    """
    Paired Wilcoxon signed-rank test.

    Reports:
    - W+ , W-
    - W_exact = min(W+, W-)  (classical table statistic)
    - exact p-value from scipy.stats.wilcoxon
    - asymptotic z and p-value (normal approximation)
    """

    x = np.asarray(x)
    y = np.asarray(y)

    if len(x) != len(y):
        raise ValueError("x and y must have the same length.")

    # Paired differences
    d = x - y
    d = d[d != 0]   # remove zero differences
    n = len(d)

    if n == 0:
        raise ValueError("All paired differences are zero.")

    # ----- Signed ranks -----
    abs_d = np.abs(d)
    ranks = rankdata(abs_d, method="average")  # correct for ties
    W_plus = np.sum(ranks[d > 0])
    W_minus = np.sum(ranks[d < 0])
    W_exact = min(W_plus, W_minus)

    # ----- Exact p-value (SciPy) -----
    stat_scipy, p_exact = wilcoxon(
        d,
        alternative=alternative,
        mode="exact" if n <= 25 else "approx"
    )

    # ----- Asymptotic normal approximation -----
    mu = n * (n + 1) / 4
    sigma = sqrt(n * (n + 1) * (2 * n + 1) / 24)

    z = (W_plus - mu - 0.5 * np.sign(W_plus - mu)) / sigma

    if alternative == "two-sided":
        p_asym = 2 * (1 - norm.cdf(abs(z)))
    elif alternative == "greater":
        p_asym = 1 - norm.cdf(z)
    elif alternative == "less":
        p_asym = norm.cdf(z)
    else:
        raise ValueError("Invalid alternative.")

    return {
        "n": n,
        "W_plus": W_plus,
        "W_minus": W_minus,
        "W_exact": W_exact,
        "scipy_stat": stat_scipy,   # this is W+
        "p_value_exact": p_exact,
        "z": z,
        "p_value_asymptotic": p_asym
    }
In [ ]:
 
In [42]:
# Paired example data (e.g., before vs after)
# Data: number of cavities
coated = [3, 1, 0, 4, 1, 0, 1, 2, 1, 0, 0, 4]
uncoated = [3, 3, 2, 5, 0, 1, 5, 0, 6, 0, 3, 3]

alpha = 0.05

# Paired Wilcoxon signed-rank test
result = wilcoxon_signed_rank_paired(
    uncoated,
    coated,
    alpha=alpha,
    alternative="greater"   # H1: median(uncoated − coated) > 0
)

print("Paired Wilcoxon signed-rank test (Dental sealant study)")
print("------------------------------------------------------")
print(f"n = {result['n']}")
print(f"W+ = {result['W_plus']}")
print(f"W- = {result['W_minus']}")
print(f"W = min(W+, W-) = {result['W_exact']}")

print("\nExact test (SciPy)")
print(f"Test statistic (W+) = {result['scipy_stat']}")
print(f"Exact p-value = {result['p_value_exact']:.4f}")
print("Decision:", "REJECT H0" if result["p_value_exact"] < alpha else "DO NOT reject H0")

print("\nAsymptotic normal approximation")
print(f"z = {result['z']:.3f}")
print(f"p-value ≈ {result['p_value_asymptotic']:.4f}")
print("Decision:", "REJECT H0" if result["p_value_asymptotic"] < alpha else "DO NOT reject H0")
Paired Wilcoxon signed-rank test (Dental sealant study)
------------------------------------------------------
n = 10
W+ = 44.0
W- = 11.0
W = min(W+, W-) = 11.0

Exact test (SciPy)
Test statistic (W+) = 44.0
Exact p-value = 0.0527
Decision: DO NOT reject H0

Asymptotic normal approximation
z = 1.631
p-value ≈ 0.0515
Decision: DO NOT reject H0
In [ ]: