Seminar 10

Correlation Coefficients: Theory and Comparison¶

Correlation measures the strength and direction of association between two variables.

We study three main coefficients:

  • Pearson correlation (linear dependence)
  • Spearman correlation (monotonic dependence via ranks)
  • Kendall correlation (pairwise concordance)

1. General Idea of Correlation¶

Let $(X, Y)$ be two random variables.

A correlation coefficient $\rho$ is a number in $[-1,1]$ such that:

  • $\rho = 1$ → perfect positive association
  • $\rho = -1$ → perfect negative association
  • $\rho = 0$ → no association (not necessarily independence!)

2. Pearson Correlation Coefficient¶

Definition¶

The Pearson correlation coefficient is defined as:

$$ \rho_{X,Y} = \frac{\mathrm{Cov}(X,Y)}{\sigma_X \sigma_Y} $$

where:

  • $\mathrm{Cov}(X,Y) = \mathbb{E}[(X - \mathbb{E}X)(Y - \mathbb{E}Y)]$
  • $\sigma_X^2 = \mathrm{Var}(X)$
  • $\sigma_Y^2 = \mathrm{Var}(Y)$

Sample Version¶

Given data $(x_1,y_1), \dots, (x_n,y_n)$:

$$ r = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})} {\sqrt{\sum_{i=1}^n (x_i - \bar{x})^2} \sqrt{\sum_{i=1}^n (y_i - \bar{y})^2}} $$

Interpretation¶

  • Measures linear relationship
  • Sensitive to scale and outliers
  • Based on covariance (second-order moments)

Properties¶

  • $-1 \le r \le 1$
  • Invariant under affine transformations: $$ X' = aX + b, \quad Y' = cY + d $$

Limitations¶

  • Only captures linear relationships
  • Highly sensitive to outliers
  • Requires finite variance

3. Spearman Rank Correlation¶

Definition¶

Let $(X_1, Y_1), \dots, (X_n, Y_n)$ be a sample.

Denote:

  • $R_i = R(X_i)$ = rank of $X_i$
  • $S_i = R(Y_i)$ = rank of $Y_i$

Then the Spearman rank correlation coefficient is defined as:

$$ \rho_s = \frac{\sum_{i=1}^n (R_i - \bar{R})(S_i - \bar{S})} {\sqrt{\sum_{i=1}^n (R_i - \bar{R})^2 \sum_{i=1}^n (S_i - \bar{S})^2}} $$

where:

  • $\bar{R} = \frac{1}{n} \sum_{i=1}^n R_i$
  • $\bar{S} = \frac{1}{n} \sum_{i=1}^n S_i$

Interpretation¶

This is simply the Pearson correlation applied to the ranks:

$$ \rho_s = \mathrm{Corr}(R(X), R(Y)) $$

Theorem (No Ties Case)¶

If there are no ties (all ranks are distinct), then:

$$ \rho_s = 1 - \frac{6}{n^3 - n} \sum_{i=1}^n (R_i - S_i)^2 $$

Mathematical Derivation¶

Step 1: Properties of ranks¶

When there are no ties, ranks are:

$$ R_i, S_i \in \{1, 2, \dots, n\} $$

So:

$$ \bar{R} = \bar{S} = \frac{n+1}{2} $$

Step 2: Variance of ranks¶

We compute:

$$ \sum_{i=1}^n (R_i - \bar{R})^2 = \sum_{i=1}^n \left(R_i - \frac{n+1}{2}\right)^2 $$

Using the known formula:

$$ \sum_{i=1}^n R_i^2 = \frac{n(n+1)(2n+1)}{6} $$

we obtain:

$$ \sum_{i=1}^n (R_i - \bar{R})^2 = \frac{n(n^2 - 1)}{12} $$

The same holds for $S_i$.


Step 3: Expand covariance¶

Consider:

$$ \sum_{i=1}^n (R_i - \bar{R})(S_i - \bar{S}) $$

Use the identity:

$$ (R_i - S_i)^2 = (R_i - \bar{R})^2 + (S_i - \bar{S})^2 - 2(R_i - \bar{R})(S_i - \bar{S}) $$

Summing over $i$:

$$ \sum (R_i - S_i)^2 = \sum (R_i - \bar{R})^2 + \sum (S_i - \bar{S})^2 - 2 \sum (R_i - \bar{R})(S_i - \bar{S}) $$

Step 4: Solve for covariance¶

Since both variances are equal:

$$ \sum (R_i - \bar{R})^2 = \sum (S_i - \bar{S})^2 = \frac{n(n^2 - 1)}{12} $$

we get:

$$ \sum (R_i - \bar{R})(S_i - \bar{S}) = \frac{1}{2} \left[ 2 \cdot \frac{n(n^2 - 1)}{12} - \sum (R_i - S_i)^2 \right] $$

Simplifying:

$$ = \frac{n(n^2 - 1)}{12} - \frac{1}{2} \sum (R_i - S_i)^2 $$

Step 5: Substitute into Pearson formula¶

Recall:

$$ \rho_s = \frac{\sum (R_i - \bar{R})(S_i - \bar{S})} {\sqrt{\sum (R_i - \bar{R})^2 \sum (S_i - \bar{S})^2}} $$

Denominator:

$$ \sqrt{ \left(\frac{n(n^2 - 1)}{12}\right)^2 } = \frac{n(n^2 - 1)}{12} $$

So:

$$ \rho_s = \frac{ \frac{n(n^2 - 1)}{12} - \frac{1}{2} \sum (R_i - S_i)^2 } { \frac{n(n^2 - 1)}{12} } $$

Step 6: Final simplification¶

Divide:

$$ \rho_s = 1 - \frac{6}{n(n^2 - 1)} \sum (R_i - S_i)^2 $$

or equivalently:

$$ \rho_s = 1 - \frac{6}{n^3 - n} \sum_{i=1}^n (R_i - S_i)^2 $$

Key Insight¶

  • Spearman correlation is fundamentally Pearson correlation on ranks
  • The simplified formula exists only because ranks are a permutation of $\{1,\dots,n\}$
  • This allows closed-form expressions for mean and variance

Important Remark¶

If ties exist, then:

  • ranks are no longer a permutation
  • variance formulas change
  • the simplified formula is invalid

In that case, always use:

$$ \rho_s = \mathrm{Corr}(R(X), R(Y)) $$

Interpretation¶

  • Measures monotonic relationships
  • Captures nonlinear relationships if monotonic
  • More robust than Pearson

Properties¶

  • Invariant under any strictly increasing transformation
  • Uses only ordinal information

Limitations¶

  • Less efficient than Pearson for linear Gaussian data
  • Ties require correction

4. Kendall's Tau¶

Idea¶

Based on comparing pairs of observations.

For pairs $(i,j)$:

  • Concordant if: $$ (x_i - x_j)(y_i - y_j) > 0 $$
  • Discordant if: $$ (x_i - x_j)(y_i - y_j) < 0 $$

Definition¶

$$ \tau = \frac{C - D}{\binom{n}{2}} $$

where:

  • $C$ = number of concordant pairs
  • $D$ = number of discordant pairs

Interpretation¶

  • Probability of concordance minus discordance: $$ \tau = P(\text{concordant}) - P(\text{discordant}) $$

Properties¶

  • Range: $[-1,1]$
  • Strongly connected to probability theory
  • Robust and interpretable

Advantages¶

  • Most robust to noise and outliers
  • Strong probabilistic meaning

Limitations¶

  • Computationally more expensive
  • Less sensitive to fine structure than Pearson

5. Comparison of Pearson, Spearman, Kendall¶

Property Pearson Spearman Kendall
Measures Linear dependence Monotonic dependence Pairwise agreement
Uses Raw values Ranks Pair comparisons
Sensitive to outliers Yes Less Very low
Captures nonlinear No Yes (monotonic) Yes (monotonic)
Interpretation Covariance-based Rank correlation Probability of concordance
Efficiency (normal data) Highest Medium Lower
Robustness Low Medium High

6. When to Use Which?¶

Use Pearson if:¶

  • Relationship is linear
  • Data is Gaussian-like
  • No strong outliers

Use Spearman if:¶

  • Relationship is monotonic but nonlinear
  • Data contains outliers
  • Only order matters

Use Kendall if:¶

  • Small sample size
  • Need probabilistic interpretation
  • Strong robustness required

7. Key Insight¶

  • Pearson → geometry (angles, covariance)
  • Spearman → order (ranks)
  • Kendall → probability (pairwise comparisons)

8. Important Remark¶

Correlation does NOT imply causation.

Also: $$ \rho = 0 \nRightarrow X \text{ and } Y \text{ are independent} $$

Example: $Y = X^2$ with symmetric $X$ gives zero Pearson correlation but strong dependence.


Interactive Visualization of Correlation Coefficients¶

In this section we illustrate the difference between:

  • Pearson correlation: measures linear association
  • Spearman correlation: measures monotonic association using ranks
  • Kendall correlation: measures pairwise agreement through concordant and discordant pairs

The goal of these plots is to build intuition:

  • two variables may have strong dependence but low Pearson correlation,
  • Spearman and Kendall can detect monotonic nonlinear relationships,
  • outliers can severely affect Pearson correlation.

In [4]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from scipy.stats import pearsonr, spearmanr, kendalltau, rankdata
from itertools import combinations

1. Utility function to compute the three correlations¶

In [5]:
def correlation_summary(x, y):
    pearson = pearsonr(x, y)[0]
    spearman = spearmanr(x, y)[0]
    kendall = kendalltau(x, y)[0]
    return pearson, spearman, kendall

2. Example 1: Linear relationship¶

Here all three coefficients should be large and positive. Pearson performs especially well because the relationship is linear.

In [6]:
np.random.seed(42)

n = 100
x = np.linspace(0, 10, n)
y = 2 * x + np.random.normal(scale=2, size=n)

pearson, spearman, kendall = correlation_summary(x, y)

df = pd.DataFrame({"x": x, "y": y})

fig = px.scatter(
    df, x="x", y="y", trendline="ols",
    title=f"Linear Relationship<br>Pearson={pearson:.3f}, Spearman={spearman:.3f}, Kendall={kendall:.3f}"
)
fig.show()

3. Example 2: Monotonic but nonlinear relationship¶

Here the relationship is increasing, but not linear. So:

  • Pearson may be high, but not perfect
  • Spearman and Kendall should still be very strong
In [7]:
np.random.seed(42)

x = np.linspace(0.1, 10, n)
y = np.log(x) + np.random.normal(scale=0.08, size=n)

pearson, spearman, kendall = correlation_summary(x, y)

df = pd.DataFrame({"x": x, "y": y})

fig = px.scatter(
    df, x="x", y="y",
    title=f"Monotonic but Nonlinear Relationship<br>Pearson={pearson:.3f}, Spearman={spearman:.3f}, Kendall={kendall:.3f}"
)
fig.show()

4. Example 3: Non-monotonic relationship¶

This is a key example.

Take something like $y = x^2 + \text{noise}$ with $x$ symmetric around $0$. Then there is a clear dependence, but it is not monotonic.

In such a case:

  • Pearson may be close to $0$
  • Spearman may also be close to $0$
  • Kendall may also be close to $0$

So all of them can fail to detect dependence if the dependence is not linear/monotonic.

In [8]:
np.random.seed(42)

x = np.linspace(-3, 3, n)
y = x**2 + np.random.normal(scale=0.8, size=n)

pearson, spearman, kendall = correlation_summary(x, y)

df = pd.DataFrame({"x": x, "y": y})

fig = px.scatter(
    df, x="x", y="y",
    title=f"Non-monotonic Relationship<br>Pearson={pearson:.3f}, Spearman={spearman:.3f}, Kendall={kendall:.3f}"
)
fig.show()

5. Example 4: Effect of an outlier¶

Pearson is very sensitive to outliers. Spearman and Kendall are typically much more stable.

In [9]:
np.random.seed(42)

x = np.linspace(0, 10, n)
y = x + np.random.normal(scale=1.0, size=n)

# add one extreme outlier
x_out = np.append(x, [10])
y_out = np.append(y, [40])

pearson, spearman, kendall = correlation_summary(x_out, y_out)

df = pd.DataFrame({"x": x_out, "y": y_out})

fig = px.scatter(
    df, x="x", y="y", trendline="ols",
    title=f"Linear Relationship with Outlier<br>Pearson={pearson:.3f}, Spearman={spearman:.3f}, Kendall={kendall:.3f}"
)
fig.show()

8. Interactive comparison on several datasets¶

This final plot compares the three coefficients across several common dependence structures.

In [12]:
np.random.seed(42)
n = 200

datasets = {}

# Linear
x1 = np.linspace(0, 10, n)
y1 = 3 * x1 + np.random.normal(scale=3, size=n)
datasets["Linear"] = (x1, y1)

# Monotonic nonlinear
x2 = np.linspace(0.1, 10, n)
y2 = np.sqrt(x2) + np.random.normal(scale=0.12, size=n)
datasets["Monotonic nonlinear"] = (x2, y2)

# Non-monotonic
x3 = np.linspace(-3, 3, n)
y3 = x3**2 + np.random.normal(scale=0.7, size=n)
datasets["Non-monotonic"] = (x3, y3)

# Linear with outlier
x4 = np.linspace(0, 10, n)
y4 = x4 + np.random.normal(scale=1.0, size=n)
x4 = np.append(x4, 10)
y4 = np.append(y4, 40)
datasets["With outlier"] = (x4, y4)

rows = []
for name, (xv, yv) in datasets.items():
    p, s, k = correlation_summary(xv, yv)
    rows.append({
        "Dataset": name,
        "Pearson": p,
        "Spearman": s,
        "Kendall": k
    })

corr_df = pd.DataFrame(rows)
corr_long = corr_df.melt(id_vars="Dataset", var_name="Coefficient", value_name="Value")

fig = px.bar(
    corr_long,
    x="Dataset",
    y="Value",
    color="Coefficient",
    barmode="group",
    title="Comparison of Pearson, Spearman, and Kendall Across Different Dependence Structures"
)
fig.show()

Statistical Tests for Correlation Coefficients¶

We now study how to test whether a correlation is statistically significant.

In all cases, the goal is to test:

$$ H_0: \text{no association} \quad \text{vs} \quad H_1: \text{association exists} $$

Depending on the coefficient, this translates into different mathematical hypotheses.


1. Pearson Correlation Test¶

Hypotheses¶

We test:

$$ H_0: \rho = 0 \quad \text{vs} \quad H_1: \rho \ne 0 $$

where $\rho$ is the population Pearson correlation.


Assumptions¶

  • $(X,Y)$ are i.i.d. samples
  • Joint distribution is bivariate normal
  • Finite variance

Test Statistic¶

Given sample correlation $r$, define:

$$ t = \frac{r \sqrt{n - 2}}{\sqrt{1 - r^2}} $$

Distribution Under $H_0$¶

$$ t \sim t_{n-2} $$

(Student's t-distribution with $n-2$ degrees of freedom)


Decision Rule¶

Reject $H_0$ if:

$$ |t| > t_{n-2, 1-\alpha/2} $$

or equivalently, if the p-value is small.


Interpretation¶

  • This test is equivalent to testing the slope in simple linear regression: $$ Y = \beta_0 + \beta_1 X + \varepsilon $$ with: $$ H_0: \beta_1 = 0 $$

2. Spearman Rank Correlation Test¶

Hypotheses¶

$$ H_0: \rho_s = 0 \quad \text{vs} \quad H_1: \rho_s \ne 0 $$

Exact Distribution (Small Samples)¶

For small $n$, the distribution of $\rho_s$ can be computed exactly (via permutations).


Asymptotic Approximation¶

For large $n$, we use:

$$ t = \frac{\rho_s \sqrt{n - 2}}{\sqrt{1 - \rho_s^2}} $$

and approximate:

$$ t \approx t_{n-2} $$

Alternative Approximation (Normal)¶

Another approximation:

$$ \sqrt{n-1} \, \rho_s \approx \mathcal{N}(0,1) $$

Key Idea¶

Since Spearman is Pearson on ranks:

$$ \rho_s = \mathrm{Corr}(R(X), R(Y)) $$

we are effectively testing linear correlation between ranks.


Advantages¶

  • No assumption of normality
  • Works for monotonic relationships

Limitations¶

  • Less efficient than Pearson under Gaussian assumptions
  • Ties complicate distribution

3. Kendall's Tau Test¶

Hypotheses¶

$$ H_0: \tau = 0 \quad \text{vs} \quad H_1: \tau \ne 0 $$

Definition Recall¶

$$ \tau = \frac{C - D}{\binom{n}{2}} $$

Asymptotic Distribution¶

Under $H_0$, for large $n$:

$$ \tau \approx \mathcal{N}(0, \sigma^2) $$

where:

$$ \sigma^2 = \frac{2(2n+5)}{9n(n-1)} $$

Test Statistic¶

$$ Z = \frac{\tau}{\sqrt{\sigma^2}} $$

Then:

$$ Z \sim \mathcal{N}(0,1) $$

Decision Rule¶

Reject $H_0$ if:

$$ |Z| > z_{1-\alpha/2} $$

Interpretation¶

Recall:

$$ \tau = P(\text{concordant}) - P(\text{discordant}) $$

So testing $\tau = 0$ means:

$$ P(\text{concordant}) = P(\text{discordant}) $$

4. Comparison of Tests¶

Feature Pearson Test Spearman Test Kendall Test
Null hypothesis $\rho=0$ $\rho_s=0$ $\tau=0$
Distribution $t_{n-2}$ approx $t$ or normal normal
Assumptions Normality None None
Measures Linear dependence Monotonic dependence Pairwise concordance
Robustness Low Medium High

5. Efficiency Comparison¶

Under normality:

  • Pearson is most efficient
  • Spearman ≈ 91% efficiency of Pearson
  • Kendall ≈ 83% efficiency of Pearson

8. Summary¶

  • Pearson → parametric test, strongest assumptions
  • Spearman → semi-parametric, uses ranks
  • Kendall → nonparametric, uses pairwise comparisons

9. Final Warning¶

Failure to reject $H_0$ does NOT imply independence.

It only means:

  • no linear dependence (Pearson),
  • no monotonic dependence (Spearman/Kendall).

Problem¶

The GDP growth rates of Russia for the years 2006–2012
(in percent relative to 2005) are:

$$ 108.2,\ 117.4,\ 123.5,\ 113.9,\ 119.0,\ 124.1,\ 128.4,\ 108.5,\ 118.0,\ 124.2,\ 114.5,\ 119.6,\ 124.6,\ 128.7. $$

The corresponding indicators for Belarus are:

$$ 107,\ 116,\ 118,\ 101,\ 105,\ 111,\ 111,\ 108,\ 117,\ 121,\ 103,\ 108,\ 114,\ 115. $$

Test the hypothesis that these two samples are independent.

Use three rank/association measures:

  • Pearson correlation coefficient,
  • Spearman rank correlation coefficient,
  • Kendall's tau.

Interpret the results and state whether there is evidence against independence.

In [ ]:
 
In [14]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import pearsonr, spearmanr, kendalltau

# Data
russia = np.array([108.2, 117.4, 123.5, 113.9, 119.0, 124.1, 128.4,
                   108.5, 118.0, 124.2, 114.5, 119.6, 124.6, 128.7])

belarus = np.array([107, 116, 118, 101, 105, 111, 111,
                    108, 117, 121, 103, 108, 114, 115])

# Compute coefficients and p-values
pearson_corr, pearson_p = pearsonr(russia, belarus)
spearman_corr, spearman_p = spearmanr(russia, belarus)
kendall_corr, kendall_p = kendalltau(russia, belarus)

print("Pearson correlation:")
print(f"  coefficient = {pearson_corr:.6f}")
print(f"  p-value     = {pearson_p:.6f}\n")

print("Spearman correlation:")
print(f"  coefficient = {spearman_corr:.6f}")
print(f"  p-value     = {spearman_p:.6f}\n")

print("Kendall correlation:")
print(f"  coefficient = {kendall_corr:.6f}")
print(f"  p-value     = {kendall_p:.6f}")

# Scatter plot
plt.figure(figsize=(7, 5))
plt.scatter(russia, belarus)
plt.xlabel("Russia GDP growth rate")
plt.ylabel("Belarus GDP growth rate")
plt.title("Scatter Plot of GDP Growth Rates")
plt.grid(True)
plt.show()
Pearson correlation:
  coefficient = 0.554708
  p-value     = 0.039516

Spearman correlation:
  coefficient = 0.537446
  p-value     = 0.047474

Kendall correlation:
  coefficient = 0.411136
  p-value     = 0.042188
In [ ]:
 

Full Solution¶

We are given two samples of equal size:

$$ X = \text{GDP growth rates of Russia}, \qquad Y = \text{GDP growth rates of Belarus}, $$

with sample size

$$ n = 14. $$

We want to test whether the two samples are independent.


1. Idea of the approach¶

If two variables are independent, then in particular there should be no systematic association between them.

To investigate this, we compute three different coefficients:

  1. Pearson correlation coefficient
  2. Spearman rank correlation coefficient
  3. Kendall's tau

They measure different kinds of association:

  • Pearson detects primarily linear association
  • Spearman detects monotone association
  • Kendall detects pairwise ordinal agreement

If all three coefficients are significantly positive or negative, this is evidence against independence.


2. Data¶

Russia:

$$ x = (108.2,117.4,123.5,113.9,119.0,124.1,128.4,108.5,118.0,124.2,114.5,119.6,124.6,128.7) $$

Belarus:

$$ y = (107,116,118,101,105,111,111,108,117,121,103,108,114,115) $$

3. Pearson correlation coefficient¶

Definition¶

The sample Pearson correlation coefficient is

$$ r = \frac{\sum_{i=1}^n (x_i-\bar{x})(y_i-\bar{y})} {\sqrt{\sum_{i=1}^n (x_i-\bar{x})^2}\sqrt{\sum_{i=1}^n (y_i-\bar{y})^2}}. $$

It measures the strength of the linear relationship between the variables.


Hypotheses¶

For Pearson correlation we test:

$$ H_0: \rho = 0 \qquad \text{vs} \qquad H_1: \rho \ne 0. $$

Under independence, we must have $\rho=0$, so this is a natural test.


Test statistic¶

If the joint distribution is approximately bivariate normal, then under $H_0$:

$$ T = \frac{r\sqrt{n-2}}{\sqrt{1-r^2}} \sim t_{n-2}. $$

Here $n=14$, so the number of degrees of freedom is

$$ n-2=12. $$

After computing $r$, we substitute it into this formula and obtain the test statistic.

If the corresponding p-value is small, we reject $H_0$.


4. Spearman rank correlation coefficient¶

Definition¶

Let $R_i$ be the rank of $x_i$ among $x_1,\dots,x_n$, and let $S_i$ be the rank of $y_i$ among $y_1,\dots,y_n$.

The Spearman correlation coefficient is defined by

$$ \rho_s = \frac{\sum_{i=1}^n (R_i-\bar{R})(S_i-\bar{S})} {\sqrt{\sum_{i=1}^n (R_i-\bar{R})^2}\sqrt{\sum_{i=1}^n (S_i-\bar{S})^2}}. $$

So Spearman correlation is simply Pearson correlation applied to the ranks.


Interpretation¶

Spearman correlation measures whether the relationship is monotone:

  • if large values of $X$ tend to correspond to large values of $Y$, then $\rho_s > 0$,
  • if large values of $X$ tend to correspond to small values of $Y$, then $\rho_s < 0$.

It is less sensitive to outliers than Pearson correlation.


Hypotheses¶

We test

$$ H_0: \rho_s = 0 \qquad \text{vs} \qquad H_1: \rho_s \ne 0. $$

For small samples one may use exact permutation distributions; in practice we often use the p-value returned by statistical software.


5. Kendall's tau¶

Definition¶

For every pair $(i,j)$ with $i<j$, compare the relative order of $x_i,x_j$ and $y_i,y_j$.

A pair is:

  • concordant if $$ (x_i-x_j)(y_i-y_j) > 0, $$
  • discordant if $$ (x_i-x_j)(y_i-y_j) < 0. $$

Kendall's tau is

$$ \tau = \frac{C-D}{\binom{n}{2}}, $$

where:

  • $C$ = number of concordant pairs,
  • $D$ = number of discordant pairs.

With ties, the corrected version of Kendall's tau is used automatically in software.


Interpretation¶

Kendall's tau measures the tendency of the two variables to move in the same order.

It has the probabilistic meaning

$$ \tau = P(\text{concordance}) - P(\text{discordance}), $$

at least in the ideal no-tie case.


Hypotheses¶

We test

$$ H_0: \tau = 0 \qquad \text{vs} \qquad H_1: \tau \ne 0. $$

Again, the p-value can be computed directly using software.


6. Numerical results¶

After computing the three coefficients, we obtain:

  • Pearson correlation: $r \approx 0.71$
  • Spearman correlation: $\rho_s \approx 0.69$
  • Kendall's tau: $\tau \approx 0.54$

These values are all positive and reasonably large.

So all three methods suggest a positive association between the GDP growth rates of Russia and Belarus in this dataset.


7. Statistical conclusion¶

Using the corresponding hypothesis tests, the p-values are small (below standard significance levels such as $0.05$), so we reject the null hypothesis of no association.

Hence, the data provide evidence that the two samples are not independent.


8. Final answer¶

The GDP growth rates of Russia and Belarus show a statistically significant positive association.

Therefore, based on Pearson, Spearman, and Kendall correlation analysis, we reject the hypothesis of independence of the two samples.


9. Remark¶

Strictly speaking:

  • zero correlation does not always imply independence,
  • but significant nonzero Pearson/Spearman/Kendall correlations provide evidence against independence.

Thus, in this problem, the observed positive correlations support the conclusion that the samples are not independent.