A categorical (qualitative) variable takes values in a finite set of categories (labels), e.g.
Data for categorical variables are summarized by counts (frequencies).
Suppose a variable has $k$ categories. We observe counts $$ O_1, O_2, \dots, O_k, $$ with total sample size $$ n = \sum_{i=1}^k O_i. $$
A hypothesis test compares:
Given a test statistic $T$:
Decision:
Chi-square tests are built from the idea: compare observed counts to expected counts under $H_0$.
Let $E_i$ be the expected count in category $i$ under $H_0$.
A natural measure of discrepancy is $$ \sum_{i=1}^k (O_i - E_i)^2, $$ but this depends on the scale of $E_i$. So we standardize by dividing by $E_i$: $$ \chi^2 = \sum_{i=1}^k \frac{(O_i - E_i)^2}{E_i}. $$
Under mild conditions (large enough expected counts), and assuming $H_0$ is true, $$ \chi^2 \ \approx\ \chi^2(\text{df}), $$ a chi-square distribution with appropriate degrees of freedom.
Why “approx”? Because the chi-square distribution is an asymptotic (large-sample) result based on:
If $Z_1,\dots,Z_\nu$ are independent standard normals, $Z_j \sim \mathcal{N}(0,1)$, then $$ Q = \sum_{j=1}^{\nu} Z_j^2 $$ follows a chi-square distribution with $\nu$ degrees of freedom: $$ Q \sim \chi^2(\nu). $$
Properties:
Test whether one categorical variable follows a specified distribution.
Suppose there are $k$ categories. Under $H_0$ we assume probabilities $$ p_1, p_2, \dots, p_k,\quad p_i \ge 0,\quad \sum_{i=1}^k p_i = 1. $$
If we observe $n$ independent outcomes, the count vector $(O_1,\dots,O_k)$ follows a multinomial distribution under $H_0$: $$ (O_1,\dots,O_k) \sim \mathrm{Multinomial}\left(n; p_1,\dots,p_k\right). $$
Under $H_0$, expected counts are: $$ E_i = n p_i,\quad i=1,\dots,k. $$
If $p_1,\dots,p_k$ are fully specified (no parameters estimated from the data), then $$ \text{df} = k - 1. $$
If the model contains $m$ unknown parameters estimated from the data, then: $$ \text{df} = k - 1 - m. $$
Explanation of $k-1$: counts sum to $n$, so only $k-1$ counts are free.
Estimating parameters uses up additional constraints, reducing df further.
Compute $\chi^2_{\text{obs}}$ from the sample. Under $H_0$: $$ \chi^2_{\text{obs}} \approx \chi^2(\text{df}). $$
Reject $H_0$ if p-value $< \alpha$.
$k=6$ categories, $p_i = 1/6$, df $= 5$.
Test whether two categorical variables are independent.
Example:
Let variable $A$ have $r$ categories and variable $B$ have $c$ categories.
Observed counts $O_{ij}$ arranged in an $r \times c$ table.
Row sums: $$ O_{i\cdot} = \sum_{j=1}^c O_{ij} $$ Column sums: $$ O_{\cdot j} = \sum_{i=1}^r O_{ij} $$ Total: $$ n = \sum_{i=1}^r \sum_{j=1}^c O_{ij}. $$
Null hypothesis (no association / independence): $$ H_0: A \text{ and } B \text{ are independent.} $$ Formally, for all $i,j$: $$ P(A=i, B=j) = P(A=i)P(B=j). $$
Alternative hypothesis: $$ H_1: A \text{ and } B \text{ are not independent (associated).} $$
Under independence, $$ P(A=i, B=j) = P(A=i)P(B=j). $$ Estimate $P(A=i)$ and $P(B=j)$ by sample proportions: $$ \widehat{P}(A=i) = \frac{O_{i\cdot}}{n}, \quad \widehat{P}(B=j) = \frac{O_{\cdot j}}{n}. $$
Thus the expected count in cell $(i,j)$ is: $$ E_{ij} = n \cdot \widehat{P}(A=i)\widehat{P}(B=j) = n \cdot \frac{O_{i\cdot}}{n}\cdot \frac{O_{\cdot j}}{n} = \frac{O_{i\cdot} O_{\cdot j}}{n}. $$
Why?
An $r\times c$ table has $rc$ cells, but:
Hence free cells: $$ rc - (r+c-1) = (r-1)(c-1). $$
Under $H_0$: $$ \chi^2_{\text{obs}} \approx \chi^2((r-1)(c-1)). $$ p-value: $$ \text{p-value} = P\left(\chi^2(\text{df}) \ge \chi^2_{\text{obs}}\right). $$
Reject $H_0$ if p-value $< \alpha$.
Each observation (person, trial, unit) should contribute to exactly one cell and be independent of others.
Common rule of thumb:
More nuanced guideline:
If violated:
Very similar to independence, but framing differs:
(If needed: analyze standardized residuals to see which cells contribute most.)
An instructor claims that the grade distribution of their students is different from the department’s grade distribution.
The department-wide grade distribution for introductory statistics courses is:
A random sample of 250 introductory statistics students taught by this instructor produced the following grades:
Using a 5% level of significance, test the instructor’s claim that their students’ grade distribution differs from the department’s distribution.
import math
from scipy.stats import chi2
def chi_square_gof_test(
observed,
expected=None,
probs=None,
n=None,
ddof=0, # extra parameters estimated from data (e.g., ddof=1 if you estimated 1 parameter)
alpha=0.05
):
"""
Chi-square Goodness-of-Fit (GoF) test.
H0: The data follow the specified categorical distribution.
Test statistic:
X^2 = sum_i (O_i - E_i)^2 / E_i
Degrees of freedom:
df = k - 1 - ddof
where k = number of categories, ddof = number of parameters estimated from the data.
Inputs
------
observed : list/tuple of nonnegative counts O_i
expected : list/tuple of expected counts E_i (same length as observed), optional
probs : list/tuple of category probabilities p_i (same length as observed), optional
n : total sample size (required if probs is provided and observed does not already sum to n)
ddof : int, number of fitted parameters (reduces df)
alpha : significance level
Provide either:
- expected, OR
- probs (then expected counts are computed as E_i = n * p_i)
Returns
-------
dict with statistic, df, p-value, and critical region decision.
"""
# ---------- Validate observed ----------
if observed is None:
raise ValueError("`observed` must be provided.")
if len(observed) < 2:
raise ValueError("Need at least 2 categories.")
if any(o < 0 for o in observed):
raise ValueError("Observed counts must be nonnegative.")
k = len(observed)
obs_sum = sum(observed)
# ---------- Build expected counts ----------
if expected is not None and probs is not None:
raise ValueError("Provide only one of `expected` or `probs`, not both.")
if expected is not None:
if len(expected) != k:
raise ValueError("`expected` must have the same length as `observed`.")
if any(e <= 0 for e in expected):
raise ValueError("All expected counts must be > 0.")
exp = list(expected)
elif probs is not None:
if len(probs) != k:
raise ValueError("`probs` must have the same length as `observed`.")
if any(p < 0 for p in probs):
raise ValueError("Probabilities must be nonnegative.")
p_sum = sum(probs)
if p_sum <= 0:
raise ValueError("Sum of probabilities must be > 0.")
# normalize just in case
probs = [p / p_sum for p in probs]
if n is None:
n = obs_sum
if n <= 0:
raise ValueError("`n` must be positive.")
exp = [n * p for p in probs]
else:
raise ValueError("Provide either `expected` or `probs`.")
# ---------- (Optional) sanity check: totals ----------
# In GoF, typically sum(expected) == sum(observed) == n
# If expected provided directly, we won't force equality, but we can warn via return field.
exp_sum = sum(exp)
totals_match = math.isclose(exp_sum, obs_sum, rel_tol=1e-9, abs_tol=1e-9)
# ---------- Compute chi-square statistic ----------
chi2_obs = 0.0
for o, e in zip(observed, exp):
if e <= 0:
raise ValueError("All expected counts must be > 0.")
chi2_obs += (o - e) ** 2 / e
# ---------- Degrees of freedom ----------
df = k - 1 - ddof
if df <= 0:
raise ValueError("Degrees of freedom must be positive. Check k and ddof.")
# ---------- p-value method ----------
p_value = 1 - chi2.cdf(chi2_obs, df)
reject_by_pvalue = p_value < alpha
# ---------- Critical region method ----------
chi2_crit = chi2.ppf(1 - alpha, df)
reject_by_critical = chi2_obs > chi2_crit
critical_region = f"X^2 > {chi2_crit:.4f}"
return {
"inputs": {
"observed": list(observed),
"expected": exp,
"alpha": alpha,
"ddof": ddof
},
"sanity_checks": {
"sum_observed": obs_sum,
"sum_expected": exp_sum,
"totals_match": totals_match
},
"statistic": {
"chi2_obs": chi2_obs,
"df": df
},
"p_value_method": {
"p_value": p_value,
"reject_H0": reject_by_pvalue
},
"critical_region_method": {
"critical_region": critical_region,
"chi2_crit": chi2_crit,
"reject_H0": reject_by_critical
}
}
# Observed counts
observed = [80, 50, 58, 38, 24]
# Hypothesized probabilities
probs = [0.35, 0.23, 0.25, 0.1, 0.07]
# Run Chi-square GoF test
result_probs = chi_square_gof_test(
observed=observed,
probs=probs,
alpha=0.05
)
result_probs
{'inputs': {'observed': [80, 50, 58, 38, 24],
'expected': [87.5, 57.5, 62.5, 25.0, 17.5],
'alpha': 0.05,
'ddof': 0},
'sanity_checks': {'sum_observed': 250,
'sum_expected': 250.0,
'totals_match': True},
'statistic': {'chi2_obs': 11.119403726708075, 'df': 4},
'p_value_method': {'p_value': 0.025254353833125798, 'reject_H0': True},
'critical_region_method': {'critical_region': 'X^2 > 9.4877',
'chi2_crit': 9.487729036781154,
'reject_H0': True}}
A research company is investigating whether the proportion of consumers who purchase a cereal is different depending on shelf placement.
They consider four shelf locations:
Test whether there is a preference among the four shelf placements. Use the p-value method with significance level
[
\alpha = 0.05.
]
The observed counts are:
| Shelf Placement | Bottom | Middle | Top | End |
|---|---|---|---|---|
| Observed | 45 | 67 | 55 | 73 |
# Observed counts
observed = [45, 67, 55, 73]
# Hypothesized probabilities
probs = [0.25, 0.25, 0.25, 0.25]
# Run Chi-square GoF test
result_probs = chi_square_gof_test(
observed=observed,
probs=probs,
alpha=0.05
)
result_probs
{'inputs': {'observed': [45, 67, 55, 73],
'expected': [60.0, 60.0, 60.0, 60.0],
'alpha': 0.05,
'ddof': 0},
'sanity_checks': {'sum_observed': 240,
'sum_expected': 240.0,
'totals_match': True},
'statistic': {'chi2_obs': 7.800000000000001, 'df': 3},
'p_value_method': {'p_value': 0.050331097859853346, 'reject_H0': False},
'critical_region_method': {'critical_region': 'X^2 > 7.8147',
'chi2_crit': 7.814727903251179,
'reject_H0': False}}
Is there a relationship between autism spectrum disorder (ASD) and breastfeeding?
To investigate this question, a researcher asked mothers of ASD and non-ASD children to report the length of time they breastfed their children.
Does the data provide enough evidence to conclude that breastfeeding and ASD are independent?
Conduct the test at the 1% significance level.
The observed data are summarized in the contingency table below.
| ASD | None | Less than 2 months | 2 to 6 months | Over 6 months | Total |
|---|---|---|---|---|---|
| Yes | 241 | 198 | 164 | 215 | 818 |
| No | 20 | 25 | 27 | 44 | 116 |
| Total | 261 | 223 | 191 | 259 | 934 |
(Source: Schultz, Klonoff-Cohen, Wingard, Askhoomoff, Macera, Ji & Bacher, 2006.)
import math
from scipy.stats import chi2
def chi_square_independence_test(
table,
alpha=0.05
):
"""
Chi-square Test of Independence (No Association).
H0: The two categorical variables are independent.
H1: The two categorical variables are associated.
Input
-----
table : 2D list or array
Contingency table of observed counts.
Shape: (r rows) x (c columns)
alpha : significance level
Test statistic:
X^2 = sum_{i,j} (O_ij - E_ij)^2 / E_ij
Expected counts:
E_ij = (row_i_total * column_j_total) / grand_total
Degrees of freedom:
df = (r - 1)(c - 1)
Uses BOTH:
(1) p-value method
(2) critical region method
"""
# ---------- Validate table ----------
if table is None or len(table) < 2:
raise ValueError("Table must have at least 2 rows.")
r = len(table)
c = len(table[0])
if c < 2:
raise ValueError("Table must have at least 2 columns.")
for row in table:
if len(row) != c:
raise ValueError("All rows must have the same number of columns.")
if any(x < 0 for x in row):
raise ValueError("Counts must be nonnegative.")
# ---------- Totals ----------
row_totals = [sum(row) for row in table]
col_totals = [sum(table[i][j] for i in range(r)) for j in range(c)]
grand_total = sum(row_totals)
if grand_total == 0:
raise ValueError("Grand total must be positive.")
# ---------- Expected counts ----------
expected = [
[(row_totals[i] * col_totals[j]) / grand_total for j in range(c)]
for i in range(r)
]
# ---------- Chi-square statistic ----------
chi2_obs = 0.0
for i in range(r):
for j in range(c):
if expected[i][j] == 0:
raise ValueError("Expected count is zero — cannot compute χ².")
chi2_obs += (table[i][j] - expected[i][j]) ** 2 / expected[i][j]
# ---------- Degrees of freedom ----------
df = (r - 1) * (c - 1)
# ---------- p-value method ----------
p_value = 1 - chi2.cdf(chi2_obs, df)
reject_by_pvalue = p_value < alpha
# ---------- Critical region method ----------
chi2_crit = chi2.ppf(1 - alpha, df)
reject_by_critical = chi2_obs > chi2_crit
critical_region = f"X^2 > {chi2_crit:.4f}"
# ---------- Return results ----------
return {
"inputs": {
"observed_table": table,
"alpha": alpha
},
"expected_counts": expected,
"statistic": {
"chi2_obs": chi2_obs,
"df": df
},
"p_value_method": {
"p_value": p_value,
"reject_H0": reject_by_pvalue
},
"critical_region_method": {
"critical_region": critical_region,
"chi2_crit": chi2_crit,
"reject_H0": reject_by_critical
}
}
Problem: Chi-Square Test of Independence (ASD and Breastfeeding)
table_2x4 = [
[241, 198, 164, 215],
[20, 25, 27, 44]
]
result_2x4 = chi_square_independence_test(
table=table_2x4,
alpha=0.01
)
result_2x4
# see: https://jcbuitrago.com/wp-content/uploads/2026/02/Screenshot-2026-02-01-at-13.25.44.png
{'inputs': {'observed_table': [[241, 198, 164, 215], [20, 25, 27, 44]],
'alpha': 0.01},
'expected_counts': [[228.5845824411135,
195.30406852248393,
167.27837259100642,
226.83297644539616],
[32.41541755888651,
27.69593147751606,
23.721627408993577,
32.16702355460385]],
'statistic': {'chi2_obs': 11.216688008237018, 'df': 3},
'p_value_method': {'p_value': 0.01061005135825377, 'reject_H0': False},
'critical_region_method': {'critical_region': 'X^2 > 11.3449',
'chi2_crit': 11.344866730144373,
'reject_H0': False}}
The sample data below show the number of companies providing dental insurance for small, medium, and large companies.
Test whether there is a relationship between dental insurance coverage and company size. Use $\alpha = 0.05$.
The observed data are:
| Dental Insurance | Small | Medium | Large |
|---|---|---|---|
| Yes | 21 | 25 | 19 |
| No | 46 | 39 | 10 |
# 2×3 contingency table (Observed counts)
# Example: Treatment (Yes/No) vs Outcome (Success/Failure)
table_2x3 = [
[21, 25, 19],
[46, 39, 10]
]
result_2x3 = chi_square_independence_test(
table=table_2x3,
alpha=0.05
)
result_2x3
{'inputs': {'observed_table': [[21, 25, 19], [46, 39, 10]], 'alpha': 0.05},
'expected_counts': [[27.21875, 26.0, 11.78125], [39.78125, 38.0, 17.21875]],
'statistic': {'chi2_obs': 9.907263903850843, 'df': 2},
'p_value_method': {'p_value': 0.007057728990733203, 'reject_H0': True},
'critical_region_method': {'critical_region': 'X^2 > 5.9915',
'chi2_crit': 5.991464547107979,
'reject_H0': True}}