In data science, we constantly face questions of the form:
Statistical hypothesis testing provides a principled framework for answering such questions under uncertainty.
Let
$$
X = (X_1, \dots, X_n)
$$
be observed data, modeled as a random sample from a distribution
$$
X_i \sim P_\theta, \quad \theta \in \Theta,
$$
where:
Examples:
A hypothesis is a statement about the parameter $\theta$.
Formally: $$ H_0: \theta \in \Theta_0, \quad H_1: \theta \in \Theta_1, $$ where $$ \Theta_0 \cap \Theta_1 = \varnothing, \quad \Theta_0 \cup \Theta_1 \subseteq \Theta. $$
Two-sided $$ H_0: \theta = \theta_0, \quad H_1: \theta \neq \theta_0 $$
One-sided $$ H_0: \theta \le \theta_0, \quad H_1: \theta > \theta_0 $$
Choice of alternative must be made before seeing the data.
A statistical test is a decision rule $$ \varphi(X) = \begin{cases} 1 & \text{reject } H_0 \\ 0 & \text{do not reject } H_0 \end{cases} $$
Equivalently, define a rejection region $\mathcal{R}$: $$ \varphi(X) = 1 \iff X \in \mathcal{R}. $$
| Decision \ Truth | $H_0$ true | $H_1$ true |
|---|---|---|
| Reject $H_0$ | Type I error | Correct |
| Do not reject $H_0$ | Correct | Type II error |
Formally: $$ \sup_{\theta \in \Theta_0} \mathbb{P}_\theta(X \in \mathcal{R}) \le \alpha $$
The power of a test is $$ \pi(\theta) = \mathbb{P}_\theta(\text{reject } H_0) = 1 - \beta(\theta), \quad \theta \in \Theta_1 $$
A test statistic is a function $$ T = T(X) $$ summarizing evidence against $H_0$.
Examples:
Decision rule: $$ \text{Reject } H_0 \iff T \in \mathcal{C} $$
Key idea:
The distribution of the test statistic under $H_0$ is known or approximable.
Let $$ F_0(t) = \mathbb{P}_{H_0}(T \le t) $$
Critical value $c_\alpha$ satisfies $$ \mathbb{P}_{H_0}(T \ge c_\alpha) = \alpha $$
The p-value is $$ p = \mathbb{P}_{H_0}(T \ge T_{\text{obs}}) $$
The one-sample test for proportions is used when we want to test a claim about a single population proportion.
Typical data science questions:
Let
$$
X_1, \dots, X_n \sim \text{Bernoulli}(p),
$$
where:
The total number of successes: $$ S = \sum_{i=1}^n X_i \sim \text{Binomial}(n, p) $$
The sample proportion: $$ \hat{p} = \frac{S}{n} $$
We test: $$ H_0: p = p_0 $$
Against one of the following alternatives:
Two-sided $$ H_1: p \neq p_0 $$
Right-tailed $$ H_1: p > p_0 $$
Left-tailed $$ H_1: p < p_0 $$
The alternative must be chosen before seeing the data.
We have: $$ \mathbb{E}[\hat{p}] = p, \qquad \mathrm{Var}(\hat{p}) = \frac{p(1-p)}{n} $$
Under $H_0$: $$ \mathbb{E}[\hat{p}] = p_0, \qquad \mathrm{Var}(\hat{p}) = \frac{p_0(1-p_0)}{n} $$
By the Central Limit Theorem: $$ \frac{\hat{p} - p}{\sqrt{p(1-p)/n}} \;\xrightarrow{d}\; \mathcal{N}(0,1) $$
Under $H_0$: $$ Z = \frac{\hat{p} - p_0} {\sqrt{p_0(1-p_0)/n}} \;\approx\; \mathcal{N}(0,1) $$
The normal approximation is valid when: $$ np_0 \ge 5 \quad \text{and} \quad n(1-p_0) \ge 5 $$
(Thresholds like 5 or 10 are common rules of thumb.)
If these conditions fail:
The one-sample z-statistic for proportions is: $$ Z = \frac{\hat{p} - p_0} {\sqrt{p_0(1-p_0)/n}} $$
This statistic measures how many standard deviations $\hat{p}$ is away from $p_0$ under $H_0$.
Let $\alpha$ be the significance level.
Reject $H_0$ if: $$ |Z| \ge z_{1-\alpha/2} $$
Reject $H_0$ if: $$ Z \ge z_{1-\alpha} $$
Reject $H_0$ if: $$ Z \le -z_{1-\alpha} $$
Two-sided $$ p\text{-value} = 2\bigl(1 - \Phi(|Z_{\text{obs}}|)\bigr) $$
Right-tailed $$ p\text{-value} = 1 - \Phi(Z_{\text{obs}}) $$
Left-tailed $$ p\text{-value} = \Phi(Z_{\text{obs}}) $$
where $\Phi$ is the CDF of $\mathcal{N}(0,1)$.
When $n$ is small or $p_0$ is extreme:
$$ S \sim \text{Binomial}(n, p_0) $$p-values are computed exactly using the binomial distribution rather than a normal approximation.
A $(1-\alpha)$ confidence interval for $p$: $$ \hat{p} \pm z_{1-\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} $$
Relationship:
$p_0$ is rejected by the two-sided test at level $\alpha$
iff $p_0$ is outside the $(1-\alpha)$ confidence interval.
import math
from scipy.stats import norm
def one_sample_proportion_test(
data=None,
p_hat=None,
n=None,
p0=0.5,
alternative="two-sided",
alpha=0.05,
success_values=("1", "success", "yes", "true")
):
"""
One-sample Z-test for proportions.
Parameters
----------
data : str, optional
Raw data as a string (e.g. "10101", "1 0 1 1 0", "success failure success").
p_hat : float, optional
Sample proportion (used if data is not provided).
n : int, optional
Sample size (required if p_hat is provided).
p0 : float
Null hypothesis proportion.
alternative : {"two-sided", "less", "greater"}
Type of alternative hypothesis.
alpha : float
Significance level.
success_values : tuple
Values interpreted as "success" in the data string.
Returns
-------
dict
Test results.
"""
# ---- Case 1: raw data is given ----
if data is not None:
tokens = data.lower().replace(",", " ").split()
n = len(tokens)
x = sum(token in success_values for token in tokens)
p_hat = x / n
# ---- Case 2: p_hat and n are given ----
elif p_hat is not None and n is not None:
x = p_hat * n
else:
raise ValueError("Provide either `data` or both `p_hat` and `n`.")
# ---- Z statistic ----
standard_error = math.sqrt(p0 * (1 - p0) / n)
z_obs = (p_hat - p0) / standard_error
# ---- p-value ----
if alternative == "two-sided":
p_value = 2 * (1 - norm.cdf(abs(z_obs)))
elif alternative == "greater":
p_value = 1 - norm.cdf(z_obs)
elif alternative == "less":
p_value = norm.cdf(z_obs)
else:
raise ValueError("alternative must be 'two-sided', 'less', or 'greater'")
# ---- Decision ----
reject = p_value < alpha
return {
"n": n,
"x": x,
"p_hat": p_hat,
"z_obs": z_obs,
"p_value": p_value,
"alpha": alpha,
"reject_H0": reject
}
result = one_sample_proportion_test(
data="1 0 1 1 0 1 1 0 1 1",
p0=0.5,
alternative="two-sided"
)
print(result)
{'n': 10, 'x': 7, 'p_hat': 0.7, 'z_obs': 1.2649110640673513, 'p_value': 0.2059032107320684, 'alpha': 0.05, 'reject_H0': False}
result = one_sample_proportion_test(
data="success failure success success failure",
p0=0.6,
alternative="greater"
)
print(result)
{'n': 5, 'x': 3, 'p_hat': 0.6, 'z_obs': 0.0, 'p_value': 0.5, 'alpha': 0.05, 'reject_H0': False}
It has been found that 85.6% of all enrolled college and university students in the United States are undergraduates. A random sample of 500 enrolled college students in a particular state revealed that 420 of them were undergraduates. Is there sufficient evidence to conclude that the proportion differs from the national percentage? Use $\alpha= 0.05$.
result = one_sample_proportion_test(
n=500,
p_hat=(420 / 500),
p0=0.856,
alternative="two-sided"
)
print(result)
{'n': 500, 'x': 420.0, 'p_hat': 0.84, 'z_obs': -1.0190297341929058, 'p_value': 0.30818885050252565, 'alpha': 0.05, 'reject_H0': False}
The one-sample test for the mean is used when we want to test a claim about the population mean based on a single sample.
Typical data science questions:
Let $$ X_1, \dots, X_n \;\text{i.i.d.}\; \sim P $$ with: $$ \mathbb{E}[X_i] = \mu, \qquad \mathrm{Var}(X_i) = \sigma^2 $$
The parameter of interest is the population mean $\mu$.
The sample mean is $$ \bar{X} = \frac{1}{n}\sum_{i=1}^n X_i $$
We test $$ H_0: \mu = \mu_0 $$
against one of the following alternatives:
Two-sided $$ H_1: \mu \neq \mu_0 $$
Right-tailed $$ H_1: \mu > \mu_0 $$
Left-tailed $$ H_1: \mu < \mu_0 $$
The alternative must be chosen before observing the data.
We have: $$ \mathbb{E}[\bar{X}] = \mu, \qquad \mathrm{Var}(\bar{X}) = \frac{\sigma^2}{n} $$
Under $H_0$: $$ \bar{X} \sim \mathcal{N}\left(\mu_0, \frac{\sigma^2}{n}\right) $$
Define: $$ Z = \frac{\bar{X} - \mu_0}{\sigma / \sqrt{n}} $$
Under $H_0$: $$ Z \sim \mathcal{N}(0,1) $$
Let $\alpha$ be the significance level.
Two-sided $$ |Z| \ge z_{1-\alpha/2} $$
Right-tailed $$ Z \ge z_{1-\alpha} $$
Left-tailed $$ Z \le -z_{1-\alpha} $$
Let $Z_{\text{obs}}$ be the observed value of the test statistic.
Two-sided $$ p\text{-value} = 2\bigl(1 - \Phi(|Z_{\text{obs}}|)\bigr) $$
Right-tailed $$ p\text{-value} = 1 - \Phi(Z_{\text{obs}}) $$
Left-tailed $$ p\text{-value} = \Phi(Z_{\text{obs}}) $$
This is the most common real-world situation.
The sample variance is: $$ S^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \bar{X})^2 $$
Define: $$ T = \frac{\bar{X} - \mu_0}{S / \sqrt{n}} $$
Under $H_0$: $$ T \sim t_{n-1} $$ (Student’s t-distribution with $n-1$ degrees of freedom)
Replacing $\sigma$ with the random variable $S$ introduces extra uncertainty.
The t-distribution has:
Let $t_{n-1,\,1-\alpha}$ denote the $(1-\alpha)$ quantile of $t_{n-1}$.
Two-sided $$ |T| \ge t_{n-1,\,1-\alpha/2} $$
Right-tailed $$ T \ge t_{n-1,\,1-\alpha} $$
Left-tailed $$ T \le -t_{n-1,\,1-\alpha} $$
Let $T_{\text{obs}}$ be the observed statistic.
Two-sided $$ p\text{-value} = 2\bigl(1 - F_{t_{n-1}}(|T_{\text{obs}}|)\bigr) $$
Right-tailed $$ p\text{-value} = 1 - F_{t_{n-1}}(T_{\text{obs}}) $$
Left-tailed $$ p\text{-value} = F_{t_{n-1}}(T_{\text{obs}}) $$
For both tests: $$ \text{Reject } H_0 \iff p\text{-value} \le \alpha $$
Relationship:
$H_0: \mu = \mu_0$ is rejected at level $\alpha$
iff $\mu_0$ is outside the $(1-\alpha)$ confidence interval.
import math
from scipy.stats import t
def one_sample_ttest(
data=None,
x_bar=None,
s=None,
n=None,
mu0=0.0,
alternative="two-sided", # "two-sided", "greater", "less"
alpha=0.05
):
"""
One-sample t-test for the population mean.
Uses both:
(1) p-value method
(2) critical region method
"""
# ---------- Parse input ----------
if data is not None:
values = [float(x) for x in data.replace(",", " ").split()]
n = len(values)
if n < 2:
raise ValueError("Sample size must be at least 2.")
x_bar = sum(values) / n
s = math.sqrt(
sum((x - x_bar) ** 2 for x in values) / (n - 1)
)
elif x_bar is not None and s is not None and n is not None:
if n < 2:
raise ValueError("Sample size must be at least 2.")
else:
raise ValueError("Provide either `data` OR (`x_bar`, `s`, `n`).")
# ---------- Test statistic ----------
se = s / math.sqrt(n)
t_obs = (x_bar - mu0) / se
df = n - 1
# ---------- p-value method ----------
if alternative == "two-sided":
p_value = 2 * (1 - t.cdf(abs(t_obs), df))
elif alternative == "greater":
p_value = 1 - t.cdf(t_obs, df)
elif alternative == "less":
p_value = t.cdf(t_obs, df)
else:
raise ValueError("alternative must be 'two-sided', 'greater', or 'less'.")
reject_by_pvalue = (p_value < alpha)
# ---------- Critical region method ----------
if alternative == "two-sided":
t_crit = t.ppf(1 - alpha / 2, df)
reject_by_critical = abs(t_obs) > t_crit
critical_region = f"|T| > {t_crit:.4f}"
elif alternative == "greater":
t_crit = t.ppf(1 - alpha, df)
reject_by_critical = t_obs > t_crit
critical_region = f"T > {t_crit:.4f}"
else: # "less"
t_crit = t.ppf(alpha, df)
reject_by_critical = t_obs < t_crit
critical_region = f"T < {t_crit:.4f}"
# ---------- Return results ----------
return {
"inputs": {
"n": n,
"x_bar": x_bar,
"s": s,
"mu0": mu0,
"alternative": alternative,
"alpha": alpha
},
"statistic": {
"t_obs": t_obs,
"df": df,
"se": se
},
"p_value_method": {
"p_value": p_value,
"reject_H0": reject_by_pvalue
},
"critical_region_method": {
"critical_region": critical_region,
"t_crit": t_crit,
"reject_H0": reject_by_critical
}
}
The weight of the world’s smallest mammal is the bumblebee bat (also known as Kitti’s hog-nosed bat or Craseonycteris thonglongyai) is approximately normally distributed with a mean 1.9 grams. Such bats are roughly the size of a large bumblebee. A chiropterologist believes that the Kitti’s hog-nosed bats in a new geographical region under study has a different average weight than 1.9 grams. A sample of 10 bats weighed in grams in the new region are shown below. Use the confidence interval method to test the claim that mean weight for all bumblebee bats is not 1.9 g using a 10% level of significance.
res = one_sample_ttest(
data="1.9 2.24 2.13 2 1.54 1.96 1.79 2.18 1.81 2.3",
mu0=1.9,
alternative="two-sided",
alpha=0.1
)
print(res)
{'inputs': {'n': 10, 'x_bar': 1.9849999999999999, 's': 0.23524219198283478, 'mu0': 1.9, 'alternative': 'two-sided', 'alpha': 0.1}, 'statistic': {'t_obs': 1.1426249638667096, 'df': 9, 'se': 0.07439011284363593}, 'p_value_method': {'p_value': 0.28267920117045664, 'reject_H0': False}, 'critical_region_method': {'critical_region': '|T| > 1.8331', 't_crit': 1.8331129326536333, 'reject_H0': False}}
The label on a particular brand of cream of mushroom soup states that (on average) there is 870 mg of sodium per serving. A nutritionist would like to test if the average is actually more than the stated value. To test this, 13 servings of this soup were randomly selected and amount of sodium measured. The sample mean was found to be 882.4 mg and the sample standard deviation was 24.3 mg. Assume that the amount of sodium per serving is normally distributed. Test this claim using the traditional method of hypothesis testing. Use the α = 0.05 level of significance.
res = one_sample_ttest(
x_bar=882.4,
s=24.3,
n=13,
mu0=870,
alternative="greater",
alpha=0.05
)
print(res)
{'inputs': {'n': 13, 'x_bar': 882.4, 's': 24.3, 'mu0': 870, 'alternative': 'greater', 'alpha': 0.05}, 'statistic': {'t_obs': 1.8398697866565177, 'df': 12, 'se': 6.7396073841365345}, 'p_value_method': {'p_value': 0.04532103678298238, 'reject_H0': True}, 'critical_region_method': {'critical_region': 'T > 1.7823', 't_crit': 1.7822875556491589, 'reject_H0': True}}
The two-sample test for proportions is used when we want to compare two population proportions based on independent samples.
Typical data science questions:
This test is a core statistical tool behind A/B testing.
Let $$ X_1, \dots, X_{n_1} \sim \text{Bernoulli}(p_1), \qquad Y_1, \dots, Y_{n_2} \sim \text{Bernoulli}(p_2), $$ where:
Define the numbers of successes: $$ S_1 = \sum_{i=1}^{n_1} X_i, \qquad S_2 = \sum_{j=1}^{n_2} Y_j $$
Sample proportions: $$ \hat{p}_1 = \frac{S_1}{n_1}, \qquad \hat{p}_2 = \frac{S_2}{n_2} $$
The quantity of interest is the difference of proportions: $$ \Delta = p_1 - p_2 $$
We test: $$ H_0: p_1 = p_2 \quad \text{(equivalently } \Delta = 0 \text{)} $$
Against one of the following alternatives:
Two-sided $$ H_1: p_1 \neq p_2 $$
Right-tailed $$ H_1: p_1 > p_2 $$
Left-tailed $$ H_1: p_1 < p_2 $$
The alternative must be chosen before observing the data.
We have: $$ \mathbb{E}[\hat{p}_1 - \hat{p}_2] = p_1 - p_2 $$ $$ \mathrm{Var}(\hat{p}_1 - \hat{p}_2) = \frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2} $$
Under $H_0: p_1 = p_2 = p$, the common proportion is estimated by the pooled estimator: $$ \hat{p} = \frac{S_1 + S_2}{n_1 + n_2} $$
This pooling reflects the assumption that both samples come from the same population under $H_0$.
By the Central Limit Theorem, under $H_0$: $$ Z = \frac{(\hat{p}_1 - \hat{p}_2)} {\sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}} \;\approx\; \mathcal{N}(0,1) $$
The normal approximation is valid when: $$ n_1\hat{p} \ge 5,\quad n_1(1-\hat{p}) \ge 5, $$ $$ n_2\hat{p} \ge 5,\quad n_2(1-\hat{p}) \ge 5 $$
If these conditions fail:
The two-sample z-statistic for proportions is: $$ Z = \frac{\hat{p}_1 - \hat{p}_2} {\sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}} $$
This measures how many standard deviations the observed difference is from zero under $H_0$.
Let $\alpha$ be the significance level.
Reject $H_0$ if: $$ |Z| \ge z_{1-\alpha/2} $$
Reject $H_0$ if: $$ Z \ge z_{1-\alpha} $$
Reject $H_0$ if: $$ Z \le -z_{1-\alpha} $$
Let $Z_{\text{obs}}$ be the observed value of the test statistic.
Two-sided $$ p\text{-value} = 2\bigl(1 - \Phi(|Z_{\text{obs}}|)\bigr) $$
Right-tailed $$ p\text{-value} = 1 - \Phi(Z_{\text{obs}}) $$
Left-tailed $$ p\text{-value} = \Phi(Z_{\text{obs}}) $$
A $(1-\alpha)$ confidence interval for $p_1 - p_2$: $$ (\hat{p}_1 - \hat{p}_2) \pm z_{1-\alpha/2} \sqrt{ \frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2} } $$
Important:
In practice, it is often complemented or replaced by:
A vice principal wants to see if there is a difference between the number of students who are late to class for the first class of the day compared to the student’s class right after lunch. To test their claim to see if there is a difference in the proportion of late students between first and after lunch classes, the vice-principal randomly selects 200 students from first class and records if they are late, then randomly selects 200 students in their class after lunch and records if they are late. At the 0.05 level of significance, can a difference be concluded? First Class After Lunch Class Sample Size 200 200 Number of late students 13 16
import math
from scipy.stats import norm
def two_sample_proportion_ztest(
data1=None,
data2=None,
p1_hat=None,
n1=None,
p2_hat=None,
n2=None,
diff0=0.0, # H0: p1 - p2 = diff0 (usually 0)
alternative="two-sided", # "two-sided", "greater", "less"
alpha=0.05,
success_values=("1", "success", "yes", "true")
):
"""
Two-sample Z-test for proportions.
Works with either:
(A) raw data strings (data1, data2)
(B) summary inputs (p1_hat, n1, p2_hat, n2)
Uses BOTH:
(1) p-value method
(2) critical region method
Note: For the classical pooled two-proportion z-test (valid when diff0 = 0),
we pool the proportions under H0. If diff0 != 0, we use the unpooled SE.
"""
# ---------- Parse input ----------
if data1 is not None and data2 is not None:
tokens1 = data1.lower().replace(",", " ").split()
tokens2 = data2.lower().replace(",", " ").split()
n1 = len(tokens1)
n2 = len(tokens2)
x1 = sum(tok in success_values for tok in tokens1)
x2 = sum(tok in success_values for tok in tokens2)
p1_hat = x1 / n1
p2_hat = x2 / n2
elif (p1_hat is not None and n1 is not None and
p2_hat is not None and n2 is not None):
x1 = p1_hat * n1
x2 = p2_hat * n2
else:
raise ValueError("Provide either (data1, data2) OR (p1_hat, n1, p2_hat, n2).")
if n1 <= 0 or n2 <= 0:
raise ValueError("n1 and n2 must be positive.")
if not (0 <= p1_hat <= 1) or not (0 <= p2_hat <= 1):
raise ValueError("p1_hat and p2_hat must be in [0,1].")
# ---------- Test statistic ----------
# If H0 is p1 - p2 = 0, use pooled SE (classical two-proportion z-test)
if diff0 == 0.0:
p_pool = (p1_hat * n1 + p2_hat * n2) / (n1 + n2)
se = math.sqrt(p_pool * (1 - p_pool) * (1 / n1 + 1 / n2))
z_obs = (p1_hat - p2_hat - diff0) / se
se_type = "pooled (H0: p1-p2=0)"
else:
# General diff0 ≠ 0: use unpooled SE (common practical approach)
se = math.sqrt(p1_hat * (1 - p1_hat) / n1 + p2_hat * (1 - p2_hat) / n2)
z_obs = (p1_hat - p2_hat - diff0) / se
se_type = "unpooled (general diff0)"
# ---------- p-value method ----------
if alternative == "two-sided":
p_value = 2 * (1 - norm.cdf(abs(z_obs)))
elif alternative == "greater":
# H1: p1 - p2 > diff0
p_value = 1 - norm.cdf(z_obs)
elif alternative == "less":
# H1: p1 - p2 < diff0
p_value = norm.cdf(z_obs)
else:
raise ValueError("alternative must be 'two-sided', 'greater', or 'less'.")
reject_by_pvalue = (p_value < alpha)
# ---------- Critical region method ----------
if alternative == "two-sided":
z_crit = norm.ppf(1 - alpha / 2)
reject_by_critical = abs(z_obs) > z_crit
critical_region = f"|Z| > {z_crit:.4f}"
elif alternative == "greater":
z_crit = norm.ppf(1 - alpha)
reject_by_critical = z_obs > z_crit
critical_region = f"Z > {z_crit:.4f}"
else: # "less"
z_crit = norm.ppf(alpha)
reject_by_critical = z_obs < z_crit
critical_region = f"Z < {z_crit:.4f}"
# ---------- Return results ----------
return {
"inputs": {
"n1": n1, "p1_hat": p1_hat, "x1": x1,
"n2": n2, "p2_hat": p2_hat, "x2": x2,
"diff0": diff0,
"alternative": alternative,
"alpha": alpha
},
"statistic": {
"z_obs": z_obs,
"se": se,
"se_type": se_type
},
"p_value_method": {
"p_value": p_value,
"reject_H0": reject_by_pvalue
},
"critical_region_method": {
"critical_region": critical_region,
"z_crit": z_crit,
"reject_H0": reject_by_critical
}
}
# ------------------ Example usage ------------------
if __name__ == "__main__":
# Example 1: raw data strings (1 = success, 0 = failure)
res1 = two_sample_proportion_ztest(
data1="1 0 1 1 0 1 1 0 1 1",
data2="1 0 0 0 1 0 0 0 1 0",
diff0=0.0,
alternative="two-sided",
alpha=0.05
)
print("Example 1 (data strings):")
print(res1, "\n")
# Example 2: summary inputs
res2 = two_sample_proportion_ztest(
p1_hat=13/200, n1=200,
p2_hat=16/200, n2=200,
diff0=0.0,
alternative="two-sided",
alpha=0.05
)
print("Example 2 (summary inputs):")
print(res2)
Example 1 (data strings):
{'inputs': {'n1': 10, 'p1_hat': 0.7, 'x1': 7, 'n2': 10, 'p2_hat': 0.3, 'x2': 3, 'diff0': 0.0, 'alternative': 'two-sided', 'alpha': 0.05}, 'statistic': {'z_obs': 1.7888543819998317, 'se': 0.22360679774997896, 'se_type': 'pooled (H0: p1-p2=0)'}, 'p_value_method': {'p_value': 0.07363827012030266, 'reject_H0': False}, 'critical_region_method': {'critical_region': '|Z| > 1.9600', 'z_crit': 1.959963984540054, 'reject_H0': False}}
Example 2 (summary inputs):
{'inputs': {'n1': 200, 'p1_hat': 0.065, 'x1': 13.0, 'n2': 200, 'p2_hat': 0.08, 'x2': 16.0, 'diff0': 0.0, 'alternative': 'two-sided', 'alpha': 0.05}, 'statistic': {'z_obs': -0.5784492956984421, 'se': 0.025931399885081405, 'se_type': 'pooled (H0: p1-p2=0)'}, 'p_value_method': {'p_value': 0.5629608205677976, 'reject_H0': False}, 'critical_region_method': {'critical_region': '|Z| > 1.9600', 'z_crit': 1.959963984540054, 'reject_H0': False}}
The general United States adult population volunteer an average of 4.2 hours per week. A random sample of 18 undergraduate college students and 20 graduate college students indicated the results below concerning the amount of time spent in volunteer service per week. At α = 0.01 level of significance, is there sufficient evidence to conclude that a difference exists between the mean number of volunteer hours per week for undergraduate and graduate college students? Assume that number of volunteer hours per week is normally distributed. UndergraduateGraduate Sample Mean 2.5 3.8 Sample Variance 2.2 3.5 Sample Size 18 20