IB Maths AI SL Topic 4 — Hypothesis Testing Paper 2 Pooled 2-sample ~8 min read

The t-test

The t-test compares the means of two independent samples to decide whether two populations have the same mean. In AI SL you’ll always use the pooled two-sample t-test (equal-variance assumption). The GDC computes the test statistic and p-value; you handle the setup (one-tailed vs two-tailed) and the conclusion in context.

📘 What you need to know

Compares two population means: μ_x and μ_y. You need a sample of values from each.
H₀: μ_x = μ_y (no difference).
H₁: depends on the wording — μ_x < μ_y, μ_x > μ_y, or μ_x ≠ μ_y.
Assumptions: each population is normally distributed; variances are equal (use the pooled option on the GDC).
GDC output: a t statistic and a p-value. You only need the p-value to decide.
Decision: if p < α ⇒ reject H₀ (means differ). Otherwise accept H₀.

One-tailed vs two-tailed — the wording test

The form of H₁ comes straight from the question:

• One-tailed (μ_x > μ_y): words like “higher / faster / more / bigger / better“. You suspect a specific direction.
• One-tailed (μ_x < μ_y): words like “lower / slower / less / smaller / reduces“.
• Two-tailed (μ_x ≠ μ_y): words like “different / changed / not the same“.

The visual below shows the conceptual question: is the gap between the two sample means big enough, relative to the spread, to be unlikely under H₀?

Both panels compare two samples (teal X, orange Y) by their distributions. Left: bells overlap heavily — the small gap is consistent with noise, so the p-value is large and we accept H₀. Right: bells barely overlap — the gap is too big for chance, the p-value is tiny, and we reject H₀.

The pooled t-test framework H₀: μ_x = μ_y · H₁: μ_x < μ_y, μ_x > μ_y, or μ_x ≠ μ_y

decision: reject H₀ ⇔ p < α

🧭 Recipe — carry out a pooled 2-sample t-test

Define μ_x and μ_y clearly (e.g. “let μ_N = mean caffeine of brand N”).
Write H₀ and H₁: pick <, >, or ≠ based on the wording.
Enter the two samples as separate lists on the GDC. Run “2-Sample t-Test”, select pooled, and the correct tail.
Read off the p-value from the GDC output.
Compare and conclude in context: reject H₀ if p < α, in the question’s wording.

Always pick “pooled”: the IB exam assumes equal variances. If you pick the non-pooled (Welch) option by mistake, the p-value will be slightly off — an easy lost mark.

Worked examples

WE 1

Set up a one-tailed t-test

A coach claims that athletes using a new warm-up routine have lower resting heart rates than those using the old routine.

Write the null and alternative hypotheses.

Define population means μ_N = mean heart rate for new routine μ_O = mean heart rate for old routine “Lower” → one-tailed test (less than) H₀: μ_N = μ_O H₁: μ_N < μ_O H₀: μ_N = μ_O · H₁: μ_N < μ_O (one-tailed) always DEFINE the population means in context. Without a definition, “μ_N” is meaningless on the page.

WE 2

Set up a two-tailed t-test

A school has two campuses, A and B. The principal wants to investigate whether mean exam scores differ between the two campuses.

Write the null and alternative hypotheses.

Define population means μ_A = mean exam score on campus A μ_B = mean exam score on campus B “Differ” → two-tailed H₀: μ_A = μ_B H₁: μ_A ≠ μ_B two-tailed test “differ” / “different” / “not the same” → ALWAYS two-tailed. Don’t pick a direction unless the question gives one.

WE 3

Full one-tailed test — reject H₀

The caffeine content (mg) in 8 cans of each of two energy-drink brands is recorded:

Brand A: 80, 82, 78, 85, 81, 79, 83, 80
Brand B: 75, 78, 72, 80, 76, 74, 77, 75

It is claimed that Brand A has a higher mean caffeine content than Brand B. Perform a pooled t-test at the 5% significance level.

Define means μ_A = mean caffeine, Brand A μ_B = mean caffeine, Brand B Hypotheses (one-tailed, “higher”) H₀: μ_A = μ_B H₁: μ_A > μ_B GDC: 2-sample pooled t-test (one-tail) t = 4.32, p ≈ 0.000354 Compare with α p = 0.000354 < 0.05 → reject H₀ sufficient evidence Brand A has higher mean caffeine a very small p-value (well under 0.05) means the data is hugely unlikely under H₀ — strong evidence in favour of the claim.

WE 4

Full two-tailed test — accept H₀

Eight students are taught using Method 1 and eight using Method 2. Their exam scores (out of 100):

Method 1: 72, 78, 65, 80, 75, 70, 73, 68
Method 2: 70, 75, 68, 77, 72, 71, 74, 69

Test whether the methods produce different mean scores at the 5% level.

Define means μ₁ = mean score, Method 1 μ₂ = mean score, Method 2 Hypotheses (two-tailed, “different”) H₀: μ₁ = μ₂ H₁: μ₁ ≠ μ₂ GDC: 2-sample pooled t-test (two-tail) t = 0.300, p ≈ 0.769 Compare p = 0.769 > 0.05 → accept H₀ insufficient evidence that the methods produce different mean scores a large p-value means the small difference (72.6 vs 72) is easily explained by random sampling — no real evidence the methods differ.

WE 5

Write a conclusion in context

A pooled one-tailed t-test is performed at the 5% significance level to test whether a new medication reduces mean blood pressure compared to a placebo. The GDC gives p = 0.034.

State the conclusion in context.

Compare p with α p = 0.034 < α = 0.05 → reject H₀ Context: “reduces” → H₁ was μ_medication < μ_placebo sufficient evidence the medication reduces mean blood pressure never write just “reject H₀”. Use the question’s wording: “reduces mean blood pressure” beats “μ_x < μ_y” every time.

WE 6

Identify the type of test from the wording

For each scenario, decide whether a one-tailed or two-tailed test is appropriate and state the form of H₁.

(i) Researchers want to know whether a new study app makes students faster at completing tests.
(ii) A factory manager wants to test whether two production lines produce different mean weights of biscuits.
(iii) A botanist claims that Plant A grows to a higher mean height than Plant B.

(i) “faster” → smaller times one-tailed → H₁: μ_app < μ_no-app (ii) “different” → either direction two-tailed → H₁: μ₁ ≠ μ₂ (iii) “higher” → bigger height one-tailed → H₁: μ_A > μ_B one-tailed (<) · two-tailed (≠) · one-tailed (>) always check direction words BEFORE running the GDC — selecting the wrong tail in the calculator gives a wrong p-value.

💡 Top tips

Always select “pooled” on the GDC — the IB exam assumes equal variances.
Select the correct tail: most GDCs let you choose <, >, or ≠. Picking the wrong one breaks the p-value.
Define μ_x, μ_y in context before writing hypotheses. Examiners want the variables named.
Conclusion = decision + context: “sufficient evidence Brand A has higher mean caffeine” beats “reject H₀”.
If “faster” / “reduces” ⇒ the direction is <; remember faster = smaller time, reduces = smaller value.

⚠ Common mistakes

Wrong tail: “faster” → one-tailed <, not >. Faster means shorter time.
Using non-pooled t-test (Welch) instead of pooled. Always pool unless told otherwise.
One-tailed when “differ”: “differ” is two-tailed. Don’t sneak in a direction.
Definitive conclusions: never say “H₀ is true” or “the means are definitely equal”. Tests give evidence, not proof.
Confusing the test statistic with the p-value: the decision uses p, not t.

This is the final note in the AI SL Hypothesis Testing chapter — you’ve now seen all four tests: χ² independence, χ² goodness of fit, and the t-test. The exam pattern is the same across all of them: set up H₀/H₁, get the p-value from the GDC, compare with α, conclude in context. Practice with past papers to lock in the workflow.

Need help with AI SL Hypothesis Testing?

Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.

Book Free Session →

The t-test

📘 What you need to know

One-tailed vs two-tailed — the wording test

🧭 Recipe — carry out a pooled 2-sample t-test

Worked examples

💡 Top tips

⚠ Common mistakes

Need help with AI SL Hypothesis Testing?

Quick Links

Contact us

Follow us