IB Maths AI SL Topic 4 — Hypothesis Testing Paper 2 Pooled 2-sample ~8 min read

The t-test

The t-test compares the means of two independent samples to decide whether two populations have the same mean. In AI SL you’ll always use the pooled two-sample t-test (equal-variance assumption). The GDC computes the test statistic and p-value; you handle the setup (one-tailed vs two-tailed) and the conclusion in context.

📘 What you need to know

One-tailed vs two-tailed — the wording test

The form of H₁ comes straight from the question:

One-tailed (μx > μy): words like “higher / faster / more / bigger / better“. You suspect a specific direction.
One-tailed (μx < μy): words like “lower / slower / less / smaller / reduces“.
Two-tailed (μxμy): words like “different / changed / not the same“.

The visual below shows the conceptual question: is the gap between the two sample means big enough, relative to the spread, to be unlikely under H₀?

Is the gap between sample means significant? Means close vs spread — ACCEPT H₀ X Y small gapp ≈ 0.77   → accept H₀ sample X sample Y Means far vs spread — REJECT H₀ X Y large gap p ≈ 0.0004   → reject H₀
Both panels compare two samples (teal X, orange Y) by their distributions. Left: bells overlap heavily — the small gap is consistent with noise, so the p-value is large and we accept H₀. Right: bells barely overlap — the gap is too big for chance, the p-value is tiny, and we reject H₀.
The pooled t-test framework H₀: μx = μy  ·  H₁: μx < μy,  μx > μy,  or  μxμy
 
decision:  reject H₀ ⇔ p < α

🧭 Recipe — carry out a pooled 2-sample t-test

  1. Define μx and μy clearly (e.g. “let μN = mean caffeine of brand N”).
  2. Write H₀ and H₁: pick <, >, or ≠ based on the wording.
  3. Enter the two samples as separate lists on the GDC. Run “2-Sample t-Test”, select pooled, and the correct tail.
  4. Read off the p-value from the GDC output.
  5. Compare and conclude in context: reject H₀ if p < α, in the question’s wording.
Always pick “pooled”: the IB exam assumes equal variances. If you pick the non-pooled (Welch) option by mistake, the p-value will be slightly off — an easy lost mark.

Worked examples

WE 1

Set up a one-tailed t-test

A coach claims that athletes using a new warm-up routine have lower resting heart rates than those using the old routine.

Write the null and alternative hypotheses.

Define population means μ_N = mean heart rate for new routine μ_O = mean heart rate for old routine “Lower” → one-tailed test (less than) H₀: μ_N = μ_O H₁: μ_N < μ_O H₀: μ_N = μ_O · H₁: μ_N < μ_O (one-tailed) always DEFINE the population means in context. Without a definition, “μ_N” is meaningless on the page.
WE 2

Set up a two-tailed t-test

A school has two campuses, A and B. The principal wants to investigate whether mean exam scores differ between the two campuses.

Write the null and alternative hypotheses.

Define population means μ_A = mean exam score on campus A μ_B = mean exam score on campus B “Differ” → two-tailed H₀: μ_A = μ_B H₁: μ_A ≠ μ_B two-tailed test “differ” / “different” / “not the same” → ALWAYS two-tailed. Don’t pick a direction unless the question gives one.
WE 3

Full one-tailed test — reject H₀

The caffeine content (mg) in 8 cans of each of two energy-drink brands is recorded:

Brand A:   80, 82, 78, 85, 81, 79, 83, 80
Brand B:   75, 78, 72, 80, 76, 74, 77, 75

It is claimed that Brand A has a higher mean caffeine content than Brand B. Perform a pooled t-test at the 5% significance level.

Define means μ_A = mean caffeine, Brand A μ_B = mean caffeine, Brand B Hypotheses (one-tailed, “higher”) H₀: μ_A = μ_B H₁: μ_A > μ_B GDC: 2-sample pooled t-test (one-tail) t = 4.32, p ≈ 0.000354 Compare with α p = 0.000354 < 0.05 → reject H₀ sufficient evidence Brand A has higher mean caffeine a very small p-value (well under 0.05) means the data is hugely unlikely under H₀ — strong evidence in favour of the claim.
WE 4

Full two-tailed test — accept H₀

Eight students are taught using Method 1 and eight using Method 2. Their exam scores (out of 100):

Method 1:   72, 78, 65, 80, 75, 70, 73, 68
Method 2:   70, 75, 68, 77, 72, 71, 74, 69

Test whether the methods produce different mean scores at the 5% level.

Define means μ₁ = mean score, Method 1 μ₂ = mean score, Method 2 Hypotheses (two-tailed, “different”) H₀: μ₁ = μ₂ H₁: μ₁ ≠ μ₂ GDC: 2-sample pooled t-test (two-tail) t = 0.300, p ≈ 0.769 Compare p = 0.769 > 0.05 → accept H₀ insufficient evidence that the methods produce different mean scores a large p-value means the small difference (72.6 vs 72) is easily explained by random sampling — no real evidence the methods differ.
WE 5

Write a conclusion in context

A pooled one-tailed t-test is performed at the 5% significance level to test whether a new medication reduces mean blood pressure compared to a placebo. The GDC gives p = 0.034.

State the conclusion in context.

Compare p with α p = 0.034 < α = 0.05 → reject H₀ Context: “reduces” → H₁ was μ_medication < μ_placebo sufficient evidence the medication reduces mean blood pressure never write just “reject H₀”. Use the question’s wording: “reduces mean blood pressure” beats “μ_x < μ_y” every time.
WE 6

Identify the type of test from the wording

For each scenario, decide whether a one-tailed or two-tailed test is appropriate and state the form of H₁.

(i) Researchers want to know whether a new study app makes students faster at completing tests.
(ii) A factory manager wants to test whether two production lines produce different mean weights of biscuits.
(iii) A botanist claims that Plant A grows to a higher mean height than Plant B.

(i) “faster” → smaller times one-tailed → H₁: μ_app < μ_no-app (ii) “different” → either direction two-tailed → H₁: μ₁ ≠ μ₂ (iii) “higher” → bigger height one-tailed → H₁: μ_A > μ_B one-tailed (<) · two-tailed (≠) · one-tailed (>) always check direction words BEFORE running the GDC — selecting the wrong tail in the calculator gives a wrong p-value.

💡 Top tips

⚠ Common mistakes

This is the final note in the AI SL Hypothesis Testing chapter — you’ve now seen all four tests: χ2 independence, χ2 goodness of fit, and the t-test. The exam pattern is the same across all of them: set up H₀/H₁, get the p-value from the GDC, compare with α, conclude in context. Practice with past papers to lock in the workflow.

Need help with AI SL Hypothesis Testing?

Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.

Book Free Session →