IB Maths AI SLTopic 4 — Hypothesis TestingPaper 2Pooled 2-sample~8 min read
The t-test
The t-test compares the means of two independent samples to decide whether two populations have the same mean. In AI SL you’ll always use the pooled two-sample t-test (equal-variance assumption). The GDC computes the test statistic and p-value; you handle the setup (one-tailed vs two-tailed) and the conclusion in context.
📘 What you need to know
Compares two population means: μx and μy. You need a sample of values from each.
H₀: μx = μy (no difference).
H₁: depends on the wording — μx < μy, μx > μy, or μx ≠ μy.
Assumptions: each population is normally distributed; variances are equal (use the pooled option on the GDC).
GDC output: a t statistic and a p-value. You only need the p-value to decide.
Decision: if p < α ⇒ reject H₀ (means differ). Otherwise accept H₀.
One-tailed vs two-tailed — the wording test
The form of H₁ comes straight from the question:
• One-tailed (μx > μy): words like “higher / faster / more / bigger / better“. You suspect a specific direction.
• One-tailed (μx < μy): words like “lower / slower / less / smaller / reduces“.
• Two-tailed (μx ≠ μy): words like “different / changed / not the same“.
The visual below shows the conceptual question: is the gap between the two sample means big enough, relative to the spread, to be unlikely under H₀?
Both panels compare two samples (teal X, orange Y) by their distributions. Left: bells overlap heavily — the small gap is consistent with noise, so the p-value is large and we accept H₀. Right: bells barely overlap — the gap is too big for chance, the p-value is tiny, and we reject H₀.
The pooled t-test framework
H₀: μx = μy · H₁: μx < μy, μx > μy, or μx ≠ μy
decision: reject H₀ ⇔ p < α
🧭 Recipe — carry out a pooled 2-sample t-test
Define μx and μy clearly (e.g. “let μN = mean caffeine of brand N”).
Write H₀ and H₁: pick <, >, or ≠ based on the wording.
Enter the two samples as separate lists on the GDC. Run “2-Sample t-Test”, select pooled, and the correct tail.
Read off the p-value from the GDC output.
Compare and conclude in context: reject H₀ if p < α, in the question’s wording.
Always pick “pooled”: the IB exam assumes equal variances. If you pick the non-pooled (Welch) option by mistake, the p-value will be slightly off — an easy lost mark.
Worked examples
WE 1
Set up a one-tailed t-test
A coach claims that athletes using a new warm-up routine have lower resting heart rates than those using the old routine.
Write the null and alternative hypotheses.
Define population meansμ_N = mean heart rate for new routineμ_O = mean heart rate for old routine“Lower” → one-tailed test (less than)H₀: μ_N = μ_OH₁: μ_N < μ_OH₀: μ_N = μ_O · H₁: μ_N < μ_O (one-tailed)always DEFINE the population means in context. Without a definition, “μ_N” is meaningless on the page.
WE 2
Set up a two-tailed t-test
A school has two campuses, A and B. The principal wants to investigate whether mean exam scores differ between the two campuses.
Write the null and alternative hypotheses.
Define population meansμ_A = mean exam score on campus Aμ_B = mean exam score on campus B“Differ” → two-tailedH₀: μ_A = μ_BH₁: μ_A ≠ μ_Btwo-tailed test“differ” / “different” / “not the same” → ALWAYS two-tailed. Don’t pick a direction unless the question gives one.
WE 3
Full one-tailed test — reject H₀
The caffeine content (mg) in 8 cans of each of two energy-drink brands is recorded:
It is claimed that Brand A has a higher mean caffeine content than Brand B. Perform a pooled t-test at the 5% significance level.
Define meansμ_A = mean caffeine, Brand Aμ_B = mean caffeine, Brand BHypotheses (one-tailed, “higher”)H₀: μ_A = μ_BH₁: μ_A > μ_BGDC: 2-sample pooled t-test (one-tail)t = 4.32, p ≈ 0.000354Compare with αp = 0.000354 < 0.05 → reject H₀sufficient evidence Brand A has higher mean caffeinea very small p-value (well under 0.05) means the data is hugely unlikely under H₀ — strong evidence in favour of the claim.
WE 4
Full two-tailed test — accept H₀
Eight students are taught using Method 1 and eight using Method 2. Their exam scores (out of 100):
Test whether the methods produce different mean scores at the 5% level.
Define meansμ₁ = mean score, Method 1μ₂ = mean score, Method 2Hypotheses (two-tailed, “different”)H₀: μ₁ = μ₂H₁: μ₁ ≠ μ₂GDC: 2-sample pooled t-test (two-tail)t = 0.300, p ≈ 0.769Comparep = 0.769 > 0.05 → accept H₀insufficient evidence that the methods produce different mean scoresa large p-value means the small difference (72.6 vs 72) is easily explained by random sampling — no real evidence the methods differ.
WE 5
Write a conclusion in context
A pooled one-tailed t-test is performed at the 5% significance level to test whether a new medication reduces mean blood pressure compared to a placebo. The GDC gives p = 0.034.
State the conclusion in context.
Compare p with αp = 0.034 < α = 0.05 → reject H₀Context: “reduces” → H₁ was μ_medication < μ_placebosufficient evidence the medication reduces mean blood pressurenever write just “reject H₀”. Use the question’s wording: “reduces mean blood pressure” beats “μ_x < μ_y” every time.
WE 6
Identify the type of test from the wording
For each scenario, decide whether a one-tailed or two-tailed test is appropriate and state the form of H₁.
(i) Researchers want to know whether a new study app makes students faster at completing tests.
(ii) A factory manager wants to test whether two production lines produce different mean weights of biscuits.
(iii) A botanist claims that Plant A grows to a higher mean height than Plant B.
(i) “faster” → smaller timesone-tailed → H₁: μ_app < μ_no-app(ii) “different” → either directiontwo-tailed → H₁: μ₁ ≠ μ₂(iii) “higher” → bigger heightone-tailed → H₁: μ_A > μ_Bone-tailed (<) · two-tailed (≠) · one-tailed (>)always check direction words BEFORE running the GDC — selecting the wrong tail in the calculator gives a wrong p-value.
💡 Top tips
Always select “pooled” on the GDC — the IB exam assumes equal variances.
Select the correct tail: most GDCs let you choose <, >, or ≠. Picking the wrong one breaks the p-value.
Define μx, μy in context before writing hypotheses. Examiners want the variables named.
Conclusion = decision + context: “sufficient evidence Brand A has higher mean caffeine” beats “reject H₀”.
If “faster” / “reduces” ⇒ the direction is <; remember faster = smaller time, reduces = smaller value.
⚠ Common mistakes
Wrong tail: “faster” → one-tailed <, not >. Faster means shorter time.
Using non-pooled t-test (Welch) instead of pooled. Always pool unless told otherwise.
One-tailed when “differ”: “differ” is two-tailed. Don’t sneak in a direction.
Definitive conclusions: never say “H₀ is true” or “the means are definitely equal”. Tests give evidence, not proof.
Confusing the test statistic with the p-value: the decision uses p, not t.
This is the final note in the AI SL Hypothesis Testing chapter — you’ve now seen all four tests: χ2 independence, χ2 goodness of fit, and the t-test. The exam pattern is the same across all of them: set up H₀/H₁, get the p-value from the GDC, compare with α, conclude in context. Practice with past papers to lock in the workflow.
Need help with AI SL Hypothesis Testing?
Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.