IB Maths AI SLTopic 4 — Hypothesis TestingPaper 2H₀, H₁, p-value~7 min read
Introduction to Hypothesis Testing
A hypothesis test uses a sample to decide whether a claim about a population is plausible. You set up two competing statements — H₀ (no change) and H₁ (the change you suspect) — then use sample data to either reject or accept H₀. The decision comes down to a single comparison: p-value vs significance level.
📘 What you need to know
Null hypothesis H₀: the “no change” assumption about the population (e.g. μ = 500). Always assumed true at the start.
Alternative hypothesis H₁: what you suspect has changed (e.g. μ < 500, μ > 500, or μ ≠ 500).
One-tailed: H₁ uses < or > (suspect a specific direction). Two-tailed: H₁ uses ≠ (suspect any change).
Significance level α: the cut-off probability set before testing — usually 1%, 5%, or 10%.
p-value: probability of seeing data at least as extreme as the sample, assuming H₀ is true. The GDC gives this.
The decision rule: if p < α ⇒ reject H₀. If p > α ⇒ accept H₀.
Setting up the hypotheses
Every test starts with two competing statements about a population parameter (usually a mean μ or a proportion).
• H₀ is the “status quo” — what is currently assumed true. Always written as an equality (e.g. μ = 500).
• H₁ reflects the claim being tested. The wording of the question tells you which inequality:
· “decreased / smaller / lower / less than” → H₁: μ < … (one-tailed, lower)
· “increased / bigger / higher / more than” → H₁: μ > … (one-tailed, upper)
· “changed / different / not the same” → H₁: μ ≠ … (two-tailed)
The decision: p-value vs significance level
Once the test is set up, your GDC computes a p-value — the probability of getting a sample as extreme as yours if H₀ were true. The comparison with α decides everything.
The p-value is a number between 0 and 1. The significance level α (orange) cuts off the leftmost slice. A p-value belowα falls in the red reject region; anything aboveα falls in the teal accept region.
The decision rule
if p < α → reject H₀ (sufficient evidence)
if p > α → accept H₀ (insufficient evidence to reject)
🧭 Recipe — carry out any hypothesis test
Identify the parameter being tested (usually a mean μ) and the claim wording.
Write H₀ (equality) and H₁ using <, > or ≠ depending on the wording.
State the significance level α (given in the question, e.g. 5%).
Use the GDC to compute the p-value from the sample data.
Compare and conclude in context: reject H₀ if p < α; otherwise accept H₀.
“Accepting” H₀ doesn’t mean it’s true — it means there’s not enough evidence to reject it. Always write “insufficient evidence to suggest…” rather than “H₀ is true”.
Worked examples
WE 1
Set up hypotheses — one-tailed
A factory advertises that its bags of rice have a mean weight of 500 g. A quality inspector suspects the bags are being under-filled and tests a sample.
Write the null and alternative hypotheses for this test.
Let μ = population mean weight of bags (g)H₀: the claim is correct → mean = 500H₀: μ = 500H₁: “under-filled” → suspect mean is LESS than 500H₁: μ < 500H₀: μ = 500 · H₁: μ < 500 (one-tailed)“under-filled” / “less than” / “decreased” all point to the LOWER one-tailed test. Define what μ represents — examiners want the variable named in context.
WE 2
Set up hypotheses — two-tailed
A teacher previously found that students’ mean mark on a maths test was 62. After introducing a new revision method she wants to know whether the mean mark has changed.
Write the null and alternative hypotheses.
Let μ = population mean mark with the new methodH₀: no change → mean still 62H₀: μ = 62“Changed” → could be higher OR lower → two-tailedH₁: μ ≠ 62H₀: μ = 62 · H₁: μ ≠ 62 (two-tailed)“changed”, “different”, or “not the same” → ALWAYS two-tailed. Don’t pick a direction unless the question specifies one.
WE 3
Classify three scenarios as one- or two-tailed
For each scenario, decide whether a one-tailed or two-tailed test is appropriate, and state the form of H₁.
(i) A farmer believes a new fertilizer makes plants taller.
(ii) A coach wants to know if a new training programme has changed athletes’ 100 m times.
(iii) A doctor claims a new medication reduces blood pressure.
(i) “taller” → mean height has INCREASEDone-tailed → H₁: μ > …(ii) “changed” → could go either waytwo-tailed → H₁: μ ≠ …(iii) “reduces” → mean has DECREASEDone-tailed → H₁: μ < …one-tailed (>) · two-tailed (≠) · one-tailed (<)always look for direction words: “more/taller/higher/increased” → >; “less/smaller/decreased/reduced” → <; “changed/different” → ≠.
WE 4
Decision using a p-value
A test is carried out at a 5% significance level. The GDC gives a p-value of 0.023.
State the conclusion of the test.
Compare p with αp = 0.023, α = 0.050.023 < 0.05 → p < αDecisionreject H₀ — sufficient evidence against H₀whenever p < α, the data is “too unusual” to have happened by chance under H₀ — so we reject it.
WE 5
Decision using a critical value
A two-tailed test at the 5% significance level has critical value 1.96. The test statistic from the sample is 1.45.
State the conclusion of the test.
Compare |test statistic| with critical value|1.45| = 1.45 < 1.96test statistic does NOT fall in the critical regionDecisionaccept H₀ — insufficient evidence to rejecttwo routes lead to the same place: compare p with α, OR compare the test statistic with the critical value. Pick whichever the question gives you.
WE 6
Write a conclusion in context
A farmer carries out a one-tailed test at the 5% significance level to test whether a new fertilizer increases the mean height of his plants. The GDC gives p = 0.087.
State the conclusion of the test in context.
Comparep = 0.087, α = 0.050.087 > 0.05 → p > αDecisionaccept H₀Write IN CONTEXT (not just “accept H₀”)insufficient evidence to suggest the fertilizer increases the mean plant heightalways frame the conclusion using the QUESTION’S wording — “fertilizer”, “plant height”, “increases”. A bare “accept H₀” is incomplete.
💡 Top tips
Decision is one comparison: p vs α. If p is smaller, reject.
Always define the parameter (e.g. “let μ = mean weight”). Don’t write H₀: μ = 500 without saying what μ means.
Conclusion in context: state the result IN THE QUESTION’S WORDING. “Sufficient evidence the fertilizer increases growth” beats “reject H₀”.
Significance level chosen BEFORE the test — you can’t adjust it after seeing the p-value.
⚠ Common mistakes
“Accept H₀” means H₀ is true: no — it means there is INSUFFICIENT evidence to reject it. Wording matters.
Wrong inequality in H₁: “reduced” → <, not >. Re-read the question.
Definitive conclusion: never say “H₀ is true” or “definitely false”. Hypothesis tests give evidence, not proof.
One-tailed when “changed”: “different” / “changed” forces two-tailed, even if you suspect a direction informally.
Forgetting context: “accept H₀” alone is incomplete; quote the variable from the question.
Next up: Chi-squared Test for Independence. The first specific hypothesis test you’ll meet — used when you have a contingency table of two categorical variables (e.g. eye colour vs hair colour) and want to know if they’re related. The GDC does the heavy lifting; your job is the set-up and the conclusion.
Need help with AI SL Hypothesis Testing?
Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.