IB Maths AI SL Topic 4 — Hypothesis Testing Paper 2 H₀, H₁, p-value ~7 min read

Introduction to Hypothesis Testing

A hypothesis test uses a sample to decide whether a claim about a population is plausible. You set up two competing statements — H₀ (no change) and H₁ (the change you suspect) — then use sample data to either reject or accept H₀. The decision comes down to a single comparison: p-value vs significance level.

📘 What you need to know

Null hypothesis H₀: the “no change” assumption about the population (e.g. μ = 500). Always assumed true at the start.
Alternative hypothesis H₁: what you suspect has changed (e.g. μ < 500, μ > 500, or μ ≠ 500).
One-tailed: H₁ uses < or > (suspect a specific direction). Two-tailed: H₁ uses ≠ (suspect any change).
Significance level α: the cut-off probability set before testing — usually 1%, 5%, or 10%.
p-value: probability of seeing data at least as extreme as the sample, assuming H₀ is true. The GDC gives this.
The decision rule: if p < α ⇒ reject H₀. If p > α ⇒ accept H₀.

Setting up the hypotheses

Every test starts with two competing statements about a population parameter (usually a mean μ or a proportion).

• H₀ is the “status quo” — what is currently assumed true. Always written as an equality (e.g. μ = 500).
• H₁ reflects the claim being tested. The wording of the question tells you which inequality:

· “decreased / smaller / lower / less than” → H₁: μ < … (one-tailed, lower)
· “increased / bigger / higher / more than” → H₁: μ > … (one-tailed, upper)
· “changed / different / not the same” → H₁: μ ≠ … (two-tailed)

The decision: p-value vs significance level

Once the test is set up, your GDC computes a p-value — the probability of getting a sample as extreme as yours if H₀ were true. The comparison with α decides everything.

The p-value is a number between 0 and 1. The significance level α (orange) cuts off the leftmost slice. A p-value below α falls in the red reject region; anything above α falls in the teal accept region.

The decision rule if p < α → reject H₀ (sufficient evidence)

if p > α → accept H₀ (insufficient evidence to reject)

🧭 Recipe — carry out any hypothesis test

Identify the parameter being tested (usually a mean μ) and the claim wording.
Write H₀ (equality) and H₁ using <, > or ≠ depending on the wording.
State the significance level α (given in the question, e.g. 5%).
Use the GDC to compute the p-value from the sample data.
Compare and conclude in context: reject H₀ if p < α; otherwise accept H₀.

“Accepting” H₀ doesn’t mean it’s true — it means there’s not enough evidence to reject it. Always write “insufficient evidence to suggest…” rather than “H₀ is true”.

Worked examples

WE 1

Set up hypotheses — one-tailed

A factory advertises that its bags of rice have a mean weight of 500 g. A quality inspector suspects the bags are being under-filled and tests a sample.

Write the null and alternative hypotheses for this test.

Let μ = population mean weight of bags (g) H₀: the claim is correct → mean = 500 H₀: μ = 500 H₁: “under-filled” → suspect mean is LESS than 500 H₁: μ < 500 H₀: μ = 500 · H₁: μ < 500 (one-tailed) “under-filled” / “less than” / “decreased” all point to the LOWER one-tailed test. Define what μ represents — examiners want the variable named in context.

WE 2

Set up hypotheses — two-tailed

A teacher previously found that students’ mean mark on a maths test was 62. After introducing a new revision method she wants to know whether the mean mark has changed.

Write the null and alternative hypotheses.

Let μ = population mean mark with the new method H₀: no change → mean still 62 H₀: μ = 62 “Changed” → could be higher OR lower → two-tailed H₁: μ ≠ 62 H₀: μ = 62 · H₁: μ ≠ 62 (two-tailed) “changed”, “different”, or “not the same” → ALWAYS two-tailed. Don’t pick a direction unless the question specifies one.

WE 3

Classify three scenarios as one- or two-tailed

For each scenario, decide whether a one-tailed or two-tailed test is appropriate, and state the form of H₁.

(i) A farmer believes a new fertilizer makes plants taller.
(ii) A coach wants to know if a new training programme has changed athletes’ 100 m times.
(iii) A doctor claims a new medication reduces blood pressure.

(i) “taller” → mean height has INCREASED one-tailed → H₁: μ > … (ii) “changed” → could go either way two-tailed → H₁: μ ≠ … (iii) “reduces” → mean has DECREASED one-tailed → H₁: μ < … one-tailed (>) · two-tailed (≠) · one-tailed (<) always look for direction words: “more/taller/higher/increased” → >; “less/smaller/decreased/reduced” → <; “changed/different” → ≠.

WE 4

Decision using a p-value

A test is carried out at a 5% significance level. The GDC gives a p-value of 0.023.

State the conclusion of the test.

Compare p with α p = 0.023, α = 0.05 0.023 < 0.05 → p < α Decision reject H₀ — sufficient evidence against H₀ whenever p < α, the data is “too unusual” to have happened by chance under H₀ — so we reject it.

WE 5

Decision using a critical value

A two-tailed test at the 5% significance level has critical value 1.96. The test statistic from the sample is 1.45.

State the conclusion of the test.

Compare |test statistic| with critical value |1.45| = 1.45 < 1.96 test statistic does NOT fall in the critical region Decision accept H₀ — insufficient evidence to reject two routes lead to the same place: compare p with α, OR compare the test statistic with the critical value. Pick whichever the question gives you.

WE 6

Write a conclusion in context

A farmer carries out a one-tailed test at the 5% significance level to test whether a new fertilizer increases the mean height of his plants. The GDC gives p = 0.087.

State the conclusion of the test in context.

Compare p = 0.087, α = 0.05 0.087 > 0.05 → p > α Decision accept H₀ Write IN CONTEXT (not just “accept H₀”) insufficient evidence to suggest the fertilizer increases the mean plant height always frame the conclusion using the QUESTION’S wording — “fertilizer”, “plant height”, “increases”. A bare “accept H₀” is incomplete.

💡 Top tips

Decision is one comparison: p vs α. If p is smaller, reject.
Always define the parameter (e.g. “let μ = mean weight”). Don’t write H₀: μ = 500 without saying what μ means.
Direction words: “decreased / reduced / smaller” → <; “increased / bigger / taller” → >; “changed / different” → ≠.
Conclusion in context: state the result IN THE QUESTION’S WORDING. “Sufficient evidence the fertilizer increases growth” beats “reject H₀”.
Significance level chosen BEFORE the test — you can’t adjust it after seeing the p-value.

⚠ Common mistakes

“Accept H₀” means H₀ is true: no — it means there is INSUFFICIENT evidence to reject it. Wording matters.
Wrong inequality in H₁: “reduced” → <, not >. Re-read the question.
Definitive conclusion: never say “H₀ is true” or “definitely false”. Hypothesis tests give evidence, not proof.
One-tailed when “changed”: “different” / “changed” forces two-tailed, even if you suspect a direction informally.
Forgetting context: “accept H₀” alone is incomplete; quote the variable from the question.

Next up: Chi-squared Test for Independence. The first specific hypothesis test you’ll meet — used when you have a contingency table of two categorical variables (e.g. eye colour vs hair colour) and want to know if they’re related. The GDC does the heavy lifting; your job is the set-up and the conclusion.

Need help with AI SL Hypothesis Testing?

Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.

Book Free Session →

Introduction to Hypothesis Testing

📘 What you need to know

Setting up the hypotheses

The decision: p-value vs significance level

🧭 Recipe — carry out any hypothesis test

Worked examples

💡 Top tips

⚠ Common mistakes

Need help with AI SL Hypothesis Testing?

Quick Links

Contact us

Follow us