IB Maths AI SLTopic 4 — Hypothesis TestingPaper 2Uniform · Binomial · Normal~9 min read
Goodness of Fit Test
A χ2goodness of fit (GOF) test checks whether observed data fits a specific distribution — usually uniform, binomial, or normal. Same test statistic as the independence test, but instead of comparing two variables, you compare the data to a fixed model. Most of the work is computing the expected frequencies from the proposed distribution; the GDC handles everything afterwards.
📘 What you need to know
Idea: do the observed frequencies match what the proposed distribution predicts? If the gap is too big ⇒ reject H₀.
H₀: data follows the proposed distribution. H₁: data does NOT follow it.
Expected frequency = (probability under the model) × (total sample size N).
Uniform: E = Nk for every one of k outcomes.
Binomial / Normal: compute P(X = x) or P(a < X < b) on the GDC, then multiply by N.
Degrees of freedom: ν = k − 1, where k is the number of categories. Decision rule identical to the independence test.
Building the expected frequencies
The expected count for each category is what the proposed distribution would predict, scaled up to your sample size N:
• Uniform: every outcome has probability 1/k, so E = N/k for all categories.
• Binomial B(n, p): use the GDC to find P(X = 0), P(X = 1), … then multiply each by N.
• Normal N(μ, σ2): use the GDC’s normal-cdf to find P(a < X < b) for each class, then multiply by N. Watch out for unbounded end classes.
For each die face, a teal observed bar sits next to an orange expected bar (= 20 if the die were fair). The bars are close on average, so χ2 is small (2.9) and the p-value is large (0.715) — consistent with a uniform distribution.
The χ2 goodness of fit statisticχ2calc = Σ(Oi − Ei)2Ei
degrees of freedom: ν = k − 1 (k = number of categories)
🧭 Recipe — full GOF test
State H₀ / H₁: data follows / does not follow the given distribution. Name the distribution clearly (e.g. “B(4, 0.5)” or “N(150, 202)”).
Compute the expected frequencies: probability under the model × N.
Find ν = k − 1.
GDC: enter observed and expected lists, run the χ2 GOF test — returns χ2calc and p.
Compare and conclude in context: reject H₀ if χ2calc > critical value (or p < α).
Expected frequencies sum to N: always check. For normal-distribution problems with unbounded ends, the four (or however many) probabilities must add to 1, so the expected frequencies must add to N.
Worked examples
WE 1
State hypotheses for a uniform GOF
A player suspects a six-sided die is biased. He rolls it many times and records the score on each roll. He plans to use a χ2 test to decide whether the die is fair.
Write the null and alternative hypotheses.
“Fair die” means every face equally likelyH₀: score follows uniform distribution (each face P = 1/6)H₁: score does NOT follow uniform distributionname the distribution explicitly, in context“uniform” is the technical word for “all outcomes equally likely”. H₀ is always specific; H₁ is just its negation.
WE 2
Expected frequencies for a uniform distribution
A spinner has five equally-sized coloured sections. A student spins it 60 times.
(a) Find the expected frequency of each colour, assuming the spinner is fair. (b) Write down the degrees of freedom for the test.
(a) Uniform → each outcome has same expected frequencyE = N / k = 60 / 5 = 12expected = 12 per colour(b) v = k − 1v = 5 − 1 = 4v = 4all expected frequencies should be ≥ 5 for the test to be valid in the IA. Here 12 ≥ 5 — fine.
WE 3
Expected frequency for a binomial GOF
It is claimed that the number of successful free throws made by basketball players in 4 attempts follows the distribution B(4, 0.5). A sample of 160 players is studied.
Find the expected number of players who make exactly 2 successful free throws.
Step 1 — find P(X = 2) using B(4, 0.5)P(X=2) = C(4,2) × 0.5² × 0.5² = 6 × 0.0625 = 0.375Step 2 — multiply by Nexpected = 160 × 0.375 = 60expected = 60 playersuse the GDC’s binomial PDF directly: binomPdf(4, 0.5, 2) = 0.375. Round only the FINAL expected frequency.
WE 4
Expected frequencies for a normal GOF
The masses (g) of 100 fruits are claimed to follow the distribution N(150, 202). The data is grouped into four classes.
Find the expected frequency for each class:
mass
m < 130
130 ≤ m < 150
150 ≤ m < 170
m ≥ 170
Use normal-cdf on the GDC with μ = 150, σ = 20Class 1: P(m < 130) = P(Z < −1)P = 0.1587 → E = 100 × 0.1587 = 15.87Class 2: P(130 ≤ m < 150)P = 0.3413 → E = 34.13Class 3: P(150 ≤ m < 170)P = 0.3413 → E = 34.13Class 4: P(m ≥ 170)P = 0.1587 → E = 15.87Check sum = N15.87 + 34.13 + 34.13 + 15.87 = 100 ✓15.87 · 34.13 · 34.13 · 15.87end classes (< or ≥) use UNBOUNDED inputs on the GDC: −∞ and +∞ (or use very large numbers). Always check the four E’s sum to N.
WE 5
Full uniform GOF test using the p-value
The die from WE 1 is rolled 120 times with the following results:
face
1
2
3
4
5
6
obs.
25
18
22
20
15
20
A χ2 GOF test is performed at the 5% significance level. The GDC gives χ2calc ≈ 2.9 and p ≈ 0.715. State the conclusion.
Expected: 120/6 = 20 per facev = 6 − 1 = 5Compare p with αp = 0.715, α = 0.050.715 > 0.05 → accept H₀insufficient evidence that the die is biasedsmall variations (25, 15) on individual faces are expected by chance — the χ² test asks whether the OVERALL pattern is too far from uniform, and here it isn’t.
WE 6
Normal GOF test using the critical value — reject H₀
For the fruit mass example (WE 4), suppose a sample of 400 fruits gives:
mass
m < 130
130 ≤ m < 150
150 ≤ m < 170
m ≥ 170
obs.
25
130
145
100
Test whether mass follows N(150, 202) at the 5% level. The critical value is 7.815 and the GDC gives χ2calc ≈ 45.2.
HypothesesH₀: mass follows N(150, 20²)H₁: mass does NOT follow N(150, 20²)Expected (for N=400, scale WE 4 by 4)63.48 · 136.52 · 136.52 · 63.48 (sum 400 ✓)v = 4 − 1 = 3Compareχ² = 45.2 > cv = 7.815→ reject H₀sufficient evidence mass does NOT follow N(150, 20²)huge χ² (45) compared to cv (7.8) — observed shifts (25 vs 63.48 in low class, 100 vs 63.48 in high class) drive the test statistic way above the threshold.
💡 Top tips
Expected frequencies sum to N: a fast sanity check before plugging into the GDC.
Uniform expected = N/k: one line, no GDC needed.
For binomial/normal: use GDC’s binomPDF / normalCDF to get each probability, then multiply by N.
Don’t round probabilities early — carry 4−5 dp and round at the final expected frequency.
Unbounded end classes: m < 130 means use −∞ (or 0 if mass) as lower bound; m ≥ 170 means use +∞ as upper bound.
⚠ Common mistakes
Wrong ν: for GOF, ν = k − 1, NOT (rows−1)(cols−1). That’s the independence test.
Forgetting to multiply by N: probabilities are not frequencies; expected freq = P × N.
Using midpoint of class instead of class-interval probability for the normal: use P(a < X < b) per class.
Probabilities don’t sum to 1: a sign you’ve missed an unbounded end. Re-check the class boundaries.
Conclusion without context: “accept H₀” is incomplete — quote the distribution and the variable.
Next up: The t-test. We swap categorical χ2 for continuous data: compare the means of two samples (e.g. children vs adults solving a puzzle) using the pooled two-sample t-test. Same workflow — H₀/H₁, GDC, p-value — but now the test is about means rather than distributions.
Need help with AI SL Hypothesis Testing?
Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.