IB Maths AA HL
Topic 4 — Statistics & Probability
Paper 1 & 2
~8 min read
Frequency Tables
Frequency tables compress repeated data into a tidy summary. The same statistical measures (mean, median, mode, SD) all generalise — you just weight each value by how often it occurs. For grouped data, you lose the exact values and have to use mid-interval estimates instead.
📘 What you need to know
- Ungrouped table: each row is a single value xi with frequency fi.
- Grouped table: each row is a class interval (e.g., 10 ≤ x < 20) with frequency fi.
- Mean (formula booklet): ⎯x = Σfixi / n where n = Σfi.
- Mode (ungrouped): value with the highest frequency. Modal class (grouped): class with the highest frequency.
- Median: use cumulative frequencies (running totals) to locate the middle value.
- Grouped data is approximate — mid-interval values estimate the mean and SD; cumulative frequency graphs estimate the median and quartiles.
- Use your GDC: enter values into one list and frequencies into another, then run 1-Var Stats.
Ungrouped frequency tables
Mean from a frequency table
⎯x = Σ fixin where n = Σfi
To find the median, build a cumulative frequency column. The median is the (n + 1)/2-th value (or for even n, the average of the n/2-th and (n/2 + 1)-th). Read off which row’s cumulative frequency first reaches that position.
Grouped frequency tables
Trade-off: grouping gains tidiness but loses precision — you no longer know exact values, so all measures become estimates.
Mid-interval value
xi = (lower + upper) / 2
stand-in for the actual values in that class
Modal class
class with highest frequency
requires equal class widths to be meaningful
The mean is estimated by summing fixi using the mid-interval values and dividing by n. The median, IQR, and quartiles are read off a cumulative frequency graph (covered in a later note).
🧭 Recipe — measures from a frequency table
- For grouped data, compute mid-interval values xi = (lower + upper)/2. (Skip for ungrouped.)
- Find n = Σfi (total frequency).
- Compute Σfixi. Divide by n for the mean.
- For mode/modal class: pick the row with the highest frequency.
- For median: build cumulative totals; locate which row contains the middle value (or class).
- For SD: enter values + frequencies into the GDC’s 1-Var Stats.
Worked examples
WE 1Mode, median, mean from an ungrouped frequency table
The table shows the number of goals scored per match by a football team across 20 matches.
| Goals (x) | 0 | 1 | 2 | 3 | 4 |
|---|
| Frequency (f) | 4 | 7 | 6 | 2 | 1 |
|---|
Find (a) the mode, (b) the median, and (c) the mean.
Step 1: Total n = Σf
n = 4 + 7 + 6 + 2 + 1 = 20
(a) Mode = value with highest frequency
7 is highest → Mode = 1 goal
(b) Median: cumulative frequencies
cum: 4, 11, 17, 19, 20
For n = 20, median = avg of 10th and 11th values
cumulative reaches 11 at goals = 1 → both 10th and 11th are “1”
Median = 1 goal
(c) Mean = Σfx/n
Σfx = 0(4) + 1(7) + 2(6) + 3(2) + 4(1) = 29
Mean = 29/20 = 1.45
Mode = 1; Median = 1; Mean = 1.45 goals
all three close together — distribution is fairly symmetric and centered near 1
WE 2Standard deviation from an ungrouped frequency table
The table shows the number of mobile phones owned by 40 people in a survey.
| Phones | 0 | 1 | 2 | 3 | 4 |
|---|
| Frequency | 2 | 14 | 18 | 4 | 2 |
|---|
Find the mean and standard deviation. Give your standard deviation to 3 s.f.
Step 1: n = Σf
n = 2 + 14 + 18 + 4 + 2 = 40
Step 2: Σfx
0(2) + 1(14) + 2(18) + 3(4) + 4(2) = 0 + 14 + 36 + 12 + 8 = 70
Mean = 70/40 = 1.75
Step 3: Σfx²
0²(2) + 1²(14) + 2²(18) + 3²(4) + 4²(2) = 0 + 14 + 72 + 36 + 32 = 154
Step 4: Variance σ² = Σfx²/n − μ²
σ² = 154/40 − 1.75² = 3.85 − 3.0625 = 0.7875
Step 5: SD
σ = √0.7875 ≈ 0.8874
Mean = 1.75; SD ≈ 0.887 phones
faster method: use 1-Var Stats with two lists (values + frequencies)
WE 3Modal class and estimated mean from a grouped table
The table shows the times in minutes (t) that 35 students took to complete a puzzle.
| Time (min) | 0 ≤ t < 5 | 5 ≤ t < 10 | 10 ≤ t < 15 | 15 ≤ t < 20 | 20 ≤ t < 25 |
|---|
| Frequency | 4 | 9 | 12 | 7 | 3 |
|---|
(a) Write down the modal class. (b) Find the mid-interval value of the modal class. (c) Find an estimate for the mean.
(a) Modal class = highest frequency
freq 12 is highest → Modal class: 10 ≤ t < 15
(b) Mid-interval value
(10 + 15)/2 = 12.5 min
(c) Mid-interval values: 2.5, 7.5, 12.5, 17.5, 22.5
Σfx
4(2.5) + 9(7.5) + 12(12.5) + 7(17.5) + 3(22.5)
= 10 + 67.5 + 150 + 122.5 + 67.5 = 417.5
Estimated mean
417.5/35 ≈ 11.93
(a) 10 ≤ t < 15; (b) 12.5 min; (c) ≈ 11.9 min
“estimate” — rounded form is preferred over leaving fractions
WE 4Estimated mean and SD from a grouped frequency table
The heights, in cm, of 50 plants in a greenhouse are recorded.
| Height (cm) | 10 ≤ h < 20 | 20 ≤ h < 30 | 30 ≤ h < 40 | 40 ≤ h < 50 | 50 ≤ h < 60 |
|---|
| Frequency | 5 | 12 | 18 | 10 | 5 |
|---|
Find an estimate for the mean and the standard deviation. Give your SD to 3 s.f.
Step 1: Mid-intervals — 15, 25, 35, 45, 55
Step 2: Σfx
5(15) + 12(25) + 18(35) + 10(45) + 5(55)
= 75 + 300 + 630 + 450 + 275 = 1730
Estimated mean = 1730/50 = 34.6 cm
Step 3: Σfx²
5(225) + 12(625) + 18(1225) + 10(2025) + 5(3025)
= 1125 + 7500 + 22050 + 20250 + 15125 = 66050
Step 4: σ² = Σfx²/n − μ²
= 66050/50 − 34.6² = 1321 − 1197.16 = 123.84
Step 5: SD
σ = √123.84 ≈ 11.13
Estimated mean ≈ 34.6 cm; estimated SD ≈ 11.1 cm
word “estimated” must appear — exact values impossible without raw data
WE 5Find a missing frequency given the mean
The number of siblings of students in a class are summarised below, where x is unknown.
| Siblings | 0 | 1 | 2 | 3 | 4 |
|---|
| Frequency | 7 | x | 9 | 4 | 2 |
|---|
Given that the mean number of siblings is 1.5, find the value of x.
Step 1: Express n and Σfx in terms of x
n = 7 + x + 9 + 4 + 2 = 22 + x
Σfx = 0(7) + 1(x) + 2(9) + 3(4) + 4(2) = x + 38
Step 2: Apply mean = Σfx/n
(x + 38)/(22 + x) = 1.5
Step 3: Solve
x + 38 = 1.5(22 + x) = 33 + 1.5x
38 − 33 = 1.5x − x → 5 = 0.5x → x = 10
x = 10
verify: total = 32, Σfx = 48; mean = 48/32 = 1.5 ✓
WE 6Modal class, median class, and estimated mean
The test scores of 40 students out of 100 are summarised below.
| Score | 30 ≤ s < 40 | 40 ≤ s < 50 | 50 ≤ s < 60 | 60 ≤ s < 70 | 70 ≤ s < 80 | 80 ≤ s < 90 |
|---|
| Frequency | 4 | 7 | 12 | 11 | 5 | 1 |
|---|
(a) Write down the modal class. (b) Find the class containing the median. (c) Estimate the mean score.
(a) Modal class
freq 12 is highest → 50 ≤ s < 60
(b) Median class — use cumulative frequency
cum: 4, 11, 23, 34, 39, 40
Median position = n/2 = 20
cum reaches 23 at 50 ≤ s < 60 (first to exceed 20)
→ median class: 50 ≤ s < 60
(c) Mid-intervals: 35, 45, 55, 65, 75, 85
Σfx = 4(35) + 7(45) + 12(55) + 11(65) + 5(75) + 1(85)
= 140 + 315 + 660 + 715 + 375 + 85 = 2290
Estimated mean = 2290/40 = 57.25
(a) 50 ≤ s < 60; (b) 50 ≤ s < 60; (c) ≈ 57.3
when modal class = median class, the distribution is roughly centred there
💡 Top tips
- Build a cumulative frequency column on the side — speeds up median and quartile work.
- For grouped data, always say “estimate” or “≈” — exact values aren’t possible.
- Use your GDC’s 1-Var Stats: enter values into List 1, frequencies into List 2.
- Compute Σfx and Σfx² in one pass — keeps your work tidy and reduces arithmetic errors.
- Round sensibly: 3 s.f. is the IB default unless the question says otherwise.
⚠ Common mistakes
- Confusing mode with frequency — mode = the data value, not the count itself.
- Using class boundaries instead of mid-interval values when estimating the mean of grouped data.
- Forgetting the word “estimate” for grouped data measures — exam markers look for this.
- Treating modal class as the mode — the modal class is an interval, not a single value.
- Forgetting to multiply by frequency when summing — Σfx ≠ Σx.
Next: Linear Transformations of Data. If every value in a data set is multiplied by 2 and shifted by 5, what happens to the mean and standard deviation? Spoiler: they transform differently. The mean follows the rule literally; the SD ignores the shift but doubles. Same idea, captured by E(aX+b) and Var(aX+b) on the formula booklet.
Need help with Statistics & Probability?
Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.
Book Free Session →