IB Maths AI HLStatistics ToolkitPaper 1 & 2~9 min read
Frequency Tables
A list of 40 raw values is hard to read. A frequency table — which records how often each value (or each class of values) occurs — compresses the same information into a few lines. Once data is in a frequency table, every statistic you’ve met (mean, median, mode, range, IQR, variance, standard deviation) can still be calculated, you just need slightly different formulas. The IB splits this into two cases: ungrouped tables (every value listed individually) and grouped tables (values bundled into class intervals). Ungrouped tables give exact statistics. Grouped tables give estimates only — because you’ve thrown away the exact values when bundling them into classes.
📘 What you need to know
Frequency table: a compact list showing each value (or class) and how many times it appears.
Ungrouped: every individual value appears with its frequency. Statistics are exact.
Grouped: values are bundled into class intervals (no gaps, e.g. 10 ≤ x < 20). Statistics are estimates.
Mean (ungrouped or grouped): x̄ = Σ fixin where n = Σfi. In the formula booklet.
For grouped data, replace xi with the mid-interval value (midpoint of the class).
Mode (ungrouped) = value with highest frequency. Modal class (grouped) = class with highest frequency, when class widths are equal.
Median: use cumulative frequencies (running totals) to find the middle position.
Range, IQR, variance, SD: same idea as before, but use the frequencies. GDC is your friend.
Round grouped-data answers to 3 sf and note that they are estimates.
Ungrouped frequency tables
An ungrouped frequency table lists every individual value with its frequency. Because nothing is bundled, every statistic you compute is exact.
value x
0
1
2
3
4
frequency f
7
18
9
4
2
This table shows 40 students, with frequencies for how many siblings each has. There are seven students with 0 siblings, eighteen with 1 sibling, and so on. The total is n = Σfi = 7 + 18 + 9 + 4 + 2 = 40.
Mean from an ungrouped frequency table
Mean (frequency table)x̄ = Σ fixin , n = Σ fiin the formula booklet ✓
Mode and median
Mode = the value with the highest frequency. For the siblings table above, that’s x = 1 (frequency 18).
Median = the middle value. With 40 students sorted, the middle is the midpoint of the 20th and 21st values. Use cumulative frequencies (running totals) to find which value those positions fall into.
When a dataset has a wide range of distinct values (e.g. heights, times, masses), listing each one separately would be unwieldy. Instead, values are grouped into class intervals.
height h (cm)
140 ≤ h < 150
150 ≤ h < 160
160 ≤ h < 170
170 ≤ h < 180
180 ≤ h < 190
frequency
4
12
20
11
3
You now have a problem: you’ve lost the exact heights. If a student is in the class 160 ≤ h < 170, you only know they are somewhere between 160 and 170 cm. The IB solution is to use the mid-interval value (midpoint) as a “representative” for each class.
Mid-interval value
mid-interval value = lower boundary + upper boundary2
For the class 160 ≤ h < 170, the mid-interval value is (160 + 170)/2 = 165.
Modal class
For grouped data with equal class widths, the modal class is the class with the highest frequency. In the heights table, the modal class is 160 ≤ h < 170 (frequency 20).
Estimating the mean from grouped data
Use the mid-interval values in place of xi:
Estimated mean (grouped)x̄ ≈ Σ fixin where xi = mid-interval value
same formula, with mid-interval values
grouped vs ungrouped — exact vs estimated
Ungrouped tables let you compute exact statistics. Grouped tables only let you estimate, because the original values inside each class have been thrown away.
🤔 Why are grouped-data answers only estimates?
When you collapse a range of values like { 161.3, 163.5, 165.1, 167.8, 169.2 } into the single class “160 ≤ h < 170 with frequency 5″, you lose the actual numbers. Using the midpoint 165 as a stand-in is a reasonable best guess, but it’s still just a guess. So any statistic you derive from a grouped table — mean, SD, even the median — is an estimate, not the true value. Always say “estimated mean” or round to 3 sf to flag this.
Ungrouped recipe
exact
Use the values directly. Mode = highest f. Median via cumulative frequencies. Mean = Σfx / n.
Grouped recipe
estimate
First find mid-interval values. Then use the same formulas with those midpoints. Modal class = highest f (equal widths only).
🧭 Recipe — mean from a grouped frequency table
Find the mid-interval value for each class: (lower + upper) / 2.
Multiply each midpoint by its class frequency to get fx.
Sum these products to get Σfx.
Divide by n (the total frequency).
State as an estimate — round to 3 sf and write “≈”.
🧠 Memory aid — adding a third column
The fastest way to compute the mean from a frequency table is to add a third column for fx (or for variance, a fourth column for fx2). Fill these in, sum the columns, then divide. It keeps the work organised — and it’s exactly what your GDC does internally.
Worked examples
WE 1
Mode, median, and mean from an ungrouped frequency table
A survey records the number of siblings of 40 students:
siblings x
0
1
2
3
4
frequency f
7
18
9
4
2
Find (a) the mode, (b) the median, (c) the mean.
(a) mode = value with highest frequencyfreq 18 is highest, at x = 1mode = 1(b) median via cumulative frequenciesCF: 7, 25, 34, 38, 40n = 40 → midpoint of 20th and 21st20th and 21st both lie at x = 1median = 1(c) mean = Σfx / nΣfx = 0×7 + 1×18 + 2×9 + 3×4 + 4×2= 0 + 18 + 18 + 12 + 8 = 56mean = 56 / 40 = 1.4mean = 1.4cumulative frequencies are essential for the median — they tell you where each position falls.
WE 2
Variance and standard deviation from a frequency table
A teacher records the homework marks (out of 5) of 14 students:
The table below shows the heights, in cm, of 50 students:
height h (cm)
140 ≤ h < 150
150 ≤ h < 160
160 ≤ h < 170
170 ≤ h < 180
180 ≤ h < 190
frequency
4
12
20
11
3
(a) Write down the modal class. (b) State the mid-interval value of the modal class.
(a) modal class = highest frequencyfreq 20 is highest → 160 ≤ h < 170modal class: 160 ≤ h < 170(b) mid-interval = (lower + upper) / 2= (160 + 170) / 2mid-interval value = 165 cm“modal class” only — never “mode” — when data is grouped.
WE 4
Estimate the mean from grouped data
Using the heights table from WE 3, calculate an estimate for the mean height.
Step 1: mid-interval values145, 155, 165, 175, 185Step 2: Σfx using midpoints4×145 + 12×155 + 20×165 + 11×175 + 3×185= 580 + 1860 + 3300 + 1925 + 555= 8220Step 3: divide by nn = 4+12+20+11+3 = 50mean ≈ 8220 / 50 = 164.4estimated mean ≈ 164 cm (3 sf)use ≈ and “estimate” — these answers are not exact for grouped data.
WE 5
Estimate the standard deviation from grouped data
For the same heights table, estimate the standard deviation. Give the answer to 3 sf.
Step 1: μ ≈ 164.4 (from WE 4)Step 2: Σfx² using midpoints squared4×145² + 12×155² + 20×165² + 11×175² + 3×185²= 4(21025) + 12(24025) + 20(27225) + 11(30625) + 3(34225)= 84100 + 288300 + 544500 + 336875 + 102675= 1 356 450Step 3: varianceσ² ≈ 1356450/50 − 164.4²= 27129 − 27027.36 = 101.64Step 4: SDσ ≈ √101.64 ≈ 10.08SD ≈ 10.1 cm (3 sf)in the exam, just type the midpoints and frequencies into the GDC’s statistics mode — it gives σ directly.
WE 6
Find a missing frequency given the mean
The frequency table below has an unknown frequency k:
value x
1
2
3
4
frequency f
4
7
k
5
Given that the mean is 2.5, find the value of k.
Step 1: set up the mean equationmean = Σfx / n = 2.5Step 2: Σfx in terms of kΣfx = 4 + 14 + 3k + 20 = 38 + 3kStep 3: n in terms of kn = 4 + 7 + k + 5 = 16 + kStep 4: solve(38 + 3k) / (16 + k) = 2.538 + 3k = 2.5(16 + k) = 40 + 2.5k3k − 2.5k = 40 − 380.5k = 2 → k = 4k = 4verifyfreqs 4, 7, 4, 5; n = 20Σfx = 4+14+12+20 = 50; mean = 50/20 = 2.5 ✓“backwards” problems are common — set up mean = Σfx/n and solve for the unknown.
💡 Top tips
Add a column for fx (and fx² if needed). It keeps work organised and matches what the GDC computes.
Always note “ungrouped” vs “grouped”. The first gives exact answers; the second gives estimates.
For grouped data, find the mid-interval values first — every subsequent step uses them.
Modal class, not “mode”, for grouped data. And only valid when class widths are equal.
Cumulative frequencies are how you find the median in an ungrouped table. Build the running total and locate the n/2 position.
Round grouped-data answers to 3 sf and write “≈” to signal they’re estimates.
⚠ Common mistakes
Forgetting to multiply by frequency. The mean is Σfx/n, not Σx/n. Each value contributes f times.
Using class boundaries instead of midpoints for grouped means. Always use mid-interval values.
Saying “mode = 165” for grouped data — there’s no mode, only a modal class.
Reporting exact answers for grouped data. They’re estimates — round to 3 sf and use ≈.
Missing the median by forgetting cumulative frequencies. Without a running total, you can’t tell which value the n/2 position lands on.
Treating n as the number of distinct values. n is the total frequency Σfi, not the row count.
Next up — Linear Transformations of Data. What happens to the mean, variance, and SD if you add a constant to every value, or multiply every value by a constant? You’ll learn the two key rules: E(aX+b) = aE(X) + b and Var(aX+b) = a2Var(X) — both in the formula booklet, both essential for IB problems.
Need help with Statistics?
Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.