IB Maths AI HL Statistics Toolkit Paper 1 & 2 ~9 min read

Frequency Tables

A list of 40 raw values is hard to read. A frequency table — which records how often each value (or each class of values) occurs — compresses the same information into a few lines. Once data is in a frequency table, every statistic you’ve met (mean, median, mode, range, IQR, variance, standard deviation) can still be calculated, you just need slightly different formulas. The IB splits this into two cases: ungrouped tables (every value listed individually) and grouped tables (values bundled into class intervals). Ungrouped tables give exact statistics. Grouped tables give estimates only — because you’ve thrown away the exact values when bundling them into classes.

📘 What you need to know

Frequency table: a compact list showing each value (or class) and how many times it appears.
Ungrouped: every individual value appears with its frequency. Statistics are exact.
Grouped: values are bundled into class intervals (no gaps, e.g. 10 ≤ x < 20). Statistics are estimates.
Mean (ungrouped or grouped): x̄ = Σ f_ix_in where n = Σf_i. In the formula booklet.
For grouped data, replace x_i with the mid-interval value (midpoint of the class).
Mode (ungrouped) = value with highest frequency. Modal class (grouped) = class with highest frequency, when class widths are equal.
Median: use cumulative frequencies (running totals) to find the middle position.
Range, IQR, variance, SD: same idea as before, but use the frequencies. GDC is your friend.
Round grouped-data answers to 3 sf and note that they are estimates.

Ungrouped frequency tables

An ungrouped frequency table lists every individual value with its frequency. Because nothing is bundled, every statistic you compute is exact.

value x	0	1	2	3	4
frequency f	7	18	9	4	2

This table shows 40 students, with frequencies for how many siblings each has. There are seven students with 0 siblings, eighteen with 1 sibling, and so on. The total is n = Σf_i = 7 + 18 + 9 + 4 + 2 = 40.

Mean from an ungrouped frequency table

Mean (frequency table) x̄ = Σ f_ix_in , n = Σ f_i in the formula booklet ✓

Mode and median

Mode = the value with the highest frequency. For the siblings table above, that’s x = 1 (frequency 18).

Median = the middle value. With 40 students sorted, the middle is the midpoint of the 20th and 21st values. Use cumulative frequencies (running totals) to find which value those positions fall into.

Variance and standard deviation

Variance (frequency table) σ² = Σ f_ix_i²n − μ² SD σ = √variance

Grouped frequency tables

When a dataset has a wide range of distinct values (e.g. heights, times, masses), listing each one separately would be unwieldy. Instead, values are grouped into class intervals.

height h (cm)	140 ≤ h < 150	150 ≤ h < 160	160 ≤ h < 170	170 ≤ h < 180	180 ≤ h < 190
frequency	4	12	20	11	3

You now have a problem: you’ve lost the exact heights. If a student is in the class 160 ≤ h < 170, you only know they are somewhere between 160 and 170 cm. The IB solution is to use the mid-interval value (midpoint) as a “representative” for each class.

Mid-interval value mid-interval value = lower boundary + upper boundary2

For the class 160 ≤ h < 170, the mid-interval value is (160 + 170)/2 = 165.

Modal class

For grouped data with equal class widths, the modal class is the class with the highest frequency. In the heights table, the modal class is 160 ≤ h < 170 (frequency 20).

Estimating the mean from grouped data

Use the mid-interval values in place of x_i:

Estimated mean (grouped) x̄ ≈ Σ f_ix_in where x_i = mid-interval value same formula, with mid-interval values

grouped vs ungrouped — exact vs estimated

Ungrouped tables let you compute exact statistics. Grouped tables only let you estimate, because the original values inside each class have been thrown away.

🤔 Why are grouped-data answers only estimates?

When you collapse a range of values like { 161.3, 163.5, 165.1, 167.8, 169.2 } into the single class “160 ≤ h < 170 with frequency 5″, you lose the actual numbers. Using the midpoint 165 as a stand-in is a reasonable best guess, but it’s still just a guess. So any statistic you derive from a grouped table — mean, SD, even the median — is an estimate, not the true value. Always say “estimated mean” or round to 3 sf to flag this.

Ungrouped recipe

exact

Use the values directly. Mode = highest f. Median via cumulative frequencies. Mean = Σfx / n.

Grouped recipe

estimate

First find mid-interval values. Then use the same formulas with those midpoints. Modal class = highest f (equal widths only).

🧭 Recipe — mean from a grouped frequency table

Find the mid-interval value for each class: (lower + upper) / 2.
Multiply each midpoint by its class frequency to get fx.
Sum these products to get Σfx.
Divide by n (the total frequency).
State as an estimate — round to 3 sf and write “≈”.

🧠 Memory aid — adding a third column

The fastest way to compute the mean from a frequency table is to add a third column for fx (or for variance, a fourth column for fx²). Fill these in, sum the columns, then divide. It keeps the work organised — and it’s exactly what your GDC does internally.

Worked examples

WE 1

Mode, median, and mean from an ungrouped frequency table

A survey records the number of siblings of 40 students:

siblings x	0	1	2	3	4
frequency f	7	18	9	4	2

Find (a) the mode, (b) the median, (c) the mean.

(a) mode = value with highest frequency freq 18 is highest, at x = 1 mode = 1 (b) median via cumulative frequencies CF: 7, 25, 34, 38, 40 n = 40 → midpoint of 20th and 21st 20th and 21st both lie at x = 1 median = 1 (c) mean = Σfx / n Σfx = 0×7 + 1×18 + 2×9 + 3×4 + 4×2 = 0 + 18 + 18 + 12 + 8 = 56 mean = 56 / 40 = 1.4 mean = 1.4 cumulative frequencies are essential for the median — they tell you where each position falls.

WE 2

Variance and standard deviation from a frequency table

A teacher records the homework marks (out of 5) of 14 students:

mark x	1	2	3	4	5
frequency f	1	3	6	3	1

Find the variance and standard deviation by hand.

Step 1: mean Σfx = 1+6+18+12+5 = 42 n = 14, μ = 42 / 14 = 3 Step 2: Σfx² = 1²×1 + 2²×3 + 3²×6 + 4²×3 + 5²×1 = 1 + 12 + 54 + 48 + 25 = 140 Step 3: variance σ² = Σfx²/n − μ² = 140/14 − 3² = 10 − 9 = 1 variance σ² = 1 Step 4: SD σ = √1 = 1 SD σ = 1 a clean result here — usually variance won’t be a perfect square. Use the GDC to confirm.

WE 3

Modal class and mid-interval value

The table below shows the heights, in cm, of 50 students:

height h (cm)	140 ≤ h < 150	150 ≤ h < 160	160 ≤ h < 170	170 ≤ h < 180	180 ≤ h < 190
frequency	4	12	20	11	3

(a) Write down the modal class.
(b) State the mid-interval value of the modal class.

(a) modal class = highest frequency freq 20 is highest → 160 ≤ h < 170 modal class: 160 ≤ h < 170 (b) mid-interval = (lower + upper) / 2 = (160 + 170) / 2 mid-interval value = 165 cm “modal class” only — never “mode” — when data is grouped.

WE 4

Estimate the mean from grouped data

Using the heights table from WE 3, calculate an estimate for the mean height.

Step 1: mid-interval values 145, 155, 165, 175, 185 Step 2: Σfx using midpoints 4×145 + 12×155 + 20×165 + 11×175 + 3×185 = 580 + 1860 + 3300 + 1925 + 555 = 8220 Step 3: divide by n n = 4+12+20+11+3 = 50 mean ≈ 8220 / 50 = 164.4 estimated mean ≈ 164 cm (3 sf) use ≈ and “estimate” — these answers are not exact for grouped data.

WE 5

Estimate the standard deviation from grouped data

For the same heights table, estimate the standard deviation. Give the answer to 3 sf.

Step 1: μ ≈ 164.4 (from WE 4) Step 2: Σfx² using midpoints squared 4×145² + 12×155² + 20×165² + 11×175² + 3×185² = 4(21025) + 12(24025) + 20(27225) + 11(30625) + 3(34225) = 84100 + 288300 + 544500 + 336875 + 102675 = 1 356 450 Step 3: variance σ² ≈ 1356450/50 − 164.4² = 27129 − 27027.36 = 101.64 Step 4: SD σ ≈ √101.64 ≈ 10.08 SD ≈ 10.1 cm (3 sf) in the exam, just type the midpoints and frequencies into the GDC’s statistics mode — it gives σ directly.

WE 6

Find a missing frequency given the mean

The frequency table below has an unknown frequency k:

value x	1	2	3	4
frequency f	4	7	k	5

Given that the mean is 2.5, find the value of k.

Step 1: set up the mean equation mean = Σfx / n = 2.5 Step 2: Σfx in terms of k Σfx = 4 + 14 + 3k + 20 = 38 + 3k Step 3: n in terms of k n = 4 + 7 + k + 5 = 16 + k Step 4: solve (38 + 3k) / (16 + k) = 2.5 38 + 3k = 2.5(16 + k) = 40 + 2.5k 3k − 2.5k = 40 − 38 0.5k = 2 → k = 4 k = 4 verify freqs 4, 7, 4, 5; n = 20 Σfx = 4+14+12+20 = 50; mean = 50/20 = 2.5 ✓ “backwards” problems are common — set up mean = Σfx/n and solve for the unknown.

💡 Top tips

Add a column for fx (and fx² if needed). It keeps work organised and matches what the GDC computes.
Always note “ungrouped” vs “grouped”. The first gives exact answers; the second gives estimates.
For grouped data, find the mid-interval values first — every subsequent step uses them.
Modal class, not “mode”, for grouped data. And only valid when class widths are equal.
Cumulative frequencies are how you find the median in an ungrouped table. Build the running total and locate the n/2 position.
Round grouped-data answers to 3 sf and write “≈” to signal they’re estimates.

⚠ Common mistakes

Forgetting to multiply by frequency. The mean is Σfx/n, not Σx/n. Each value contributes f times.
Using class boundaries instead of midpoints for grouped means. Always use mid-interval values.
Saying “mode = 165” for grouped data — there’s no mode, only a modal class.
Reporting exact answers for grouped data. They’re estimates — round to 3 sf and use ≈.
Missing the median by forgetting cumulative frequencies. Without a running total, you can’t tell which value the n/2 position lands on.
Treating n as the number of distinct values. n is the total frequency Σf_i, not the row count.

Next up — Linear Transformations of Data. What happens to the mean, variance, and SD if you add a constant to every value, or multiply every value by a constant? You’ll learn the two key rules: E(aX+b) = aE(X) + b and Var(aX+b) = a²Var(X) — both in the formula booklet, both essential for IB problems.

Need help with Statistics?

Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.

Book Free Session →

Frequency Tables

📘 What you need to know

Ungrouped frequency tables

Mean from an ungrouped frequency table

Mode and median

Variance and standard deviation

Grouped frequency tables

Modal class

Estimating the mean from grouped data

🤔 Why are grouped-data answers only estimates?

🧭 Recipe — mean from a grouped frequency table

🧠 Memory aid — adding a third column

Worked examples

💡 Top tips

⚠ Common mistakes

Need help with Statistics?

Quick Links

Contact us

Follow us