IB Maths AI HL Statistics Toolkit Paper 1 & 2 ~9 min read

Frequency Tables

A list of 40 raw values is hard to read. A frequency table — which records how often each value (or each class of values) occurs — compresses the same information into a few lines. Once data is in a frequency table, every statistic you’ve met (mean, median, mode, range, IQR, variance, standard deviation) can still be calculated, you just need slightly different formulas. The IB splits this into two cases: ungrouped tables (every value listed individually) and grouped tables (values bundled into class intervals). Ungrouped tables give exact statistics. Grouped tables give estimates only — because you’ve thrown away the exact values when bundling them into classes.

📘 What you need to know

Ungrouped frequency tables

An ungrouped frequency table lists every individual value with its frequency. Because nothing is bundled, every statistic you compute is exact.

value x01234
frequency f718942

This table shows 40 students, with frequencies for how many siblings each has. There are seven students with 0 siblings, eighteen with 1 sibling, and so on. The total is n = Σfi = 7 + 18 + 9 + 4 + 2 = 40.

Mean from an ungrouped frequency table

Mean (frequency table) = Σ fixin  ,  n = Σ fi in the formula booklet ✓

Mode and median

Mode = the value with the highest frequency. For the siblings table above, that’s x = 1 (frequency 18).

Median = the middle value. With 40 students sorted, the middle is the midpoint of the 20th and 21st values. Use cumulative frequencies (running totals) to find which value those positions fall into.

Variance and standard deviation

Variance (frequency table) σ2 = Σ fixi2nμ2 SD σ = √variance

Grouped frequency tables

When a dataset has a wide range of distinct values (e.g. heights, times, masses), listing each one separately would be unwieldy. Instead, values are grouped into class intervals.

height h (cm)140 ≤ h < 150150 ≤ h < 160160 ≤ h < 170170 ≤ h < 180180 ≤ h < 190
frequency41220113

You now have a problem: you’ve lost the exact heights. If a student is in the class 160 ≤ h < 170, you only know they are somewhere between 160 and 170 cm. The IB solution is to use the mid-interval value (midpoint) as a “representative” for each class.

Mid-interval value mid-interval value = lower boundary + upper boundary2

For the class 160 ≤ h < 170, the mid-interval value is (160 + 170)/2 = 165.

Modal class

For grouped data with equal class widths, the modal class is the class with the highest frequency. In the heights table, the modal class is 160 ≤ h < 170 (frequency 20).

Estimating the mean from grouped data

Use the mid-interval values in place of xi:

Estimated mean (grouped) Σ fixin   where xi = mid-interval value same formula, with mid-interval values
grouped vs ungrouped — exact vs estimated
Ungrouped vs grouped — exact vs estimate UNGROUPED — exact valuesvalue frequency 07 118 29 34 42 exact statistics — every value known GROUPED — class intervalsclass frequency 140 ≤ h < 1504 150 ≤ h < 16012 160 ≤ h < 17020 170 ≤ h < 18011 180 ≤ h < 1903 estimates — use mid-interval values
Ungrouped tables let you compute exact statistics. Grouped tables only let you estimate, because the original values inside each class have been thrown away.

🤔 Why are grouped-data answers only estimates?

When you collapse a range of values like { 161.3, 163.5, 165.1, 167.8, 169.2 } into the single class “160 ≤ h < 170 with frequency 5″, you lose the actual numbers. Using the midpoint 165 as a stand-in is a reasonable best guess, but it’s still just a guess. So any statistic you derive from a grouped table — mean, SD, even the median — is an estimate, not the true value. Always say “estimated mean” or round to 3 sf to flag this.

Ungrouped recipe
exact
Use the values directly. Mode = highest f. Median via cumulative frequencies. Mean = Σfx / n.
Grouped recipe
estimate
First find mid-interval values. Then use the same formulas with those midpoints. Modal class = highest f (equal widths only).

🧭 Recipe — mean from a grouped frequency table

  1. Find the mid-interval value for each class: (lower + upper) / 2.
  2. Multiply each midpoint by its class frequency to get fx.
  3. Sum these products to get Σfx.
  4. Divide by n (the total frequency).
  5. State as an estimate — round to 3 sf and write “≈”.

🧠 Memory aid — adding a third column

The fastest way to compute the mean from a frequency table is to add a third column for fx (or for variance, a fourth column for fx2). Fill these in, sum the columns, then divide. It keeps the work organised — and it’s exactly what your GDC does internally.

Worked examples

WE 1

Mode, median, and mean from an ungrouped frequency table

A survey records the number of siblings of 40 students:

siblings x01234
frequency f718942

Find (a) the mode, (b) the median, (c) the mean.

(a) mode = value with highest frequency freq 18 is highest, at x = 1 mode = 1 (b) median via cumulative frequencies CF: 7, 25, 34, 38, 40 n = 40 → midpoint of 20th and 21st 20th and 21st both lie at x = 1 median = 1 (c) mean = Σfx / n Σfx = 0×7 + 1×18 + 2×9 + 3×4 + 4×2 = 0 + 18 + 18 + 12 + 8 = 56 mean = 56 / 40 = 1.4 mean = 1.4 cumulative frequencies are essential for the median — they tell you where each position falls.
WE 2

Variance and standard deviation from a frequency table

A teacher records the homework marks (out of 5) of 14 students:

mark x12345
frequency f13631

Find the variance and standard deviation by hand.

Step 1: mean Σfx = 1+6+18+12+5 = 42 n = 14, μ = 42 / 14 = 3 Step 2: Σfx² = 1²×1 + 2²×3 + 3²×6 + 4²×3 + 5²×1 = 1 + 12 + 54 + 48 + 25 = 140 Step 3: variance σ² = Σfx²/n − μ² = 140/14 − 3² = 10 − 9 = 1 variance σ² = 1 Step 4: SD σ = √1 = 1 SD σ = 1 a clean result here — usually variance won’t be a perfect square. Use the GDC to confirm.
WE 3

Modal class and mid-interval value

The table below shows the heights, in cm, of 50 students:

height h (cm)140 ≤ h < 150150 ≤ h < 160160 ≤ h < 170170 ≤ h < 180180 ≤ h < 190
frequency41220113

(a) Write down the modal class.
(b) State the mid-interval value of the modal class.

(a) modal class = highest frequency freq 20 is highest → 160 ≤ h < 170 modal class: 160 ≤ h < 170 (b) mid-interval = (lower + upper) / 2 = (160 + 170) / 2 mid-interval value = 165 cm “modal class” only — never “mode” — when data is grouped.
WE 4

Estimate the mean from grouped data

Using the heights table from WE 3, calculate an estimate for the mean height.

Step 1: mid-interval values 145, 155, 165, 175, 185 Step 2: Σfx using midpoints 4×145 + 12×155 + 20×165 + 11×175 + 3×185 = 580 + 1860 + 3300 + 1925 + 555 = 8220 Step 3: divide by n n = 4+12+20+11+3 = 50 mean ≈ 8220 / 50 = 164.4 estimated mean ≈ 164 cm (3 sf) use ≈ and “estimate” — these answers are not exact for grouped data.
WE 5

Estimate the standard deviation from grouped data

For the same heights table, estimate the standard deviation. Give the answer to 3 sf.

Step 1: μ ≈ 164.4 (from WE 4) Step 2: Σfx² using midpoints squared 4×145² + 12×155² + 20×165² + 11×175² + 3×185² = 4(21025) + 12(24025) + 20(27225) + 11(30625) + 3(34225) = 84100 + 288300 + 544500 + 336875 + 102675 = 1 356 450 Step 3: variance σ² ≈ 1356450/50 − 164.4² = 27129 − 27027.36 = 101.64 Step 4: SD σ ≈ √101.64 ≈ 10.08 SD ≈ 10.1 cm (3 sf) in the exam, just type the midpoints and frequencies into the GDC’s statistics mode — it gives σ directly.
WE 6

Find a missing frequency given the mean

The frequency table below has an unknown frequency k:

value x1234
frequency f47k5

Given that the mean is 2.5, find the value of k.

Step 1: set up the mean equation mean = Σfx / n = 2.5 Step 2: Σfx in terms of k Σfx = 4 + 14 + 3k + 20 = 38 + 3k Step 3: n in terms of k n = 4 + 7 + k + 5 = 16 + k Step 4: solve (38 + 3k) / (16 + k) = 2.5 38 + 3k = 2.5(16 + k) = 40 + 2.5k 3k − 2.5k = 40 − 38 0.5k = 2 → k = 4 k = 4 verify freqs 4, 7, 4, 5; n = 20 Σfx = 4+14+12+20 = 50; mean = 50/20 = 2.5 ✓ “backwards” problems are common — set up mean = Σfx/n and solve for the unknown.

💡 Top tips

⚠ Common mistakes

Next up — Linear Transformations of Data. What happens to the mean, variance, and SD if you add a constant to every value, or multiply every value by a constant? You’ll learn the two key rules: E(aX+b) = aE(X) + b and Var(aX+b) = a2Var(X) — both in the formula booklet, both essential for IB problems.

Need help with Statistics?

Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.

Book Free Session →