IB Maths AA SL
Topic 4 — Statistics Toolkit
Paper 1 & 2
~12 min read
Frequency Tables
When you have lots of repeated values — or huge data sets — listing every single value gets messy fast. Frequency tables compress all that data into a tidy summary. This note shows you how to find every statistical measure from them, whether the data is grouped or not.
📘 What you need to know
- A frequency table shows how many times each value (or group of values) appears.
- Ungrouped tables list each individual value — you know exactly what the data is.
- Grouped tables organise data into class intervals (like 10 ≤ x < 20) — you don’t know the exact values.
- For ungrouped data, you find exact statistics. For grouped data, you can only estimate them using mid-interval values.
- Mean formula: x̄ = Σ fi xin where n = Σfi (in your formula booklet).
- For grouped data, the modal class = the class with the highest frequency. The mid-interval value is the average of the upper and lower boundary.
The two types of frequency table
Frequency tables come in two flavours. The big difference is whether each row gives you an exact value or a range — and that completely changes what you can calculate.
Ungrouped frequency table
Each row holds one specific value and how many times it occurred. You know the exact data.
e.g. “Number of pets” → 0, 1, 2, 3
Grouped frequency table
Each row holds a range of values (a “class interval”) and how many fell into that range. You don’t know the exact values.
e.g. “Height” → 150 ≤ h < 155, 155 ≤ h < 160, …
If a question gives you a frequency table, the very first thing to check is: are these single values or class intervals? That tells you whether you’ll be finding exact answers or estimates.
Ungrouped frequency tables
An ungrouped table shows you the exact data — just compressed. Imagine writing out every value individually: that’s still possible from this table, you just don’t have to.
Example layout
| Number of pets (x) | 0 | 1 | 2 | 3 |
|---|
| Frequency (f) | 11 | 5 | 8 | 6 |
This is just shorthand for: eleven 0s, five 1s, eight 2s, and six 3s. Total of 30 students.
How to find each statistic
📍 Mode
The mode is the value with the highest frequency — easy to spot, just look for the biggest f.
📍 Median
The median is the middle value when all the data is in order. Use a cumulative frequency (running total) to find which row it lands in.
- Find the position: median position = (n + 1) ÷ 2 (or the average of the two middle positions if n is even).
- Build a cumulative frequency row by adding running totals.
- Find which row contains your position — that’s the median value.
📍 Mean
In plain English:
- Multiply each value by its frequency: fi × xi.
- Add all those products together: that’s Σ fi xi (the total of every value, repeated by its frequency).
- Divide by the total frequency.
🤔 Why multiply value × frequency?
If the value 2 occurs 8 times, the contribution to the total is 2 + 2 + 2 + 2 + 2 + 2 + 2 + 2 = 8 × 2 = 16. So fi × xi is just a shortcut for “add this value f times”. Then dividing by the total frequency gives the mean.
📍 Standard deviation, range, IQR
Use your GDC. Enter the values in one list and the frequencies in another, run 1-Var Stats with frequency mode on. The calculator handles all the rest.
📍Always check your answer makes sense
The mean, median, and mode should always sit inside the range of your data. If the values go from 0 to 3 but you’ve calculated a mean of 5, you’ve made an arithmetic error somewhere.
Grouped frequency tables
Grouped tables look like this:
| Height, h (cm) | Frequency |
|---|
| 150 ≤ h < 155 | 3 |
| 155 ≤ h < 160 | 5 |
| 160 ≤ h < 165 | 9 |
| 165 ≤ h < 170 | 7 |
| 170 ≤ h < 175 | 1 |
You can see that 9 students have a height between 160 and 165 — but you don’t know if they’re 161 cm or 164.9 cm. That’s the cost of grouping the data. So instead of finding exact statistics, you find estimates.
The mid-interval value (the trick that makes it all work)
For each class, find the midpoint and use it as a “stand-in” for every value in that class. So for 160 ≤ h < 165, the mid-interval is (160 + 165) ÷ 2 = 162.5 cm — and we treat all 9 students as if they’re each 162.5 cm tall.
🧠Memory trick: “Pretend they’re all in the middle”
You can’t know exactly where each value falls, so guess the safest spot — the middle of the class. Some will be a bit higher, some a bit lower, and the errors mostly cancel out.
What you can find from a grouped table
- Modal class — the class with the highest frequency. Note: this is a class, not a single value (you can’t say which exact height was most common).
- Estimated mean — use the formula but with mid-interval values:
- Estimated standard deviation, variance — same idea, plug into your GDC using the mid-interval values and frequencies.
- Estimated median, quartiles, IQR — best found from a cumulative frequency graph (covered in the next note).
- Range — can’t be found exactly, since you don’t know the smallest and largest values.
📍Show that you know they’re estimates
It’s good practice to round grouped-data answers (e.g. 162.1 cm to 4 s.f.) rather than leave them as exact fractions. This signals to the marker that you understand the answer is an approximation.
In the formula booklet, the mean formula looks the same for both ungrouped and grouped data — but the meaning of xi changes. For ungrouped, xi is the actual value. For grouped, it’s the mid-interval. Same formula, different interpretation.
Worked examples
WE 1Find statistics from an ungrouped frequency table
The frequency table below shows the number of pets owned by 30 students.
| Number of pets | 0 | 1 | 2 | 3 |
|---|
| Frequency | 11 | 5 | 8 | 6 |
Find: (a) the mode (b) the median (c) the mean (d) the standard deviation.
Total frequency n = 11 + 5 + 8 + 6 = 30 students.part (a) — mode
Highest frequency = 11, which belongs to value 0.
Mode = 0part (b) — median
n = 30 → median = average of 15th and 16th values.
Build cumulative frequency:
0 → 11, 1 → 16, 2 → 24, 3 → 30
15th value lands in “1” row, 16th value lands in “1” row.
Median = 1part (c) — mean
Use Σfx ÷ n:
Σfx = 11×0 + 5×1 + 8×2 + 6×3 = 0+5+16+18 = 39
Mean = 3930 = 1.3
Mean = 1.3part (d) — sd
Enter values in L1, frequencies in L2, run 1-Var Stats.
σx ≈ 1.159…
SD = 1.16 (3 s.f.)
always set frequency list when running stats — otherwise it counts each row only once!
WE 2Find statistics from a grouped frequency table
The table below shows the heights, in cm, of 25 students.
| Height, h | Frequency |
|---|
| 150 ≤ h < 155 | 3 |
| 155 ≤ h < 160 | 5 |
| 160 ≤ h < 165 | 9 |
| 165 ≤ h < 170 | 7 |
| 170 ≤ h < 175 | 1 |
(a) Write down the modal class. (b) Find the mid-interval value of the modal class. (c) Estimate the mean height.
Grouped data — answers will be estimates, not exact values.part (a)
Highest frequency = 9, in the row 160 ≤ h < 165.
Modal class = 160 ≤ h < 165part (b)
Mid-interval = (lower + upper) ÷ 2:
160 + 1652 = 162.5
Mid-interval = 162.5 cmpart (c) — estimated mean
Find each mid-interval, then Σfx:
Mid-intervals: 152.5, 157.5, 162.5, 167.5, 172.5
Σfx = 3×152.5 + 5×157.5 + 9×162.5 + 7×167.5 + 1×172.5
= 457.5 + 787.5 + 1462.5 + 1172.5 + 172.5 = 4052.5
Divide by n = 25: 4052.525 = 162.1
Estimated mean ≈ 162.1 cm
use mid-interval values as the “x” — they’re stand-ins for every value in the class
WE 3Find a missing frequency given the mean
The frequency table shows the number of goals scored in 20 football matches. Given that the mean is 1.5, find the value of k.
Total = 20, mean = 1.5
Two equations: total frequency = 20, and mean × n = Σfx.
Total frequency: 4 + k + 6 + 3 = 20
Solve for k: k = 20 − 13 = 7
Check using mean:
Σfx = 4×0 + 7×1 + 6×2 + 3×3 = 0 + 7 + 12 + 9 = 28
Mean = 28 ÷ 20 = 1.4 ❌ doesn’t match 1.5
So k can’t come from the total alone — use the mean equation.
Set up: Σfx = 1.5 × 20 = 30
0 + k + 12 + 9 = 30 → k = 9
Then total: 4 + 9 + 6 + 3 = 22, not 20…
There’s a conflict — only one constraint can be satisfied.
Re-read: “20 matches” is the total → must satisfy that. Use that to find k = 7.
k = 7 (and mean = 1.4, not exactly 1.5)
always sanity-check both constraints. If they conflict, the question may have a typo or expect approximation
WE 4Use cumulative frequency to find the median
The table shows the number of children per family in a survey of 40 families. Find the median.
| Children | 0 | 1 | 2 | 3 | 4 |
|---|
| Frequency | 5 | 12 | 15 | 6 | 2 |
n = 40 → median is average of 20th and 21st values
Build cumulative frequencies (running totals) to locate the right row.
Cumulative frequencies:
0 → 5, 1 → 17, 2 → 32, 3 → 38, 4 → 40
20th value: first row where cum freq ≥ 20 is “2” (at 32)
21st value: also in row “2”
Median = average of two “2”s:
Median = 2 children
cumulative frequency = running total — perfect tool for finding the median’s row
WE 5Estimate the standard deviation from grouped data
Using the heights table from WE 2 (25 students), estimate the standard deviation of the heights.
Grouped data → use mid-interval values in the GDC. SD will be an estimate.
Enter mid-intervals in L1: 152.5, 157.5, 162.5, 167.5, 172.5
Enter frequencies in L2: 3, 5, 9, 7, 1
Run 1-Var Stats with frequency list = L2:
σx ≈ 5.099…
Round to 3 s.f.:
Estimated SD ≈ 5.10 cm (3 s.f.)
remember — this is an estimate because we treated everyone in a class as having the mid-interval height
💡 Top tips
- Always set the frequency list on your GDC. Otherwise the calculator treats each row as one data point — and your answers will be totally wrong.
- For ungrouped data, answers are exact. For grouped data, they’re estimates — always say “estimated” in your final answer.
- Build a cumulative frequency row when finding the median from any frequency table. It locates the median’s row instantly.
- Mid-interval value = (lower boundary + upper boundary) ÷ 2. Always work this out for every class before estimating the mean or SD.
- The modal class is a class, not a single value. Write “the modal class is 160 ≤ h < 165”, not “the mode is 162.5”.
- Sanity check by total frequency. Σfi should equal n — if not, you’ve miscounted.
- Round grouped-data answers to 3 s.f. or similar — this signals to the marker that you understand the answer is an estimate.
- For “find a missing frequency given the mean” questions, use the equation Σfi xi = mean × n.
⚠ Common mistakes
- Forgetting the frequency list on the GDC. If you enter just the values without the frequency column, the calculator gives the wrong mean and SD.
- Using just x instead of f × x in the mean. Don’t divide Σx by 4 (number of classes); divide Σfx by Σf (total frequency).
- Saying “mode = 162.5” for grouped data. You can only give a modal class, not a single mode.
- Confusing exact with estimate. Grouped data → always estimates. Don’t write “the mean is exactly 162.1 cm”.
- Wrong mid-interval values. For 160 ≤ h < 165, the midpoint is 162.5 — not 160 or 162 or 165. Take the average of the two boundaries.
- Including the wrong row in cumulative frequency. The cumulative frequency at the end of a class is the running total up to and including that class.
- Forgetting units. Heights → cm. Time → seconds. Always include them in the final answer.
- Stopping at the modal class. Some questions ask for the mid-interval of the modal class as well — read the question carefully.
Frequency tables are the bridge between raw data and graphs. The next note covers linear transformations of data — what happens to the mean and SD when you scale or shift everything in your data set.
Need help with Frequency Tables?
Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.
Book Free Session →