IB Maths AA SL
Topic 4 โ Statistics Toolkit
Paper 1 & 2
~9 min read
Measures of Central Tendency
Mean, median, and mode are three different ways of asking the same question: “where’s the middle of this data?” They each answer it slightly differently โ and knowing which one to pick is half the skill.
๐ What you need to know
- Mean, median, and mode are all called measures of central tendency โ they’re all kinds of “average”.
- The mean is the total of all values divided by the number of values: ฮฃxn.
- The median is the middle value when the data is sorted in order.
- The mode is the value that appears most often. There can be more than one โ or none at all.
- The mean has special symbols: xฬ (sample mean) or ฮผ (population mean).
- The units for any of these are the same as the units of the data.
What are they all for?
Imagine someone hands you a list of test scores, salaries, or shoe sizes, and asks: “what’s a typical value?” That single question has three different answers depending on which kind of “typical” you mean.
MEAN
xฬ = ฮฃxn
The “fair share”. Add everything up and divide by how many.
MEDIAN
middle of sorted list
The “middle one”. Sort the data, then pick the value in the middle.
MODE
most frequent
The “popular kid”. Whichever value shows up most often.
In statistics class people just say “the average” and mean the mean. In real exams, IB always asks for one specifically โ never just “average”. Read the question carefully.
Mode โ the most popular value
The mode is the value that occurs most often in the data set. To find it, just count which value appears the most times.
Special cases โ when mode gets weird
- More than one mode: if two values are tied for “most often”, you have two modes โ list them both.
- No mode: if every value appears only once, there is no mode. Don’t write “mode = 0” โ that’s a number, not “no mode”. Write “no mode” in words.
๐The mode is the only “average” that works for words
If your data is qualitative โ like favourite colours, blood types, or eye colours โ you can’t add them up or sort them by size. The mode is the only measure that makes sense.
Median โ the middle value
The median is the value sitting right in the middle of the data after you’ve sorted it from smallest to largest. It’s the “halfway point” โ half the values are below it, half are above.
How to find the median
- Sort the data in order from smallest to largest.
- Count how many values you have โ call this n.
- If n is odd, the median is the single middle value.
- If n is even, the median is the average of the two middle values.
Visual example โ odd n
3ยท
7ยท
9ยท
12ยท
15ยท
18ยท
22
7 values โ middle one is the 4th. Median = 12.
Visual example โ even n
3ยท
7ยท
9ยท
12ยท
15ยท
18ยท
22ยท
25
8 values โ average of 4th and 5th. Median = (12 + 15) รท 2 = 13.5.
๐ค The (n+1)/2 trick โ finding the position
The median sits at position (n + 1) รท 2 in your sorted list.
Example: n = 7 โ position (7+1)/2 = 4, so the 4th value is the median.
Example: n = 8 โ position (8+1)/2 = 4.5, meaning “halfway between the 4th and 5th values” โ so average them.
The single biggest mistake students make with the median? Forgetting to sort the data first. The median is the middle of the sorted list, not the middle of the original list.
Mean โ the fair share
The mean is what most people call “the average”. It’s the total of all the values, divided by how many values you have.
The ฮฃ (capital sigma) symbol just means “add them all up”. So ฮฃ xi is shorthand for x1 + x2 + x3 + โฆ + xn.
Symbols you’ll see
- xฬ (“x-bar”) โ the mean of a sample.
- ฮผ (Greek “mu”) โ the mean of a population.
- Both formulas are the same; only the symbol changes depending on whether you’re talking about all the data, or just a sample of it.
๐ง Memory trick: “Mean = Mountain divided by Mob”
Pile up all your numbers (the mountain), then divide by how many people are in the mob. Add รท count = mean.
Which one should I use?
Each measure has its own strengths and weaknesses. Here’s a quick comparison so you can pick the right tool:
| Mean | Median | Mode |
|---|
| Uses every value? | Yes | No | No |
| Affected by outliers? | Yes (badly) | Hardly | No |
| Works on words/labels? | No | No | Yes |
| Always exists? | Yes | Yes | Sometimes |
| Easy to calculate? | Yes | Sort first | Yes |
๐Outliers wreck the mean โ but barely touch the median
Imagine a small company where most workers earn $40k a year, but the CEO earns $2 million. The mean salary will look huge โ but the median will sit comfortably at $40k. That’s why news articles about “average house prices” often quote the median: it gives a fairer picture of the typical case.
Worked examples
WE 1Find the mode, median and mean
Find the mode, median and mean for the data set below.
43, 29, 70, 51, 64, 43
Data: 43, 29, 70, 51, 64, 43mode
Most common value: 43 appears twice, others once
Mode = 43median
Sort in order: 29, 43, 43, 51, 64, 70
n = 6 (even) โ average of 3rd and 4th values:
43 + 512 = 47
Median = 47mean
Add up all values: ฮฃx = 43+29+70+51+64+43 = 300
Divide by n = 6: 3006 = 50
Mean = 50
always sort the data before finding the median!
WE 2All three averages โ odd number of values
The number of goals scored by a football team in their last 7 matches is given below. Find the mode, median and mean.
2, 0, 1, 3, 2, 4, 2
Data: 2, 0, 1, 3, 2, 4, 2mode
2 appears 3 times โ most often:
Mode = 2median
Sort: 0, 1, 2, 2, 2, 3, 4
n = 7 (odd) โ middle (4th) value:
Median = 2mean
ฮฃx = 0+1+2+2+2+3+4 = 14
147 = 2
Mean = 2
all three are equal here โ that’s a sign the data is symmetrical
WE 3The effect of an outlier on the mean and median
The salaries (in thousands of dollars) of 5 employees at a small company are: 35, 38, 40, 42, 45. The owner, who earns $250,000, is added to the data set.
(a) Find the mean and median before adding the owner. (b) Find the mean and median after. (c) Comment.
Salaries in $1000s โ the owner adds an extreme value (an outlier).part (a) โ before
Sort: 35, 38, 40, 42, 45
n = 5 โ middle (3rd) value: Median = 40
Mean: 35+38+40+42+455 = 2005 = 40
Median = $40k, Mean = $40kpart (b) โ after
Sort: 35, 38, 40, 42, 45, 250
n = 6 โ average of 3rd and 4th: (40+42)/2 = 41
Mean: 200 + 2506 = 4506 = 75
Median = $41k, Mean = $75kpart (c)
Median moved by only $1k โ barely budged.
Mean jumped from $40k to $75k โ nearly doubled!
Mean is heavily affected by outliers; median is not
this is exactly why news articles use median house prices, not mean
WE 4Find a missing value given the mean
The mean of the five values 8, 12, x, 15, 20 is 14. Find the value of x.
Mean = 14, n = 5
Work backwards from the mean formula. Total = mean ร n.
Use mean ร n = total: ฮฃx = 14 ร 5 = 70
Sum the known values: 8 + 12 + 15 + 20 = 55
Find x: x = 70 โ 55 = 15
x = 15
“reverse the mean formula” is a classic IB question type
WE 5Find the modal category for qualitative data
20 students were asked their favourite ice cream flavour. The results were:
Vanilla: 5, Chocolate: 8, Strawberry: 4, Mint: 3
(a) State the modal flavour. (b) Explain why the mean and median can’t be calculated.
Flavours are words โ qualitative data. Only the mode works here.part (a)
Highest frequency: Chocolate with 8 students
Modal flavour = Chocolatepart (b)
You can’t add or sort flavour names โ
“vanilla + chocolate” makes no mathematical sense.
Mean and median only work with numbers
mode is the only “average” for qualitative data
๐ก Top tips
- Always sort the data first when finding the median. Skipping this step is the single most common mark-loss in this topic.
- Use the (n+1)/2 trick to find the position of the median. Whole number โ that position is the median. Half number โ average the two on either side.
- Remember “no mode” if every value appears just once. Don’t write “mode = 0”.
- For qualitative data, only mode works. Don’t try to find a mean of words โ it doesn’t exist.
- Check whether outliers are affecting the mean. If the question hints at a lopsided distribution, the median is usually a fairer answer.
- Use your GDC’s stats mode to find these quickly. Type the data into a list, then run “1-Var Stats” โ mean, median, sum, and n all appear at once.
- “Reverse the mean” questions: total = mean ร n. Memorise this โ it shows up a lot.
- For symmetrical data, mean โ median โ mode. If they’re very different, the distribution is probably skewed.
โ Common mistakes
- Forgetting to sort the data before finding the median. The middle of the original list is meaningless โ you have to sort first.
- Saying “mode = 0” when there’s no mode. Zero is a number; “no mode” is a description. They’re not the same thing.
- Picking only one mode when there are two. If two values tie for most frequent, both are the mode.
- Picking the wrong middle value when n is even. With 6 values, the median is the average of the 3rd and 4th โ not just one of them.
- Using the mean when the data has outliers. A single huge or tiny value can drag the mean way off; the median is fairer here.
- Finding the mean of qualitative data. You can’t take a “mean of colours”. Only numbers go in the mean formula.
- Forgetting units. If the data is in metres, the mean is also in metres. Always state units in your final answer.
- Counting frequencies wrong. When data is in a list (not a frequency table), tally every occurrence carefully โ easy to miscount under exam pressure.
Now you can describe where the centre of any data set is. The next note covers measures of dispersion โ how spread out the data is around that centre. The two together give you the full picture.
Need help with Measures of Central Tendency?
Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.
Book Free Session โ