IB Maths AA SL
Topic 4 โ Statistics Toolkit
Paper 1 & 2
~8 min read
Outliers
Sometimes a data set has a value that just doesn’t fit โ a 29-year-old at a kids’ birthday party, or a $250,000 salary in a list of regular jobs. These are outliers. This note shows you how to spot them with a precise rule, and when (and when not) to remove them.
๐ What you need to know
- An outlier is an extreme value that doesn’t fit with the rest of the data.
- The IB rule: x is an outlier if x < Q1 โ 1.5 ร IQR or x > Q3 + 1.5 ร IQR.
- So you need three things first: Q1, Q3, and IQR โ get them from your GDC.
- Outliers should be removed if they’re errors (e.g. a 71 typed instead of 17).
- Outliers should NOT be removed if they’re real data โ even if extreme.
- Outliers heavily affect the mean, range, and SD โ but barely affect the median and IQR.
What is an outlier?
An outlier is a data value that’s wildly bigger or smaller than the rest. Think of it as the odd one out โ the kid in the photo who doesn’t quite belong.
Some real-world examples:
- A 29-year-old at a children’s birthday party (most kids are 5).
- A $250,000 CEO salary in a list of regular employee wages.
- A 200 cm-tall basketball player in a sample of 12-year-olds.
The problem? Outliers can wreck statistics like the mean, the range, and the standard deviation. So you need a precise way to detect them โ and a careful approach to deciding what to do about them.
“Outlier” doesn’t mean “bad data”. It just means “unusual”. Sometimes outliers are real and important โ like rare medical conditions or top-performing students. Removing them blindly is a serious statistical sin.
The 1.5 ร IQR rule โ how to detect them
The IB uses a precise mathematical rule for spotting outliers, based on the quartiles. The idea: anything sitting too far from the middle 50% of the data is suspicious.
Anything outside the green zone is an outlier
lower fence
Q1
Q3
upper fence
Outlier
Safe zone (not outliers)
Outlier
Q1 โ 1.5รIQR
Q3 + 1.5รIQR
The 4-step method
- Find Q1, Q3, and IQR from your GDC.
- Calculate the lower fence: Q1 โ 1.5 ร IQR.
- Calculate the upper fence: Q3 + 1.5 ร IQR.
- Any data value outside these fences is an outlier.
๐ค Why 1.5? Why not just “really far”?
The 1.5 ร IQR rule comes from John Tukey, the statistician who invented the box plot. He picked 1.5 because for typical (roughly normal) data, only about 1% of values land outside this fence by chance. So if a value is outside, it’s worth flagging.
It’s a useful rule of thumb โ not a law of nature. It just gives a consistent, reproducible way to identify “extreme” values.
๐ง Memory trick: “1.5 IQRs from the nearest fence”
The IQR is the spread of the middle 50%. Adding 1.5 ร IQR to each side gives you a “fence” โ anything outside the fence is unusual. Lower fence = Q1 โ 1.5รIQR, Upper fence = Q3 + 1.5รIQR.
Should I remove an outlier?
This is where students often go wrong. Just because a value is an outlier doesn’t mean you should delete it. The decision depends entirely on context.
๐๏ธ REMOVE if it’s an error
If the data was clearly recorded wrong โ a typo, a duplicated entry, or a measurement mistake โ remove it.
e.g. height of 1700 cm โ clearly meant to be 170 cm. A “71” typed where “17” was meant.
โ KEEP if it’s real data
If the value is genuinely part of the population โ even if extreme โ keep it. The outlier is meaningful.
e.g. the CEO’s salary in a company wage list, a 13-year-old at a 5-year-old’s birthday party.
๐Always justify your decision
If an exam asks “should this outlier be removed?”, you have to give a reason โ not just “yes” or “no”. Mention whether the value seems plausible (real) or implausible (an error). The justification is where the marks live.
How outliers affect the statistics
Outliers don’t affect every statistical measure equally โ some get hit hard, others barely notice.
Heavily affected by outliers
- Mean โ uses every value, so one extreme number drags the average up or down.
- Range โ defined by the largest and smallest values, so a single outlier blows it up.
- Standard deviation โ squares the deviations, so extreme values count even more.
Hardly affected by outliers
- Median โ only looks at the middle position, doesn’t care how extreme the ends are.
- Mode โ counts frequency, ignores magnitude.
- IQR โ only uses Q1 and Q3, ignoring the bottom and top 25%.
If your data has outliers, prefer the median and IQR over the mean and SD. They give a more “honest” picture of typical values.
Worked examples
WE 1Identify outliers and decide which to remove
The ages, in years, of children attending a birthday party are:
2, 7, 5, 4, 8, 4, 6, 5, 5, 29, 2, 5, 13
(a) Identify any outliers. (b) Suggest which value(s) should be removed and justify.
Data: 2, 7, 5, 4, 8, 4, 6, 5, 5, 29, 2, 5, 13part (a) โ outliers
From GDC: Q1 = 4, Q3 = 7.5
IQR = Q3 โ Q1: 7.5 โ 4 = 3.5
Lower fence: Q1 โ 1.5 ร IQR = 4 โ 1.5ร3.5 = โ1.25
Upper fence: Q3 + 1.5 ร IQR = 7.5 + 1.5ร3.5 = 12.75
Anything outside [โ1.25, 12.75] is an outlier:
Outliers are 13 and 29part (b) โ keep or remove?
13 is a valid age for a child โ keep it.
29 is too old for a children’s party โ likely an error or an adult.
Remove 29; keep 13
always give a reason โ “remove because it doesn’t fit a children’s party” is what scores marks
WE 2Find the outlier boundaries and check a value
A data set has Q1 = 12, Q3 = 22, and a single value of 45. Determine whether 45 is an outlier.
Q1 = 12, Q3 = 22, check x = 45
Find the upper fence first since 45 is large.
IQR: 22 โ 12 = 10
Upper fence: Q3 + 1.5 ร IQR = 22 + 15 = 37
Compare: 45 > 37 โ
Yes, 45 is an outlier
always quote the fence value alongside the comparison
WE 3Effect of removing an outlier on mean and median
5 employees earn (in $1000s): 35, 38, 40, 42, 45. The owner ($250k) is added โ making 6 values. Then the owner is identified as an outlier and removed.
(a) Mean and median with the owner. (b) Mean and median after removing the owner. (c) Comment.
Same data with and without the outlier. We’ll see how each measure responds.part (a) โ with owner
Sorted: 35, 38, 40, 42, 45, 250
Median = avg of 3rd and 4th: (40+42)/2 = 41
Mean = total รท 6: (35+38+40+42+45+250)/6 = 450/6 = 75
Median = $41k, Mean = $75kpart (b) โ without owner
Sorted: 35, 38, 40, 42, 45
Median = 3rd value: 40
Mean: 200/5 = 40
Median = $40k, Mean = $40kpart (c)
Median: 41 โ 40 (changed by $1k).
Mean: 75 โ 40 (dropped by $35k!).
Mean is heavily affected; median barely is
if data has outliers, the median gives a fairer picture of “typical”
WE 4Check both ends for outliers
A data set has Q1 = 50, Q3 = 80. Check whether 8, 100, and 130 are outliers.
Q1 = 50, Q3 = 80
First find both fences, then check each value.
IQR: 80 โ 50 = 30
Lower fence: 50 โ 1.5ร30 = 50 โ 45 = 5
Upper fence: 80 + 1.5ร30 = 80 + 45 = 125
Safe zone: [5, 125]. Check each value:
8: 8 โฅ 5 โ not an outlier โ
100: 100 โค 125 โ not an outlier โ
130: 130 > 125 โ outlier โ
Only 130 is an outlier
always check both fences โ outliers can be too small OR too large
WE 5Decide whether to remove outliers in different contexts
A scientist measures the lengths of 30 fish in a lake. The data has two outliers identified: a length of 5 cm (very small) and 90 cm (very large). For each scenario, decide whether the outlier should be removed and justify.
(a) The 5 cm fish was measured incorrectly โ the actual length was 50 cm. (b) The 90 cm fish is a healthy adult of a less common species in the lake.
Decision depends on whether the value is an error or a real (but unusual) data point.part (a) โ 5 cm fish
This is a clear measurement error โ the real value is 50 cm.
Either correct it or remove the wrong record.
Remove (or correct to 50 cm)part (b) โ 90 cm fish
A healthy adult fish is real data โ even if rare.
Removing it would hide a true feature of the population.
Keep the 90 cm fish
“outlier” doesn’t mean “wrong” โ sometimes outliers are the most interesting data
๐ก Top tips
- Always use the GDC for Q1 and Q3. Different by-hand methods give different quartiles, but the IB expects the calculator’s values.
- Always show your fence calculations. Even if 29 is “obviously” an outlier, you need to write Q1 โ 1.5รIQR = โฆ and compare.
- Both ends matter. Don’t only check the upper fence. Always compute Q1 โ 1.5รIQR too.
- Justify removal decisions in context. “Remove the 29 because the data is from a children’s party” scores way more than just “remove”.
- Median and IQR are outlier-resistant. If your data has outliers and you want a “typical” measure, use them instead of the mean and SD.
- The fence values can be negative โ that’s fine. Just means no value can be a low-end outlier.
- If a value sits exactly on the fence (e.g. value = 12.75 with upper fence = 12.75), it’s not an outlier. The rule uses strict inequality.
- Mention units in your final answer when the data has them โ outliers with units like cm, kg, or $ are easier to interpret.
โ Common mistakes
- Removing every outlier without thinking. Outliers can be real, important data. Always justify removal in context โ never automatically.
- Forgetting to check the lower fence. An outlier can be too small as well as too large. Always check both ends.
- Using just “1 ร IQR” instead of “1.5 ร IQR”. The IB rule is specifically 1.5 โ that’s the multiplier you need.
- Confusing Q1 and Q3. Q1 is the lower quartile (lowest 25% mark). Q3 is the upper. Mix them up and your fences are flipped.
- Computing IQR with the wrong sign. IQR = Q3 โ Q1 (always positive). If you get a negative number, you’ve subtracted the wrong way.
- Saying “this value is too far from the mean”. The IB outlier rule is based on quartiles, not the mean. Use Q1 and Q3.
- Not justifying. The exam wants reasoning: why should this outlier be removed? “Because it’s a measurement error” โ yes. “Because it’s big” โ no.
- Removing an outlier that’s clearly a real, important value. A high salary, an unusual height, or a rare species โ these are the most informative parts of your data.
Outliers are especially important for the next note โ box & whisker diagrams โ where outliers get drawn as separate crosses outside the whiskers. Once you can spot them, the box plot rules become easy.
Need help with Outliers?
Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.
Book Free Session โ