IB Maths AA HL
Topic 4 — Statistics & Probability
Paper 1 & 2
~6 min read
Outliers
An outlier is a value that sits unusually far from the rest of the data. The IB rule defines “unusually far” as more than 1.5 × IQR beyond either quartile. Whether you should remove an outlier is a separate question — that depends on the context, not the formula.
📘 What you need to know
- Definition: x is an outlier if x < Q1 − 1.5 × IQR or x > Q3 + 1.5 × IQR.
- Outlier boundaries: lower bound = Q1 − 1.5 × IQR; upper bound = Q3 + 1.5 × IQR.
- Mean is sensitive to outliers; median is resistant.
- Standard deviation is sensitive; IQR is resistant.
- Decide whether to remove based on context — not the formula alone.
- Remove if it’s an error: typo, instrument failure, miscoded entry.
- Don’t remove if genuine: a CEO’s salary, a record-breaking athlete’s time, an unusual but valid observation.
The 1.5 × IQR rule
Outlier boundaries
Lower: Q1 − 1.5 × IQR | Upper: Q3 + 1.5 × IQR
Any value strictly less than the lower bound or strictly greater than the upper bound is classified as an outlier. The bounds are computed once per data set and tested against every value.
Should you remove an outlier?
Remove if…
it’s clearly an error: typo, mis-recorded value, equipment fault, miscoded category
Don’t remove if…
it’s a genuine extreme: a CEO’s salary, a record performance, a rare-but-real event
The default: keep the outlier and report both with-and-without summaries (or use median + IQR, which are robust). Verify the value first if possible — only remove after confirming it’s an error.
🧭 Recipe — find and classify outliers
- Sort the data in ascending order.
- Find the quartiles Q1 and Q3 (by hand or via 1-Var Stats).
- Compute the IQR: Q3 − Q1.
- Find the boundaries: lower = Q1 − 1.5 × IQR; upper = Q3 + 1.5 × IQR.
- List any values below the lower bound or above the upper bound.
- Discuss removal in context — only remove if there’s reason to believe it’s an error.
Worked examples
WE 1Identify the outlier in a data set
Identify any outliers in the following data set: 14, 18, 20, 22, 24, 26, 28, 30, 32, 60.
Step 1: Already sorted, n = 10
Step 2: Quartiles (split into halves of 5)
Lower half: 14, 18, 20, 22, 24 → Q₁ = 20
Upper half: 26, 28, 30, 32, 60 → Q₃ = 30
IQR = 30 − 20 = 10
Step 3: Boundaries
Lower = 20 − 1.5(10) = 20 − 15 = 5
Upper = 30 + 1.5(10) = 30 + 15 = 45
Step 4: Compare each value to the bounds
All values between 14 and 32 are inside [5, 45]; 60 is above 45
Outlier: 60
only one value lies outside [5, 45], so 60 is the sole outlier
WE 2Data set with both a low and a high outlier
Identify any outliers in: 2, 18, 20, 22, 24, 26, 28, 30, 32, 70.
Step 1: n = 10
Step 2: Quartiles
Lower half: 2, 18, 20, 22, 24 → Q₁ = 20
Upper half: 26, 28, 30, 32, 70 → Q₃ = 30
IQR = 10
Step 3: Boundaries
Lower = 20 − 15 = 5; Upper = 30 + 15 = 45
Step 4: Check each value
2 < 5 → outlier (low)
70 > 45 → outlier (high)
Outliers: 2 and 70
a single data set can have outliers at both ends
WE 3Find boundaries given quartiles, then test specific values
For a data set, Q1 = 24 and Q3 = 36. (a) Find the lower and upper outlier boundaries. (b) Determine whether each of the values 3, 5, 50, and 60 is an outlier.
(a) IQR = 36 − 24 = 12
Lower = 24 − 1.5(12) = 24 − 18 = 6
Upper = 36 + 1.5(12) = 36 + 18 = 54
(b) Test each value against [6, 54]
3 < 6 → outlier
5 < 6 → outlier
50 is in [6, 54] → not outlier
60 > 54 → outlier
Boundaries: [6, 54]; outliers: 3, 5, and 60
“strictly less” or “strictly greater” — values exactly on the boundary are not outliers
WE 4Effect of removing the outlier on mean and median
The data set is: 4, 9, 11, 13, 15, 17, 19, 22, 80. (a) Find the mean and median. (b) Identify any outlier and recompute the mean and median without it. (c) Comment on which measure is more affected.
(a) Original (n = 9)
Mean = (4+9+11+13+15+17+19+22+80)/9 = 190/9 ≈ 21.1
Median (5th value) = 15
(b) Find the outlier
Lower half: 4, 9, 11, 13 → Q₁ = (9+11)/2 = 10
Upper half: 17, 19, 22, 80 → Q₃ = (19+22)/2 = 20.5
IQR = 10.5; Upper bound = 20.5 + 15.75 = 36.25
80 > 36.25 → outlier
After removing 80 (n = 8)
New mean = 110/8 = 13.75
New median = (13+15)/2 = 14
(c) Compare changes
Mean: 21.1 → 13.75 (changed by ≈ 7.4)
Median: 15 → 14 (changed by 1)
Mean is much more affected; median is resistant to outliers
classic illustration: prefer median when outliers are present
WE 5Salary scenario — should the outlier be removed?
The annual salaries (in $1000s) of 10 employees at a small company are: 35, 38, 40, 42, 44, 45, 47, 48, 50, 180. (a) Identify any outlier. (b) Discuss whether the outlier should be removed.
(a) Quartiles (n = 10)
Lower half: 35, 38, 40, 42, 44 → Q₁ = 40
Upper half: 45, 47, 48, 50, 180 → Q₃ = 48
IQR = 8; Upper bound = 48 + 12 = 60
180 > 60 → outlier
(b) Discussion
$180k is plausibly the salary of a CEO or director
→ a valid (if extreme) data point, not an error
→ should NOT be removed
Outlier: $180k; should NOT be removed (genuine high earner)
when reporting a “typical” salary, use the median ($44.5k) since the mean is pulled up by the outlier
WE 6Hours revised — identify and decide
A teacher records the hours that 10 students spent revising for an exam: 1, 4, 6, 7, 8, 8, 9, 10, 12, 35. (a) Find Q1, Q3, and the IQR. (b) Determine whether 35 is an outlier. (c) Suggest whether the value should be removed and justify.
(a) Quartiles (n = 10)
Lower half: 1, 4, 6, 7, 8 → Q₁ = 6
Upper half: 8, 9, 10, 12, 35 → Q₃ = 10
IQR = 4
(b) Boundaries
Lower = 6 − 6 = 0; Upper = 10 + 6 = 16
35 > 16 → outlier ✓
(c) Discussion
35 hours is unusually high but plausible for a very dedicated student
→ verify with the student before removing; if confirmed valid, keep it
35 is an outlier; verify accuracy first — remove only if it’s a recording error
don’t reflexively remove outliers — they often carry the most interesting information
💡 Top tips
- Memorise the rule: 1.5 × IQR from the nearest quartile.
- Always state both bounds, then check each suspect value against them.
- Use your GDC’s box plot — it marks outliers with an asterisk or cross, instant visual check.
- Mean & SD are sensitive; median & IQR are robust. Pick robust measures for outlier-prone data.
- Justify removal in context — “outlier formula says so” is not enough; you need a reason like “data entry error” or “recording fault”.
⚠ Common mistakes
- Multiplying by 1.5 × Q rather than 1.5 × IQR — it’s the IQR that gets scaled, not the quartile itself.
- Adding to or subtracting from the wrong quartile — lower bound uses Q1, upper uses Q3.
- Calling values exactly on the boundary outliers — the rule uses strict inequalities.
- Removing outliers automatically without checking whether they’re genuine.
- Quoting the IQR instead of the bounds as the threshold — the threshold is Q ± 1.5 × IQR, not the IQR itself.
Next: Box & Whisker Diagrams. The five-number summary (min, Q1, median, Q3, max) becomes a visual: a box for the middle 50%, whiskers reaching out to the extremes, and crosses for outliers. Two box plots side-by-side reveal differences in centre and spread at a glance.
Need help with Statistics & Probability?
Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.
Book Free Session →