IB Maths AA HL Topic 4 — Statistics & Probability Paper 1 & 2 ~6 min read

Outliers

An outlier is a value that sits unusually far from the rest of the data. The IB rule defines “unusually far” as more than 1.5 × IQR beyond either quartile. Whether you should remove an outlier is a separate question — that depends on the context, not the formula.

📘 What you need to know

Definition: x is an outlier if x < Q₁ − 1.5 × IQR or x > Q₃ + 1.5 × IQR.
Outlier boundaries: lower bound = Q₁ − 1.5 × IQR; upper bound = Q₃ + 1.5 × IQR.
Mean is sensitive to outliers; median is resistant.
Standard deviation is sensitive; IQR is resistant.
Decide whether to remove based on context — not the formula alone.
Remove if it’s an error: typo, instrument failure, miscoded entry.
Don’t remove if genuine: a CEO’s salary, a record-breaking athlete’s time, an unusual but valid observation.

The 1.5 × IQR rule

Outlier boundaries Lower: Q₁ − 1.5 × IQR | Upper: Q₃ + 1.5 × IQR

Any value strictly less than the lower bound or strictly greater than the upper bound is classified as an outlier. The bounds are computed once per data set and tested against every value.

Should you remove an outlier?

Remove if…

it’s clearly an error: typo, mis-recorded value, equipment fault, miscoded category

Don’t remove if…

it’s a genuine extreme: a CEO’s salary, a record performance, a rare-but-real event

The default: keep the outlier and report both with-and-without summaries (or use median + IQR, which are robust). Verify the value first if possible — only remove after confirming it’s an error.

🧭 Recipe — find and classify outliers

Sort the data in ascending order.
Find the quartiles Q₁ and Q₃ (by hand or via 1-Var Stats).
Compute the IQR: Q₃ − Q₁.
Find the boundaries: lower = Q₁ − 1.5 × IQR; upper = Q₃ + 1.5 × IQR.
List any values below the lower bound or above the upper bound.
Discuss removal in context — only remove if there’s reason to believe it’s an error.

Worked examples

WE 1

Identify the outlier in a data set

Identify any outliers in the following data set: 14, 18, 20, 22, 24, 26, 28, 30, 32, 60.

Step 1: Already sorted, n = 10 Step 2: Quartiles (split into halves of 5) Lower half: 14, 18, 20, 22, 24 → Q₁ = 20 Upper half: 26, 28, 30, 32, 60 → Q₃ = 30 IQR = 30 − 20 = 10 Step 3: Boundaries Lower = 20 − 1.5(10) = 20 − 15 = 5 Upper = 30 + 1.5(10) = 30 + 15 = 45 Step 4: Compare each value to the bounds All values between 14 and 32 are inside [5, 45]; 60 is above 45 Outlier: 60 only one value lies outside [5, 45], so 60 is the sole outlier

WE 2

Data set with both a low and a high outlier

Identify any outliers in: 2, 18, 20, 22, 24, 26, 28, 30, 32, 70.

Step 1: n = 10 Step 2: Quartiles Lower half: 2, 18, 20, 22, 24 → Q₁ = 20 Upper half: 26, 28, 30, 32, 70 → Q₃ = 30 IQR = 10 Step 3: Boundaries Lower = 20 − 15 = 5; Upper = 30 + 15 = 45 Step 4: Check each value 2 < 5 → outlier (low) 70 > 45 → outlier (high) Outliers: 2 and 70 a single data set can have outliers at both ends

WE 3

Find boundaries given quartiles, then test specific values

For a data set, Q₁ = 24 and Q₃ = 36. (a) Find the lower and upper outlier boundaries. (b) Determine whether each of the values 3, 5, 50, and 60 is an outlier.

(a) IQR = 36 − 24 = 12 Lower = 24 − 1.5(12) = 24 − 18 = 6 Upper = 36 + 1.5(12) = 36 + 18 = 54 (b) Test each value against [6, 54] 3 < 6 → outlier 5 < 6 → outlier 50 is in [6, 54] → not outlier 60 > 54 → outlier Boundaries: [6, 54]; outliers: 3, 5, and 60 “strictly less” or “strictly greater” — values exactly on the boundary are not outliers

WE 4

Effect of removing the outlier on mean and median

The data set is: 4, 9, 11, 13, 15, 17, 19, 22, 80. (a) Find the mean and median. (b) Identify any outlier and recompute the mean and median without it. (c) Comment on which measure is more affected.

(a) Original (n = 9) Mean = (4+9+11+13+15+17+19+22+80)/9 = 190/9 ≈ 21.1 Median (5th value) = 15 (b) Find the outlier Lower half: 4, 9, 11, 13 → Q₁ = (9+11)/2 = 10 Upper half: 17, 19, 22, 80 → Q₃ = (19+22)/2 = 20.5 IQR = 10.5; Upper bound = 20.5 + 15.75 = 36.25 80 > 36.25 → outlier After removing 80 (n = 8) New mean = 110/8 = 13.75 New median = (13+15)/2 = 14 (c) Compare changes Mean: 21.1 → 13.75 (changed by ≈ 7.4) Median: 15 → 14 (changed by 1) Mean is much more affected; median is resistant to outliers classic illustration: prefer median when outliers are present

WE 5

Salary scenario — should the outlier be removed?

The annual salaries (in $1000s) of 10 employees at a small company are: 35, 38, 40, 42, 44, 45, 47, 48, 50, 180. (a) Identify any outlier. (b) Discuss whether the outlier should be removed.

(a) Quartiles (n = 10) Lower half: 35, 38, 40, 42, 44 → Q₁ = 40 Upper half: 45, 47, 48, 50, 180 → Q₃ = 48 IQR = 8; Upper bound = 48 + 12 = 60 180 > 60 → outlier (b) Discussion $180k is plausibly the salary of a CEO or director → a valid (if extreme) data point, not an error → should NOT be removed Outlier: $180k; should NOT be removed (genuine high earner) when reporting a “typical” salary, use the median ($44.5k) since the mean is pulled up by the outlier

WE 6

Hours revised — identify and decide

A teacher records the hours that 10 students spent revising for an exam: 1, 4, 6, 7, 8, 8, 9, 10, 12, 35. (a) Find Q₁, Q₃, and the IQR. (b) Determine whether 35 is an outlier. (c) Suggest whether the value should be removed and justify.

(a) Quartiles (n = 10) Lower half: 1, 4, 6, 7, 8 → Q₁ = 6 Upper half: 8, 9, 10, 12, 35 → Q₃ = 10 IQR = 4 (b) Boundaries Lower = 6 − 6 = 0; Upper = 10 + 6 = 16 35 > 16 → outlier ✓ (c) Discussion 35 hours is unusually high but plausible for a very dedicated student → verify with the student before removing; if confirmed valid, keep it 35 is an outlier; verify accuracy first — remove only if it’s a recording error don’t reflexively remove outliers — they often carry the most interesting information

💡 Top tips

Memorise the rule: 1.5 × IQR from the nearest quartile.
Always state both bounds, then check each suspect value against them.
Use your GDC’s box plot — it marks outliers with an asterisk or cross, instant visual check.
Mean & SD are sensitive; median & IQR are robust. Pick robust measures for outlier-prone data.
Justify removal in context — “outlier formula says so” is not enough; you need a reason like “data entry error” or “recording fault”.

⚠ Common mistakes

Multiplying by 1.5 × Q rather than 1.5 × IQR — it’s the IQR that gets scaled, not the quartile itself.
Adding to or subtracting from the wrong quartile — lower bound uses Q₁, upper uses Q₃.
Calling values exactly on the boundary outliers — the rule uses strict inequalities.
Removing outliers automatically without checking whether they’re genuine.
Quoting the IQR instead of the bounds as the threshold — the threshold is Q ± 1.5 × IQR, not the IQR itself.

Next: Box & Whisker Diagrams. The five-number summary (min, Q₁, median, Q₃, max) becomes a visual: a box for the middle 50%, whiskers reaching out to the extremes, and crosses for outliers. Two box plots side-by-side reveal differences in centre and spread at a glance.

Need help with Statistics & Probability?

Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.

Book Free Session →

Outliers

📘 What you need to know

The 1.5 × IQR rule

Should you remove an outlier?

🧭 Recipe — find and classify outliers

Worked examples

💡 Top tips

⚠ Common mistakes

Need help with Statistics & Probability?

Quick Links

Contact us

Follow us