IB Maths AA HL Topic 4 — Statistics & Probability Paper 1 & 2 ~6 min read

Outliers

An outlier is a value that sits unusually far from the rest of the data. The IB rule defines “unusually far” as more than 1.5 × IQR beyond either quartile. Whether you should remove an outlier is a separate question — that depends on the context, not the formula.

📘 What you need to know

The 1.5 × IQR rule

Outlier boundaries Lower: Q1 − 1.5 × IQR   |   Upper: Q3 + 1.5 × IQR

Any value strictly less than the lower bound or strictly greater than the upper bound is classified as an outlier. The bounds are computed once per data set and tested against every value.

Should you remove an outlier?

Remove if…
it’s clearly an error: typo, mis-recorded value, equipment fault, miscoded category
Don’t remove if…
it’s a genuine extreme: a CEO’s salary, a record performance, a rare-but-real event
The default: keep the outlier and report both with-and-without summaries (or use median + IQR, which are robust). Verify the value first if possible — only remove after confirming it’s an error.

🧭 Recipe — find and classify outliers

  1. Sort the data in ascending order.
  2. Find the quartiles Q1 and Q3 (by hand or via 1-Var Stats).
  3. Compute the IQR: Q3Q1.
  4. Find the boundaries: lower = Q1 − 1.5 × IQR; upper = Q3 + 1.5 × IQR.
  5. List any values below the lower bound or above the upper bound.
  6. Discuss removal in context — only remove if there’s reason to believe it’s an error.

Worked examples

WE 1

Identify the outlier in a data set

Identify any outliers in the following data set:   14, 18, 20, 22, 24, 26, 28, 30, 32, 60.

Step 1: Already sorted, n = 10 Step 2: Quartiles (split into halves of 5) Lower half: 14, 18, 20, 22, 24 → Q₁ = 20 Upper half: 26, 28, 30, 32, 60 → Q₃ = 30 IQR = 30 − 20 = 10 Step 3: Boundaries Lower = 20 − 1.5(10) = 20 − 15 = 5 Upper = 30 + 1.5(10) = 30 + 15 = 45 Step 4: Compare each value to the bounds All values between 14 and 32 are inside [5, 45]; 60 is above 45 Outlier: 60 only one value lies outside [5, 45], so 60 is the sole outlier
WE 2

Data set with both a low and a high outlier

Identify any outliers in:   2, 18, 20, 22, 24, 26, 28, 30, 32, 70.

Step 1: n = 10 Step 2: Quartiles Lower half: 2, 18, 20, 22, 24 → Q₁ = 20 Upper half: 26, 28, 30, 32, 70 → Q₃ = 30 IQR = 10 Step 3: Boundaries Lower = 20 − 15 = 5; Upper = 30 + 15 = 45 Step 4: Check each value 2 < 5 → outlier (low) 70 > 45 → outlier (high) Outliers: 2 and 70 a single data set can have outliers at both ends
WE 3

Find boundaries given quartiles, then test specific values

For a data set, Q1 = 24 and Q3 = 36. (a) Find the lower and upper outlier boundaries. (b) Determine whether each of the values 3, 5, 50, and 60 is an outlier.

(a) IQR = 36 − 24 = 12 Lower = 24 − 1.5(12) = 24 − 18 = 6 Upper = 36 + 1.5(12) = 36 + 18 = 54 (b) Test each value against [6, 54] 3 < 6 → outlier 5 < 6 → outlier 50 is in [6, 54] → not outlier 60 > 54 → outlier Boundaries: [6, 54]; outliers: 3, 5, and 60 “strictly less” or “strictly greater” — values exactly on the boundary are not outliers
WE 4

Effect of removing the outlier on mean and median

The data set is:   4, 9, 11, 13, 15, 17, 19, 22, 80. (a) Find the mean and median. (b) Identify any outlier and recompute the mean and median without it. (c) Comment on which measure is more affected.

(a) Original (n = 9) Mean = (4+9+11+13+15+17+19+22+80)/9 = 190/9 ≈ 21.1 Median (5th value) = 15 (b) Find the outlier Lower half: 4, 9, 11, 13 → Q₁ = (9+11)/2 = 10 Upper half: 17, 19, 22, 80 → Q₃ = (19+22)/2 = 20.5 IQR = 10.5; Upper bound = 20.5 + 15.75 = 36.25 80 > 36.25 → outlier After removing 80 (n = 8) New mean = 110/8 = 13.75 New median = (13+15)/2 = 14 (c) Compare changes Mean: 21.1 → 13.75 (changed by ≈ 7.4) Median: 15 → 14 (changed by 1) Mean is much more affected; median is resistant to outliers classic illustration: prefer median when outliers are present
WE 5

Salary scenario — should the outlier be removed?

The annual salaries (in $1000s) of 10 employees at a small company are:   35, 38, 40, 42, 44, 45, 47, 48, 50, 180. (a) Identify any outlier. (b) Discuss whether the outlier should be removed.

(a) Quartiles (n = 10) Lower half: 35, 38, 40, 42, 44 → Q₁ = 40 Upper half: 45, 47, 48, 50, 180 → Q₃ = 48 IQR = 8; Upper bound = 48 + 12 = 60 180 > 60 → outlier (b) Discussion $180k is plausibly the salary of a CEO or director → a valid (if extreme) data point, not an error → should NOT be removed Outlier: $180k; should NOT be removed (genuine high earner) when reporting a “typical” salary, use the median ($44.5k) since the mean is pulled up by the outlier
WE 6

Hours revised — identify and decide

A teacher records the hours that 10 students spent revising for an exam:   1, 4, 6, 7, 8, 8, 9, 10, 12, 35. (a) Find Q1, Q3, and the IQR. (b) Determine whether 35 is an outlier. (c) Suggest whether the value should be removed and justify.

(a) Quartiles (n = 10) Lower half: 1, 4, 6, 7, 8 → Q₁ = 6 Upper half: 8, 9, 10, 12, 35 → Q₃ = 10 IQR = 4 (b) Boundaries Lower = 6 − 6 = 0; Upper = 10 + 6 = 16 35 > 16 → outlier ✓ (c) Discussion 35 hours is unusually high but plausible for a very dedicated student → verify with the student before removing; if confirmed valid, keep it 35 is an outlier; verify accuracy first — remove only if it’s a recording error don’t reflexively remove outliers — they often carry the most interesting information

💡 Top tips

⚠ Common mistakes

Next: Box & Whisker Diagrams. The five-number summary (min, Q1, median, Q3, max) becomes a visual: a box for the middle 50%, whiskers reaching out to the extremes, and crosses for outliers. Two box plots side-by-side reveal differences in centre and spread at a glance.

Need help with Statistics & Probability?

Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.

Book Free Session →