IB Maths AA SL Topic 4 โ€” Statistics Toolkit Paper 1 & 2 ~8 min read

Outliers

Sometimes a data set has a value that just doesn’t fit โ€” a 29-year-old at a kids’ birthday party, or a $250,000 salary in a list of regular jobs. These are outliers. This note shows you how to spot them with a precise rule, and when (and when not) to remove them.

๐Ÿ“˜ What you need to know

What is an outlier?

An outlier is a data value that’s wildly bigger or smaller than the rest. Think of it as the odd one out โ€” the kid in the photo who doesn’t quite belong.

Some real-world examples:

The problem? Outliers can wreck statistics like the mean, the range, and the standard deviation. So you need a precise way to detect them โ€” and a careful approach to deciding what to do about them.

“Outlier” doesn’t mean “bad data”. It just means “unusual”. Sometimes outliers are real and important โ€” like rare medical conditions or top-performing students. Removing them blindly is a serious statistical sin.

The 1.5 ร— IQR rule โ€” how to detect them

The IB uses a precise mathematical rule for spotting outliers, based on the quartiles. The idea: anything sitting too far from the middle 50% of the data is suspicious.

Outlier rule
x is an outlier if   x < Q1 โˆ’ 1.5 ร— IQR
or   x > Q3 + 1.5 ร— IQR
Anything outside the green zone is an outlier
lower fence
Q1
Q3
upper fence
Outlier
Safe zone (not outliers)
Outlier
Q1 โˆ’ 1.5ร—IQR
Q3 + 1.5ร—IQR

The 4-step method

  1. Find Q1, Q3, and IQR from your GDC.
  2. Calculate the lower fence: Q1 โˆ’ 1.5 ร— IQR.
  3. Calculate the upper fence: Q3 + 1.5 ร— IQR.
  4. Any data value outside these fences is an outlier.

๐Ÿค” Why 1.5? Why not just “really far”?

The 1.5 ร— IQR rule comes from John Tukey, the statistician who invented the box plot. He picked 1.5 because for typical (roughly normal) data, only about 1% of values land outside this fence by chance. So if a value is outside, it’s worth flagging.

It’s a useful rule of thumb โ€” not a law of nature. It just gives a consistent, reproducible way to identify “extreme” values.

๐Ÿง 

Memory trick: “1.5 IQRs from the nearest fence”

The IQR is the spread of the middle 50%. Adding 1.5 ร— IQR to each side gives you a “fence” โ€” anything outside the fence is unusual. Lower fence = Q1 โˆ’ 1.5ร—IQR,   Upper fence = Q3 + 1.5ร—IQR.

Should I remove an outlier?

This is where students often go wrong. Just because a value is an outlier doesn’t mean you should delete it. The decision depends entirely on context.

๐Ÿ—‘๏ธ REMOVE if it’s an error

If the data was clearly recorded wrong โ€” a typo, a duplicated entry, or a measurement mistake โ€” remove it.
e.g. height of 1700 cm โ€” clearly meant to be 170 cm. A “71” typed where “17” was meant.

โœ“ KEEP if it’s real data

If the value is genuinely part of the population โ€” even if extreme โ€” keep it. The outlier is meaningful.
e.g. the CEO’s salary in a company wage list, a 13-year-old at a 5-year-old’s birthday party.
๐Ÿ“

Always justify your decision

If an exam asks “should this outlier be removed?”, you have to give a reason โ€” not just “yes” or “no”. Mention whether the value seems plausible (real) or implausible (an error). The justification is where the marks live.

How outliers affect the statistics

Outliers don’t affect every statistical measure equally โ€” some get hit hard, others barely notice.

Heavily affected by outliers

Hardly affected by outliers

If your data has outliers, prefer the median and IQR over the mean and SD. They give a more “honest” picture of typical values.

Worked examples

WE 1

Identify outliers and decide which to remove

The ages, in years, of children attending a birthday party are:

2,   7,   5,   4,   8,   4,   6,   5,   5,   29,   2,   5,   13

(a) Identify any outliers.   (b) Suggest which value(s) should be removed and justify.

Data: 2, 7, 5, 4, 8, 4, 6, 5, 5, 29, 2, 5, 13part (a) โ€” outliers From GDC: Q1 = 4,   Q3 = 7.5 IQR = Q3 โˆ’ Q1: 7.5 โˆ’ 4 = 3.5 Lower fence: Q1 โˆ’ 1.5 ร— IQR = 4 โˆ’ 1.5ร—3.5 = โˆ’1.25 Upper fence: Q3 + 1.5 ร— IQR = 7.5 + 1.5ร—3.5 = 12.75 Anything outside [โˆ’1.25, 12.75] is an outlier: Outliers are 13 and 29part (b) โ€” keep or remove? 13 is a valid age for a child โ†’ keep it. 29 is too old for a children’s party โ†’ likely an error or an adult. Remove 29; keep 13 always give a reason โ€” “remove because it doesn’t fit a children’s party” is what scores marks
WE 2

Find the outlier boundaries and check a value

A data set has Q1 = 12, Q3 = 22, and a single value of 45. Determine whether 45 is an outlier.

Q1 = 12,   Q3 = 22,   check x = 45 Find the upper fence first since 45 is large. IQR: 22 โˆ’ 12 = 10 Upper fence: Q3 + 1.5 ร— IQR = 22 + 15 = 37 Compare: 45 > 37 โœ“ Yes, 45 is an outlier always quote the fence value alongside the comparison
WE 3

Effect of removing an outlier on mean and median

5 employees earn (in $1000s): 35, 38, 40, 42, 45. The owner ($250k) is added โ€” making 6 values. Then the owner is identified as an outlier and removed.

(a) Mean and median with the owner.   (b) Mean and median after removing the owner.   (c) Comment.

Same data with and without the outlier. We’ll see how each measure responds.part (a) โ€” with owner Sorted: 35, 38, 40, 42, 45, 250 Median = avg of 3rd and 4th: (40+42)/2 = 41 Mean = total รท 6: (35+38+40+42+45+250)/6 = 450/6 = 75 Median = $41k, Mean = $75kpart (b) โ€” without owner Sorted: 35, 38, 40, 42, 45 Median = 3rd value: 40 Mean: 200/5 = 40 Median = $40k, Mean = $40kpart (c) Median: 41 โ†’ 40 (changed by $1k). Mean: 75 โ†’ 40 (dropped by $35k!). Mean is heavily affected; median barely is if data has outliers, the median gives a fairer picture of “typical”
WE 4

Check both ends for outliers

A data set has Q1 = 50, Q3 = 80. Check whether 8, 100, and 130 are outliers.

Q1 = 50,   Q3 = 80 First find both fences, then check each value. IQR: 80 โˆ’ 50 = 30 Lower fence: 50 โˆ’ 1.5ร—30 = 50 โˆ’ 45 = 5 Upper fence: 80 + 1.5ร—30 = 80 + 45 = 125 Safe zone: [5, 125]. Check each value: 8:   8 โ‰ฅ 5 โ†’ not an outlier โœ“ 100:   100 โ‰ค 125 โ†’ not an outlier โœ“ 130:   130 > 125 โ†’ outlier โŒ Only 130 is an outlier always check both fences โ€” outliers can be too small OR too large
WE 5

Decide whether to remove outliers in different contexts

A scientist measures the lengths of 30 fish in a lake. The data has two outliers identified: a length of 5 cm (very small) and 90 cm (very large). For each scenario, decide whether the outlier should be removed and justify.

(a) The 5 cm fish was measured incorrectly โ€” the actual length was 50 cm.   (b) The 90 cm fish is a healthy adult of a less common species in the lake.

Decision depends on whether the value is an error or a real (but unusual) data point.part (a) โ€” 5 cm fish This is a clear measurement error โ€” the real value is 50 cm. Either correct it or remove the wrong record. Remove (or correct to 50 cm)part (b) โ€” 90 cm fish A healthy adult fish is real data โ€” even if rare. Removing it would hide a true feature of the population. Keep the 90 cm fish “outlier” doesn’t mean “wrong” โ€” sometimes outliers are the most interesting data

๐Ÿ’ก Top tips

โš  Common mistakes

Outliers are especially important for the next note โ€” box & whisker diagrams โ€” where outliers get drawn as separate crosses outside the whiskers. Once you can spot them, the box plot rules become easy.

Need help with Outliers?

Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.

Book Free Session โ†’