IB Maths AI HLStatistics ToolkitPaper 1 & 2~8 min read
Measures of Dispersion
A measure of central tendency only tells you where the centre of the data is β it doesn’t say whether the values are tightly clustered around the centre or spread out across a wide range. Two classes with the same mean of 60% can have wildly different distributions: in one class everyone is between 55 and 65, while in the other scores are between 30 and 90. That’s where measures of dispersion come in. You’ll meet four of them here: the range, interquartile range (IQR), variance, and standard deviation. Each one captures spread differently β and choosing the right one matters when the data has outliers.
π What you need to know
Quartiles are measures of location: they divide an ordered dataset into four equal parts. Q1 = lower quartile (25%), Q2 = median (50%), Q3 = upper quartile (75%).
Use your GDC to find quartiles β different by-hand methods give slightly different answers; the IB expects calculator values.
Range = largest value β smallest value. Quick but very sensitive to outliers.
Interquartile range (IQR) = Q3 β Q1. Captures the middle 50% of the data; ignores extreme values. In the formula booklet.
VarianceΟ2 measures average squared distance from the mean. Units: squared.
Standard deviationΟ = βvariance. Same units as the original data. The most common spread measure.
The IB expects you to use a GDC to find variance and standard deviation. Formulas are shown to deepen understanding only.
Bigger spread β data is more variable. Smaller spread β data is more consistent.
Range and quartiles
The simplest measure of spread is the range β just the gap between the largest and smallest values. Quick to compute, but a single outlier can blow it up.
Range
range = maximum value β minimum value
To get a more robust spread, ignore the extreme 25% at each end. The middle 50% sits between the lower quartile Q1 and the upper quartile Q3, and the gap between them is the interquartile range.
Interquartile range
IQR = Q3 β Q1in the formula booklet β
quartiles divide an ordered dataset into four equal parts
Each quartile section contains 25% of the data. The IQR captures only the middle 50%, ignoring the outer extremes β that’s what makes it robust to outliers.
Variance and standard deviation
The range and IQR are quick spread measures, but they don’t use every value in the data. Standard deviation does β it measures the average distance from the mean. Variance is just its square (so the units are squared; less intuitive but useful in further calculations).
Variance β definition formΟ2 = Ξ£ fi(xi β ΞΌ)2nnot in the formula booklet
Variance β computational form (easier by hand)Ο2 = Ξ£ fixi2n β ΞΌ2“mean of the squares minus the square of the mean”
Standard deviationΟ = βΟ2
You’re not expected to memorise these formulas β the IB expects you to use your GDC. But the computational form (mean of squares minus square of the mean) is far easier than the definition form when working by hand, so it’s worth knowing.
π€ Why are variance units squared but SD’s aren’t?
The variance formula uses squared deviations: (xi β ΞΌ)2. Squaring removes negative signs (so deviations don’t cancel out) and emphasises larger deviations. But it also squares the units. If your data is in cm, the variance is in cm2. Taking the square root undoes this and brings the spread measure back to cm β that’s why standard deviation is the one you report in real-world contexts.
π§ Recipe β variance and standard deviation by hand
Compute the meanΞΌ = Ξ£x/n.
Compute Ξ£x2 = sum of the squares of every value.
Variance: Ο2 = Ξ£x2/n β ΞΌ2.
Standard deviation: take the square root of the variance.
Always verify with GDC β type the data into statistics mode and confirm.
Range vs IQR vs SD β when to use which
Measure
Uses all data?
Affected by outliers?
Best for
Range
No (just two values)
Yes β heavily
Quick rough check
IQR
No (middle 50%)
No β robust
Skewed data with outliers
Variance
Yes
Yes
Further calculations
Standard deviation
Yes
Yes
Symmetric data, no outliers
π§ Memory aid β pairing centre with spread
Statisticians always pair a centre measure with a spread measure of the same “family”. Median goes with IQR β both ignore extremes. Mean goes with standard deviation β both use every value. Mixing them (e.g. quoting mean with IQR) is technically OK but unusual.
Worked examples
WE 1
Range and IQR
Find the range and interquartile range for the data set below.
42 28 67 51 64 42
Step 1: sort the data28, 42, 42, 51, 64, 67Step 2: range = max β min= 67 β 28 = 39range = 39Step 3: quartiles using GDC (or by hand)lower half {28, 42, 42} β Qβ = 42upper half {51, 64, 67} β Qβ = 64Step 4: IQR = Qβ β Qβ= 64 β 42 = 22IQR = 22IB exams expect GDC values for quartiles β different by-hand methods give slightly different answers.
WE 2
Variance and standard deviation by hand
Using the same data set as WE 1, find the variance and standard deviation. Give your answers to 3 significant figures where appropriate.
42 28 67 51 64 42
Step 1: mean (from WE 1 of previous topic)ΞΌ = 294/6 = 49Step 2: Ξ£xΒ² = sum of squares42Β² + 28Β² + 67Β² + 51Β² + 64Β² + 42Β²= 1764 + 784 + 4489 + 2601 + 4096 + 1764= 15498Step 3: varianceΟΒ² = Ξ£xΒ²/n β ΞΌΒ²= 15498/6 β 49Β²= 2583 β 2401 = 182variance ΟΒ² = 182Step 4: SD = βvarianceΟ = β182 β 13.491SD Ο β 13.5 (3 sf)GDC confirms: ΟβΒ² = 182, Οβ β 13.491. Always use the GDC in the actual exam.
WE 3
All four measures together
The midday temperatures (Β°C) on 8 consecutive days at a weather station are:
15, 18, 22, 19, 25, 20, 17, 24
Find (a) the range, (b) the IQR, (c) the variance, and (d) the standard deviation. Give answers to 3 sf.
Two basketball teams record their scores (points) in 5 games each:
Game
1
2
3
4
5
Team A
40
45
50
55
60
Team B
48
49
50
51
52
Compare the spread of scores using both range and standard deviation. Which team is more consistent?
Team Amean = 250/5 = 50range = 60 β 40 = 20Ξ£xΒ² = 1600+2025+2500+3025+3600 = 12750ΟΒ² = 12750/5 β 50Β² = 2550 β 2500 = 50Ο = β50 β 7.07Team Bmean = 250/5 = 50 (same!)range = 52 β 48 = 4Ξ£xΒ² = 2304+2401+2500+2601+2704 = 12510ΟΒ² = 12510/5 β 2500 = 2502 β 2500 = 2Ο = β2 β 1.41compareA: range 20, Ο β 7.07B: range 4, Ο β 1.41Team B is much more consistentsame mean, very different spread β exactly why a centre measure alone is never enough.
WE 5
Outlier effect β range vs IQR
The wait times (in minutes) at a clinic on Monday are:
12, 14, 15, 16, 18, 20, 22, 23
On Tuesday, one extra patient with a complex case had a wait of 60 minutes, making the dataset:
12, 14, 15, 16, 18, 20, 22, 23, 60
Compare the range and IQR for both days, and comment on which is more robust to the outlier.
Monday (n = 8)range = 23 β 12 = 11Qβ = (14+15)/2 = 14.5; Qβ = (20+22)/2 = 21IQR = 21 β 14.5 = 6.5Tuesday (n = 9, includes 60)range = 60 β 12 = 48 (jumped from 11!)lower half {12,14,15,16}: Qβ = 14.5upper half {18,20,22,23}: Qβ = 21wait β with 9 values exclude middle: Qβ = 14.5, Qβ = 22.5IQR = 22.5 β 14.5 = 8comparerange: 11 β 48 (quadrupled by one outlier)IQR: 6.5 β 8 (barely changed)IQR is much more robust to the outlierwhen data has outliers, always report the IQR rather than the range.
WE 6
Compare two players β same mean, different SD
Two students each throw 5 darts at a target. Their distances (in cm) from the bullseye are:
Alice: 2, 3, 4, 5, 6
Bob: 1, 3, 4, 5, 7
(a) Show that both have the same mean distance.
(b) Calculate the standard deviation for each.
(c) State which is the more consistent player.
(a) meansAlice: (2+3+4+5+6)/5 = 20/5 = 4Bob: (1+3+4+5+7)/5 = 20/5 = 4 β(b) Alice’s SDΞ£xΒ² = 4+9+16+25+36 = 90ΟΒ² = 90/5 β 4Β² = 18 β 16 = 2Ο = β2 β 1.41 cmBob’s SDΞ£xΒ² = 1+9+16+25+49 = 100ΟΒ² = 100/5 β 16 = 20 β 16 = 4Ο = β4 = 2 cmAlice Ο β 1.41, Bob Ο = 2(c) lower SD = more consistent(c) Alice is more consistentsame mean, different spread β SD reveals what the mean hides.
π‘ Top tips
Always use the GDC for quartiles, variance, and SD in the exam. By-hand work is for understanding, not exam answers.
Range/IQR formula: IQR = Qβ β Qβ (in formula booklet). Range = max β min (not in booklet but trivial).
Computational form for variance is “mean of squares minus square of mean”. Memorise this, not the deviation form.
Pair them up: report mean with SD, or median with IQR. Don’t mix families in a comparison.
Units: variance squared, SD same as data. Always include units in your final answer where appropriate.
For comparisons: lower spread β more consistent / reliable. Higher spread β more variable.
β Common mistakes
Forgetting to sort first. Quartiles and the median all rely on sorted data.
Confusing variance with SD. SD is the square root of variance. Both are “spread” but in different units.
Using deviation form by hand when the computational form would be much faster. Mean of squares minus square of mean is almost always easier.
Reporting the range for data with outliers. One extreme value can multiply the range several times over β use IQR instead.
Saying “high standard deviation β high mean”. SD measures spread, not centre. Two datasets can share the same mean with completely different SDs (see WE 4).
Mixing sample vs population formulas. For AI HL, always use the population versions (divide by n, not nβ1). The GDC labels these Ο; s uses nβ1.
Next up β Frequency Tables. Real datasets are rarely small enough to write out one by one. Frequency tables let you compress repeated values into compact summaries β and you’ll learn how to recover the mean, median, mode, range, IQR, variance, and standard deviation directly from a table, both for ungrouped and grouped data.
Need help with Statistics?
Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.