IB Maths AA SL
Topic 4 β Statistics Toolkit
Paper 1 & 2
~10 min read
Measures of Dispersion
Two data sets can have the exact same mean but look completely different β one tightly bunched, the other wildly spread. Dispersion is the second half of the picture: how spread out the data is. This note covers all four measures: range, IQR, variance, and standard deviation.
π What you need to know
- The range = largest value β smallest value. Quick to find, but ruined by outliers.
- The quartiles Q1, Q2, Q3 split sorted data into four equal sections (each holding 25% of the values).
- The interquartile range (IQR) = Q3 β Q1 β the spread of the middle 50%. Not affected by outliers.
- The variance (ΟΒ²) measures the average squared distance from the mean.
- The standard deviation (Ο) = βvariance. It’s the most important measure of spread.
- For variance and standard deviation, use your GDC β don’t waste exam time calculating by hand.
- The IQR formula is in the formula booklet. Variance and SD aren’t, but your GDC handles them.
Why “spread” matters
Imagine two students, both with a mean test score of 70%. Student A scored 68, 70, 72, 70, 70 β really consistent. Student B scored 30, 50, 70, 90, 110 β wild swings. Same mean, totally different stories.
Dispersion measures answer the question: “how spread out is the data?” A small dispersion means the data is tightly bunched around the centre. A big dispersion means the data is all over the place.
RANGE
Range = Max β Min
Difference between the biggest and smallest values. Quick but rough.
Same units as the data
INTERQUARTILE RANGE
IQR = Q3 β Q1
Spread of the middle 50% of the data. Ignores outliers.
Same units as the data
VARIANCE
ΟΒ²
Mean of the squared distances from the mean. Always positive.
Squared units (a bit awkward)
STANDARD DEVIATION
Ο = βvariance
Square root of the variance. The most useful measure of spread.
Same units as the data
If you’re stuck on which measure to use: standard deviation is the safest default. It uses every value, has the right units, and is what most exam questions expect.
The range β quickest but roughest
The range is the difference between the largest and smallest values. That’s it.
Example: for the data 4, 7, 2, 9, 5: max = 9, min = 2, so range = 7.
πWhy is the range “rough”?
The range only looks at the two most extreme values and ignores everything else. If your data has a single outlier β say, one freakishly tall person in a sample of average heights β the range jumps massively, even though the rest of the data is perfectly normal.
Quartiles and the IQR
Quartiles are three special values that split your sorted data into four equal-sized groups. Each group holds 25% of the data.
- Q1 (lower quartile) β splits the lowest 25% from the rest.
- Q2 (median) β splits the data in half.
- Q3 (upper quartile) β splits the highest 25% from the rest.
Quartiles split the sorted data into four equal pieces
IQR = middle 50%
Min
Q1
Q2 (median)
Q3
Max
Each section holds 25% of the data. The orange bracket shows the middle 50% β that’s the IQR.
The IQR formula
The IQR measures how spread out the middle 50% of the data is. Because it ignores the lowest 25% and the highest 25%, it isn’t affected by outliers β that’s why it’s so useful.
π€ Why is the IQR better than the range for skewed data?
Imagine 99 people earning around $50k per year, plus one billionaire earning $5 billion. The range would be about $5 billion β a meaningless number for describing the typical spread. But the IQR would still be a sensible figure (maybe $20k), because it ignores the extreme top and bottom 25%.
That’s why economists usually quote IQRs, not ranges, when describing income distributions.
πAlways use your GDC for quartiles
Different methods of finding quartiles by hand give slightly different answers. The IB expects you to use your calculator’s “1-Var Stats” function β it’ll give you Q1, Q2, Q3, the mean, and standard deviation all in one go.
Variance and standard deviation
The range and IQR only use a few values from your data set. The standard deviation uses every value β that’s what makes it powerful.
The idea: for each data point, find how far it is from the mean (this is the “deviation”). Then average those deviations. Big deviations means data spread far from the centre β small deviations means data clustered tightly.
The formulas (don’t memorise β use your GDC)
π€ Why square the differences?
If you just took (xi β ΞΌ) and averaged it, the positive and negative differences would cancel out, giving you 0 every time. Squaring kills the negatives, so all the deviations contribute properly.
The downside? Squared units are weird (mΒ² when you wanted m). That’s why we square-root at the end to get the standard deviation back into normal units.
π§ Memory trick: “Variance is squared, SD wears the clothes”
Variance has the squared units (kgΒ², sΒ², mΒ²) β it’s the “naked” version. The standard deviation puts the original units back on (kg, s, m) by square-rooting. That’s why SD is the one you actually quote in answers.
How to find them on your GDC
- Enter the data into a list (List 1 or L1, depending on your model).
- Run “1-Var Stats” or “Statistics Calculation” on that list.
- Look for Οx (or “stdDev”) β that’s your standard deviation.
- Square it to get the variance, ΟΒ²x.
- Some calculators show variance directly β if they do, just read it off.
There’s also a thing called “sample standard deviation” (sx) which divides by n β 1 instead of n. For IB AA SL, always use Ο (population standard deviation) unless the question says otherwise. They’re slightly different numbers!
Which measure should I use?
| Range | IQR | Standard deviation |
|---|
| Quick to calculate? | Yes | Need GDC | Need GDC |
| Uses every value? | No (only 2) | No (middle 50%) | Yes |
| Affected by outliers? | Yes (badly) | No | Yes |
| Same units as data? | Yes | Yes | Yes |
| Best for⦠| Quick checks | Skewed data | Symmetric data |
Rule of thumb: if the data is roughly symmetric and outlier-free, use standard deviation. If it’s skewed or has outliers, use the IQR. The range is mostly a quick check.
Worked examples
WE 1Find the range and IQR
Find the range and interquartile range for the data set below.
43, 29, 70, 51, 64, 43
Data: 43, 29, 70, 51, 64, 43range
Max β Min: 70 β 29 = 41
Range = 41iqr
Find Q1 and Q3 using GDC:
Q1 = 43, Q3 = 64
IQR = Q3 β Q1: 64 β 43 = 21
IQR = 21
always use the GDC for quartiles β different by-hand methods give different answers!
WE 2Find the variance and standard deviation
Find the variance and standard deviation for the data set below. Give answers to 3 s.f.
43, 29, 70, 51, 64, 43
Data: 43, 29, 70, 51, 64, 43
Use the GDC’s 1-Var Stats β never calculate by hand in an exam!
Enter data in list, run 1-Var Stats:
Οx = 13.759β¦ (this is SD)
Square it for variance: ΟxΒ² = 13.759β¦Β² = 189.333β¦
Round to 3 s.f.:
Variance = 189, SD = 13.8 (3 s.f.)
Ο has the same units as the data; ΟΒ² has squared units
WE 3Compare two data sets using SD
Two students each took 5 maths tests. Their scores are:
Aisha: 78, 80, 82, 79, 81 | Ben: 60, 95, 70, 100, 75
Find the mean and SD for each. Comment on consistency.
Use GDC for both. Same mean is possible β but SD shows real differences.aisha
From GDC: mean = 80, Ο β 1.41ben
From GDC: mean = 80, Ο β 14.6comment
Same mean (80) but very different spreads.
Aisha’s Ο is small β consistent scores.
Ben’s Ο is large β wildly varying scores.
Aisha is more consistent (smaller SD)
two data sets can share a mean but have totally different shapes β that’s why SD matters
WE 4How an outlier affects range vs IQR
The number of books read by 7 students last summer: 3, 4, 5, 6, 7, 8, 9. An eighth student says they read 50 books.
(a) Find the range and IQR for the original 7. (b) Find the new range and IQR with all 8. (c) Comment.
The 50 is a clear outlier. We’ll see how each measure responds.part (a)
Original 7: 3, 4, 5, 6, 7, 8, 9
Range = 9 β 3: Range = 6
From GDC: Q1 = 4, Q3 = 8 β IQR = 4
Range = 6, IQR = 4part (b)
All 8: 3, 4, 5, 6, 7, 8, 9, 50
Range = 50 β 3: Range = 47
From GDC: Q1 = 4.5, Q3 = 8.5 β IQR = 4
Range = 47, IQR = 4part (c)
Range jumped from 6 to 47 (almost 8Γ bigger!).
IQR didn’t change at all.
IQR is much more reliable when outliers exist
range is affected by outliers; IQR ignores the top and bottom 25%
WE 5Real-world interpretation of SD
The masses of 100 newborn babies have a mean of 3.4 kg and a standard deviation of 0.5 kg. The masses of 100 chocolate bars have a mean of 50 g and a standard deviation of 0.5 g.
(a) Compare the absolute spread. (b) Comment on which data is more “consistent” relative to its mean.
Both have the same SD numerically (0.5) β but their scales are completely different.part (a)
Babies SD = 0.5 kg = 500 g
Chocolate SD = 0.5 g
Babies have 1000Γ more absolute spreadpart (b)
Compare SD as a fraction of the mean:
Babies: 0.5 Γ· 3.4 β 0.147 = 14.7%
Chocolate: 0.5 Γ· 50 = 0.01 = 1%
Chocolate is much more consistent (only 1% variation)
always interpret SD relative to the size of the data β context matters!
π‘ Top tips
- Use your GDC. Don’t waste time computing variance or SD by hand. Enter data in a list, run 1-Var Stats, and read off everything you need.
- Use Οx not sx. The IB AA SL syllabus uses population SD (Ο), which divides by n. The “sample SD” (sx) divides by n β 1 β close, but not the same answer.
- Square SD to get variance. Variance = (SD)Β². Many calculators only display SD directly β just square it if asked for variance.
- Match units carefully. Range, IQR, and SD all share units with the data. Variance has squared units (e.g. kgΒ²). Don’t put kgΒ² when the question wants kg.
- Round to 3 s.f. in the final answer unless told otherwise. But keep more decimals in your working to avoid rounding errors.
- For comparing data sets, always quote both a measure of centre and a measure of spread. Mean alone misses half the story.
- For skewed data or data with outliers, use the IQR. For symmetric data, use the standard deviation.
- “Consistent” or “reliable” in a question usually points to the smaller SD or IQR.
β Common mistakes
- Using sx instead of Οx. They’re both on your calculator β pick the right one. AA SL wants Ο, not s.
- Confusing variance with standard deviation. Variance is the squared version. SD is the square root. They are not the same number!
- Forgetting to square root. If a question asks for “standard deviation” but you give the variance, you’ve done half the work.
- Forgetting to square. If a question asks for “variance” but you give the SD, same problem.
- Using the range when there’s a clear outlier. The range is destroyed by extreme values. Use the IQR instead.
- Trying to find quartiles by hand. Different methods give different answers β and the IB expects GDC values. Don’t second-guess your calculator.
- Quoting variance with the wrong units. If your data is in metres, variance is in mΒ², not m.
- Comparing SDs of different-scale data. An SD of 0.5 means very different things for babies’ weights vs. chocolate bars. Always think about relative spread when comparing across contexts.
You can now describe both where the centre of any data set is, and how spread out it is around that centre. The next note covers frequency tables β how to apply all of these formulas when you have lots of repeated values.
Need help with Measures of Dispersion?
Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.
Book Free Session β