IB Maths AI SLTopic 4 — Statistics ToolkitPaper 1 & 25-number summary~7 min read
Box & Whisker Diagrams
A box plot displays the five-number summary — min, Q1, median, Q3, max — on a single axis. The box shows the middle 50% of data; the whiskers show the rest. Outliers are drawn as × marks outside. Two box plots stacked on the same axis are the fastest way to compare two data sets in an exam.
📘 What you need to know
Five-number summary: min, Q1, median, Q3, max. These are everything a box plot shows.
Box spans Q1 to Q3 — the middle 50%. Median is a vertical line inside the box.
Whiskers extend from each end of the box to the smallest/largest non-outlier value.
Outliers shown as × outside the whiskers (apply the 1.5 × IQR rule from the previous note).
Quartiles split data into quarters: 25% of values lie in each section (min→Q1, Q1→med, med→Q3, Q3→max).
Symmetry: if the median sits in the middle of the box and the whiskers are the same length, the distribution is roughly symmetric (could be modelled by a normal distribution).
Anatomy & comparison
The diagram below shows two box plots on the same scale — the standard exam setup for “compare two distributions”. Always compare a measure of centre (medians) and a measure of spread (IQR or range), in context.
Two box plots on a single scale. Class A (teal) has a higher median (72 vs 60) and a wider box (IQR 24 vs 20) — on average higher scores but more spread out. Class B (orange) is more consistent but with a lower centre.
The five-number summary
min · Q1 · median (Q2) · Q3 · max
range = max − min · IQR = Q3 − Q1
🧭 Recipe — draw or read a box plot
Order the data and read off (or compute) min, Q1, median, Q3, max.
Check for outliers using the 1.5 × IQR rule. Any outliers are drawn separately as ×.
Draw the box from Q1 to Q3 with a vertical line at the median.
Draw the whiskers — left to the smallest non-outlier, right to the largest non-outlier.
To compare two box plots: state a measure of centre (medians) AND a measure of spread (IQR or range), in context.
The four equal sections: each chunk between min, Q1, median, Q3, max contains 25% of the data. Boxes can look very different in width, but each whisker still represents a quarter of all values.
Worked examples
WE 1
Read values from a given box plot
A box plot shows maximum daily temperatures (°C) recorded in a city for 30 days. From the diagram: minimum = 18, Q1 = 22, median = 26, Q3 = 30, maximum = 35.
Find (a) the range (b) the IQR.
(a) Range = max − min35 − 18 = 17range = 17 °C(b) IQR = Q₃ − Q₁30 − 22 = 8IQR = 8 °Crange uses the WHISKER ends; IQR uses the BOX edges. Both have data units (°C here).
Data already ordered. n = 11.min = 152, max = 175Median = 6th valuemedian = 165Quartiles: each half has 5lower 5: 152, 156, 159, 161, 163 → Q₁ = 159upper 5: 167, 168, 170, 172, 175 → Q₃ = 170152 · 159 · 165 · 170 · 175these five numbers are all you need to draw the box plot. Check 1.5×IQR to confirm there are no outliers.
WE 3
Build a box plot from raw data — with outlier
A walker records her daily step count (in thousands) over 11 days:
4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 28
State the values needed to draw the box plot, including any outliers.
Five-number summarymin = 4, max = 28median = 6th = 10Q₁ = 7 (median of lower 5)Q₃ = 13 (median of upper 5)Outlier checkIQR = 6, upper fence = 13 + 9 = 2228 > 22 → 28 is an outlierWhisker ends at largest non-outlierright whisker ends at 14 (not 28)box plot: 4 — 7 — 10 — 13 — 14, with × at 28whiskers always stop at the most extreme NON-outlier. The outlier 28 is a separate × mark beyond the whisker.
WE 4
Compare two box plots in context
Two classes’ end-of-term test scores (out of 100):
Class A: min = 40, Q₁ = 58, median = 72, Q₃ = 82, max = 95
Class B: min = 35, Q₁ = 50, median = 60, Q₃ = 70, max = 80
Compare the two distributions in context.
Centre — compare mediansA: 72, B: 60→ Class A scored higher on averageSpread — compare IQRsA IQR = 82 − 58 = 24B IQR = 70 − 50 = 20→ Class A’s middle 50% is more spread outClass A: higher centre, more variable. Class B: lower but more consistent.always make TWO comparisons (centre + spread) IN CONTEXT. “Class A is higher” alone is not enough for full marks.
WE 5
Shape: symmetric or skewed?
A box plot has Q1 = 30, median = 50, Q3 = 80.
Comment on the symmetry of the distribution and state whether the data could plausibly be modelled by a normal distribution.
Compare the two halves of the boxQ₁ to median: 50 − 30 = 20median to Q₃: 80 − 50 = 30Median is closer to Q₁ → right-skewedpositively skewed; NOT plausibly normalif the median were equidistant from Q₁ and Q₃ (and whiskers were equal lengths), the data would be roughly symmetric — and could be modelled by a normal distribution.
WE 6
Interpret percentages of data from a box plot
The temperatures box plot from WE 1 summarises data from 30 days.
(a) Approximately what percentage of days had a maximum temperature above 30 °C?
(b) Approximately how many days had a maximum temperature between 22 °C and 30 °C?
(a) 30 °C = Q₃. Each quarter holds 25% of data.days above Q₃ ≈ 25%approximately 25%(b) 22 to 30 = Q₁ to Q₃ = middle 50%days = 50% × 30 = 15approximately 15 daysbox plots split data into FOUR equal-count sections — 25% each. The IQR holds the middle 50% by definition. Exact at quartile positions; approximate elsewhere.
💡 Top tips
Each quartile section = 25% of the data. Use this for “approximately how many” questions.
Compare BOTH centre and spread when asked to compare two box plots; one alone loses marks.
Comment in context: “Class A scored higher” beats “the median is higher”.
Symmetric box plot (median centred, whiskers equal) suggests the data could be normal.
Whisker stops at the most extreme non-outlier: outliers sit as separate × marks.
⚠ Common mistakes
Drawing the whisker to the outlier: it must stop at the previous (non-outlier) value, with the outlier as ×.
Forgetting the median line in the box. Without it the plot is incomplete.
Comparing only the medians: you need a spread comparison too (IQR or range).
Saying “Class A is better” from medians alone — what does “better” mean? Use the data’s units in your comparison.
Treating box width as frequency: a wider box doesn’t mean MORE data — both boxes still contain 50% of values. It just means the middle 50% are more spread out.
Next up: Cumulative Frequency Graphs. Box plots show the five-number summary but lose detail in between. Cumulative frequency curves fill in every value — useful for grouped data where you don’t have the raw values. You’ll read off the median and quartiles by drawing horizontal lines at 25%, 50% and 75%.
Need help with AI SL Statistics?
Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.