IB Maths AI SL Topic 4 — Statistics Toolkit Paper 1 & 2 5-number summary ~7 min read

Box & Whisker Diagrams

A box plot displays the five-number summary — min, Q1, median, Q3, max — on a single axis. The box shows the middle 50% of data; the whiskers show the rest. Outliers are drawn as × marks outside. Two box plots stacked on the same axis are the fastest way to compare two data sets in an exam.

📘 What you need to know

Anatomy & comparison

The diagram below shows two box plots on the same scale — the standard exam setup for “compare two distributions”. Always compare a measure of centre (medians) and a measure of spread (IQR or range), in context.

Comparing two classes’ test scores 30405060708090100 test score Class A 405872 (med)8295 Class B 355060 (med)7080 12-point gap in medians
Two box plots on a single scale. Class A (teal) has a higher median (72 vs 60) and a wider box (IQR 24 vs 20) — on average higher scores but more spread out. Class B (orange) is more consistent but with a lower centre.
The five-number summary min  ·  Q1  ·  median (Q2)  ·  Q3  ·  max
 
range = max − min  ·  IQR = Q3Q1

🧭 Recipe — draw or read a box plot

  1. Order the data and read off (or compute) min, Q1, median, Q3, max.
  2. Check for outliers using the 1.5 × IQR rule. Any outliers are drawn separately as ×.
  3. Draw the box from Q1 to Q3 with a vertical line at the median.
  4. Draw the whiskers — left to the smallest non-outlier, right to the largest non-outlier.
  5. To compare two box plots: state a measure of centre (medians) AND a measure of spread (IQR or range), in context.
The four equal sections: each chunk between min, Q1, median, Q3, max contains 25% of the data. Boxes can look very different in width, but each whisker still represents a quarter of all values.

Worked examples

WE 1

Read values from a given box plot

A box plot shows maximum daily temperatures (°C) recorded in a city for 30 days. From the diagram: minimum = 18, Q1 = 22, median = 26, Q3 = 30, maximum = 35.

Find (a) the range   (b) the IQR.

(a) Range = max − min 35 − 18 = 17 range = 17 °C (b) IQR = Q₃ − Q₁ 30 − 22 = 8 IQR = 8 °C range uses the WHISKER ends; IQR uses the BOX edges. Both have data units (°C here).
WE 2

Five-number summary from raw data

Heights (cm) of 11 students:

152,  156,  159,  161,  163,  165,  167,  168,  170,  172,  175

Find the five-number summary.

Data already ordered. n = 11. min = 152, max = 175 Median = 6th value median = 165 Quartiles: each half has 5 lower 5: 152, 156, 159, 161, 163 → Q₁ = 159 upper 5: 167, 168, 170, 172, 175 → Q₃ = 170 152 · 159 · 165 · 170 · 175 these five numbers are all you need to draw the box plot. Check 1.5×IQR to confirm there are no outliers.
WE 3

Build a box plot from raw data — with outlier

A walker records her daily step count (in thousands) over 11 days:

4,  6,  7,  8,  9,  10,  11,  12,  13,  14,  28

State the values needed to draw the box plot, including any outliers.

Five-number summary min = 4, max = 28 median = 6th = 10 Q₁ = 7 (median of lower 5) Q₃ = 13 (median of upper 5) Outlier check IQR = 6, upper fence = 13 + 9 = 22 28 > 22 → 28 is an outlier Whisker ends at largest non-outlier right whisker ends at 14 (not 28) box plot: 4 — 7 — 10 — 13 — 14, with × at 28 whiskers always stop at the most extreme NON-outlier. The outlier 28 is a separate × mark beyond the whisker.
WE 4

Compare two box plots in context

Two classes’ end-of-term test scores (out of 100):

Class A: min = 40, Q₁ = 58, median = 72, Q₃ = 82, max = 95
Class B: min = 35, Q₁ = 50, median = 60, Q₃ = 70, max = 80

Compare the two distributions in context.

Centre — compare medians A: 72, B: 60 → Class A scored higher on average Spread — compare IQRs A IQR = 82 − 58 = 24 B IQR = 70 − 50 = 20 → Class A’s middle 50% is more spread out Class A: higher centre, more variable. Class B: lower but more consistent. always make TWO comparisons (centre + spread) IN CONTEXT. “Class A is higher” alone is not enough for full marks.
WE 5

Shape: symmetric or skewed?

A box plot has Q1 = 30, median = 50, Q3 = 80.

Comment on the symmetry of the distribution and state whether the data could plausibly be modelled by a normal distribution.

Compare the two halves of the box Q₁ to median: 50 − 30 = 20 median to Q₃: 80 − 50 = 30 Median is closer to Q₁ → right-skewed positively skewed; NOT plausibly normal if the median were equidistant from Q₁ and Q₃ (and whiskers were equal lengths), the data would be roughly symmetric — and could be modelled by a normal distribution.
WE 6

Interpret percentages of data from a box plot

The temperatures box plot from WE 1 summarises data from 30 days.

(a) Approximately what percentage of days had a maximum temperature above 30 °C?
(b) Approximately how many days had a maximum temperature between 22 °C and 30 °C?

(a) 30 °C = Q₃. Each quarter holds 25% of data. days above Q₃ ≈ 25% approximately 25% (b) 22 to 30 = Q₁ to Q₃ = middle 50% days = 50% × 30 = 15 approximately 15 days box plots split data into FOUR equal-count sections — 25% each. The IQR holds the middle 50% by definition. Exact at quartile positions; approximate elsewhere.

💡 Top tips

⚠ Common mistakes

Next up: Cumulative Frequency Graphs. Box plots show the five-number summary but lose detail in between. Cumulative frequency curves fill in every value — useful for grouped data where you don’t have the raw values. You’ll read off the median and quartiles by drawing horizontal lines at 25%, 50% and 75%.

Need help with AI SL Statistics?

Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.

Book Free Session →