IB Maths AI SL Statistics Toolkit Paper 1 & 2 Chapter synthesis ~8 min read

Interpreting Data

This is the synthesis note for the whole chapter. Once you know how to calculate the statistics (mean, median, IQR, standard deviation) and how to draw the diagrams (box plots, histograms, cumulative frequency), you have to decide which to use and interpret what they say. Two big skills: pick the right measure / diagram for the data, and compare two data sets with one measure of centre + one measure of spread + a sentence in context.

📘 What you need to know

Picking the right measure

Every statistical measure has a job. The choice depends on (i) the type of data and (ii) whether there are outliers.

Mean uses every value, so it’s sensitive to outliers. One extreme value pulls it strongly toward the extreme. Best for symmetric, outlier-free data.

Median just picks the middle value. Outliers can’t shift it — whether the largest value is 50 or 50 000, the middle stays put. Best when there are outliers or skew.

Mode = the most common value. Only really useful for qualitative data, or for discrete data with a clear winner.

Range uses only the min and max — one outlier wrecks it. Quick-and-dirty only.

IQR = spread of the middle 50%. Outliers don’t affect it. Pair it with the median.

Standard deviation uses every value — the most informative measure of spread, but pulled by outliers. Pair it with the mean.

The pairing rule: mean with standard deviation; median with IQR. Don’t cross them — saying “the mean is 65 and the IQR is 10” is mixing two different philosophies.

Comparing two data sets

The exam loves comparison questions: “compare the two distributions in context”. The reliable formula:

The comparison method ONE comment on centre (mean OR median)
+ ONE comment on spread (SD OR IQR)
+ a sentence each in CONTEXT (use real-world units)

Two box plots stacked on the same axis make this easy: you can read off both medians and both IQRs at a glance.

Comparing two box plots — bus route times Route A Route B 0 5 10 15 20 25 30 35 40 Journey time (minutes) median A = 18 median B = 16 outlier (35 min) IQR = 12 IQR = 6
Two box plots on the same scale make comparison instant: read off both medians and both IQRs, note any outliers. Route B is faster on average (lower median) and more consistent (smaller IQR), but has one anomalously long journey.

🧭 Recipe — compare two distributions

  1. Check for outliers: any extreme values? If yes ⇒ use median + IQR. If no (roughly symmetric) ⇒ use mean + standard deviation.
  2. Compare the centres: state which is bigger and by how much. Add a sentence in context (e.g. “Route B has a lower average journey time”).
  3. Compare the spreads: state which is bigger and by how much. Add a sentence in context (e.g. “Route B is more consistent”).
  4. Mention any unusual features: outliers, gaps, skew, bimodality. Explain in context what they might mean.
  5. Conclude: which set is “better” or more suitable for the question being asked? Tie it back to the real-world decision being made.

Worked examples

WE 1

Mean vs median — choose the better average

The annual salaries (in $) of eight employees at a small company are:

28 000, 32 000, 35 000, 36 000, 38 000, 40 000, 42 000, 250 000

(a) Find the mean and the median. (b) State, with a reason, which is the better representative average.

(a) compute both mean = (28+32+35+36+38+40+42+250)×1000 / 8 = 501 000 / 8 = $62 625 median = midpoint of 4th & 5th = (36+38)/2 ×1000 = $37 000 (b) check for an outlier Q1 = 33 500, Q3 = 41 000, IQR = 7 500 Q3 + 1.5×IQR = 41 000 + 11 250 = 52 250 250 000 > 52 250 ⇒ OUTLIER median ($37 000) is the better average the $250k salary (probably the director) drags the mean up to $62.6k — higher than ANY of the other 7 employees actually earn. The median ignores the outlier and gives a number that actually represents a typical employee.
WE 2

Choose the right diagram

For each data set, state the most appropriate diagram and justify briefly.

(a) Lifespans (hours) of 200 lightbulbs.   (b) Favourite phone brand of 80 students.   (c) Marathon finish times for 5000 runners, with the question “what time was the 90th percentile?”.   (d) Comparing test scores from two classes.

(a) continuous, grouped data histogram (or cumulative frequency graph) (b) qualitative bar chart (gaps between bars) (c) continuous, want a percentile cumulative frequency graph (d) comparing two sets two box plots on the same axis (a) histogram · (b) bar chart · (c) CF graph · (d) box plots “percentile” is the giveaway for cumulative frequency. “Compare two sets” is the giveaway for parallel box plots. Match diagram to data type AND to the question being asked.
WE 3

Compare two box plots in context

The diagram in this note shows journey times for two bus routes. Route A: min 8, Q1 12, median 18, Q3 24, max 32. Route B: min 10, Q1 14, median 16, Q3 20, with an outlier at 35.

Compare the two distributions of journey times in context.

Step 1 — centres median A = 18 min, median B = 16 min B is faster on average by 2 min Step 2 — spreads IQR A = 24−12 = 12 min IQR B = 20−14 = 6 min B is more consistent (half the IQR of A) Step 3 — unusual features B has an outlier at 35 min — a one-off long journey, perhaps traffic Conclude B is preferable for a typical commute — faster and more reliable — but be aware of occasional very long delays B: faster (median 16 vs 18) AND more consistent (IQR 6 vs 12) good comparison answers always have THREE parts: centre, spread, context. Mentioning the outlier separately scores extra. The phrase “more consistent” is what examiners want when IQR is smaller.
WE 4

Compare using mean and standard deviation

Two classes took the same maths test. The results were roughly symmetric with no outliers.

Class A: mean = 72, σ = 5  ·  Class B: mean = 68, σ = 12

Compare the two classes’ performance. Which class did better overall? Which class was more consistent?

no outliers, symmetric — use mean and σ Step 1 — centres mean A = 72, mean B = 68 A scored higher on average (by 4 marks) Step 2 — spreads σ A = 5, σ B = 12 A had smaller standard deviation — results clustered close to the mean B had larger spread — some students did much better, some much worse Conclude A did better overall AND was more consistent a smaller standard deviation means results are tightly clustered around the mean — that’s “consistency”. A larger SD spreads results out — high variability in performance.
WE 5

The effect of an outlier on mean vs median

An athlete’s 100m sprint times (in seconds) over six races were:

11.2, 11.5, 11.6, 11.7, 11.8, 18.5

(The 18.5 s race was one where the athlete fell.) (a) Find the mean and median including all six times. (b) Find the mean and median if the 18.5 s race is excluded. (c) Comment on which measure better represents the athlete’s typical performance.

(a) include all 6 sum = 11.2+11.5+11.6+11.7+11.8+18.5 = 76.3 mean = 76.3 / 6 = 12.72 s (2 dp) median = (11.6+11.7)/2 = 11.65 s (b) exclude the 18.5 sum = 57.8; mean = 57.8/5 = 11.56 s median = 11.6 s (middle of 5) (c) compare mean: jumped from 11.56 to 12.72 (a HUGE shift) median: barely changed (11.6 to 11.65) median better represents typical performance one freak race shifts the mean by over a second. The median is resistant: it stayed within 0.05 s of the “clean” value. This is exactly why median is preferred when outliers are present.
WE 6

Make a recommendation from a comparison

Two cafés in town have been measuring how long customers wait to be served (in minutes):

Café X: median = 5,  IQR = 2  ·  Café Y: median = 6,  IQR = 1

A customer values speed but also dislikes unpredictable wait times. Compare the two cafés and recommend one, justifying your answer.

Step 1 — centre median X = 5, median Y = 6 X is faster on average by 1 min Step 2 — spread IQR X = 2, IQR Y = 1 Y is more consistent (half the spread) Step 3 — the customer’s priorities speed: X wins · predictability: Y wins trade-off — depends which they weight more Recommend Y: only 1 min slower on average, but the wait is much more predictable no single “right” answer here — the question is asking you to JUSTIFY. As long as you reference both the centre AND the spread AND link it back to the customer’s stated priorities, you’ll get full marks. Either Y (consistency) or X (speed) is defensible.

💡 Top tips

⚠ Common mistakes

This was the final note of the Statistics Toolkit chapter. You now have the complete kit: collect data (sampling) ⇒ summarise it (measures of centre and spread) ⇒ display it (box plot / histogram / cumulative frequency) ⇒ interpret it (this note). Every IB AI SL stats question on Paper 1 or 2 is built from these blocks. With practice, the “what to do” becomes automatic, and you can focus all your effort on getting the calculation right and writing the context sentence.

Need help with AI SL Statistics Toolkit?

Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.

Book Free Session →