IB Maths AI SLStatistics ToolkitPaper 1 & 2Chapter synthesis~8 min read
Interpreting Data
This is the synthesis note for the whole chapter. Once you know how to calculate the statistics (mean, median, IQR, standard deviation) and how to draw the diagrams (box plots, histograms, cumulative frequency), you have to decide which to use and interpret what they say. Two big skills: pick the right measure / diagram for the data, and compare two data sets with one measure of centre + one measure of spread + a sentence in context.
📘 What you need to know
Outliers? Use median & IQR — both are resistant to extreme values. Mean and standard deviation get dragged by outliers.
Roughly symmetric data? Use mean & standard deviation — they use every value, so they’re the most informative when nothing extreme distorts them.
Mode is for qualitative data (favourite colour, brand). For numbers, prefer median or mean.
Diagram by data type: bar chart (qualitative or discrete) · histogram (continuous, grouped) · cumulative frequency graph (continuous, for percentiles) · box plot (5-number summary, great for comparing).
Compare = centre + spread + context: pick one measure of centre, one of spread, comment in real-world units. Never just quote numbers.
Smaller spread = more consistent. Whether that’s “better” depends on context (consistent delivery times = good; consistent low test scores = bad).
Picking the right measure
Every statistical measure has a job. The choice depends on (i) the type of data and (ii) whether there are outliers.
Mean uses every value, so it’s sensitive to outliers. One extreme value pulls it strongly toward the extreme. Best for symmetric, outlier-free data.
Median just picks the middle value. Outliers can’t shift it — whether the largest value is 50 or 50 000, the middle stays put. Best when there are outliers or skew.
Mode = the most common value. Only really useful for qualitative data, or for discrete data with a clear winner.
Range uses only the min and max — one outlier wrecks it. Quick-and-dirty only.
IQR = spread of the middle 50%. Outliers don’t affect it. Pair it with the median.
Standard deviation uses every value — the most informative measure of spread, but pulled by outliers. Pair it with the mean.
The pairing rule: mean with standard deviation; median with IQR. Don’t cross them — saying “the mean is 65 and the IQR is 10” is mixing two different philosophies.
Comparing two data sets
The exam loves comparison questions: “compare the two distributions in context”. The reliable formula:
The comparison method
ONE comment on centre (mean OR median)
+ ONE comment on spread (SD OR IQR)
+ a sentence each in CONTEXT (use real-world units)
Two box plots stacked on the same axis make this easy: you can read off both medians and both IQRs at a glance.
Two box plots on the same scale make comparison instant: read off both medians and both IQRs, note any outliers. Route B is faster on average (lower median) and more consistent (smaller IQR), but has one anomalously long journey.
🧭 Recipe — compare two distributions
Check for outliers: any extreme values? If yes ⇒ use median + IQR. If no (roughly symmetric) ⇒ use mean + standard deviation.
Compare the centres: state which is bigger and by how much. Add a sentence in context (e.g. “Route B has a lower average journey time”).
Compare the spreads: state which is bigger and by how much. Add a sentence in context (e.g. “Route B is more consistent”).
Mention any unusual features: outliers, gaps, skew, bimodality. Explain in context what they might mean.
Conclude: which set is “better” or more suitable for the question being asked? Tie it back to the real-world decision being made.
Worked examples
WE 1
Mean vs median — choose the better average
The annual salaries (in $) of eight employees at a small company are:
(a) Find the mean and the median. (b) State, with a reason, which is the better representative average.
(a) compute bothmean = (28+32+35+36+38+40+42+250)×1000 / 8 = 501 000 / 8 = $62 625median = midpoint of 4th & 5th = (36+38)/2 ×1000 = $37 000(b) check for an outlierQ1 = 33 500, Q3 = 41 000, IQR = 7 500Q3 + 1.5×IQR = 41 000 + 11 250 = 52 250250 000 > 52 250 ⇒ OUTLIERmedian ($37 000) is the better averagethe $250k salary (probably the director) drags the mean up to $62.6k — higher than ANY of the other 7 employees actually earn. The median ignores the outlier and gives a number that actually represents a typical employee.
WE 2
Choose the right diagram
For each data set, state the most appropriate diagram and justify briefly.
(a) Lifespans (hours) of 200 lightbulbs. (b) Favourite phone brand of 80 students. (c) Marathon finish times for 5000 runners, with the question “what time was the 90th percentile?”. (d) Comparing test scores from two classes.
(a) continuous, grouped datahistogram (or cumulative frequency graph)(b) qualitativebar chart (gaps between bars)(c) continuous, want a percentilecumulative frequency graph(d) comparing two setstwo box plots on the same axis(a) histogram · (b) bar chart · (c) CF graph · (d) box plots“percentile” is the giveaway for cumulative frequency. “Compare two sets” is the giveaway for parallel box plots. Match diagram to data type AND to the question being asked.
WE 3
Compare two box plots in context
The diagram in this note shows journey times for two bus routes. Route A: min 8, Q1 12, median 18, Q3 24, max 32. Route B: min 10, Q1 14, median 16, Q3 20, with an outlier at 35.
Compare the two distributions of journey times in context.
Step 1 — centresmedian A = 18 min, median B = 16 minB is faster on average by 2 minStep 2 — spreadsIQR A = 24−12 = 12 minIQR B = 20−14 = 6 minB is more consistent (half the IQR of A)Step 3 — unusual featuresB has an outlier at 35 min — a one-off long journey, perhaps trafficConcludeB is preferable for a typical commute — faster and more reliable —but be aware of occasional very long delaysB: faster (median 16 vs 18) AND more consistent (IQR 6 vs 12)good comparison answers always have THREE parts: centre, spread, context. Mentioning the outlier separately scores extra. The phrase “more consistent” is what examiners want when IQR is smaller.
WE 4
Compare using mean and standard deviation
Two classes took the same maths test. The results were roughly symmetric with no outliers.
Class A: mean = 72, σ = 5 · Class B: mean = 68, σ = 12
Compare the two classes’ performance. Which class did better overall? Which class was more consistent?
no outliers, symmetric — use mean and σStep 1 — centresmean A = 72, mean B = 68A scored higher on average (by 4 marks)Step 2 — spreadsσ A = 5, σ B = 12A had smaller standard deviation — results clustered close to the meanB had larger spread — some students did much better, some much worseConcludeA did better overall AND was more consistenta smaller standard deviation means results are tightly clustered around the mean — that’s “consistency”. A larger SD spreads results out — high variability in performance.
WE 5
The effect of an outlier on mean vs median
An athlete’s 100m sprint times (in seconds) over six races were:
11.2, 11.5, 11.6, 11.7, 11.8, 18.5
(The 18.5 s race was one where the athlete fell.) (a) Find the mean and median including all six times. (b) Find the mean and median if the 18.5 s race is excluded. (c) Comment on which measure better represents the athlete’s typical performance.
(a) include all 6sum = 11.2+11.5+11.6+11.7+11.8+18.5 = 76.3mean = 76.3 / 6 = 12.72 s (2 dp)median = (11.6+11.7)/2 = 11.65 s(b) exclude the 18.5sum = 57.8; mean = 57.8/5 = 11.56 smedian = 11.6 s (middle of 5)(c) comparemean: jumped from 11.56 to 12.72 (a HUGE shift)median: barely changed (11.6 to 11.65)median better represents typical performanceone freak race shifts the mean by over a second. The median is resistant: it stayed within 0.05 s of the “clean” value. This is exactly why median is preferred when outliers are present.
WE 6
Make a recommendation from a comparison
Two cafés in town have been measuring how long customers wait to be served (in minutes):
Café X: median = 5, IQR = 2 · Café Y: median = 6, IQR = 1
A customer values speed but also dislikes unpredictable wait times. Compare the two cafés and recommend one, justifying your answer.
Step 1 — centremedian X = 5, median Y = 6X is faster on average by 1 minStep 2 — spreadIQR X = 2, IQR Y = 1Y is more consistent (half the spread)Step 3 — the customer’s prioritiesspeed: X wins · predictability: Y winstrade-off — depends which they weight moreRecommend Y: only 1 min slower on average,but the wait is much more predictableno single “right” answer here — the question is asking you to JUSTIFY. As long as you reference both the centre AND the spread AND link it back to the customer’s stated priorities, you’ll get full marks. Either Y (consistency) or X (speed) is defensible.
💡 Top tips
Always pair correctly: mean ↔ standard deviation; median ↔ IQR. Don’t mix.
Comparison phrasing: “On average X is faster/higher/larger… AND X is more/less consistent…”. Two clauses, both in context.
Quote actual numbers when comparing — “median 16 vs 18 min” beats “B has a lower median”.
Spot outliers first, before you pick measures. The 1.5×IQR rule decides for you.
“More consistent” or “less variable” — these phrases match a smaller IQR or SD. Examiners love them.
Recommendation questions need a JUSTIFICATION: link your stats back to what the customer/teacher/user values.
⚠ Common mistakes
Quoting numbers without context: “Median is 16” earns nothing. “Median journey time on Route B is 16 minutes, 2 min less than Route A” earns the marks.
Forgetting to check for outliers before choosing mean vs median — you can’t pick the right measure without that step.
Mixing measures: comparing mean of one set with median of another. Use the SAME measure for both.
Treating “smaller spread” as always better: in some contexts (e.g. exam scores) bigger spread might mean some students did really well. Context decides.
One-line answers: comparison questions usually carry 3−4 marks and need at least 3 sentences. Centre + spread + context, minimum.
This was the final note of the Statistics Toolkit chapter. You now have the complete kit: collect data (sampling) ⇒ summarise it (measures of centre and spread) ⇒ display it (box plot / histogram / cumulative frequency) ⇒ interpret it (this note). Every IB AI SL stats question on Paper 1 or 2 is built from these blocks. With practice, the “what to do” becomes automatic, and you can focus all your effort on getting the calculation right and writing the context sentence.
Need help with AI SL Statistics Toolkit?
Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.