IB Maths AI SLStatistics ToolkitPaper 1 & 2Diagrams & charts~7 min read
Histograms
A frequency histogram is the right chart for continuous grouped data: heights, weights, times, lengths. Bars touch (no gaps), class widths are equal, and the height of each bar = the frequency of that class. From a histogram you read off the modal class, count how many values fall above or below a threshold, and estimate the mean using mid-interval values.
📘 What you need to know
Histogram = continuous data only: the x-axis is a number line, so bars must touch — no gaps between classes.
Bar chart vs histogram: bar chart for qualitative (colours, names) or discrete (number of pets); histogram for continuous (mass, height, time).
Equal class widths at AI SL — so the bar height directly shows the frequency.
Modal class = the class with the tallest bar.
Read frequencies straight off the height — you can count how many values lie below or above any class boundary by summing the relevant bars.
Shape tells the story: roughly symmetric and bell-shaped ⇒ data could be modelled by a normal distribution. Long tail to one side ⇒ skewed.
The anatomy of a histogram
A histogram is just a frequency table drawn as bars. The x-axis carries the variable (a number line with class boundaries marked) and the y-axis carries the frequency. Bars sit between class boundaries with no gap, because the variable is continuous — every value between, say, 240 and 280 ms is possible.
Histogram of reaction times: bars touch (no gaps), all class widths are 40 ms, bar height = frequency. The tallest bar (280 ≤ t < 320) is the modal class.
Histogram or bar chart? Don’t mix them up
The choice depends on the type of data, not on how it looks. Use this quick test:
Bar chart — gaps between bars — for qualitative data (colour, brand, country) or discrete data (number of pets, goals scored). The categories are separate, so the gaps show that.
Histogram — bars touch — for continuous data (height, time, mass). The variable lives on a continuous number line, so the bars must too.
Mental shortcut: if it makes sense to ask “what’s halfway between two values?” it’s continuous ⇒ histogram. “Halfway between brown eyes and blue eyes” makes no sense ⇒ bar chart.
What histograms are good for
1. Spotting the modal class at a glance — tallest bar.
2. Seeing the shape — symmetric? skewed? bell-shaped? This tells you whether a normal model is plausible.
3. Estimating the mean — multiply each class midpoint by its frequency, sum, and divide by total frequency. This uses the formula below.
Estimating the mean from a histogram (or grouped table)x̄ = ∑ fixin
where xi = mid-interval value, fi = bar height, n = ∑fi
🧠Recipe — drawing or reading a histogram
Check the data is continuous with equal class widths. If yes ⇒ histogram. If discrete/qualitative ⇒ bar chart instead.
Set up axes: x-axis = the variable (a number line with class boundaries); y-axis = frequency. Label both with units.
Draw bars touching, with height = frequency for each class. Equal widths.
Read off what’s asked: modal class (tallest bar); count above/below a value (sum the bar heights); total = sum of all heights.
If asked for the mean, use mid-interval values: x̄ = ∑fixi / n (GDC: enter midpoints and frequencies in 1-Var Stats).
Worked examples
WE 1
Draw a histogram from a frequency table
Reaction times (t, in ms) of 40 students were recorded:
(a) Draw a frequency histogram. (b) State the modal class.
(a) draw the histogramx-axis: reaction time (ms), from 200 to 400y-axis: frequency, from 0 to 15 (need to fit 14)five bars, equal width 40, no gapsheights = 6, 11, 14, 7, 2 (see diagram above)(b) modal class = tallest bartallest = bar with frequency 14(b) 280 ≤ t < 320always check total frequency adds back to the given count: 6+11+14+7+2 = 40 ✓. Quick sanity check.
WE 2
Read frequencies straight off a histogram
A histogram shows daily screen time (t, hours) for 50 students. The bar heights are:
(a) State the modal class. (b) Find how many students used screens for less than 4 hours. (c) Find the percentage who used screens for 6 hours or more.
(a) tallest barheights: 5, 12, 18, 10, 5 ⇒ 18 is biggest(a) 4 ≤ t < 6(b) add bars below 45 + 12 = 17 students(c) add bars from 6 upward, then convert to %10 + 5 = 15 students15/50 × 100 = 30%(b) 17 students · (c) 30%“less than 4” means everything to the left of the 4 boundary — sum the bars whose right edge is ≤ 4. Same idea for “6 or more”: sum from the bar starting at 6 onward.
WE 3
Estimate the mean using mid-interval values
The lengths (cm) of 30 cucumbers are shown in a histogram with these bars:
Step 1 — mid-interval valuesmidpoints: 12, 16, 20, 24, 28Step 2 — sum of f × x3(12) + 8(16) + 12(20) + 6(24) + 1(28)= 36 + 128 + 240 + 144 + 28= 576Step 3 — divide by total frequencyx̄ = 576/30 = 19.2 cmestimated mean = 19.2 cmit’s an ESTIMATE because we don’t know the exact length of each cucumber — only its class. Always say “estimate” when the data is grouped.
WE 4
Which diagram: bar chart or histogram?
For each data set, state whether a bar chart or histogram is more appropriate. Justify each choice.
(a) Favourite holiday destination of 100 tourists. (b) Weights (kg) of 80 watermelons. (c) Number of siblings of 60 students.
(a) “destination” is a category, not a numberqualitative data ⇒ BAR CHART (with gaps)(b) weight is measured, takes any value in a rangecontinuous data ⇒ HISTOGRAM (no gaps)(c) number of siblings: 0, 1, 2, 3, …discrete data ⇒ BAR CHART (with gaps)(a) bar chart · (b) histogram · (c) bar chartcareful with (c) — “number of” things is COUNTED, not measured, so it’s discrete. Bar chart, not histogram. You can’t have 2.7 siblings.
WE 5
Describe the shape of a distribution
A histogram of exam scores (out of 100) for a class of 60 students shows bars of heights:
Comment on the shape of the distribution. Is a normal model appropriate?
describe what you seeheights rise: 3, 10, 18 … then fall: 17, 9, 3single peak around 50−70left half mirrors right half roughlyconcludedistribution is roughly symmetric and bell-shapedYES — a normal model is appropriatea normal distribution looks like a symmetric bell. Two warning signs to reject: a long tail on one side (skewed) or two peaks (bimodal). Neither here — so normal is reasonable.
WE 6
Find a missing bar height
A histogram shows the masses (kg) of 60 puppies in five equal classes from 50 to 75. Four of the bar heights are visible:
(a) Find the missing frequency. (b) Hence state the modal class.
(a) frequencies sum to total8 + 15 + ? + 12 + 5 = 6040 + ? = 60? = 20(b) compare all five8, 15, 20, 12, 5 ⇒ 20 is biggest(a) missing freq = 20 · (b) modal class 60 ≤ w < 65missing-bar problems are pure arithmetic: total frequency MINUS the known bar heights. Then re-rank to find the modal class.
💡 Top tips
Always label both axes with the variable name AND units. Easy mark, easy to forget.
Bars must touch — the moment you see a gap on a histogram, the candidate has misread “continuous” as “discrete”.
Read “less than” / “at least” carefully: “less than 4” includes the bar ending at 4; “at least 4” starts FROM the bar opening at 4.
For the mean, use mid-interval values on the GDC: enter midpoints as L1, frequencies as L2, then run 1-Var Stats.
Comment on shape: “roughly symmetric, single peak” supports a normal model; “long right tail” means skewed.
âš Common mistakes
Drawing gaps between bars — that’s a bar chart. Histograms have bars that touch.
Using a histogram for discrete data — “number of pets” or “goals scored” needs a bar chart, not a histogram.
Estimating the mean from class boundaries instead of midpoints — always use the midpoint of each class as the xi.
Saying the mean is exact — from grouped data the mean is an ESTIMATE. State that explicitly.
Forgetting the totals must add up — a quick check that all bar heights sum to n catches arithmetic slips.
Next up: Interpreting Data — the chapter wrap-up. You’ll see how to compare two data sets using a combination of the diagrams (box plots, histograms, cumulative frequency) and the statistical measures (mean, median, IQR, standard deviation) from the previous notes. The pattern: always compare a measure of centre + a measure of spread + a comment in context.
Need help with AI SL Statistics Toolkit?
Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.