IB Maths AI SL Statistics Toolkit Paper 1 & 2 Diagrams & charts ~7 min read

Histograms

A frequency histogram is the right chart for continuous grouped data: heights, weights, times, lengths. Bars touch (no gaps), class widths are equal, and the height of each bar = the frequency of that class. From a histogram you read off the modal class, count how many values fall above or below a threshold, and estimate the mean using mid-interval values.

📘 What you need to know

The anatomy of a histogram

A histogram is just a frequency table drawn as bars. The x-axis carries the variable (a number line with class boundaries marked) and the y-axis carries the frequency. Bars sit between class boundaries with no gap, because the variable is continuous — every value between, say, 240 and 280 ms is possible.

Histogram — reaction times of 40 students 6 11 14 7 2 0 5 10 15 Frequency 200 240 280 320 360 400 Reaction time, t (ms) no gaps between bars modal class
Histogram of reaction times: bars touch (no gaps), all class widths are 40 ms, bar height = frequency. The tallest bar (280 ≤ t < 320) is the modal class.

Histogram or bar chart? Don’t mix them up

The choice depends on the type of data, not on how it looks. Use this quick test:

Bar chart — gaps between bars — for qualitative data (colour, brand, country) or discrete data (number of pets, goals scored). The categories are separate, so the gaps show that.

Histogram — bars touch — for continuous data (height, time, mass). The variable lives on a continuous number line, so the bars must too.

Mental shortcut: if it makes sense to ask “what’s halfway between two values?” it’s continuous ⇒ histogram. “Halfway between brown eyes and blue eyes” makes no sense ⇒ bar chart.

What histograms are good for

1. Spotting the modal class at a glance — tallest bar.

2. Seeing the shape — symmetric? skewed? bell-shaped? This tells you whether a normal model is plausible.

3. Estimating the mean — multiply each class midpoint by its frequency, sum, and divide by total frequency. This uses the formula below.

Estimating the mean from a histogram (or grouped table) x̄ = fi xin   where xi = mid-interval value, fi = bar height, n = ∑fi

🧭 Recipe — drawing or reading a histogram

  1. Check the data is continuous with equal class widths. If yes ⇒ histogram. If discrete/qualitative ⇒ bar chart instead.
  2. Set up axes: x-axis = the variable (a number line with class boundaries); y-axis = frequency. Label both with units.
  3. Draw bars touching, with height = frequency for each class. Equal widths.
  4. Read off what’s asked: modal class (tallest bar); count above/below a value (sum the bar heights); total = sum of all heights.
  5. If asked for the mean, use mid-interval values: x̄ = ∑fixi / n (GDC: enter midpoints and frequencies in 1-Var Stats).

Worked examples

WE 1

Draw a histogram from a frequency table

Reaction times (t, in ms) of 40 students were recorded:

200≤t<240: 6  Â·  240≤t<280: 11  Â·  280≤t<320: 14  Â·  320≤t<360: 7  Â·  360≤t<400: 2

(a) Draw a frequency histogram. (b) State the modal class.

(a) draw the histogram x-axis: reaction time (ms), from 200 to 400 y-axis: frequency, from 0 to 15 (need to fit 14) five bars, equal width 40, no gaps heights = 6, 11, 14, 7, 2 (see diagram above) (b) modal class = tallest bar tallest = bar with frequency 14 (b) 280 ≤ t < 320 always check total frequency adds back to the given count: 6+11+14+7+2 = 40 ✓. Quick sanity check.
WE 2

Read frequencies straight off a histogram

A histogram shows daily screen time (t, hours) for 50 students. The bar heights are:

0≤t<2: 5  Â·  2≤t<4: 12  Â·  4≤t<6: 18  Â·  6≤t<8: 10  Â·  8≤t<10: 5

(a) State the modal class. (b) Find how many students used screens for less than 4 hours. (c) Find the percentage who used screens for 6 hours or more.

(a) tallest bar heights: 5, 12, 18, 10, 5 ⇒ 18 is biggest (a) 4 ≤ t < 6 (b) add bars below 4 5 + 12 = 17 students (c) add bars from 6 upward, then convert to % 10 + 5 = 15 students 15/50 × 100 = 30% (b) 17 students · (c) 30% “less than 4” means everything to the left of the 4 boundary — sum the bars whose right edge is ≤ 4. Same idea for “6 or more”: sum from the bar starting at 6 onward.
WE 3

Estimate the mean using mid-interval values

The lengths (cm) of 30 cucumbers are shown in a histogram with these bars:

10≤L<14: 3  Â·  14≤L<18: 8  Â·  18≤L<22: 12  Â·  22≤L<26: 6  Â·  26≤L<30: 1

Estimate the mean length.

Step 1 — mid-interval values midpoints: 12, 16, 20, 24, 28 Step 2 — sum of f × x 3(12) + 8(16) + 12(20) + 6(24) + 1(28) = 36 + 128 + 240 + 144 + 28 = 576 Step 3 — divide by total frequency x̄ = 576/30 = 19.2 cm estimated mean = 19.2 cm it’s an ESTIMATE because we don’t know the exact length of each cucumber — only its class. Always say “estimate” when the data is grouped.
WE 4

Which diagram: bar chart or histogram?

For each data set, state whether a bar chart or histogram is more appropriate. Justify each choice.

(a) Favourite holiday destination of 100 tourists.   (b) Weights (kg) of 80 watermelons.   (c) Number of siblings of 60 students.

(a) “destination” is a category, not a number qualitative data ⇒ BAR CHART (with gaps) (b) weight is measured, takes any value in a range continuous data ⇒ HISTOGRAM (no gaps) (c) number of siblings: 0, 1, 2, 3, … discrete data ⇒ BAR CHART (with gaps) (a) bar chart · (b) histogram · (c) bar chart careful with (c) — “number of” things is COUNTED, not measured, so it’s discrete. Bar chart, not histogram. You can’t have 2.7 siblings.
WE 5

Describe the shape of a distribution

A histogram of exam scores (out of 100) for a class of 60 students shows bars of heights:

30≤s<40: 3  Â·  40≤s<50: 10  Â·  50≤s<60: 18  Â·  60≤s<70: 17  Â·  70≤s<80: 9  Â·  80≤s<90: 3

Comment on the shape of the distribution. Is a normal model appropriate?

describe what you see heights rise: 3, 10, 18 … then fall: 17, 9, 3 single peak around 50−70 left half mirrors right half roughly conclude distribution is roughly symmetric and bell-shaped YES — a normal model is appropriate a normal distribution looks like a symmetric bell. Two warning signs to reject: a long tail on one side (skewed) or two peaks (bimodal). Neither here — so normal is reasonable.
WE 6

Find a missing bar height

A histogram shows the masses (kg) of 60 puppies in five equal classes from 50 to 75. Four of the bar heights are visible:

50≤w<55: 8  Â·  55≤w<60: 15  Â·  60≤w<65: ?  Â·  65≤w<70: 12  Â·  70≤w<75: 5

(a) Find the missing frequency. (b) Hence state the modal class.

(a) frequencies sum to total 8 + 15 + ? + 12 + 5 = 60 40 + ? = 60 ? = 20 (b) compare all five 8, 15, 20, 12, 5 ⇒ 20 is biggest (a) missing freq = 20 · (b) modal class 60 ≤ w < 65 missing-bar problems are pure arithmetic: total frequency MINUS the known bar heights. Then re-rank to find the modal class.

💡 Top tips

âš  Common mistakes

Next up: Interpreting Data — the chapter wrap-up. You’ll see how to compare two data sets using a combination of the diagrams (box plots, histograms, cumulative frequency) and the statistical measures (mean, median, IQR, standard deviation) from the previous notes. The pattern: always compare a measure of centre + a measure of spread + a comment in context.

Need help with AI SL Statistics Toolkit?

Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.

Book Free Session →