IB Maths AA HLTopic 4 — Statistics & ProbabilityPaper 1 & 2~8 min read
Cumulative Frequency Graphs
When data is grouped, you lose the individual values — but the running total of frequencies, plotted as a smooth S-curve, lets you read off the median, quartiles, percentiles, and “how many are above/below” without needing the raw data. Plot points at upper class boundaries, then join with a smooth increasing curve.
📘 What you need to know
Cumulative frequency at x = total count of values ≤ x (running sum of frequencies).
For grouped data, plot points (upper class boundary, cumulative frequency).
Connect with a smooth, increasing curve — never a stepped or jagged line.
Median: read the x-value at cumulative frequency = n/2.
Quartiles: n/4 for Q1, 3n/4 for Q3.
p-th percentile: read at x-value where cumulative frequency = (p/100) × n.
“How many below value v“: read up from v on the x-axis to the curve, across to the y-axis.
“How many above value v“: subtract from n.
Anatomy of a cumulative frequency graph
Cumulative frequency curve: an “S” shape rising from 0 to n. Drop horizontal lines at n/4, n/2, 3n/4 to find Q1, median, Q3.
Two ways to use the curve
Find a position → value
y → x
draw horizontal at the cumulative frequency, drop down to the x-axis
Find a value → position
x → y
draw vertical from the data value, across to the y-axis
Without a graph, you can also estimate using linear interpolation between known cumulative-frequency points (the calculation is exactly what reading off a smooth curve approximates).
🧭 Recipe — using cumulative frequency
Build the cumulative frequency table: running totals of the frequencies, indexed at upper class boundaries.
Plot points (upper boundary, cumulative frequency) and connect with a smooth curve from (lowest lower bound, 0) upward.
For median/quartiles/percentiles: find the target cumulative position (n/2, n/4, etc.), draw horizontal to the curve, drop vertical to read the data value.
For “how many ≤ v“: draw vertical from v, read across to the y-axis.
For “how many > v“: subtract the answer above from n.
Worked examples
WE 1
Construct a cumulative frequency table
The times in minutes (t) taken by 50 students to complete an exam are summarised below.
Time (min)
30 ≤ t < 40
40 ≤ t < 50
50 ≤ t < 60
60 ≤ t < 70
70 ≤ t < 80
Frequency
5
12
18
10
5
Construct a cumulative frequency table.
Step 1: Running totals (cumulative frequency at the upper boundary of each class)≤ 40: 5≤ 50: 5 + 12 = 17≤ 60: 17 + 18 = 35≤ 70: 35 + 10 = 45≤ 80: 45 + 5 = 50CF table: (40, 5), (50, 17), (60, 35), (70, 45), (80, 50)cumulative frequency reaches n = 50 at the highest upper boundary
WE 2
Estimate median, quartiles, and IQR
Using the cumulative frequency from WE 1 (n = 50), estimate the median, Q1, Q3, and the interquartile range.
Estimate the number/percentage above or below a value
The heights, in cm, of 40 plants are summarised below (cumulative frequency given at upper bounds).
Height (cm)
≤ 10
≤ 20
≤ 30
≤ 40
≤ 50
Cum. freq.
3
11
25
35
40
(a) Estimate the number of plants no taller than 25 cm. (b) Estimate the percentage of plants taller than 35 cm.
(a) CF at 25 — between (20, 11) and (30, 25)CF ≈ 11 + (25 − 20)/(30 − 20) × (25 − 11) = 11 + 0.5 × 14 = 18→ about 18 plants ≤ 25 cm(b) CF at 35 — between (30, 25) and (40, 35)CF ≈ 25 + (35 − 30)/(40 − 30) × (35 − 25) = 25 + 0.5 × 10 = 30Plants > 35 cm: 40 − 30 = 10Percentage: 10/40 × 100 = 25%(a) ≈ 18 plants; (b) 25%“above v” = (n − CF at v); express as percentage of n
WE 4
Estimate a percentile
For the same plant data (n = 40), estimate the 80th percentile.
Step 1: Target cumulative position80% of 40 = 0.8 × 40 = 32Step 2: CF reaches 32 between (30, 25) and (40, 35)Estimate: 30 + (32 − 25)/(35 − 25) × 10= 30 + 7/10 × 10 = 30 + 7 = 3780th percentile ≈ 37 cmmeans about 80% of plants are 37 cm or shorter
WE 5
Construct a box plot from cumulative frequency data
Using the CF data from WE 1 (exam times, n = 50), state the five-number summary you would use to draw a box plot.
Step 1: Min and max from class boundariesLowest class is 30 ≤ t < 40 → Min ≈ 30Highest class is 70 ≤ t < 80 → Max ≈ 80Step 2: Quartiles (from WE 2)Q₁ ≈ 46.3; Median ≈ 54.4; Q₃ ≈ 62.5Five-number summary: (30, 46.3, 54.4, 62.5, 80) — all in minutesgrouped data box plots use class boundaries for min/max — exact values aren’t known
WE 6
Find class frequencies and modal class from a cumulative frequency table
A cumulative frequency table for 50 measurements is given below.
Upper bound
≤ 10
≤ 20
≤ 30
≤ 40
≤ 50
≤ 60
Cum. freq.
4
13
25
38
45
50
(a) Find the frequency in each class. (b) Identify the modal class. (c) Identify the class containing the median.
(a) Class frequencies = differences of consecutive cumulative values0 ≤ x < 10: 410 ≤ x < 20: 13 − 4 = 920 ≤ x < 30: 25 − 13 = 1230 ≤ x < 40: 38 − 25 = 1340 ≤ x < 50: 45 − 38 = 750 ≤ x < 60: 50 − 45 = 5(b) Modal class = highest frequencyfreq 13 → modal class is 30 ≤ x < 40(c) Median position = n/2 = 25CF first reaches 25 at upper bound 30→ median is in 20 ≤ x < 30Modal class: 30 ≤ x < 40; median class: 20 ≤ x < 30individual class frequencies are differences of consecutive CFs — same idea, working backwards
💡 Top tips
Always plot at upper class boundaries — never lower bounds or midpoints.
Connect with a smooth curve, not a staircase or zig-zag.
Show your reading lines on the graph (dashed horizontal then vertical) — examiners want to see the method.
For “how many above v”, subtract from n; don’t try to read from above directly.
Linear interpolation between known cumulative-frequency points gives the same answer as a smooth curve — useful when no graph is provided.
⚠ Common mistakes
Plotting at midpoints or lower bounds — the cumulative count “≤ x” only reaches that count at the upper boundary.
Drawing a stepped line instead of a smooth curve.
Using n/2 + 1 instead of n/2 for the median position — the +1 rule is for raw discrete data, not for grouped cumulative frequency.
Forgetting to convert to a percentage when the question asks for a percentage.
Reading “more than v” directly from the curve — read the “less than or equal to v” then subtract from n.
Next: Histograms. While cumulative frequency graphs answer “how many up to here?”, histograms answer “how is the data distributed?”. Same grouped data, different lens — bars whose heights show frequency directly. The shape reveals modal classes, skewness, and whether the data could be modelled by a normal distribution.
Need help with Statistics & Probability?
Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.