IB Maths AA HL Topic 4 — Statistics & Probability Paper 1 & 2 ~8 min read

Cumulative Frequency Graphs

When data is grouped, you lose the individual values — but the running total of frequencies, plotted as a smooth S-curve, lets you read off the median, quartiles, percentiles, and “how many are above/below” without needing the raw data. Plot points at upper class boundaries, then join with a smooth increasing curve.

📘 What you need to know

Anatomy of a cumulative frequency graph

0 n/4 n/2 3n/4 n cumulative frequency data values (upper class boundaries) → Q₁ median Q₃
Cumulative frequency curve: an “S” shape rising from 0 to n. Drop horizontal lines at n/4, n/2, 3n/4 to find Q1, median, Q3.

Two ways to use the curve

Find a position → value
y → x
draw horizontal at the cumulative frequency, drop down to the x-axis
Find a value → position
x → y
draw vertical from the data value, across to the y-axis
Without a graph, you can also estimate using linear interpolation between known cumulative-frequency points (the calculation is exactly what reading off a smooth curve approximates).

🧭 Recipe — using cumulative frequency

  1. Build the cumulative frequency table: running totals of the frequencies, indexed at upper class boundaries.
  2. Plot points (upper boundary, cumulative frequency) and connect with a smooth curve from (lowest lower bound, 0) upward.
  3. For median/quartiles/percentiles: find the target cumulative position (n/2, n/4, etc.), draw horizontal to the curve, drop vertical to read the data value.
  4. For “how many ≤ v: draw vertical from v, read across to the y-axis.
  5. For “how many > v: subtract the answer above from n.

Worked examples

WE 1

Construct a cumulative frequency table

The times in minutes (t) taken by 50 students to complete an exam are summarised below.

Time (min)30 ≤ t < 4040 ≤ t < 5050 ≤ t < 6060 ≤ t < 7070 ≤ t < 80
Frequency51218105

Construct a cumulative frequency table.

Step 1: Running totals (cumulative frequency at the upper boundary of each class) ≤ 40: 5 ≤ 50: 5 + 12 = 17 ≤ 60: 17 + 18 = 35 ≤ 70: 35 + 10 = 45 ≤ 80: 45 + 5 = 50 CF table: (40, 5), (50, 17), (60, 35), (70, 45), (80, 50) cumulative frequency reaches n = 50 at the highest upper boundary
WE 2

Estimate median, quartiles, and IQR

Using the cumulative frequency from WE 1 (n = 50), estimate the median, Q1, Q3, and the interquartile range.

Step 1: Target cumulative positions Median: n/2 = 25; Q₁: n/4 = 12.5; Q₃: 3n/4 = 37.5 Step 2: Median — cumulative reaches 25 between (50, 17) and (60, 35) Median ≈ 50 + (25 − 17)/(35 − 17) × 10 = 50 + 8/18 × 10 ≈ 54.4 Step 3: Q₁ — cumulative reaches 12.5 between (40, 5) and (50, 17) Q₁ ≈ 40 + (12.5 − 5)/(17 − 5) × 10 = 40 + 7.5/12 × 10 ≈ 46.25 Step 4: Q₃ — cumulative reaches 37.5 between (60, 35) and (70, 45) Q₃ ≈ 60 + (37.5 − 35)/(45 − 35) × 10 = 60 + 2.5/10 × 10 = 62.5 Step 5: IQR IQR ≈ 62.5 − 46.25 = 16.25 Median ≈ 54.4 min; Q₁ ≈ 46.3 min; Q₃ ≈ 62.5 min; IQR ≈ 16.3 min linear interpolation gives the same answer as reading off a smooth curve
WE 3

Estimate the number/percentage above or below a value

The heights, in cm, of 40 plants are summarised below (cumulative frequency given at upper bounds).

Height (cm)≤ 10≤ 20≤ 30≤ 40≤ 50
Cum. freq.311253540

(a) Estimate the number of plants no taller than 25 cm. (b) Estimate the percentage of plants taller than 35 cm.

(a) CF at 25 — between (20, 11) and (30, 25) CF ≈ 11 + (25 − 20)/(30 − 20) × (25 − 11) = 11 + 0.5 × 14 = 18 → about 18 plants ≤ 25 cm (b) CF at 35 — between (30, 25) and (40, 35) CF ≈ 25 + (35 − 30)/(40 − 30) × (35 − 25) = 25 + 0.5 × 10 = 30 Plants > 35 cm: 40 − 30 = 10 Percentage: 10/40 × 100 = 25% (a) ≈ 18 plants; (b) 25% “above v” = (n − CF at v); express as percentage of n
WE 4

Estimate a percentile

For the same plant data (n = 40), estimate the 80th percentile.

Step 1: Target cumulative position 80% of 40 = 0.8 × 40 = 32 Step 2: CF reaches 32 between (30, 25) and (40, 35) Estimate: 30 + (32 − 25)/(35 − 25) × 10 = 30 + 7/10 × 10 = 30 + 7 = 37 80th percentile ≈ 37 cm means about 80% of plants are 37 cm or shorter
WE 5

Construct a box plot from cumulative frequency data

Using the CF data from WE 1 (exam times, n = 50), state the five-number summary you would use to draw a box plot.

Step 1: Min and max from class boundaries Lowest class is 30 ≤ t < 40 → Min ≈ 30 Highest class is 70 ≤ t < 80 → Max ≈ 80 Step 2: Quartiles (from WE 2) Q₁ ≈ 46.3; Median ≈ 54.4; Q₃ ≈ 62.5 Five-number summary: (30, 46.3, 54.4, 62.5, 80) — all in minutes grouped data box plots use class boundaries for min/max — exact values aren’t known
WE 6

Find class frequencies and modal class from a cumulative frequency table

A cumulative frequency table for 50 measurements is given below.

Upper bound≤ 10≤ 20≤ 30≤ 40≤ 50≤ 60
Cum. freq.41325384550

(a) Find the frequency in each class. (b) Identify the modal class. (c) Identify the class containing the median.

(a) Class frequencies = differences of consecutive cumulative values 0 ≤ x < 10: 4 10 ≤ x < 20: 13 − 4 = 9 20 ≤ x < 30: 25 − 13 = 12 30 ≤ x < 40: 38 − 25 = 13 40 ≤ x < 50: 45 − 38 = 7 50 ≤ x < 60: 50 − 45 = 5 (b) Modal class = highest frequency freq 13 → modal class is 30 ≤ x < 40 (c) Median position = n/2 = 25 CF first reaches 25 at upper bound 30 → median is in 20 ≤ x < 30 Modal class: 30 ≤ x < 40; median class: 20 ≤ x < 30 individual class frequencies are differences of consecutive CFs — same idea, working backwards

💡 Top tips

⚠ Common mistakes

Next: Histograms. While cumulative frequency graphs answer “how many up to here?”, histograms answer “how is the data distributed?”. Same grouped data, different lens — bars whose heights show frequency directly. The shape reveals modal classes, skewness, and whether the data could be modelled by a normal distribution.

Need help with Statistics & Probability?

Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.

Book Free Session →