IB Maths AI SL Topic 4 — Statistics Toolkit Paper 1 & 2 CF curves & percentiles ~8 min read

Cumulative Frequency Graphs

A box plot collapses grouped data to five numbers. A cumulative frequency graph keeps every class — you plot the running total at each upper boundary and join the points with a smooth curve. From the curve you can read the median, quartiles, any percentile, and the number of values above or below any threshold.

πŸ“˜ What you need to know

Reading a cumulative frequency graph

The graph below shows homework times (min) for 40 students. The dashed orange lines show the three quartile reads.

Cumulative frequency: homework times for 40 students 020406080100 time t (min) 010203040 cumulative frequency Q₁ = 35 median = 50 Q₃ = 65355065ΒΌnΒ½nΒΎn
The S-shaped CF curve for 40 students’ homework times. Dashed orange lines at y = 10, 20, 30 (= n4, n2, 3n4) hit the curve and drop down to x = 35, 50, 65 — the three quartiles.
Reading values from a CF graph to find a percentile p%: draw horizontal at y = p100n, drop to x-axis
 
to find # values below threshold t: go up at x = t, read off y
# values above t = n − (value read off)

🧭 Recipe — build & use a CF graph

  1. Build the CF column in the frequency table — running total of frequencies.
  2. List plotting points: pair each upper boundary with its cumulative frequency. Add (lower bound of class 1, 0).
  3. Draw the curve: plot the points and join smoothly — always increasing, never going down.
  4. For a percentile p: horizontal at y = pn/100, then vertical drop — the x-value is your answer.
  5. For “above / below a threshold”: vertical at the threshold, horizontal to the y-axis — read the CF, then subtract from n if “above”.
The curve must be increasing: cumulative frequency can never go down. If yours decreases, you’ve mis-added.

Worked examples

WE 1

Build the cumulative frequency table

The number of hours per week 30 students spend watching TV:

hours h0≤h<22≤h<44≤h<66≤h<88≤h<10
freq f481152

Complete the cumulative frequency column and list the points used to plot the CF graph.

Running totals at each upper boundary at 2: 4 at 4: 4+8 = 12 at 6: 12+11 = 23 at 8: 23+5 = 28 at 10: 28+2 = 30 Plotting points (include (0, 0)) (0, 0), (2, 4), (4, 12), (6, 23), (8, 28), (10, 30) CF: 4, 12, 23, 28, 30 Β· plot 6 points always include (lower bound of first class, 0). The curve has to start somewhere β€” otherwise the lower portion of the graph isn’t anchored.
WE 2

Find the median from a CF curve

The CF curve for the commute times of 50 employees passes through these points:

(0, 0), (10, 5), (20, 15), (30, 35), (40, 45), (50, 50)

Estimate the median commute time.

n = 50, so median at cum = n/2 = 25 Find where the curve has y = 25 25 lies between (20, 15) and (30, 35) it sits exactly halfway between cum = 15 and 35 so x is halfway: (20+30)/2 = 25 median β‰ˆ 25 min when the half-line lands midway between two plotted points, the read is midway in x too. Otherwise interpolate or just read off the graph.
WE 3

Find Q1, Q3 and the IQR

Using the cumulative frequency graph shown above (homework times for 40 students), find Q1, Q3 and the IQR.

n = 40 Q₁: horizontal at y = n/4 = 10 read off Q₁ β‰ˆ 35 min Q₃: horizontal at y = 3n/4 = 30 read off Q₃ β‰ˆ 65 min IQR = Q₃ βˆ’ Q₁ IQR = 65 βˆ’ 35 = 30 Q₁ β‰ˆ 35 Β· Q₃ β‰ˆ 65 Β· IQR β‰ˆ 30 min always state CF graph reads with “β‰ˆ” β€” they come from a hand-drawn curve, not exact arithmetic.
WE 4

Find a percentile (P₈₀)

Using the same CF graph (40 students, homework times), estimate the 80th percentile.

Pβ‚ˆβ‚€ β†’ horizontal at y = 80% Γ— n y = 0.8 Γ— 40 = 32 Read off x where the curve has y = 32 32 lies between (60, 28) and (80, 36) midway between 28 and 36 β†’ x midway = 70 Pβ‚ˆβ‚€ β‰ˆ 70 min 80% of students spend less than about 70 min on homework; equivalently, the top 20% spend more.
WE 5

Percentage above a threshold

Using the same CF graph, estimate the percentage of students who spent more than 70 minutes on homework.

Read CF at x = 70 (number below 70) at x = 70: y β‰ˆ 32 Above 70 = n βˆ’ below above = 40 βˆ’ 32 = 8 students As a percentage 8 / 40 = 0.2 β†’ 20% β‰ˆ 20% CF graphs read “less than” directly. For “more than” or “above”, subtract from n. This matches WE 4 β€” the 80th percentile is 70, so 100 βˆ’ 80 = 20% lie above it.
WE 6

Construct a box plot from the CF graph

Using the CF graph for 40 students’ homework times, state the five-number summary and hence describe the box plot you would draw.

Read quartiles (from WE 3) Q₁ β‰ˆ 35, median β‰ˆ 50, Q₃ β‰ˆ 65 Min and max β€” class boundaries lowest class is 0 ≀ t < 20 β†’ min β‰ˆ 0 highest class is 80 ≀ t < 100 β†’ max β‰ˆ 100 Five-number summary 0, 35, 50, 65, 100 box plot: 0 β€” 35 β€” 50 β€” 65 β€” 100 for grouped data we don’t know the exact min and max β€” best estimates are the outer class boundaries. CF graphs are the bridge between grouped frequency tables and box plots.

πŸ’‘ Top tips

⚠ Common mistakes

Next up: Histograms. Frequency tables get a visual partner — bars with no gaps, equal class widths, frequency on the y-axis. Useful for spotting the shape of the distribution (symmetric vs skewed) at a glance and confirming whether a normal-distribution model is plausible.

Need help with AI SL Statistics?

Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.

Book Free Session →