IB Maths AI SLTopic 4 — Statistics ToolkitPaper 1 & 2CF curves & percentiles~8 min read
Cumulative Frequency Graphs
A box plot collapses grouped data to five numbers. A cumulative frequency graph keeps every class — you plot the running total at each upper boundary and join the points with a smooth curve. From the curve you can read the median, quartiles, any percentile, and the number of values above or below any threshold.
π What you need to know
Cumulative frequency = running total of the frequencies up to the upper boundary of each class.
Plotting points: (upper boundary, cumulative frequency). Also plot (lower bound of first class, 0).
Curve shape: a smooth, increasing S-curve (sometimes called an ogive) joining the points.
Read down for x, across for y: draw a horizontal line at the target cum freq, drop down to the x-axis.
Quartiles: Q1 at 14n Β· median at 12n Β· Q3 at 34n. The pth percentile is at p100n.
For “how many are above x?”: go up from x, read the CF, then subtract from n.
Reading a cumulative frequency graph
The graph below shows homework times (min) for 40 students. The dashed orange lines show the three quartile reads.
The S-shaped CF curve for 40 students’ homework times. Dashed orange lines at y = 10, 20, 30 (= n4, n2, 3n4) hit the curve and drop down to x = 35, 50, 65 — the three quartiles.
Reading values from a CF graph
to find a percentile p%: draw horizontal at y = p100n, drop to x-axis
to find # values below threshold t: go up at x = t, read off y
# values above t = n − (value read off)
π§ Recipe — build & use a CF graph
Build the CF column in the frequency table — running total of frequencies.
List plotting points: pair each upper boundary with its cumulative frequency. Add (lower bound of class 1, 0).
Draw the curve: plot the points and join smoothly — always increasing, never going down.
For a percentile p: horizontal at y = pn/100, then vertical drop — the x-value is your answer.
For “above / below a threshold”: vertical at the threshold, horizontal to the y-axis — read the CF, then subtract from n if “above”.
The curve must be increasing: cumulative frequency can never go down. If yours decreases, you’ve mis-added.
Worked examples
WE 1
Build the cumulative frequency table
The number of hours per week 30 students spend watching TV:
hours h
0≤h<2
2≤h<4
4≤h<6
6≤h<8
8≤h<10
freq f
4
8
11
5
2
Complete the cumulative frequency column and list the points used to plot the CF graph.
Running totals at each upper boundaryat 2: 4at 4: 4+8 = 12at 6: 12+11 = 23at 8: 23+5 = 28at 10: 28+2 = 30Plotting points (include (0, 0))(0, 0), (2, 4), (4, 12), (6, 23), (8, 28), (10, 30)CF: 4, 12, 23, 28, 30 Β· plot 6 pointsalways include (lower bound of first class, 0). The curve has to start somewhere β otherwise the lower portion of the graph isn’t anchored.
WE 2
Find the median from a CF curve
The CF curve for the commute times of 50 employees passes through these points:
n = 50, so median at cum = n/2 = 25Find where the curve has y = 2525 lies between (20, 15) and (30, 35)it sits exactly halfway between cum = 15 and 35so x is halfway: (20+30)/2 = 25median β 25 minwhen the half-line lands midway between two plotted points, the read is midway in x too. Otherwise interpolate or just read off the graph.
WE 3
Find Q1, Q3 and the IQR
Using the cumulative frequency graph shown above (homework times for 40 students), find Q1, Q3 and the IQR.
n = 40Qβ: horizontal at y = n/4 = 10read off Qβ β 35 minQβ: horizontal at y = 3n/4 = 30read off Qβ β 65 minIQR = Qβ β QβIQR = 65 β 35 = 30Qβ β 35 Β· Qβ β 65 Β· IQR β 30 minalways state CF graph reads with “β” β they come from a hand-drawn curve, not exact arithmetic.
WE 4
Find a percentile (P₈₀)
Using the same CF graph (40 students, homework times), estimate the 80th percentile.
Pββ β horizontal at y = 80% Γ ny = 0.8 Γ 40 = 32Read off x where the curve has y = 3232 lies between (60, 28) and (80, 36)midway between 28 and 36 β x midway = 70Pββ β 70 min80% of students spend less than about 70 min on homework; equivalently, the top 20% spend more.
WE 5
Percentage above a threshold
Using the same CF graph, estimate the percentage of students who spent more than 70 minutes on homework.
Read CF at x = 70 (number below 70)at x = 70: y β 32Above 70 = n β belowabove = 40 β 32 = 8 studentsAs a percentage8 / 40 = 0.2 β 20%β 20%CF graphs read “less than” directly. For “more than” or “above”, subtract from n. This matches WE 4 β the 80th percentile is 70, so 100 β 80 = 20% lie above it.
WE 6
Construct a box plot from the CF graph
Using the CF graph for 40 students’ homework times, state the five-number summary and hence describe the box plot you would draw.
Read quartiles (from WE 3)Qβ β 35, median β 50, Qβ β 65Min and max β class boundarieslowest class is 0 β€ t < 20 β min β 0highest class is 80 β€ t < 100 β max β 100Five-number summary0, 35, 50, 65, 100box plot: 0 β 35 β 50 β 65 β 100for grouped data we don’t know the exact min and max β best estimates are the outer class boundaries. CF graphs are the bridge between grouped frequency tables and box plots.
π‘ Top tips
Always plot at the UPPER boundary, not the midpoint. CF means “β€ this value”.
Compute n/2, n/4, 3n/4 first: these are the y-axis levels you’ll read at.
Draw faint construction lines for every read — horizontal then vertical — so the examiner can see your method.
“Less than”: read down; “more than”: subtract from n.
CF graphs are for grouped data — answers are estimates. Write “β” and round sensibly.
β Common mistakes
Plotting at the midpoint of each class. Use the UPPER boundary — cumulative is “up to this point”.
Forgetting (0, 0): the curve must start at the lower bound of the first class with zero cumulative frequency.
Connecting points with straight lines: IB wants a smooth curve (or polyline is OK if specified).
Reading from the wrong axis: “median” is an x-value; “how many above 70” is a y-related value.
Treating “more than” as the direct read: the CF graph gives “less than or equal to”. Subtract from n for the complement.
Next up: Histograms. Frequency tables get a visual partner — bars with no gaps, equal class widths, frequency on the y-axis. Useful for spotting the shape of the distribution (symmetric vs skewed) at a glance and confirming whether a normal-distribution model is plausible.
Need help with AI SL Statistics?
Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.