IB Maths AI SL Topic 4 — Statistics Toolkit Paper 1 & 2 Sampling & data ~7 min read

Sampling & Data Collection

Studying every member of a population is rarely possible — we take a sample instead. How the sample is chosen decides whether it represents the population fairly. This note covers data types, the five sampling techniques in the syllabus, and the two formulae you must memorise.

📘 What you need to know

Data types: qualitative (words) vs quantitative (numbers); quantitative splits into discrete (counted) vs continuous (measured).
Population N = whole group of interest. Sample n = subset studied. Sampling frame = a list of every member.
Simple random: number every member, pick n at random — equal chance for all.
Systematic: pick every k^th member, where k = Nn, starting at a random number 1 to k.
Stratified: split into groups; from each, sample group sizeN × n members at random.
Quota & Convenience: used when no sampling frame exists. Both risk bias — convenience the most.

Types of data

Qualitative = words (eye colour, brand). Quantitative = numbers. Quantitative splits further into:

• Discrete — specific counted values. Examples: number of siblings, shoe size.
• Continuous — any value in a range, measured. Examples: height (cm), time (s), mass (kg).

Quick test: can the value have “half” of itself? Half a sibling doesn’t exist (discrete). Half a centimetre does (continuous).

Sampling techniques

Five methods, differing in how members are picked and how much bias they let in.

Four ways of choosing 5 members from a population of 15. Teal = selected, grey = not. Random and systematic spread evenly; stratified picks proportionally from each group; convenience clusters — producing bias.

The two formulae you must know Systematic interval: k = Nn

Stratified count: number from group = group sizeN × n

🧭 Recipe — tackle any sampling question

Read carefully: identify the technique, calculate sample sizes, or criticise / improve a method?
Check for a list: no list ⇒ can’t use random / systematic / stratified ⇒ must use quota or convenience.
For stratified: compute (group ÷ N) × n for each group; check the parts sum to n.
For systematic: divide N by n to get k, then add k repeatedly to the random start.
To improve: usually “increase the sample size” or “use a more random technique”.

Worked examples

WE 1

Classify the types of data

A biologist records the following about a group of birds. Classify each as qualitative, discrete, or continuous.

(i) feather colour (ii) number of eggs in a nest (iii) wing length in cm (iv) species name

(i) colour — words → qualitative (ii) eggs — counted whole numbers → discrete (iii) wing length cm — measured, any value → continuous (iv) species name — words → qualitative qual · disc · cont · qual test: can the value have “half” of itself? Half an egg makes no sense → discrete. Half a cm does → continuous.

WE 2

Stratified sample from a school

A school has 360 students in Year 11, 240 in Year 12, and 200 in Year 13. The principal wants a stratified sample of 40 students. How many from each year group?

Step 1 — total N = 360 + 240 + 200 = 800 Step 2 — (group/N) × n for each year Y11: (360/800) × 40 = 18 Y12: (240/800) × 40 = 12 Y13: (200/800) × 40 = 10 Check 18 + 12 + 10 = 40 ✓ 18 Y11 · 12 Y12 · 10 Y13 always check parts sum to n. If they don’t, it’s an arithmetic slip.

WE 3

Systematic — interval and first members

A factory produces 1200 lightbulbs in a day, numbered 1 to 1200. A quality inspector wants a systematic sample of 30 bulbs.

(a) Find the interval k. (b) Given the random start is 7, list the first four bulbs sampled.

(a) k = N / n k = 1200 / 30 = 40 (b) start at 7, add k = 40 each time 7 → 47 → 87 → 127 k = 40 · first four: 7, 47, 87, 127 the random start lies between 1 and k. After that, keep adding k until you have n members.

WE 4

Stratified sample from a library

A library has 1500 books: 600 fiction, 540 non-fiction, and 360 reference. A librarian takes a stratified sample of 25 books. How many of each type?

N = 1500, n = 25 Fiction: (600/1500) × 25 = 10 Non-fiction: (540/1500) × 25 = 9 Reference: (360/1500) × 25 = 6 Check 10 + 9 + 6 = 25 ✓ 10 fic · 9 non-fic · 6 ref if a calculation gives a decimal, round to the nearest integer and adjust the largest group so the parts still sum to n.

WE 5

Identify the technique and its weakness

A coffee-shop manager surveys the next 30 customers who walk through the door on Monday morning.

(a) Name the sampling technique. (b) Give one reason it may not be reliable.

(a) “easiest available” → convenience convenience sampling (b) Monday-morning customers ≠ all customers excludes evening & weekend regulars not representative → biased convenience is the “lazy” method: easy to do, rarely fair. Day, time and location all introduce bias.

WE 6

Hospital sample — calc + alternative when no list

A hospital employs 480 nurses and 120 doctors. The board wants a sample of 25 staff, with both professions represented in proportion.

(a) Find how many nurses and doctors should be sampled. (b) The board has no list of all 600 staff. Suggest a sampling technique they could use instead, and explain why.

(a) Stratified, N = 600 Nurses: (480/600) × 25 = 20 Doctors: (120/600) × 25 = 5 check: 20 + 5 = 25 ✓ 20 nurses · 5 doctors (b) No list → can’t sample randomly within strata use quota sampling set quotas: 20 nurses, 5 doctors fill with next available staff quota — same proportions, no list required trigger phrase “no list / no register” → quota or convenience. Quota when proportions matter; convenience when speed dominates.

💡 Top tips

“No sampling frame” is the key phrase: forces quota or convenience — never random / systematic / stratified.
Stratified parts must sum to n: always check.
Random start for systematic: between 1 and k, not 1 and N.
To improve any method: a safe answer is “increase the sample size”.
Bias = systematic favouring of some members. Random methods minimise it; convenience maximises it.

⚠ Common mistakes

Confusing population and sample: N = whole group, n = chosen subset. Mixing them flips the formula.
Calling stratified “quota”: both split into groups, but stratified picks RANDOMLY within each; quota does not.
Forgetting the sum check: easy mark lost when stratified parts don’t add to n.
Using k = N/n without a list: systematic needs an ordered list, just like random and stratified.
Suggesting only “more people”: bigger sample + better method (e.g. random instead of convenience) is the stronger answer.

Next up: Measures of Central Tendency — mean, median, mode. Once you have a sample, the next question is “what’s the average?”. Each measure behaves differently with outliers and skewed data, and the GDC computes them all directly from raw or grouped data.

Need help with AI SL Statistics?

Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.

Book Free Session →

Sampling & Data Collection

📘 What you need to know

Types of data

Sampling techniques

🧭 Recipe — tackle any sampling question

Worked examples

💡 Top tips

⚠ Common mistakes

Need help with AI SL Statistics?

Quick Links

Contact us

Follow us