IB Maths AI SL Topic 4 — Statistics Toolkit Paper 1 & 2 Sampling & data ~7 min read

Sampling & Data Collection

Studying every member of a population is rarely possible — we take a sample instead. How the sample is chosen decides whether it represents the population fairly. This note covers data types, the five sampling techniques in the syllabus, and the two formulae you must memorise.

๐Ÿ“˜ What you need to know

Types of data

Qualitative = words (eye colour, brand). Quantitative = numbers. Quantitative splits further into:

Discrete — specific counted values. Examples: number of siblings, shoe size.
Continuous — any value in a range, measured. Examples: height (cm), time (s), mass (kg).

Quick test: can the value have “half” of itself? Half a sibling doesn’t exist (discrete). Half a centimetre does (continuous).

Sampling techniques

Five methods, differing in how members are picked and how much bias they let in.

Four sampling techniques on the same population Simple random all equal chance Systematic (every k=3) start at 2, then +3 each time 123456789101112131415 Stratified (proportional) stratum 1 stratum 2 stratum 3 Convenience (biased) only easy-to-reach members cluster
Four ways of choosing 5 members from a population of 15. Teal = selected, grey = not. Random and systematic spread evenly; stratified picks proportionally from each group; convenience clusters — producing bias.
The two formulae you must know Systematic interval:  k = Nn
 
Stratified count:  number from group = group sizeN × n

๐Ÿงญ Recipe — tackle any sampling question

  1. Read carefully: identify the technique, calculate sample sizes, or criticise / improve a method?
  2. Check for a list: no list ⇒ can’t use random / systematic / stratified ⇒ must use quota or convenience.
  3. For stratified: compute (group ÷ N) × n for each group; check the parts sum to n.
  4. For systematic: divide N by n to get k, then add k repeatedly to the random start.
  5. To improve: usually “increase the sample size” or “use a more random technique”.

Worked examples

WE 1

Classify the types of data

A biologist records the following about a group of birds. Classify each as qualitative, discrete, or continuous.

(i) feather colour   (ii) number of eggs in a nest   (iii) wing length in cm   (iv) species name

(i) colour — words โ†’ qualitative (ii) eggs — counted whole numbers โ†’ discrete (iii) wing length cm — measured, any value โ†’ continuous (iv) species name — words โ†’ qualitative qual ยท disc ยท cont ยท qual test: can the value have “half” of itself? Half an egg makes no sense โ†’ discrete. Half a cm does โ†’ continuous.
WE 2

Stratified sample from a school

A school has 360 students in Year 11, 240 in Year 12, and 200 in Year 13. The principal wants a stratified sample of 40 students. How many from each year group?

Step 1 — total N = 360 + 240 + 200 = 800 Step 2 — (group/N) × n for each year Y11: (360/800) × 40 = 18 Y12: (240/800) × 40 = 12 Y13: (200/800) × 40 = 10 Check 18 + 12 + 10 = 40 โœ“ 18 Y11 ยท 12 Y12 ยท 10 Y13 always check parts sum to n. If they don’t, it’s an arithmetic slip.
WE 3

Systematic — interval and first members

A factory produces 1200 lightbulbs in a day, numbered 1 to 1200. A quality inspector wants a systematic sample of 30 bulbs.

(a) Find the interval k.   (b) Given the random start is 7, list the first four bulbs sampled.

(a) k = N / n k = 1200 / 30 = 40 (b) start at 7, add k = 40 each time 7 โ†’ 47 โ†’ 87 โ†’ 127 k = 40 ยท first four: 7, 47, 87, 127 the random start lies between 1 and k. After that, keep adding k until you have n members.
WE 4

Stratified sample from a library

A library has 1500 books: 600 fiction, 540 non-fiction, and 360 reference. A librarian takes a stratified sample of 25 books. How many of each type?

N = 1500, n = 25 Fiction: (600/1500) × 25 = 10 Non-fiction: (540/1500) × 25 = 9 Reference: (360/1500) × 25 = 6 Check 10 + 9 + 6 = 25 โœ“ 10 fic ยท 9 non-fic ยท 6 ref if a calculation gives a decimal, round to the nearest integer and adjust the largest group so the parts still sum to n.
WE 5

Identify the technique and its weakness

A coffee-shop manager surveys the next 30 customers who walk through the door on Monday morning.

(a) Name the sampling technique.   (b) Give one reason it may not be reliable.

(a) “easiest available” โ†’ convenience convenience sampling (b) Monday-morning customers โ‰  all customers excludes evening & weekend regulars not representative โ†’ biased convenience is the “lazy” method: easy to do, rarely fair. Day, time and location all introduce bias.
WE 6

Hospital sample — calc + alternative when no list

A hospital employs 480 nurses and 120 doctors. The board wants a sample of 25 staff, with both professions represented in proportion.

(a) Find how many nurses and doctors should be sampled.   (b) The board has no list of all 600 staff. Suggest a sampling technique they could use instead, and explain why.

(a) Stratified, N = 600 Nurses: (480/600) × 25 = 20 Doctors: (120/600) × 25 = 5 check: 20 + 5 = 25 โœ“ 20 nurses ยท 5 doctors (b) No list โ†’ can’t sample randomly within strata use quota sampling set quotas: 20 nurses, 5 doctors fill with next available staff quota โ€” same proportions, no list required trigger phrase “no list / no register” โ†’ quota or convenience. Quota when proportions matter; convenience when speed dominates.

๐Ÿ’ก Top tips

โš  Common mistakes

Next up: Measures of Central Tendency — mean, median, mode. Once you have a sample, the next question is “what’s the average?”. Each measure behaves differently with outliers and skewed data, and the GDC computes them all directly from raw or grouped data.

Need help with AI SL Statistics?

Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.

Book Free Session →