IB Maths AI SLTopic 4 — Statistics ToolkitPaper 1 & 2Sampling & data~7 min read
Sampling & Data Collection
Studying every member of a population is rarely possible — we take a sample instead. How the sample is chosen decides whether it represents the population fairly. This note covers data types, the five sampling techniques in the syllabus, and the two formulae you must memorise.
๐ What you need to know
Data types: qualitative (words) vs quantitative (numbers); quantitative splits into discrete (counted) vs continuous (measured).
PopulationN = whole group of interest. Samplen = subset studied. Sampling frame = a list of every member.
Simple random: number every member, pick n at random — equal chance for all.
Systematic: pick every kth member, where k = Nn, starting at a random number 1 to k.
Stratified: split into groups; from each, sample group sizeN × n members at random.
Quota & Convenience: used when no sampling frame exists. Both risk bias — convenience the most.
Types of data
Qualitative = words (eye colour, brand). Quantitative = numbers. Quantitative splits further into:
• Discrete — specific counted values. Examples: number of siblings, shoe size.
• Continuous — any value in a range, measured. Examples: height (cm), time (s), mass (kg).
Quick test: can the value have “half” of itself? Half a sibling doesn’t exist (discrete). Half a centimetre does (continuous).
Sampling techniques
Five methods, differing in how members are picked and how much bias they let in.
Four ways of choosing 5 members from a population of 15. Teal = selected, grey = not. Random and systematic spread evenly; stratified picks proportionally from each group; convenience clusters — producing bias.
The two formulae you must know
Systematic interval: k = Nn
Stratified count: number from group = group sizeN × n
๐งญ Recipe — tackle any sampling question
Read carefully: identify the technique, calculate sample sizes, or criticise / improve a method?
Check for a list: no list ⇒ can’t use random / systematic / stratified ⇒ must use quota or convenience.
For stratified: compute (group ÷ N) × n for each group; check the parts sum to n.
For systematic: divide N by n to get k, then add k repeatedly to the random start.
To improve: usually “increase the sample size” or “use a more random technique”.
Worked examples
WE 1
Classify the types of data
A biologist records the following about a group of birds. Classify each as qualitative, discrete, or continuous.
(i) feather colour (ii) number of eggs in a nest (iii) wing length in cm (iv) species name
(i) colour — wordsโ qualitative(ii) eggs — counted whole numbersโ discrete(iii) wing length cm — measured, any valueโ continuous(iv) species name — wordsโ qualitativequal ยท disc ยท cont ยท qualtest: can the value have “half” of itself? Half an egg makes no sense โ discrete. Half a cm does โ continuous.
WE 2
Stratified sample from a school
A school has 360 students in Year 11, 240 in Year 12, and 200 in Year 13. The principal wants a stratified sample of 40 students. How many from each year group?
Step 1 — totalN = 360 + 240 + 200 = 800Step 2 — (group/N) × n for each yearY11: (360/800) × 40 = 18Y12: (240/800) × 40 = 12Y13: (200/800) × 40 = 10Check18 + 12 + 10 = 40 โ18 Y11 ยท 12 Y12 ยท 10 Y13always check parts sum to n. If they don’t, it’s an arithmetic slip.
WE 3
Systematic — interval and first members
A factory produces 1200 lightbulbs in a day, numbered 1 to 1200. A quality inspector wants a systematic sample of 30 bulbs.
(a) Find the interval k. (b) Given the random start is 7, list the first four bulbs sampled.
(a) k = N / nk = 1200 / 30 = 40(b) start at 7, add k = 40 each time7 โ 47 โ 87 โ 127k = 40 ยท first four: 7, 47, 87, 127the random start lies between 1 and k. After that, keep adding k until you have n members.
WE 4
Stratified sample from a library
A library has 1500 books: 600 fiction, 540 non-fiction, and 360 reference. A librarian takes a stratified sample of 25 books. How many of each type?
N = 1500, n = 25Fiction: (600/1500) × 25 = 10Non-fiction: (540/1500) × 25 = 9Reference: (360/1500) × 25 = 6Check10 + 9 + 6 = 25 โ10 fic ยท 9 non-fic ยท 6 refif a calculation gives a decimal, round to the nearest integer and adjust the largest group so the parts still sum to n.
WE 5
Identify the technique and its weakness
A coffee-shop manager surveys the next 30 customers who walk through the door on Monday morning.
(a) Name the sampling technique. (b) Give one reason it may not be reliable.
(a) “easiest available” โ convenienceconvenience sampling(b) Monday-morning customers โ all customersexcludes evening & weekend regularsnot representative โ biasedconvenience is the “lazy” method: easy to do, rarely fair. Day, time and location all introduce bias.
WE 6
Hospital sample — calc + alternative when no list
A hospital employs 480 nurses and 120 doctors. The board wants a sample of 25 staff, with both professions represented in proportion.
(a) Find how many nurses and doctors should be sampled. (b) The board has no list of all 600 staff. Suggest a sampling technique they could use instead, and explain why.
(a) Stratified, N = 600Nurses: (480/600) × 25 = 20Doctors: (120/600) × 25 = 5check: 20 + 5 = 25 โ20 nurses ยท 5 doctors(b) No list โ can’t sample randomly within stratause quota samplingset quotas: 20 nurses, 5 doctorsfill with next available staffquota โ same proportions, no list requiredtrigger phrase “no list / no register” โ quota or convenience. Quota when proportions matter; convenience when speed dominates.
๐ก Top tips
“No sampling frame” is the key phrase: forces quota or convenience — never random / systematic / stratified.
Stratified parts must sum to n: always check.
Random start for systematic: between 1 and k, not 1 and N.
To improve any method: a safe answer is “increase the sample size”.
Bias = systematic favouring of some members. Random methods minimise it; convenience maximises it.
โ Common mistakes
Confusing population and sample: N = whole group, n = chosen subset. Mixing them flips the formula.
Calling stratified “quota”: both split into groups, but stratified picks RANDOMLY within each; quota does not.
Forgetting the sum check: easy mark lost when stratified parts don’t add to n.
Using k = N/n without a list: systematic needs an ordered list, just like random and stratified.
Suggesting only “more people”: bigger sample + better method (e.g. random instead of convenience) is the stronger answer.
Next up: Measures of Central Tendency — mean, median, mode. Once you have a sample, the next question is “what’s the average?”. Each measure behaves differently with outliers and skewed data, and the GDC computes them all directly from raw or grouped data.
Need help with AI SL Statistics?
Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.