IB Maths AI HLStatistics ToolkitPaper 1 & 2~8 min read
Sampling
Whenever you want to know something about a huge group — every voter in a country, every fish in a lake, every car rolling off a production line — you almost never have the time or money to measure all of it. So you take a sample: a smaller subset that, picked carefully enough, says something honest about the whole. The art is in how you pick. Get it wrong and your data is biased; get it right and a handful of measurements can speak for millions of cases. The IB asks you to know five sampling techniques — simple random, systematic, stratified, quota, and convenience — and to spot which is the right one for any given situation.
📘 What you need to know
Population: the entire set you want information about. Sample: a subset of the population from which you actually collect data. Sampling frame: a complete list of every member of the population.
Qualitative data is descriptive (eye colour, favourite sport). Quantitative data is numerical — either discrete (counted, only specific values: pets, goals) or continuous (measured, any value in a range: height, time).
Random sample: every member has an equal chance of being chosen. Biased sample: leads to misleading conclusions about the population.
Simple random — number every member, pick n at random. Fair, but needs a sampling frame.
Systematic — pick every kth member after a random start, where the interval is k = Nn.
Stratified — split the population into groups (strata) and sample randomly from each, in proportion: number from group = nN × group size.
Quota — like stratified but selections inside each group are not random. Used when no sampling frame exists.
Convenience — pick whoever is easiest to reach. Quickest but most biased.
Bigger sample ⇒ generally more reliable. More random ⇒ less biased.
Types of data
Before you pick a sampling technique, classify what you’re collecting. The IB recognises three categories: qualitative, discrete, and continuous. The distinction matters because some methods (histograms, standard deviation) only work for continuous data, while others (the mode) shine for qualitative data.
Quantitative — Discrete
counted
Specific values only. Number of pets, goals scored, coin flips. You’d never say “2.7 pets”.
Quantitative — Continuous
measured
Any value in a range. Height, time, mass. You can always be more precise (17 cm → 17.3 cm → 17.34 cm…).
🧠 Memory aid — the age trick
Age can be either discrete or continuous, depending on how it’s used. “I am 17 years old” is discrete (no one says 17.43 years in conversation). But “I have been alive for 17.43 years” is continuous. Read the context carefully — IB questions sometimes test this exact distinction.
Population vs sample
The population is the entire group you’re interested in. The sample is the bit you actually measure. A vet who wants to know how long French bulldogs sleep can’t ask every French bulldog on Earth (the population), so she takes a sample from a few cities.
🤔 Why sample at all instead of using the whole population?
Sampling is quicker, cheaper, and gives you less data to process. The trade-off: a sample might not perfectly represent the population, and a poorly-chosen sample can introduce bias. But with care — a big enough sample, picked without favouritism — the sample’s mean, spread, and proportions are usually close to the population’s. That’s the entire foundation of statistics.
The five sampling techniques
five techniques at a glance — bias risk rises left → right
① Simple random sampling
Every group of n members of the population has an equal chance of being chosen. Number every member from 1 to N, then use a random number generator (or pull slips from a hat) to pick n different numbers.
Use when: you have a small population with a complete list. Avoid when: you can’t number every member — e.g. fish in a lake, birds in a forest.
② Systematic sampling
Take every kth member of the population after a random starting point. The interval k is the sampling interval.
Sampling intervalk = size of population Nsize of sample npick a random start between 1 and k, then every kth member after that
Use when: there’s a natural order (production line, alphabetical roll). Avoid when: no list is available.
③ Stratified sampling
Split the population into disjoint groups (strata) — e.g. by year group, gender, or region. Then take a random sample from each stratum, with the size of each sub-sample proportional to the stratum’s size.
Number from each group (stratum)
number sampled = size of sample nsize of population N × size of group
then pick a random sample of the calculated size from each group
Use when: the population has clearly different sub-groups whose proportions matter. Avoid when: the groups overlap or can’t be defined cleanly.
④ Quota sampling
Same as stratified — split by groups in proportion — but selections within each group are not random. Pick whoever turns up until each group’s quota is full.
Use when: no sampling frame is available (so stratified is impossible) but you still want representative group proportions. Beware: some members may refuse, introducing bias.
⑤ Convenience sampling
Pick whoever is easiest to reach — the first 10 people who walk past, or your friends. Quickest method, but the most prone to bias because the selection isn’t representative.
Use when: you need quick, informal data and no list exists. Beware: results unlikely to represent the population.
🧭 Recipe — calculating a stratified sample
Identify the groups and their sizes within the population.
Compute the sampling rationN.
Multiply each group size by the ratio.
Round sensibly (usually to the nearest whole number) and check the total equals n; adjust if it doesn’t.
Take a random sample of the calculated size from each group.
Worked examples
WE 1
Stratified sample from a school
A school has 1200 students: 480 in Year 9, 360 in Year 10, 240 in Year 11, and 120 in Year 12. A stratified sample of 50 students is to be taken. Calculate how many should be selected from each year.
Step 1: sampling ration/N = 50/1200 = 1/24Step 2: multiply each group size by the ratioYear 9: 480 × 50/1200 = 20Year 10: 360 × 50/1200 = 15Year 11: 240 × 50/1200 = 10Year 12: 120 × 50/1200 = 5Step 3: check total20 + 15 + 10 + 5 = 50 ✓20, 15, 10, 5 students respectivelyalways check the sub-sample sizes sum to n.
WE 2
Systematic sampling on a production line
A factory produces 800 phone cases per day. A quality inspector wants a systematic sample of 40 cases. Find the sampling interval k, and describe which cases would be inspected if the random starting number is 7.
Step 1: sampling interval k = N/nk = 800/40 = 20Step 2: start at 7, then every 20th7, 27, 47, 67, …Step 3: find the last case pickedlast = 7 + 39 × 20 = 787k = 20; inspect cases 7, 27, 47, …, 787the random starting number must lie between 1 and k inclusive (here, 1 to 20).
WE 3
Stratified with awkward rounding
A hospital employs 8 doctors, 22 nurses, and 15 admin staff. A stratified sample of 12 is required. Calculate the number from each group, rounding sensibly.
Step 1: totalsN = 8 + 22 + 15 = 45; n = 12ratio = 12/45 = 4/15Step 2: multiplyDoctors: 8 × 12/45 = 2.133… → 2Nurses: 22 × 12/45 = 5.867… → 6Admin: 15 × 12/45 = 4.000 → 4Step 3: check total2 + 6 + 4 = 12 ✓2 doctors, 6 nurses, 4 adminrounding sometimes needs a tiny adjustment to keep the total = n.
WE 4
Identify the sampling technique
For each scenario, name the most appropriate sampling technique:
(a) A teacher randomly draws 10 names from a hat containing all 30 students’ names.
(b) A researcher stops the first 50 people exiting a train station.
(c) A magazine includes a postage-paid survey card in every 25th magazine on the production line.
(a) names drawn from a hatevery member equally likely to be picked(a) simple random sampling(b) first 50 exitingpicked by ease of access(b) convenience sampling(c) every 25th magazineregular interval on a list(c) systematic sampling“every nth” → systematic; “easiest” → convenience; “random list” → simple random.
WE 5
Quality control across machines
A factory has three machines: A produces 3000 widgets/day, B produces 2000, C produces 1000. A stratified sample of 30 widgets is required. How many from each machine?
Step 1: totalsN = 6000, n = 30; ratio = 30/6000 = 1/200Step 2: multiplyMachine A: 3000 × 30/6000 = 15Machine B: 2000 × 30/6000 = 10Machine C: 1000 × 30/6000 = 5Step 3: check total15 + 10 + 5 = 30 ✓15 from A, 10 from B, 5 from Cclean ratios divide evenly — no rounding issues here.
WE 6
Choose and justify a technique
A biologist studies birds in a forest. They can’t list every bird in the population. They want a sample of 50 that reflects the proportions of 4 known species in the area.
(a) State an appropriate sampling technique.
(b) Suggest one way to make the sample more reliable.
(a) no sampling frame ⇒ random/systematic ruled outbut proportional groups needed ⇒ …(a) quota sampling(b) ways to improve reliabilityincrease n, or visit at different times of day(b) e.g. visit multiple times across the day and increase n“no list” + “proportional” → quota. If a list existed, stratified would be better.
💡 Top tips
Match keywords to techniques. “Every kth” → systematic; “random from groups” → stratified; “non-random from groups” → quota; “easiest to reach” → convenience.
No sampling frame rules out simple random, systematic, and stratified — only quota or convenience remain.
For stratified, always check the sub-sample sizes sum to n; adjust rounding if they don’t.
To improve reliability: increase the sample size, make the selection more random, or vary conditions (times, locations) to reduce one-off bias.
State the method AND justify it with the context — examiners reward both parts.
⚠ Common mistakes
Confusing stratified with quota. Stratified requires random selection within each group; quota does not.
Wrong interval formula in systematic.k = N/n, not n/N.
Forgetting the random start in systematic — the first member is picked from 1 to k, not always position 1.
Picking stratified when no list exists. If the population can’t be listed, stratified is impossible — use quota.
Treating convenience as random. “First 10 people” is not random — it’s biased toward whoever happens to be in that location at that time.
Once you’ve chosen a sampling technique, the next question is whether your data collection process is reliable (repeatable) and valid (measuring the right thing). The next sub-topic — Reliability & Validity of Data Collection Methods — covers the test–retest and parallel-forms checks for reliability, plus content-related and criterion-related checks for validity.
Need help with Statistics?
Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.