IB Maths AA SL Topic 4 โ€” Statistics Toolkit Paper 1 & 2 ~10 min read

Sampling & Data Collection

Before you can analyse anything, you have to collect data โ€” and how you collect it matters more than most students realise. This note covers what data even is, how to grab a fair slice of it, and why a bad sample can make even perfect maths give you the wrong answer.

๐Ÿ“˜ What you need to know

The four types of data

Every piece of data fits into one of four boxes. Knowing which box yours sits in tells you what graphs to draw and what calculations make sense.

Qualitative

Words / categories
Describes something โ€” usually colour, name, type, or label. Not a number.
e.g. eye colour, favourite subject, blood type

Quantitative

Numbers (counted or measured)
A number that you can do maths with โ€” count it or measure it.
e.g. number of pets, height, exam score, time taken

Discrete (a type of quantitative)

Counted โ€” only specific values
Whole or fixed values. Nothing between them.
e.g. number of students (3 or 4, never 3.7), shoe size

Continuous (a type of quantitative)

Measured โ€” any value in a range
Can take any value within a range โ€” limited only by how precise your tool is.
e.g. height (172.3 cm, 172.34 cmโ€ฆ), mass, time, temperature
The data type tree
DATA QUALITATIVE words / categories QUANTITATIVE numbers DISCRETE counted CONTINUOUS measured eye colour, type number of pets height, time
A trick for telling discrete from continuous: ask “could you ever get half of one?” If yes, it’s continuous (1.5 cm is fine). If no, it’s discrete (1.5 children is impossible โ€” you can’t have half a kid!).

What about age?

Age is a sneaky one โ€” it depends on how the question phrases it.

Population vs sample

Imagine a vet wants to know the average sleep time of French bulldogs. The population would be every French bulldog in the world. That’s clearly impossible to measure โ€” so instead, the vet would take a sample: maybe 50 dogs from a few different cities, and use them to estimate what the population looks like.

Definitions
Population = every member you care about
Sample = a smaller group taken from the population
Sampling frame = a list of every member of the population

Why use a sample at all?

๐Ÿ‘ Pros of sampling

  • Quicker and cheaper than measuring everyone.
  • Less data to handle and analyse.
  • Sometimes the only practical option (you can’t measure every fish in the ocean!).

๐Ÿ‘Ž Cons of sampling

  • The sample might not fully represent the population.
  • Bias can creep in if the method isn’t fair.
  • Different samples can lead to different conclusions.

The five sampling techniques

The IB exam expects you to know five different sampling techniques. Each one suits a different situation. Here’s a tour:

1

Simple random sampling

How: Number every member of the population, then use a random number generator (or pull names from a hat) to pick n of them. Every member has the same chance.
โœ“ Truly random and unbiased. Best choice when you have a small population.
โœ— Slow if the population is huge. Impossible if you can’t list every member (e.g. fish in a lake).
2

Systematic sampling

How: Calculate k = Nn (population รท sample size). Pick a random start between 1 and k, then take every kth member after that.
โœ“ Quick and easy. Great when there’s a natural order โ€” a list of names, conveyor belt, etc.
โœ— Can’t use if you can’t list members. Risk of bias if the order has a hidden pattern.
3

Stratified sampling

How: Split the population into disjoint groups (called strata) โ€” like males/females, or different age bands. From each group, take a random sample, sized so the proportions match the population.
Formula:  sample from group = nN ร— number in group.
โœ“ Sample reflects the population structure. Good when groups within the population are very different.
โœ— Can’t be used if the population can’t be split into groups, or if groups overlap.
4

Quota sampling

How: Like stratified, you split into groups and decide how many to pick from each โ€” but you don’t pick randomly. Just keep selecting until each quota is filled.
โœ“ Useful when no list of the population exists. Common in street surveys.
โœ— Can be biased โ€” people who refuse to take part skew the results.
5

Convenience sampling

How: Just pick whoever is easiest to reach โ€” friends, classmates, the first 20 people who walk past.
โœ“ Fast and free. Used when no list of the population is available.
โœ— Almost always biased. Sample probably won’t represent the wider population.
๐Ÿง 

Memory trick: “Random vs Roughly Right vs Whoever Shows Up”

Think of the methods on a “random scale”. Simple random & systematic are properly random. Stratified is random within each group. Quota picks the right numbers from each group but not randomly. Convenience just grabs whoever is around. The further down the list, the higher the bias risk.

Bias and reliability

A biased sample is one that gives a misleading picture of the population. The whole point of using a careful sampling method is to fight bias.

๐Ÿ“

What makes data reliable?

A sample is reliable if you’d get similar results from a different sample of the same population. The sample needs to be representative (the right mix) and big enough. Tiny samples โ€” even random ones โ€” can fluctuate a lot.

What causes data to be unreliable?

If a question asks you to “suggest one improvement” to a sampling method, the safest answers are nearly always: increase the sample size, or use a more random method. Both attack the bias problem head-on.

Worked examples

WE 1

Identify the type of data

For each of the following, state whether the data is qualitative or quantitative. If quantitative, also state whether it’s discrete or continuous.

(a) Eye colour of students    (b) Number of pets owned    (c) Time to run 100 m    (d) Mass of an apple    (e) Shoe size

Qualitative = words. Quantitative = numbers. Discrete = counted, Continuous = measured. (a) Eye colour: words โ†’ Qualitative (b) Number of pets: numbers, counted โ†’ Quantitative, Discrete (c) Time to run 100 m: numbers, measured โ†’ Quantitative, Continuous (d) Mass of an apple: numbers, measured โ†’ Quantitative, Continuous (e) Shoe size: numbers, fixed values โ†’ Quantitative, Discrete a โ€” Qualitative  |  b, e โ€” Discrete  |  c, d โ€” Continuous tip: ask “could I have half of this?” โ€” if not, it’s discrete
WE 2

Stratified sampling โ€” Mike’s mice

Mike is a biologist studying mice in an open enclosure. He has approximately 540 field mice and 260 harvest mice. Mike wants to sample 10 mice and he wants the proportions to match the population.

(a) Calculate how many of each type Mike should include.   (b) Given that Mike has no list of the mice, name the sampling method.   (c) Suggest one way to improve the method.

Total: 540 + 260 = 800 micepart (a) Field mice: 540800 ร— 10 = 6.75 Harvest mice: 260800 ร— 10 = 3.25 Round to whole mice (you can’t sample half a mouse!): 7 field mice,   3 harvest micepart (b) No list of the mice โ†’ can’t be random or stratified. Mike picks until each group quota is filled. Quota samplingpart (c) The simplest improvement: Increase the sample size a bigger sample is more representative and more reliable
WE 3

Systematic sampling on a production line

A factory produces 600 chocolate bars per hour. The quality controller wants to sample 30 bars using systematic sampling.

(a) Calculate the interval k.   (b) If the controller starts at bar number 7, list the next 4 bars she would sample.

N = 600,   n = 30part (a) Use k = N รท n: k = 60030 = 20 k = 20part (b) Starting bar = 7. Then add 20 each time: 7,   27,   47,   67,   87 Next 4 bars: 27, 47, 67, 87 just keep adding k = 20 until you reach the end of the run
WE 4

Stratified sampling at a school

A school has 480 boys and 320 girls. The headteacher wants to survey 40 students using stratified sampling. Calculate how many boys and girls should be in the sample.

N = 480 + 320 = 800,   n = 40 Use the stratified formula for each group separately. Sample fraction: nN = 40800 = 120 Boys: 120 ร— 480 = 24 Girls: 120 ร— 320 = 16 Check: 24 + 16 = 40 โœ“ 24 boys,   16 girls always check your group totals add up to the sample size
WE 5

Identify and evaluate a sampling method

A market researcher stands outside a supermarket from 10 am to 12 pm and asks the first 100 people who walk past about their shopping preferences.

(a) Identify the sampling method.   (b) Give one disadvantage.   (c) Suggest a better method.

She picks whoever is easiest to reach โ€” no list, no proportions, no random selection.part (a) Convenience samplingpart (b) 10 am โ€“ 12 pm misses people at work. Sample is biased toward stay-at-home shoppers. Sample is not representative of all shopperspart (c) Survey at varied times across the week, or use stratified sampling on a customer database. Use stratified sampling across different times of day naming a specific better method scores higher than just saying “use a bigger sample”

๐Ÿ’ก Top tips

โš  Common mistakes

Welcome to Topic 4! The Statistics Toolkit is mostly about gathering, summarising, and visualising data. The next note covers measures of central tendency โ€” mean, median, and mode โ€” which is where the actual number-crunching kicks in.

Need help with Sampling & Data Collection?

Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.

Book Free Session โ†’