IB Maths AI HLNormal DistributionPaper 1 & 2~7 min read
The Normal Distribution
The normal distribution is the famous bell curve — a continuous probability distribution, unlike the discrete binomial. It models heaps of real-life measurements (heights, weights, times) that cluster symmetrically around a central value. It’s written X ∼ N(μ, σ2), and because it’s continuous, the probability of any single exact value is zero — you always work with areas under the curve.
📘 What you need to know
Continuous: X can take any value in a range; P(X = k) = 0, so strict and weak inequalities are the same.
Notation: X ∼ N(μ, σ2) — μ is the mean, σ2 is the variance (so σ = √variance is the standard deviation).
Shape: symmetric, bell-shaped, total area under the curve = 1.
Centre: symmetric about x = μ, so mean = median = mode = μ.
Probability = area: P(a ≤ X ≤ b) is the area under the curve between a and b.
68–95–99.7 rule: ~68% within μ ± σ, ~95% within μ ± 2σ, ~99.7% within μ ± 3σ.
Changing parameters: changing μ shifts the curve sideways; increasing σ2 flattens and widens it.
Continuous variables & area
A continuous random variable measures something — height, weight, time — and can take any value in a range. For a continuous distribution the probability of an exact value is always zero, so we measure probability as area under the probability density curve.
Probability as area
P(a ≤ X ≤ b) = area under the curve from a to b | total area = 1
P(X = k) = 0, so ≤ and < give the same answer
🤔 Why is P(X = k) = 0 for a continuous variable?
Probability is the area under the curve. A single point has no width, so its “area” — and therefore its probability — is zero. This is why, for the normal distribution, it never matters whether an inequality is strict (<, >) or weak (≤, ≥): P(X < k) = P(X ≤ k).
Shape & the 68–95–99.7 rule
The normal curve is symmetric and bell-shaped, centred on μ. The standard deviation σ controls its width, and a fixed share of the data falls within each band of σ from the mean.
The 68–95–99.7 (empirical) rule
~68% within ±σ, ~95% within ±2σ, ~99.7% within ±3σ of the mean.
How μ and σ change the curve
A small variance gives a tall, narrow curve; a large variance gives a short, wide one (same area).
🧠 Memory aid — variance vs standard deviation
The notation N(μ, σ2) gives you the variance as the second number. If a question wants the standard deviation, square-root it. So N(40, 100) means μ = 40 and σ2 = 100, giving σ = √100 = 10 — not 100.
Modelling with the normal
Many continuous, symmetric, single-peaked variables can be modelled normally if the population is large enough. Although a normal variable can technically take any real value, values more than about 4 standard deviations from the mean have practically zero density — which is why it can model things like height that can’t truly be negative.
Can model ✓
symmetric, 1 mode
Heights, weights, times, running speeds — large population, bell-shaped, single peak.
Cannot model ✗
skewed / multimodal
Human lifespan (not symmetric); a random number generator (no single mode).
Exam habit: when a question mixes distributions, state clearly which one each variable follows (e.g. S ∼ N(40, 100)). To justify a normal model, the usual assumptions are that the variable is symmetrical and bell-shaped.
Worked examples
WE 1
Mean and standard deviation from notation
The speeds (mph) of a cheetah subspecies are modelled by S ∼ N(40, 100). Write down the mean and standard deviation.
read off μ and σ²μ = 40, σ² = 100square-root for σσ = √100 = 10mean = 40 mph, sd = 10 mphthe second number in N(μ, σ²) is the variance — don’t forget to root it.
WE 2
State the assumptions
State two assumptions needed to model the cheetah speeds with a normal distribution.
the distribution of speeds is…1. symmetrical2. bell-shapedsymmetrical AND bell-shapeda large population with one mode also supports the model.
WE 3
Apply the 68% rule
For the cheetahs S ∼ N(40, 100), roughly what proportion run between 30 and 50 mph?
find how many σ from the mean30 = 40 − 10 = μ − σ; 50 = 40 + 10 = μ + σapply the empirical rulewithin μ ± σ → about 68%.≈ 68%30 to 50 is exactly one sd either side of the mean.
WE 4
Apply the 95% rule
For S ∼ N(40, 100), roughly what proportion run faster than 60 mph?
60 = 40 + 20 = μ + 2σabout 95% lie within μ ± 2σ, so 5% lie outside.split the 5% by symmetry5% ÷ 2 = 2.5% in each tail≈ 2.5%“faster than μ + 2σ” is just the upper tail.
WE 5
Is the normal a good model?
Explain whether human lifespan can be well modelled by a normal distribution.
check symmetrylifespans are NOT symmetric — most people live to old age, with a long left tail of early deaths.No — not symmetricala normal model needs a symmetric, bell-shaped, single-mode variable.
💡 Top tips
N(μ, σ2) gives the variance — square-root it for the standard deviation.
Mean = median = mode = μ, and the curve is symmetric about μ.
Strict vs weak doesn’t matter for a continuous variable: P(X < k) = P(X ≤ k).
Memorise 68–95–99.7 for quick estimates within 1, 2, 3 sd.
Use symmetry to split tail areas (e.g. half of the outer 5%).
To justify a model, say the variable is symmetric and bell-shaped with one mode.
⚠ Common mistakes
Treating the second number as σ — in N(μ, σ2) it’s the variance.
Forgetting to square-root the variance when the standard deviation is needed.
Worrying about < vs ≤ for a continuous variable — they’re equal here.
Misremembering the bands as 68 / 95 / 99.7 — keep them in order.
Using a normal model for skewed data (lifespan) or data with no single mode.
Confusing probability with density — probability is the area, not the height.
Next up — Calculations with the Normal Distribution. You’ll use your GDC’s normal CD function to find probabilities like P(a < X < b), handle one-sided tails with “very big” bounds, and run the inverse normal to go from a probability back to a value of x.
Need help with the Normal Distribution?
Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.