IB Maths AI HL Normal Distribution Paper 1 & 2 ~7 min read

The Normal Distribution

The normal distribution is the famous bell curve — a continuous probability distribution, unlike the discrete binomial. It models heaps of real-life measurements (heights, weights, times) that cluster symmetrically around a central value. It’s written X ∼ N(μ, σ²), and because it’s continuous, the probability of any single exact value is zero — you always work with areas under the curve.

📘 What you need to know

Continuous: X can take any value in a range; P(X = k) = 0, so strict and weak inequalities are the same.
Notation: X ∼ N(μ, σ²) — μ is the mean, σ² is the variance (so σ = √variance is the standard deviation).
Shape: symmetric, bell-shaped, total area under the curve = 1.
Centre: symmetric about x = μ, so mean = median = mode = μ.
Probability = area: P(a ≤ X ≤ b) is the area under the curve between a and b.
68–95–99.7 rule: ~68% within μ ± σ, ~95% within μ ± 2σ, ~99.7% within μ ± 3σ.
Changing parameters: changing μ shifts the curve sideways; increasing σ² flattens and widens it.

Continuous variables & area

A continuous random variable measures something — height, weight, time — and can take any value in a range. For a continuous distribution the probability of an exact value is always zero, so we measure probability as area under the probability density curve.

Probability as area P(a ≤ X ≤ b) = area under the curve from a to b | total area = 1 P(X = k) = 0, so ≤ and < give the same answer

🤔 Why is P(X = k) = 0 for a continuous variable?

Probability is the area under the curve. A single point has no width, so its “area” — and therefore its probability — is zero. This is why, for the normal distribution, it never matters whether an inequality is strict (<, >) or weak (≤, ≥): P(X < k) = P(X ≤ k).

Shape & the 68–95–99.7 rule

The normal curve is symmetric and bell-shaped, centred on μ. The standard deviation σ controls its width, and a fixed share of the data falls within each band of σ from the mean.

The 68–95–99.7 (empirical) rule

~68% within ±σ, ~95% within ±2σ, ~99.7% within ±3σ of the mean.

How μ and σ change the curve

A small variance gives a tall, narrow curve; a large variance gives a short, wide one (same area).

🧠 Memory aid — variance vs standard deviation

The notation N(μ, σ²) gives you the variance as the second number. If a question wants the standard deviation, square-root it. So N(40, 100) means μ = 40 and σ² = 100, giving σ = √100 = 10 — not 100.

Modelling with the normal

Many continuous, symmetric, single-peaked variables can be modelled normally if the population is large enough. Although a normal variable can technically take any real value, values more than about 4 standard deviations from the mean have practically zero density — which is why it can model things like height that can’t truly be negative.

Can model ✓

symmetric, 1 mode

Heights, weights, times, running speeds — large population, bell-shaped, single peak.

Cannot model ✗

skewed / multimodal

Human lifespan (not symmetric); a random number generator (no single mode).

Exam habit: when a question mixes distributions, state clearly which one each variable follows (e.g. S ∼ N(40, 100)). To justify a normal model, the usual assumptions are that the variable is symmetrical and bell-shaped.

Worked examples

WE 1

Mean and standard deviation from notation

The speeds (mph) of a cheetah subspecies are modelled by S ∼ N(40, 100). Write down the mean and standard deviation.

read off μ and σ² μ = 40, σ² = 100 square-root for σ σ = √100 = 10 mean = 40 mph, sd = 10 mph the second number in N(μ, σ²) is the variance — don’t forget to root it.

WE 2

State the assumptions

State two assumptions needed to model the cheetah speeds with a normal distribution.

the distribution of speeds is… 1. symmetrical 2. bell-shaped symmetrical AND bell-shaped a large population with one mode also supports the model.

WE 3

Apply the 68% rule

For the cheetahs S ∼ N(40, 100), roughly what proportion run between 30 and 50 mph?

find how many σ from the mean 30 = 40 − 10 = μ − σ; 50 = 40 + 10 = μ + σ apply the empirical rule within μ ± σ → about 68%. ≈ 68% 30 to 50 is exactly one sd either side of the mean.

WE 4

Apply the 95% rule

For S ∼ N(40, 100), roughly what proportion run faster than 60 mph?

60 = 40 + 20 = μ + 2σ about 95% lie within μ ± 2σ, so 5% lie outside. split the 5% by symmetry 5% ÷ 2 = 2.5% in each tail ≈ 2.5% “faster than μ + 2σ” is just the upper tail.

WE 5

Is the normal a good model?

Explain whether human lifespan can be well modelled by a normal distribution.

check symmetry lifespans are NOT symmetric — most people live to old age, with a long left tail of early deaths. No — not symmetrical a normal model needs a symmetric, bell-shaped, single-mode variable.

💡 Top tips

N(μ, σ²) gives the variance — square-root it for the standard deviation.
Mean = median = mode = μ, and the curve is symmetric about μ.
Strict vs weak doesn’t matter for a continuous variable: P(X < k) = P(X ≤ k).
Memorise 68–95–99.7 for quick estimates within 1, 2, 3 sd.
Use symmetry to split tail areas (e.g. half of the outer 5%).
To justify a model, say the variable is symmetric and bell-shaped with one mode.

⚠ Common mistakes

Treating the second number as σ — in N(μ, σ²) it’s the variance.
Forgetting to square-root the variance when the standard deviation is needed.
Worrying about < vs ≤ for a continuous variable — they’re equal here.
Misremembering the bands as 68 / 95 / 99.7 — keep them in order.
Using a normal model for skewed data (lifespan) or data with no single mode.
Confusing probability with density — probability is the area, not the height.

Next up — Calculations with the Normal Distribution. You’ll use your GDC’s normal CD function to find probabilities like P(a < X < b), handle one-sided tails with “very big” bounds, and run the inverse normal to go from a probability back to a value of x.

Need help with the Normal Distribution?

Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.

Book Free Session →

The Normal Distribution

📘 What you need to know

Continuous variables & area

🤔 Why is P(X = k) = 0 for a continuous variable?

Shape & the 68–95–99.7 rule

🧠 Memory aid — variance vs standard deviation

Modelling with the normal

Worked examples

💡 Top tips

⚠ Common mistakes

Need help with the Normal Distribution?

Quick Links

Contact us

Follow us