IB Maths AI SL Normal Distribution Paper 1 & 2 X ~ N(μ, σ²) ~6 min read

The Normal Distribution

Many real-world measurements — heights, exam scores, the error in a manufactured part — cluster symmetrically around an average. The normal distribution is the bell-shaped curve that models them. Written X ~ N(μ, σ²), it is fixed by just its mean and variance, and the 68–95–99.7 rule tells you how the data spreads.

📘 What you need to know

Continuous distribution: X can take any value in a range, so P(X = k) = 0 — probability is an area under the curve.
Bell-shaped and symmetric, written X ~ N(μ, σ²): μ is the mean, σ² the variance, σ the standard deviation.
Symmetric about x = μ, so mean = median = mode = μ.
The 68–95–99.7 rule: ≈68% of data lies within μ ± σ, ≈95% within μ ± 2σ, ≈99.7% within μ ± 3σ.
μ shifts the curve, σ sets its width: a small σ gives a tall, narrow curve; a large σ a short, wide one. Total area is always 1.
Model with it when a variable is continuous, symmetric and single-peaked — not when data is skewed or multi-modal.

A continuous, bell-shaped curve

A continuous random variable can take any value in a range — things you measure, like height, mass or time. With infinitely many possible values, the probability of any exact value is zero; probability is the area under the curve between two points.

A normal distribution is a continuous distribution whose graph is symmetric and bell-shaped, written X ~ N(μ, σ²).

Normal distribution notation X ~ N(μ, σ²) μ = mean · σ² = variance · σ = standard deviation · symmetric about x = μ

Those two numbers are all you need. μ fixes the centre — change it and the curve slides left or right. σ fixes the spread — a small σ gives a tall, narrow peak, a large σ a low, wide one, with the area always 1.

The 68–95–99.7 rule

For every normal distribution, the proportion of data within a fixed number of standard deviations of the mean is the same. This is the empirical rule — worth memorising.

The 68–95–99.7 rule P(μ − σ < X < μ + σ) ≈ 0.68 P(μ − 2σ < X < μ + 2σ) ≈ 0.95 P(μ − 3σ < X < μ + 3σ) ≈ 0.997

For any normal distribution, about 68% of the data lies within one standard deviation of the mean, 95% within two, and 99.7% within three — whatever the values of μ and σ.

The rule gives clean answers only at exactly 1, 2 or 3 standard deviations. Its symmetry is the useful part: 68% inside μ ± σ leaves 32% outside, split equally — so 16% lies below μ − σ and 16% above μ + σ.

Modelling with the normal distribution

A normal model is a good fit for a continuous variable when a few conditions hold:

Use a normal model when the variable is — continuous (measured, not counted) · roughly symmetric about its mean · single-peaked (one mode) · drawn from a large population.

It is not suitable for skewed data — like human lifespans, which trail off to one side — or data with no single peak, like the number from a random number generator. A normal variable can technically take any real value, but values beyond about 4 standard deviations from the mean are so unlikely that heights, masses and times are still modelled well.

🧭 Recipe — any normal distribution question

Check it suits a normal model: the variable should be continuous, roughly symmetric and single-peaked.
State the model: write X ~ N(μ, σ²) and say in words what X measures.
Find σ: the second number is the variance — take its square root to get the standard deviation.
For “within k standard deviations”, apply the 68–95–99.7 rule with k = 1, 2 or 3.
Sketch the bell, mark μ and the σ-points, and read the answer in context.

Worked examples

WE 1

Read off μ, σ² and σ

The mass of an apple from an orchard is modelled by X ~ N(150, 64), where X is measured in grams. Write down (a) the mean mass, (b) the variance, (c) the standard deviation.

model: X ~ N(150, 64) (a) the mean is the first number μ = 150 g (b) the variance is the second number σ² = 64 (c) standard deviation = √variance σ = √64 = 8 g (a) μ = 150 g · (b) σ² = 64 · (c) σ = 8 g N(μ, σ²) lists the variance second, not the SD — square-root it to get σ = 8 g.

WE 2

Assumptions and limits of the model

A coach models the times of runners in a 100 m race with a normal distribution. (a) State two conditions the data should satisfy for this to be reasonable. (b) Explain why the score shown on a rolled fair die could not be modelled by a normal distribution.

(a) a normal model needs the data to be: symmetric about the mean and bell-shaped (one mode) continuous, from a large population (b) look at the die score it is discrete, and every value 1–6 is equally likely the distribution is flat with no single peak (a) continuous, symmetric, single-peaked · (b) discrete and uniform — not bell-shaped “state the assumptions” is a common exam ask. A normal model needs a continuous, symmetric, single-peaked variable.

WE 3

68% within μ ± σ

The heights of adult women in a country are modelled by X ~ N(165, 49), with X in centimetres. (a) Find the standard deviation. (b) Estimate the range of heights in which about 68% of women lie.

model: X ~ N(165, 49), so μ = 165, σ² = 49 (a) standard deviation σ = √49 = 7 cm (b) 68% lies within μ ± σ μ − σ = 165 − 7 = 158 μ + σ = 165 + 7 = 172 (a) σ = 7 cm · (b) about 68% lie between 158 cm and 172 cm “Within one standard deviation” is the interval μ ± σ — find σ first, then add and subtract.

WE 4

95% within μ ± 2σ, and a tail

The lifetime of a certain battery is modelled by X ~ N(20, 4), measured in hours. (a) Between which two lifetimes do about 95% of batteries lie? (b) Estimate the percentage of batteries that last more than 24 hours.

model: X ~ N(20, 4), so μ = 20, σ = √4 = 2 (a) 95% lies within μ ± 2σ 20 ± 2×2 = 20 ± 4 ⇒ 16 to 24 hours (b) 24 = μ + 2σ, so use the leftover outside μ ± 2σ = 100% − 95% = 5% by symmetry, half is above 24 h: 5% ÷ 2 (a) 16 to 24 hours · (b) about 2.5% the 5% outside the 95% band splits equally — one tail is 2.5%.

WE 5

99.7%, and a tail below μ − σ

A machine fills cartons with juice. The volume is modelled by X ~ N(330, 9), in millilitres. (a) Nearly all cartons lie between which two volumes? (b) Estimate the percentage of cartons containing less than 327 ml.

model: X ~ N(330, 9), so μ = 330, σ = √9 = 3 (a) 99.7% lies within μ ± 3σ 330 ± 3×3 = 330 ± 9 ⇒ 321 to 339 ml (b) spot that 327 = 330 − 3 = μ − σ 68% within μ ± σ ⇒ 32% outside below μ − σ is half of that: 32% ÷ 2 (a) 321 to 339 ml · (b) about 16% recognising 327 as exactly μ − σ is the key step — then take half of the 32% left over.

WE 6

Full question: compare two normal models

Two machines fill 1 kg bags of flour. Machine A’s masses follow X ~ N(1000, 25) and machine B’s follow Y ~ N(1000, 100), both in grams. (a) State the mean and standard deviation for each machine. (b) State, with a reason, which machine fills bags more consistently. (c) For machine A, find the range of masses in which about 95% of bags lie.

(a) read μ and σ = √variance machine A: μ = 1000 g, σ = √25 = 5 g machine B: μ = 1000 g, σ = √100 = 10 g (b) compare the spreads same mean, but A has the smaller σ smaller σ ⇒ narrower, taller curve ⇒ less spread (c) machine A: 95% within μ ± 2σ 1000 ± (2 × 5) = 1000 ± 10 (a) A: μ=1000, σ=5 · B: μ=1000, σ=10 · (b) A is more consistent · (c) 990 g to 1010 g equal means, different variances — the smaller σ always gives the more consistent output.

💡 Top tips

The second number is the variance — N(μ, σ²) gives σ², not σ. Square-root it before doing anything else.
Sketch the bell every time: mark μ in the middle and the σ-points either side — “between”, “more than” and tail questions become obvious.
68–95–99.7 is exact only at 1, 2 and 3 σ. For any other distance you need the GDC (the next topic).
Use symmetry: a tail below μ − σ is half of the 32% left over — 16%, not 32%.
P(X = k) = 0 for a normal distribution, so strict and weak inequalities are the same: P(X < a) = P(X ≤ a).

⚠ Common mistakes

Variance vs standard deviation: treating the second number in N(μ, σ²) as σ — it is σ². The most common slip in the topic.
Using the rule at the wrong distance: 68–95–99.7 applies only at exactly 1, 2 or 3 standard deviations from the mean.
Forgetting to halve a tail: outside μ ± 2σ is 5%, but a single tail is 2.5%.
Modelling skewed or counted data as normal: the variable must be continuous, symmetric and single-peaked.
Thinking a larger σ makes a taller curve — a larger σ makes it shorter and wider; the area stays fixed at 1.

Next up: Calculations with Normal Distribution — using the GDC’s Normal CD function for P(a < X < b) and tail probabilities, and the Inverse Normal function to work back from a probability to a value. For now: write down μ and σ, sketch the bell, and never confuse the variance with the SD.

Need help with AI SL Normal Distribution?

Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.

Book Free Session →

The Normal Distribution

📘 What you need to know

A continuous, bell-shaped curve

The 68–95–99.7 rule

Modelling with the normal distribution

🧭 Recipe — any normal distribution question

Worked examples

💡 Top tips

⚠ Common mistakes

Need help with AI SL Normal Distribution?

Quick Links

Contact us

Follow us