IB Maths AI SLNormal DistributionPaper 1 & 2X ~ N(μ, σ2)~6 min read
The Normal Distribution
Many real-world measurements — heights, exam scores, the error in a manufactured part — cluster symmetrically around an average. The normal distribution is the bell-shaped curve that models them. Written X ~ N(μ, σ2), it is fixed by just its mean and variance, and the 68–95–99.7 rule tells you how the data spreads.
📘 What you need to know
Continuous distribution: X can take any value in a range, so P(X = k) = 0 — probability is an area under the curve.
Bell-shaped and symmetric, written X ~ N(μ, σ2): μ is the mean, σ2 the variance, σ the standard deviation.
Symmetric about x = μ, so mean = median = mode = μ.
The 68–95–99.7 rule: ≈68% of data lies within μ ± σ, ≈95% within μ ± 2σ, ≈99.7% within μ ± 3σ.
μ shifts the curve, σ sets its width: a small σ gives a tall, narrow curve; a large σ a short, wide one. Total area is always 1.
Model with it when a variable is continuous, symmetric and single-peaked — not when data is skewed or multi-modal.
A continuous, bell-shaped curve
A continuous random variable can take any value in a range — things you measure, like height, mass or time. With infinitely many possible values, the probability of any exact value is zero; probability is the area under the curve between two points.
A normal distribution is a continuous distribution whose graph is symmetric and bell-shaped, written X ~ N(μ, σ2).
Normal distribution notationX ~ N(μ, σ2)
μ = mean · σ2 = variance · σ = standard deviation · symmetric about x = μ
Those two numbers are all you need. μ fixes the centre — change it and the curve slides left or right. σ fixes the spread — a small σ gives a tall, narrow peak, a large σ a low, wide one, with the area always 1.
The 68–95–99.7 rule
For every normal distribution, the proportion of data within a fixed number of standard deviations of the mean is the same. This is the empirical rule — worth memorising.
The 68–95–99.7 rule
P(μ − σ < X < μ + σ) ≈ 0.68
P(μ − 2σ < X < μ + 2σ) ≈ 0.95P(μ − 3σ < X < μ + 3σ) ≈ 0.997
For any normal distribution, about 68% of the data lies within one standard deviation of the mean, 95% within two, and 99.7% within three — whatever the values of μ and σ.
The rule gives clean answers only at exactly 1, 2 or 3 standard deviations. Its symmetry is the useful part: 68% inside μ ± σ leaves 32% outside, split equally — so 16% lies belowμ − σ and 16% above μ + σ.
Modelling with the normal distribution
A normal model is a good fit for a continuous variable when a few conditions hold:
Use a normal model when the variable is — continuous (measured, not counted) · roughly symmetric about its mean · single-peaked (one mode) · drawn from a large population.
It is not suitable for skewed data — like human lifespans, which trail off to one side — or data with no single peak, like the number from a random number generator. A normal variable can technically take any real value, but values beyond about 4 standard deviations from the mean are so unlikely that heights, masses and times are still modelled well.
🧭 Recipe — any normal distribution question
Check it suits a normal model: the variable should be continuous, roughly symmetric and single-peaked.
State the model: write X ~ N(μ, σ2) and say in words what X measures.
Find σ: the second number is the variance — take its square root to get the standard deviation.
For “within k standard deviations”, apply the 68–95–99.7 rule with k = 1, 2 or 3.
Sketch the bell, mark μ and the σ-points, and read the answer in context.
Worked examples
WE 1
Read off μ, σ2 and σ
The mass of an apple from an orchard is modelled by X ~ N(150, 64), where X is measured in grams. Write down (a) the mean mass, (b) the variance, (c) the standard deviation.
model: X ~ N(150, 64)(a) the mean is the first numberμ = 150 g(b) the variance is the second numberσ2 = 64(c) standard deviation = √varianceσ = √64 = 8 g(a) μ = 150 g · (b) σ2 = 64 · (c) σ = 8 gN(μ, σ2) lists the variance second, not the SD — square-root it to get σ = 8 g.
WE 2
Assumptions and limits of the model
A coach models the times of runners in a 100 m race with a normal distribution. (a) State two conditions the data should satisfy for this to be reasonable. (b) Explain why the score shown on a rolled fair die could not be modelled by a normal distribution.
(a) a normal model needs the data to be:symmetric about the mean and bell-shaped (one mode)continuous, from a large population(b) look at the die scoreit is discrete, and every value 1–6 is equally likelythe distribution is flat with no single peak(a) continuous, symmetric, single-peaked · (b) discrete and uniform — not bell-shaped“state the assumptions” is a common exam ask. A normal model needs a continuous, symmetric, single-peaked variable.
WE 3
68% within μ ± σ
The heights of adult women in a country are modelled by X ~ N(165, 49), with X in centimetres. (a) Find the standard deviation. (b) Estimate the range of heights in which about 68% of women lie.
model: X ~ N(165, 49), so μ = 165, σ2 = 49(a) standard deviationσ = √49 = 7 cm(b) 68% lies within μ ± σμ − σ = 165 − 7 = 158μ + σ = 165 + 7 = 172(a) σ = 7 cm · (b) about 68% lie between 158 cm and 172 cm“Within one standard deviation” is the interval μ ± σ — find σ first, then add and subtract.
WE 4
95% within μ ± 2σ, and a tail
The lifetime of a certain battery is modelled by X ~ N(20, 4), measured in hours. (a) Between which two lifetimes do about 95% of batteries lie? (b) Estimate the percentage of batteries that last more than 24 hours.
model: X ~ N(20, 4), so μ = 20, σ = √4 = 2(a) 95% lies within μ ± 2σ20 ± 2×2 = 20 ± 4 ⇒ 16 to 24 hours(b) 24 = μ + 2σ, so use the leftoveroutside μ ± 2σ = 100% − 95% = 5%by symmetry, half is above 24 h: 5% ÷ 2(a) 16 to 24 hours · (b) about 2.5%the 5% outside the 95% band splits equally — one tail is 2.5%.
WE 5
99.7%, and a tail below μ − σ
A machine fills cartons with juice. The volume is modelled by X ~ N(330, 9), in millilitres. (a) Nearly all cartons lie between which two volumes? (b) Estimate the percentage of cartons containing less than 327 ml.
model: X ~ N(330, 9), so μ = 330, σ = √9 = 3(a) 99.7% lies within μ ± 3σ330 ± 3×3 = 330 ± 9 ⇒ 321 to 339 ml(b) spot that 327 = 330 − 3 = μ − σ68% within μ ± σ ⇒ 32% outsidebelow μ − σ is half of that: 32% ÷ 2(a) 321 to 339 ml · (b) about 16%recognising 327 as exactly μ − σ is the key step — then take half of the 32% left over.
WE 6
Full question: compare two normal models
Two machines fill 1 kg bags of flour. Machine A’s masses follow X ~ N(1000, 25) and machine B’s follow Y ~ N(1000, 100), both in grams. (a) State the mean and standard deviation for each machine. (b) State, with a reason, which machine fills bags more consistently. (c) For machine A, find the range of masses in which about 95% of bags lie.
(a) read μ and σ = √variancemachine A: μ = 1000 g, σ = √25 = 5 gmachine B: μ = 1000 g, σ = √100 = 10 g(b) compare the spreadssame mean, but A has the smaller σsmaller σ ⇒ narrower, taller curve ⇒ less spread(c) machine A: 95% within μ ± 2σ1000 ± (2 × 5) = 1000 ± 10(a) A: μ=1000, σ=5 · B: μ=1000, σ=10 · (b) A is more consistent · (c) 990 g to 1010 gequal means, different variances — the smaller σ always gives the more consistent output.
💡 Top tips
The second number is the variance — N(μ, σ2) gives σ2, not σ. Square-root it before doing anything else.
Sketch the bell every time: mark μ in the middle and the σ-points either side — “between”, “more than” and tail questions become obvious.
68–95–99.7 is exact only at 1, 2 and 3 σ. For any other distance you need the GDC (the next topic).
Use symmetry: a tail below μ − σ is half of the 32% left over — 16%, not 32%.
P(X = k) = 0 for a normal distribution, so strict and weak inequalities are the same: P(X < a) = P(X ≤ a).
⚠ Common mistakes
Variance vs standard deviation: treating the second number in N(μ, σ2) as σ — it is σ2. The most common slip in the topic.
Using the rule at the wrong distance: 68–95–99.7 applies only at exactly 1, 2 or 3 standard deviations from the mean.
Forgetting to halve a tail: outside μ ± 2σ is 5%, but a single tail is 2.5%.
Modelling skewed or counted data as normal: the variable must be continuous, symmetric and single-peaked.
Thinking a larger σ makes a taller curve — a larger σ makes it shorter and wider; the area stays fixed at 1.
Next up: Calculations with Normal Distribution — using the GDC’s Normal CD function for P(a < X < b) and tail probabilities, and the Inverse Normal function to work back from a probability to a value. For now: write down μ and σ, sketch the bell, and never confuse the variance with the SD.
Need help with AI SL Normal Distribution?
Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.