IB Maths AA SLTopic 4 — Probability DistributionsPaper 1 & 2~10 min read
The Normal Distribution
Heights of students, the weight of apples in a crate, marks on a maths test — most natural measurements cluster around an average, with fewer values appearing as you move further away. That’s a normal distribution: the famous bell curve. It’s the most important continuous distribution in statistics, and you’ll meet it on almost every Paper 2.
📘 What you need to know
The normal distribution is continuous and bell-shaped — symmetric about its mean.
Notation:X ∼ N(μ, σ2) — that’s mean, then variance (σ squared, NOT σ).
For a normal distribution, mean = median = mode (all sit at the centre).
The total area under the curve = 1.
68-95-99.7 rule: ~68% of values lie within 1σ of the mean, ~95% within 2σ, ~99.7% within 3σ.
Probabilities are areas under the curve. P(X = a single value) = 0 for any specific value.
Because it’s continuous, P(X < a) = P(X ≤ a) — strict and weak inequalities give the same answer (unlike the binomial!).
What does a normal distribution look like?
Imagine you measured the height of every 16-year-old in your country. Most would be near the average (say 168 cm). Some would be a bit taller, some a bit shorter. Very few would be extremely tall or extremely short. If you plotted all those heights on a graph, you’d get a smooth, symmetric bell shape — that’s a normal distribution.
The classic bell curve
What kinds of things are normally distributed?
Lots of natural and human-measured quantities follow this shape, especially when you’re measuring something where most values cluster around an average:
📏
Heights
of people, plants, animals
⚖️
Weights
of fruit, packages, newborns
📝
Test scores
IQ, exam marks, IB scores
⏱️
Times
to run a race, react to a signal
A normal distribution doesn’t always fit perfectly — but if a real-world measurement is roughly symmetric around an average and very extreme values are rare, the normal model is usually a good description.
The bell curve’s key properties
Every normal distribution has the same characteristic shape, no matter the scale. Here’s what’s always true about it:
It’s symmetric — fold the curve down the middle and the two halves match exactly.
The mean, median, and mode are all equal — they all sit at the centre, where the peak is.
It’s bell-shaped — the curve rises smoothly to the mean, then falls away symmetrically.
It never touches the x-axis — the tails just keep getting closer (we say they’re asymptotic).
The total area under the curve is 1 — because all probabilities together must add to 1.
🤔 Why does the area equal 1?
The area under any probability distribution curve represents total probability. Since every value of X must lie somewhere on the curve, and probabilities have to sum to 1, the entire shaded area beneath the curve adds up to 1. This is a rule for any continuous distribution, not just the normal one.
The notation: X ∼ N(μ, σ2)
If X is normally distributed, we write it like this. Two numbers tell you everything about the curve — where it’s centred, and how spread out it is.
X ∼ N(μ, σ2)
Xthe random variable (the measurement)
μthe mean — where the curve is centred
σ²the variance — how spread out the curve is
For example, if students’ heights have mean 168 cm and standard deviation 6 cm, then variance = 6² = 36, so:
X ∼ N(168, 36)
⚠️
The classic mistake — variance, not standard deviation!
In the bracket, the second number is variance (σ²), not standard deviation (σ). If you’re told σ = 6, the notation uses 36, not 6. Get this wrong and your whole answer falls apart.
When you put numbers into your GDC for a normal distribution, it usually asks for standard deviation directly (σ, not σ²). So in the notation you write 36, but in the calculator you’d type 6. Read carefully both times.
The 68-95-99.7 rule (the most useful thing on this page)
Here’s a fact you’ll use over and over. For any normal distribution, the same rough percentages of values fall within 1, 2, and 3 standard deviations of the mean:
The empirical rule — 68 / 95 / 99.7
So if test scores are normally distributed with mean 70 and standard deviation 10:
About 68% of students score between 60 and 80 (within 1σ).
About 95% score between 50 and 90 (within 2σ).
About 99.7% score between 40 and 100 (within 3σ).
🧠
Remember the rule as 68 → 95 → 99.7
One sigma each side gets you most students (68%), two sigmas covers nearly everyone (95%), three sigmas covers almost all of them (99.7%). The numbers always grow as you move outward — never the other way.
Half on each side
Because the curve is symmetric, you can split these percentages in two. For example:
About 34% of values lie between μ and μ + σ (half of 68%).
About 13.5% lie between μ + σ and μ + 2σ (half of 95% − 68%, divided by 2).
Only about 2.5% lie above μ + 2σ (half of the leftover 5%).
If a question gives you nice “round σ” values like μ + σ or μ − 2σ, you can solve it with this rule alone — no GDC needed. That’s a Paper 1 favourite.
Probabilities are areas, not heights
This is the biggest mental shift coming from the binomial distribution. With binomial, P(X = 5) was a real positive number — the height of one bar. With normal, it’s zero.
Why? Because the normal distribution is continuous — X can take infinitely many values. The probability of hitting any one exact value (like exactly 168.000000… cm) is essentially zero. What we measure instead is the area under the curve over a range:
P(X < a) = area to the left of a
P(X > a) = area to the right of a
P(a < X < b) = area betweena and b
Probabilities as areas under the curve
📍
Strict and weak inequalities mean the same thing here
Because P(X = exact value) = 0 for any normal distribution, we get a nice bonus: P(X < a) and P(X ≤ a) are equal. So unlike with the binomial, you don’t have to worry about converting < into ≤ here. They’re the same.
How μ and σ change the shape
Both numbers in N(μ, σ2) tell you something different about the curve:
Changing μ shifts the whole curve left or right — the shape stays exactly the same.
Changing σ changes the width and height. A small σ → tall, narrow curve (values close to mean). A large σ → short, wide curve (values more spread out).
Effect of changing the standard deviation
🤔 Why does a smaller σ make the curve taller?
The total area under the curve has to stay equal to 1 (because total probability = 1). So if the curve gets narrower, it has to get taller to keep the same area. Same total amount of “stuff” — just packed differently.
Quick comparison: binomial vs normal
Both are probability distributions, but they work in different ways. Here’s how they line up:
📊 BINOMIAL
Discrete — only whole-number values
Counts successes in n trials
Bar graph (separate vertical lines)
P(X = 5) is a real probability
P(X < 5) ≠ P(X ≤ 5) — careful!
Defined by n and p
🔔 NORMAL
Continuous — any real value
Models measurements (heights, times, etc.)
Smooth bell-shaped curve
P(X = 5) = 0 for any single value
P(X < 5) = P(X ≤ 5) — they’re equal!
Defined by μ and σ2
Worked examples
WE 1
Set up the notation
The mass of a particular brand of apple is normally distributed with mean 150 g and standard deviation 12 g. Let X be the mass of a randomly chosen apple, in grams. Write down the distribution of X.
Watch out — the bracket takes variance, not standard deviation.Identify:μ = 150, σ = 12Variance:σ² = 12² = 144X ∼ N(150, 144)writing N(150, 12) instead of N(150, 144) is a classic SL slip — square the SD!
WE 2
Use the 68-95-99.7 rule
Test scores are normally distributed with mean 70 and standard deviation 8. Roughly what percentage of students scored between 62 and 78?
62 = μ − σ and 78 = μ + σ. So this is “within 1 standard deviation”.Check:70 − 8 = 62 ✓ and 70 + 8 = 78 ✓Within 1σ:≈ 68%≈ 68% of studentsalways check first if the bounds are at μ ± σ, μ ± 2σ, or μ ± 3σ — saves you the GDC!
WE 3
Use symmetry
The heights of seedlings are normally distributed with mean 12 cm. The probability that a seedling is shorter than 9 cm is 0.18.
(a) Find P(seedling is taller than 15 cm). (b) Find P(seedling is between 9 and 15 cm).
9 is 3 below the mean, 15 is 3 above. They’re symmetric about μ = 12.part (a) — taller than 15By symmetry:P(X > 15) = P(X < 9) = 0.18P(X > 15) = 0.18part (b) — between 9 and 15Total area = 1. Subtract the two tails:P(9 < X < 15) = 1 − 0.18 − 0.18 = 0.64P(9 < X < 15) = 0.64always sketch the curve! symmetry questions become obvious once you see the picture.
WE 4
Find a tail probability using the rule
The weights of newborn babies at a hospital are normally distributed with mean 3.4 kg and standard deviation 0.4 kg. Approximately what percentage of babies weigh more than 4.2 kg?
4.2 = 3.4 + 2(0.4), so this is “more than μ + 2σ” — the upper 2σ tail.95% lie within 2σ → 5% lie outside (in both tails).By symmetry, half is in each tail:P(X > μ + 2σ) = 5% ÷ 2 = 2.5%≈ 2.5% of babies“outside 2σ” = 5%, then split evenly because of symmetry. easy mark on Paper 1!
WE 5
Real exam-style multi-part
The IQ scores of a population are normally distributed with mean 100 and standard deviation 15. Let X be the IQ score of a randomly chosen person.
(a) State the distribution of X. (b) Roughly what percentage of people have an IQ between 85 and 115? (c) Roughly what percentage have an IQ above 130? (d) P(X < 70) = 0.0228. Find P(70 < X < 130).
μ = 100, σ = 15. Use the 68-95-99.7 rule and symmetry.part (a)σ² = 15² = 225X ∼ N(100, 225)part (b) — between 85 and 11585 = μ − σ, 115 = μ + σ → within 1σ≈ 68%part (c) — above 130130 = μ + 2σ → upper 2σ tail5% ÷ 2 = 2.5%≈ 2.5%part (d) — between 70 and 13070 = μ − 2σ, 130 = μ + 2σ → within 2σBy symmetry P(X > 130) = P(X < 70) = 0.0228P(70 < X < 130) = 1 − 2(0.0228) = 0.9544P(70 < X < 130) = 0.9544notice 0.9544 ≈ 95% — that’s the 95% rule giving us a more precise value!
💡 Top tips
Sketch the curve for every question — even a rough drawing. Symmetry and tail problems become obvious once you see the picture.
Check if bounds are nice (μ ± σ, μ ± 2σ, μ ± 3σ) before reaching for the GDC. The 68-95-99.7 rule might be all you need.
Remember the bracket takes σ2, not σ. Always square it before writing N(μ, σ2).
Total area = 1. When you find a probability, the rest of the curve adds to (1 − that probability). Use this constantly.
Symmetry is your best friend. P(X < μ − a) = P(X > μ + a) — same area on each side.
Strict vs weak inequalities don’t matter here. P(X < a) = P(X ≤ a).
Half of each rule. 68% inside 1σ → 34% on each side of μ. Useful for one-tailed regions.
Always include units if the question is about a real measurement (cm, kg, marks).
⚠ Common mistakes
Writing N(μ, σ) instead of N(μ, σ²). The bracket takes variance, always squared.
Confusing 68% / 95% / 99.7% with their order. Always: 1σ → 68%, 2σ → 95%, 3σ → 99.7%.
Forgetting to halve when doing one tail. “Above μ + 2σ” is half of (100% − 95%) = 2.5%, not 5%.
Treating P(X = 5) as a non-zero probability. For continuous distributions, single-value probabilities are always 0.
Assuming any data is normal. Skewed or bounded data (like income or scores capped at 100) usually isn’t.
Mixing up mean and standard deviation when reading a problem. Re-read carefully; the question often gives σ or σ², not always both.
Not using symmetry. Many questions can be solved without a calculator if you spot that the bounds are mirror images of the mean.
Forgetting the curve never touches the axis. P(X < very large value) is always slightly less than 1, never equal.
You can now spot a normal distribution in a real-world problem, write the notation properly, sketch the curve, and use symmetry plus the 68-95-99.7 rule to answer Paper 1 questions without a calculator. The next note shows you how to calculate any normal probability using your GDC — including the trickier ones where the bounds aren’t nice multiples of σ.
Need help with the Normal Distribution?
Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.