IB Maths AA SL Topic 4 — Probability Distributions Paper 1 & 2 ~10 min read

The Normal Distribution

Heights of students, the weight of apples in a crate, marks on a maths test — most natural measurements cluster around an average, with fewer values appearing as you move further away. That’s a normal distribution: the famous bell curve. It’s the most important continuous distribution in statistics, and you’ll meet it on almost every Paper 2.

📘 What you need to know

What does a normal distribution look like?

Imagine you measured the height of every 16-year-old in your country. Most would be near the average (say 168 cm). Some would be a bit taller, some a bit shorter. Very few would be extremely tall or extremely short. If you plotted all those heights on a graph, you’d get a smooth, symmetric bell shape — that’s a normal distribution.

The classic bell curve
x μ mean peak at the mean curve approaches but never touches axis tail tapers off mirror mirror

What kinds of things are normally distributed?

Lots of natural and human-measured quantities follow this shape, especially when you’re measuring something where most values cluster around an average:

📏
Heights
of people, plants, animals
⚖️
Weights
of fruit, packages, newborns
📝
Test scores
IQ, exam marks, IB scores
⏱️
Times
to run a race, react to a signal
A normal distribution doesn’t always fit perfectly — but if a real-world measurement is roughly symmetric around an average and very extreme values are rare, the normal model is usually a good description.

The bell curve’s key properties

Every normal distribution has the same characteristic shape, no matter the scale. Here’s what’s always true about it:

🤔 Why does the area equal 1?

The area under any probability distribution curve represents total probability. Since every value of X must lie somewhere on the curve, and probabilities have to sum to 1, the entire shaded area beneath the curve adds up to 1. This is a rule for any continuous distribution, not just the normal one.

The notation: X ∼ N(μ, σ2)

If X is normally distributed, we write it like this. Two numbers tell you everything about the curve — where it’s centred, and how spread out it is.

X ∼ N(μ, σ2)
Xthe random variable (the measurement)
μthe mean — where the curve is centred
σ²the variance — how spread out the curve is

For example, if students’ heights have mean 168 cm and standard deviation 6 cm, then variance = 6² = 36, so:

X ∼ N(168, 36)

⚠️

The classic mistake — variance, not standard deviation!

In the bracket, the second number is variance (σ²), not standard deviation (σ). If you’re told σ = 6, the notation uses 36, not 6. Get this wrong and your whole answer falls apart.

When you put numbers into your GDC for a normal distribution, it usually asks for standard deviation directly (σ, not σ²). So in the notation you write 36, but in the calculator you’d type 6. Read carefully both times.

The 68-95-99.7 rule (the most useful thing on this page)

Here’s a fact you’ll use over and over. For any normal distribution, the same rough percentages of values fall within 1, 2, and 3 standard deviations of the mean:

The empirical rule — 68 / 95 / 99.7
μ−3σ μ−2σ μ−σ μ μ+σ μ+2σ μ+3σ 68% within 1σ 95% within 2σ 99.7% within 3σ

So if test scores are normally distributed with mean 70 and standard deviation 10:

🧠

Remember the rule as 68 → 95 → 99.7

One sigma each side gets you most students (68%), two sigmas covers nearly everyone (95%), three sigmas covers almost all of them (99.7%). The numbers always grow as you move outward — never the other way.

Half on each side

Because the curve is symmetric, you can split these percentages in two. For example:

If a question gives you nice “round σ” values like μ + σ or μ − 2σ, you can solve it with this rule alone — no GDC needed. That’s a Paper 1 favourite.

Probabilities are areas, not heights

This is the biggest mental shift coming from the binomial distribution. With binomial, P(X = 5) was a real positive number — the height of one bar. With normal, it’s zero.

Why? Because the normal distribution is continuousX can take infinitely many values. The probability of hitting any one exact value (like exactly 168.000000… cm) is essentially zero. What we measure instead is the area under the curve over a range:

Probabilities as areas under the curve
a P(X < a) a P(X > a) a b P(a < X < b)
📍

Strict and weak inequalities mean the same thing here

Because P(X = exact value) = 0 for any normal distribution, we get a nice bonus: P(X < a) and P(Xa) are equal. So unlike with the binomial, you don’t have to worry about converting < into ≤ here. They’re the same.

How μ and σ change the shape

Both numbers in N(μ, σ2) tell you something different about the curve:

Effect of changing the standard deviation
μ small σ → tall, narrow large σ → short, wide

🤔 Why does a smaller σ make the curve taller?

The total area under the curve has to stay equal to 1 (because total probability = 1). So if the curve gets narrower, it has to get taller to keep the same area. Same total amount of “stuff” — just packed differently.

Quick comparison: binomial vs normal

Both are probability distributions, but they work in different ways. Here’s how they line up:

📊 BINOMIAL

  • Discrete — only whole-number values
  • Counts successes in n trials
  • Bar graph (separate vertical lines)
  • P(X = 5) is a real probability
  • P(X < 5) ≠ P(X ≤ 5) — careful!
  • Defined by n and p

🔔 NORMAL

  • Continuous — any real value
  • Models measurements (heights, times, etc.)
  • Smooth bell-shaped curve
  • P(X = 5) = 0 for any single value
  • P(X < 5) = P(X ≤ 5) — they’re equal!
  • Defined by μ and σ2

Worked examples

WE 1

Set up the notation

The mass of a particular brand of apple is normally distributed with mean 150 g and standard deviation 12 g. Let X be the mass of a randomly chosen apple, in grams. Write down the distribution of X.

Watch out — the bracket takes variance, not standard deviation. Identify: μ = 150, σ = 12 Variance: σ² = 12² = 144 X ∼ N(150, 144) writing N(150, 12) instead of N(150, 144) is a classic SL slip — square the SD!
WE 2

Use the 68-95-99.7 rule

Test scores are normally distributed with mean 70 and standard deviation 8. Roughly what percentage of students scored between 62 and 78?

62 = μ − σ and 78 = μ + σ. So this is “within 1 standard deviation”. Check: 70 − 8 = 62 ✓   and   70 + 8 = 78 ✓ Within 1σ: ≈ 68% ≈ 68% of students always check first if the bounds are at μ ± σ, μ ± 2σ, or μ ± 3σ — saves you the GDC!
WE 3

Use symmetry

The heights of seedlings are normally distributed with mean 12 cm. The probability that a seedling is shorter than 9 cm is 0.18.

(a) Find P(seedling is taller than 15 cm).
(b) Find P(seedling is between 9 and 15 cm).

9 is 3 below the mean, 15 is 3 above. They’re symmetric about μ = 12.part (a) — taller than 15 By symmetry: P(X > 15) = P(X < 9) = 0.18 P(X > 15) = 0.18part (b) — between 9 and 15 Total area = 1. Subtract the two tails: P(9 < X < 15) = 1 − 0.18 − 0.18 = 0.64 P(9 < X < 15) = 0.64 always sketch the curve! symmetry questions become obvious once you see the picture.
WE 4

Find a tail probability using the rule

The weights of newborn babies at a hospital are normally distributed with mean 3.4 kg and standard deviation 0.4 kg. Approximately what percentage of babies weigh more than 4.2 kg?

4.2 = 3.4 + 2(0.4), so this is “more than μ + 2σ” — the upper 2σ tail. 95% lie within 2σ → 5% lie outside (in both tails). By symmetry, half is in each tail: P(X > μ + 2σ) = 5% ÷ 2 = 2.5% ≈ 2.5% of babies “outside 2σ” = 5%, then split evenly because of symmetry. easy mark on Paper 1!
WE 5

Real exam-style multi-part

The IQ scores of a population are normally distributed with mean 100 and standard deviation 15. Let X be the IQ score of a randomly chosen person.

(a) State the distribution of X.
(b) Roughly what percentage of people have an IQ between 85 and 115?
(c) Roughly what percentage have an IQ above 130?
(d) P(X < 70) = 0.0228. Find P(70 < X < 130).

μ = 100, σ = 15. Use the 68-95-99.7 rule and symmetry.part (a) σ² = 15² = 225 X ∼ N(100, 225)part (b) — between 85 and 115 85 = μ − σ,   115 = μ + σ → within 1σ ≈ 68%part (c) — above 130 130 = μ + 2σ → upper 2σ tail 5% ÷ 2 = 2.5% ≈ 2.5%part (d) — between 70 and 130 70 = μ − 2σ,   130 = μ + 2σ → within 2σ By symmetry P(X > 130) = P(X < 70) = 0.0228 P(70 < X < 130) = 1 − 2(0.0228) = 0.9544 P(70 < X < 130) = 0.9544 notice 0.9544 ≈ 95% — that’s the 95% rule giving us a more precise value!

💡 Top tips

⚠ Common mistakes

You can now spot a normal distribution in a real-world problem, write the notation properly, sketch the curve, and use symmetry plus the 68-95-99.7 rule to answer Paper 1 questions without a calculator. The next note shows you how to calculate any normal probability using your GDC — including the trickier ones where the bounds aren’t nice multiples of σ.

Need help with the Normal Distribution?

Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.

Book Free Session →