IB Maths AA SL Topic 4 — Probability Distributions Paper 1 & 2 ~10 min read

The Normal Distribution

Heights of students, the weight of apples in a crate, marks on a maths test — most natural measurements cluster around an average, with fewer values appearing as you move further away. That’s a normal distribution: the famous bell curve. It’s the most important continuous distribution in statistics, and you’ll meet it on almost every Paper 2.

📘 What you need to know

The normal distribution is continuous and bell-shaped — symmetric about its mean.
Notation: X ∼ N(μ, σ²) — that’s mean, then variance (σ squared, NOT σ).
For a normal distribution, mean = median = mode (all sit at the centre).
The total area under the curve = 1.
68-95-99.7 rule: ~68% of values lie within 1σ of the mean, ~95% within 2σ, ~99.7% within 3σ.
Probabilities are areas under the curve. P(X = a single value) = 0 for any specific value.
Because it’s continuous, P(X < a) = P(X ≤ a) — strict and weak inequalities give the same answer (unlike the binomial!).

What does a normal distribution look like?

Imagine you measured the height of every 16-year-old in your country. Most would be near the average (say 168 cm). Some would be a bit taller, some a bit shorter. Very few would be extremely tall or extremely short. If you plotted all those heights on a graph, you’d get a smooth, symmetric bell shape — that’s a normal distribution.

The classic bell curve

What kinds of things are normally distributed?

Lots of natural and human-measured quantities follow this shape, especially when you’re measuring something where most values cluster around an average:

📏

Heights

of people, plants, animals

⚖️

Weights

of fruit, packages, newborns

📝

Test scores

IQ, exam marks, IB scores

⏱️

Times

to run a race, react to a signal

A normal distribution doesn’t always fit perfectly — but if a real-world measurement is roughly symmetric around an average and very extreme values are rare, the normal model is usually a good description.

The bell curve’s key properties

Every normal distribution has the same characteristic shape, no matter the scale. Here’s what’s always true about it:

It’s symmetric — fold the curve down the middle and the two halves match exactly.
The mean, median, and mode are all equal — they all sit at the centre, where the peak is.
It’s bell-shaped — the curve rises smoothly to the mean, then falls away symmetrically.
It never touches the x-axis — the tails just keep getting closer (we say they’re asymptotic).
The total area under the curve is 1 — because all probabilities together must add to 1.

🤔 Why does the area equal 1?

The area under any probability distribution curve represents total probability. Since every value of X must lie somewhere on the curve, and probabilities have to sum to 1, the entire shaded area beneath the curve adds up to 1. This is a rule for any continuous distribution, not just the normal one.

The notation: X ∼ N(μ, σ²)

If X is normally distributed, we write it like this. Two numbers tell you everything about the curve — where it’s centred, and how spread out it is.

X ∼ N(μ, σ²)

Xthe random variable (the measurement)

μthe mean — where the curve is centred

σ²the variance — how spread out the curve is

For example, if students’ heights have mean 168 cm and standard deviation 6 cm, then variance = 6² = 36, so:

X ∼ N(168, 36)

⚠️

The classic mistake — variance, not standard deviation!

In the bracket, the second number is variance (σ²), not standard deviation (σ). If you’re told σ = 6, the notation uses 36, not 6. Get this wrong and your whole answer falls apart.

When you put numbers into your GDC for a normal distribution, it usually asks for standard deviation directly (σ, not σ²). So in the notation you write 36, but in the calculator you’d type 6. Read carefully both times.

The 68-95-99.7 rule (the most useful thing on this page)

Here’s a fact you’ll use over and over. For any normal distribution, the same rough percentages of values fall within 1, 2, and 3 standard deviations of the mean:

The empirical rule — 68 / 95 / 99.7

So if test scores are normally distributed with mean 70 and standard deviation 10:

About 68% of students score between 60 and 80 (within 1σ).
About 95% score between 50 and 90 (within 2σ).
About 99.7% score between 40 and 100 (within 3σ).

🧠

Remember the rule as 68 → 95 → 99.7

One sigma each side gets you most students (68%), two sigmas covers nearly everyone (95%), three sigmas covers almost all of them (99.7%). The numbers always grow as you move outward — never the other way.

Half on each side

Because the curve is symmetric, you can split these percentages in two. For example:

About 34% of values lie between μ and μ + σ (half of 68%).
About 13.5% lie between μ + σ and μ + 2σ (half of 95% − 68%, divided by 2).
Only about 2.5% lie above μ + 2σ (half of the leftover 5%).

If a question gives you nice “round σ” values like μ + σ or μ − 2σ, you can solve it with this rule alone — no GDC needed. That’s a Paper 1 favourite.

Probabilities are areas, not heights

This is the biggest mental shift coming from the binomial distribution. With binomial, P(X = 5) was a real positive number — the height of one bar. With normal, it’s zero.

Why? Because the normal distribution is continuous — X can take infinitely many values. The probability of hitting any one exact value (like exactly 168.000000… cm) is essentially zero. What we measure instead is the area under the curve over a range:

P(X < a) = area to the left of a
P(X > a) = area to the right of a
P(a < X < b) = area between a and b

Probabilities as areas under the curve

📍

Strict and weak inequalities mean the same thing here

Because P(X = exact value) = 0 for any normal distribution, we get a nice bonus: P(X < a) and P(X ≤ a) are equal. So unlike with the binomial, you don’t have to worry about converting < into ≤ here. They’re the same.

How μ and σ change the shape

Both numbers in N(μ, σ²) tell you something different about the curve:

Changing μ shifts the whole curve left or right — the shape stays exactly the same.
Changing σ changes the width and height. A small σ → tall, narrow curve (values close to mean). A large σ → short, wide curve (values more spread out).

Effect of changing the standard deviation

🤔 Why does a smaller σ make the curve taller?

The total area under the curve has to stay equal to 1 (because total probability = 1). So if the curve gets narrower, it has to get taller to keep the same area. Same total amount of “stuff” — just packed differently.

Quick comparison: binomial vs normal

Both are probability distributions, but they work in different ways. Here’s how they line up:

📊 BINOMIAL

Discrete — only whole-number values
Counts successes in n trials
Bar graph (separate vertical lines)
P(X = 5) is a real probability
P(X < 5) ≠ P(X ≤ 5) — careful!
Defined by n and p

🔔 NORMAL

Continuous — any real value
Models measurements (heights, times, etc.)
Smooth bell-shaped curve
P(X = 5) = 0 for any single value
P(X < 5) = P(X ≤ 5) — they’re equal!
Defined by μ and σ²

Worked examples

WE 1

Set up the notation

The mass of a particular brand of apple is normally distributed with mean 150 g and standard deviation 12 g. Let X be the mass of a randomly chosen apple, in grams. Write down the distribution of X.

Watch out — the bracket takes variance, not standard deviation. Identify: μ = 150, σ = 12 Variance: σ² = 12² = 144 X ∼ N(150, 144) writing N(150, 12) instead of N(150, 144) is a classic SL slip — square the SD!

WE 2

Use the 68-95-99.7 rule

Test scores are normally distributed with mean 70 and standard deviation 8. Roughly what percentage of students scored between 62 and 78?

62 = μ − σ and 78 = μ + σ. So this is “within 1 standard deviation”. Check: 70 − 8 = 62 ✓ and 70 + 8 = 78 ✓ Within 1σ: ≈ 68% ≈ 68% of students always check first if the bounds are at μ ± σ, μ ± 2σ, or μ ± 3σ — saves you the GDC!

WE 3

Use symmetry

The heights of seedlings are normally distributed with mean 12 cm. The probability that a seedling is shorter than 9 cm is 0.18.

(a) Find P(seedling is taller than 15 cm).
(b) Find P(seedling is between 9 and 15 cm).

9 is 3 below the mean, 15 is 3 above. They’re symmetric about μ = 12.part (a) — taller than 15 By symmetry: P(X > 15) = P(X < 9) = 0.18 P(X > 15) = 0.18part (b) — between 9 and 15 Total area = 1. Subtract the two tails: P(9 < X < 15) = 1 − 0.18 − 0.18 = 0.64 P(9 < X < 15) = 0.64 always sketch the curve! symmetry questions become obvious once you see the picture.

WE 4

Find a tail probability using the rule

The weights of newborn babies at a hospital are normally distributed with mean 3.4 kg and standard deviation 0.4 kg. Approximately what percentage of babies weigh more than 4.2 kg?

4.2 = 3.4 + 2(0.4), so this is “more than μ + 2σ” — the upper 2σ tail. 95% lie within 2σ → 5% lie outside (in both tails). By symmetry, half is in each tail: P(X > μ + 2σ) = 5% ÷ 2 = 2.5% ≈ 2.5% of babies “outside 2σ” = 5%, then split evenly because of symmetry. easy mark on Paper 1!

WE 5

Real exam-style multi-part

The IQ scores of a population are normally distributed with mean 100 and standard deviation 15. Let X be the IQ score of a randomly chosen person.

(a) State the distribution of X.
(b) Roughly what percentage of people have an IQ between 85 and 115?
(c) Roughly what percentage have an IQ above 130?
(d) P(X < 70) = 0.0228. Find P(70 < X < 130).

μ = 100, σ = 15. Use the 68-95-99.7 rule and symmetry.part (a) σ² = 15² = 225 X ∼ N(100, 225)part (b) — between 85 and 115 85 = μ − σ, 115 = μ + σ → within 1σ ≈ 68%part (c) — above 130 130 = μ + 2σ → upper 2σ tail 5% ÷ 2 = 2.5% ≈ 2.5%part (d) — between 70 and 130 70 = μ − 2σ, 130 = μ + 2σ → within 2σ By symmetry P(X > 130) = P(X < 70) = 0.0228 P(70 < X < 130) = 1 − 2(0.0228) = 0.9544 P(70 < X < 130) = 0.9544 notice 0.9544 ≈ 95% — that’s the 95% rule giving us a more precise value!

💡 Top tips

Sketch the curve for every question — even a rough drawing. Symmetry and tail problems become obvious once you see the picture.
Check if bounds are nice (μ ± σ, μ ± 2σ, μ ± 3σ) before reaching for the GDC. The 68-95-99.7 rule might be all you need.
Remember the bracket takes σ², not σ. Always square it before writing N(μ, σ²).
Total area = 1. When you find a probability, the rest of the curve adds to (1 − that probability). Use this constantly.
Symmetry is your best friend. P(X < μ − a) = P(X > μ + a) — same area on each side.
Strict vs weak inequalities don’t matter here. P(X < a) = P(X ≤ a).
Half of each rule. 68% inside 1σ → 34% on each side of μ. Useful for one-tailed regions.
Always include units if the question is about a real measurement (cm, kg, marks).

⚠ Common mistakes

Writing N(μ, σ) instead of N(μ, σ²). The bracket takes variance, always squared.
Confusing 68% / 95% / 99.7% with their order. Always: 1σ → 68%, 2σ → 95%, 3σ → 99.7%.
Forgetting to halve when doing one tail. “Above μ + 2σ” is half of (100% − 95%) = 2.5%, not 5%.
Treating P(X = 5) as a non-zero probability. For continuous distributions, single-value probabilities are always 0.
Assuming any data is normal. Skewed or bounded data (like income or scores capped at 100) usually isn’t.
Mixing up mean and standard deviation when reading a problem. Re-read carefully; the question often gives σ or σ², not always both.
Not using symmetry. Many questions can be solved without a calculator if you spot that the bounds are mirror images of the mean.
Forgetting the curve never touches the axis. P(X < very large value) is always slightly less than 1, never equal.

You can now spot a normal distribution in a real-world problem, write the notation properly, sketch the curve, and use symmetry plus the 68-95-99.7 rule to answer Paper 1 questions without a calculator. The next note shows you how to calculate any normal probability using your GDC — including the trickier ones where the bounds aren’t nice multiples of σ.

Need help with the Normal Distribution?

Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.

Book Free Session →

The Normal Distribution

📘 What you need to know

What does a normal distribution look like?

What kinds of things are normally distributed?

The bell curve’s key properties

🤔 Why does the area equal 1?

The notation: X ∼ N(μ, σ²)

The classic mistake — variance, not standard deviation!

The 68-95-99.7 rule (the most useful thing on this page)

Remember the rule as 68 → 95 → 99.7

Half on each side

Probabilities are areas, not heights

Strict and weak inequalities mean the same thing here

How μ and σ change the shape

🤔 Why does a smaller σ make the curve taller?

Quick comparison: binomial vs normal

📊 BINOMIAL

🔔 NORMAL

Worked examples

💡 Top tips

⚠ Common mistakes

Need help with the Normal Distribution?

Quick Links

Contact us

Follow us

The Normal Distribution

📘 What you need to know

What does a normal distribution look like?

What kinds of things are normally distributed?

The bell curve’s key properties

🤔 Why does the area equal 1?

The notation: X ∼ N(μ, σ2)

The classic mistake — variance, not standard deviation!

The 68-95-99.7 rule (the most useful thing on this page)

Remember the rule as 68 → 95 → 99.7

Half on each side

Probabilities are areas, not heights

Strict and weak inequalities mean the same thing here

How μ and σ change the shape

🤔 Why does a smaller σ make the curve taller?

Quick comparison: binomial vs normal

📊 BINOMIAL

🔔 NORMAL

Worked examples

💡 Top tips

⚠ Common mistakes

Need help with the Normal Distribution?

Quick Links

Contact us

Follow us

The notation: X ∼ N(μ, σ²)