IB Maths AA SL Topic 4 — Correlation & Regression Paper 1 & 2 ~9 min read

Pearson’s Product-Moment Correlation Coefficient

“Strong” or “weak” is a bit subjective when looking at a scatter diagram. Pearson’s PMCC (or just r) is the maths fix — a single number between −1 and 1 that tells you exactly how strong the linear correlation is.

📘 What you need to know

What is r?

The PMCC takes a scatter diagram and squashes the entire pattern down into one number. That one number tells you both the direction and the strength of the linear relationship at the same time.

The r scale — from −1 to +1

The PMCC scale
−1
−0.5
0
+0.5
+1
perfect negative
no correlation
perfect positive
← stronger negative stronger positive →

Three rules for reading r

  1. Sign tells direction: r > 0 → positive correlation. r < 0 → negative correlation.
  2. Distance from 0 tells strength: The closer to ±1, the stronger. The closer to 0, the weaker.
  3. Special values:  r = 1 → perfect positive.  r = −1 → perfect negative.  r = 0 → no linear correlation.
🧠

Memory trick: “Sign for direction, size for strength”

The sign (+ or −) tells you whether r is positive or negative — that’s the direction. The size (how close to 1) tells you how strong it is. Two separate things, both packed into one number.

Translating r into words

You’ll often need to describe the correlation based on the value of r. Here’s a rough guide for IB exams (use the absolute value |r|):

Perfect

|r| = 1
All points lie exactly on a straight line.

Strong

0.7 ≤ |r| < 1
Points lie close to a straight line.

Weak

|r| < 0.5
Points are scattered, but a slope may be visible.
Between 0.5 and 0.7 is sometimes called “moderate” correlation. The IB doesn’t always require this band — usually they accept “moderately strong” or “moderately weak”. When in doubt, also mention the actual value of r in your answer.

How to find r on your GDC

The actual formula for r is messy, and the IB doesn’t want you to use it. Just enter the data into your calculator’s stats mode.

The GDC method (every exam, every time)

  1. Enter the x-values in List 1 (or L1).
  2. Enter the y-values in List 2 (or L2). Make sure the pairs line up correctly!
  3. Run “Linear Regression” (LinReg or “ax + b”) — set XList = L1, YList = L2.
  4. Read off r from the output. The calculator also gives you , a, and b for free.
📍

Can’t see r on your calculator?

On Casio: enter LINK → SET UP → “STAT” → switch “Stat Wizard” or look for “Diagnostic On”. On TI: 2nd → CATALOG → DiagnosticOn → ENTER. Once it’s on, r will appear in the LinReg output.

What the formula looks like (just for understanding)

Pearson’s PMCC formula (you don’t need to memorise this!)
r = SxySx Sy

🤔 What does the formula actually do?

The top part (Sxy) measures how much x and y change together — called the covariance. The bottom part (SxSy) measures how much each variable changes on its own. Dividing them gives a “scaled” number that always ends up between −1 and 1.

You don’t need to calculate this — just appreciate that it’s measuring “how much do these two move together, relative to how much they move on their own?”

When does r suggest a linear relationship?

If you get r = 0.6, is that “good enough” to use a linear model? It depends on how many data points you have. The IB will give you a critical value in the question — you compare |r| to that value.

The critical value test
If  |r| > critical value  → a linear model is appropriate
If  |r| ≤ critical value  → a linear model is NOT appropriate

How does the critical value work?

Sample size nTypical critical value (5%)Meaning
n = 5≈ 0.878Need very strong r to justify linear model
n = 10≈ 0.632Moderately strong r is enough
n = 20≈ 0.444Even moderate r counts
n = 50≈ 0.279Even quite weak r becomes meaningful

Notice: with more data, you need a smaller r to reach significance. That’s because more data makes patterns more reliable.

You’ll never have to memorise critical values — the question always gives them. Just compare |r| with the critical value provided. Bigger? Linear model is good. Smaller? It’s not.

What r doesn’t tell you

Pearson’s r is powerful — but it has limits.

📍

The 3 things r won’t tell you

1. r only measures LINEAR relationships. A perfect curve (like a parabola) can give r ≈ 0 even though the variables are perfectly related.

2. r doesn’t prove causation. Same rule from the last note: a strong r doesn’t mean one variable causes the other.

3. r is sensitive to outliers. One extreme point can drag r way off — always glance at a scatter diagram before trusting r.

🤔 Why doesn’t r catch curves?

Imagine the relationship y = x² for −5 ≤ x ≤ 5. The points form a perfect U-shape — totally predictable. But because the U goes down on the left and up on the right, the linear correlation cancels out, and r ≈ 0.

That’s why you should always sketch a scatter diagram first. r alone can mislead you on curved data.

Worked examples

WE 1

Find r for two test scores

The table shows the scores of 8 students for a maths test (x) and an English test (y):

Maths (x)718375261687582
English (y)5391217414997

(a) Write down the value of Pearson’s PMCC, r.   (b) Comment on the value.

Use the GDC’s LinReg function — never calculate by hand!part (a) Enter Maths in L1, English in L2. Run LinReg(ax+b) with XList = L1, YList = L2: r = 0.79433… r = 0.794 (3 s.f.)part (b) r is positive → positive correlation. |r| = 0.794, close to 1 → strong correlation. Strong positive linear correlation always describe the SIGN and the STRENGTH — that’s what the marks want
WE 2

Interpret different values of r

For each value of r, describe the correlation.

(a) r = 0.92    (b) r = −0.85    (c) r = 0.15    (d) r = −0.41

Sign tells direction. Distance from 0 tells strength.part (a) Positive, |r| = 0.92 (very close to 1): Strong positive linear correlationpart (b) Negative, |r| = 0.85 (close to 1): Strong negative linear correlationpart (c) Positive, |r| = 0.15 (close to 0): Weak positive linear correlationpart (d) Negative, |r| = 0.41 (closer to 0 than 1): Weak negative linear correlation always include all three words — strength, sign, AND “linear”!
WE 3

Use a critical value to test for linear correlation

For a sample of 12 students, the PMCC between hours of revision and exam mark is r = 0.682. The critical value at the 5% level for n = 12 is 0.576. Test whether a linear model is appropriate.

r = 0.682,   critical value = 0.576 Compare |r| with the critical value. Find |r|: |0.682| = 0.682 Compare: 0.682 > 0.576 ✓ Since |r| is bigger than the critical value: A linear model is appropriate always state the comparison clearly — “0.682 > 0.576” gets the marks
WE 4

Compare two correlation coefficients

Two studies measure how well two factors predict success in a marathon:

Study A: hours of training vs finish time, r = −0.88

Study B: shoe price vs finish time, r = −0.21

Which factor better predicts finish time? Justify your answer.

Compare the absolute values of r — bigger |r| means stronger linear link. |r| for Study A: 0.88 |r| for Study B: 0.21 0.88 > 0.21 — Study A’s correlation is much stronger. Both are negative — more training/expensive shoes = faster (lower time). Hours of training predict finish time much better than shoe price always compare |r| — the sign just says direction, not strength
WE 5

When r can mislead

For data in the table below, the calculated value of r is approximately 0.

x−3−2−10123
y9410149

Does r ≈ 0 mean there’s no relationship between x and y? Explain.

r only measures LINEAR relationships — it can’t see curves. Notice the pattern: y = x² There IS a perfect relationship — but it’s a curve (parabola), not a straight line. As x goes from negative to positive, y goes down then up — linear correlation cancels out. No — there’s a strong (non-linear) relationship that r misses always sketch the data — r alone can mislead you on curves!

💡 Top tips

⚠ Common mistakes

If r shows the data has a strong linear pattern, the next question is: what’s the equation of the best line through it? That’s what the next note — Linear Regression — is all about.

Need help with Pearson’s PMCC?

Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.

Book Free Session →