IB Maths AA SL
Topic 4 — Correlation & Regression
Paper 1 & 2
~9 min read
Pearson’s Product-Moment Correlation Coefficient
“Strong” or “weak” is a bit subjective when looking at a scatter diagram. Pearson’s PMCC (or just r) is the maths fix — a single number between −1 and 1 that tells you exactly how strong the linear correlation is.
📘 What you need to know
- Pearson’s product-moment correlation coefficient (PMCC) is denoted r.
- r measures linear correlation only — it doesn’t detect curves.
- r is always between −1 and 1: −1 ≤ r ≤ 1.
- Sign of r: positive → positive correlation; negative → negative correlation.
- Closer to ±1 = stronger linear correlation. Closer to 0 = weaker.
- Use your GDC to find r — never calculate by hand in an exam.
- If |r| > the critical value given in the question, a linear model is appropriate.
What is r?
The PMCC takes a scatter diagram and squashes the entire pattern down into one number. That one number tells you both the direction and the strength of the linear relationship at the same time.
The r scale — from −1 to +1
The PMCC scale
−1
−0.5
0
+0.5
+1
perfect negative
no correlation
perfect positive
← stronger negative
stronger positive →
Three rules for reading r
- Sign tells direction: r > 0 → positive correlation. r < 0 → negative correlation.
- Distance from 0 tells strength: The closer to ±1, the stronger. The closer to 0, the weaker.
- Special values: r = 1 → perfect positive. r = −1 → perfect negative. r = 0 → no linear correlation.
🧠Memory trick: “Sign for direction, size for strength”
The sign (+ or −) tells you whether r is positive or negative — that’s the direction. The size (how close to 1) tells you how strong it is. Two separate things, both packed into one number.
Translating r into words
You’ll often need to describe the correlation based on the value of r. Here’s a rough guide for IB exams (use the absolute value |r|):
Perfect
|r| = 1
All points lie exactly on a straight line.
Strong
0.7 ≤ |r| < 1
Points lie close to a straight line.
Weak
|r| < 0.5
Points are scattered, but a slope may be visible.
Between 0.5 and 0.7 is sometimes called “moderate” correlation. The IB doesn’t always require this band — usually they accept “moderately strong” or “moderately weak”. When in doubt, also mention the actual value of r in your answer.
How to find r on your GDC
The actual formula for r is messy, and the IB doesn’t want you to use it. Just enter the data into your calculator’s stats mode.
The GDC method (every exam, every time)
- Enter the x-values in List 1 (or L1).
- Enter the y-values in List 2 (or L2). Make sure the pairs line up correctly!
- Run “Linear Regression” (LinReg or “ax + b”) — set XList = L1, YList = L2.
- Read off r from the output. The calculator also gives you r², a, and b for free.
📍Can’t see r on your calculator?
On Casio: enter LINK → SET UP → “STAT” → switch “Stat Wizard” or look for “Diagnostic On”. On TI: 2nd → CATALOG → DiagnosticOn → ENTER. Once it’s on, r will appear in the LinReg output.
What the formula looks like (just for understanding)
🤔 What does the formula actually do?
The top part (Sxy) measures how much x and y change together — called the covariance. The bottom part (SxSy) measures how much each variable changes on its own. Dividing them gives a “scaled” number that always ends up between −1 and 1.
You don’t need to calculate this — just appreciate that it’s measuring “how much do these two move together, relative to how much they move on their own?”
When does r suggest a linear relationship?
If you get r = 0.6, is that “good enough” to use a linear model? It depends on how many data points you have. The IB will give you a critical value in the question — you compare |r| to that value.
How does the critical value work?
| Sample size n | Typical critical value (5%) | Meaning |
|---|
| n = 5 | ≈ 0.878 | Need very strong r to justify linear model |
| n = 10 | ≈ 0.632 | Moderately strong r is enough |
| n = 20 | ≈ 0.444 | Even moderate r counts |
| n = 50 | ≈ 0.279 | Even quite weak r becomes meaningful |
Notice: with more data, you need a smaller r to reach significance. That’s because more data makes patterns more reliable.
You’ll never have to memorise critical values — the question always gives them. Just compare |r| with the critical value provided. Bigger? Linear model is good. Smaller? It’s not.
What r doesn’t tell you
Pearson’s r is powerful — but it has limits.
📍The 3 things r won’t tell you
1. r only measures LINEAR relationships. A perfect curve (like a parabola) can give r ≈ 0 even though the variables are perfectly related.
2. r doesn’t prove causation. Same rule from the last note: a strong r doesn’t mean one variable causes the other.
3. r is sensitive to outliers. One extreme point can drag r way off — always glance at a scatter diagram before trusting r.
🤔 Why doesn’t r catch curves?
Imagine the relationship y = x² for −5 ≤ x ≤ 5. The points form a perfect U-shape — totally predictable. But because the U goes down on the left and up on the right, the linear correlation cancels out, and r ≈ 0.
That’s why you should always sketch a scatter diagram first. r alone can mislead you on curved data.
Worked examples
WE 1Find r for two test scores
The table shows the scores of 8 students for a maths test (x) and an English test (y):
| Maths (x) | 7 | 18 | 37 | 52 | 61 | 68 | 75 | 82 |
|---|
| English (y) | 5 | 3 | 9 | 12 | 17 | 41 | 49 | 97 |
(a) Write down the value of Pearson’s PMCC, r. (b) Comment on the value.
Use the GDC’s LinReg function — never calculate by hand!part (a)
Enter Maths in L1, English in L2.
Run LinReg(ax+b) with XList = L1, YList = L2:
r = 0.79433…
r = 0.794 (3 s.f.)part (b)
r is positive → positive correlation.
|r| = 0.794, close to 1 → strong correlation.
Strong positive linear correlation
always describe the SIGN and the STRENGTH — that’s what the marks want
WE 2Interpret different values of r
For each value of r, describe the correlation.
(a) r = 0.92 (b) r = −0.85 (c) r = 0.15 (d) r = −0.41
Sign tells direction. Distance from 0 tells strength.part (a)
Positive, |r| = 0.92 (very close to 1):
Strong positive linear correlationpart (b)
Negative, |r| = 0.85 (close to 1):
Strong negative linear correlationpart (c)
Positive, |r| = 0.15 (close to 0):
Weak positive linear correlationpart (d)
Negative, |r| = 0.41 (closer to 0 than 1):
Weak negative linear correlation
always include all three words — strength, sign, AND “linear”!
WE 3Use a critical value to test for linear correlation
For a sample of 12 students, the PMCC between hours of revision and exam mark is r = 0.682. The critical value at the 5% level for n = 12 is 0.576. Test whether a linear model is appropriate.
r = 0.682, critical value = 0.576
Compare |r| with the critical value.
Find |r|: |0.682| = 0.682
Compare: 0.682 > 0.576 ✓
Since |r| is bigger than the critical value:
A linear model is appropriate
always state the comparison clearly — “0.682 > 0.576” gets the marks
WE 4Compare two correlation coefficients
Two studies measure how well two factors predict success in a marathon:
Study A: hours of training vs finish time, r = −0.88
Study B: shoe price vs finish time, r = −0.21
Which factor better predicts finish time? Justify your answer.
Compare the absolute values of r — bigger |r| means stronger linear link.
|r| for Study A: 0.88
|r| for Study B: 0.21
0.88 > 0.21 — Study A’s correlation is much stronger.
Both are negative — more training/expensive shoes = faster (lower time).
Hours of training predict finish time much better than shoe price
always compare |r| — the sign just says direction, not strength
For data in the table below, the calculated value of r is approximately 0.
Does r ≈ 0 mean there’s no relationship between x and y? Explain.
r only measures LINEAR relationships — it can’t see curves.
Notice the pattern: y = x²
There IS a perfect relationship — but it’s a curve (parabola), not a straight line.
As x goes from negative to positive, y goes down then up — linear correlation cancels out.
No — there’s a strong (non-linear) relationship that r misses
always sketch the data — r alone can mislead you on curves!
💡 Top tips
- Always use your GDC. Enter x‘s in L1, y‘s in L2, run LinReg — done.
- Check the data pairs. Make sure each x matches the right y when entering — one slip and your r is wrong.
- Sign for direction, size for strength. Always describe both when answering “comment on r“.
- Round r to 3 s.f. in your final answer unless told otherwise.
- If asked “is a linear model appropriate?”, compare |r| to the given critical value. State the comparison clearly: “0.794 > 0.576, so yes”.
- Diagnostic mode must be ON for some calculators to display r. Check this before the exam!
- Always check for outliers on a scatter diagram — they can dramatically change r.
- r only catches linear patterns. If a scatter looks curved, r ≈ 0 doesn’t mean “no relationship”.
⚠ Common mistakes
- Calculating r by hand. The formula is messy and a single sign error wrecks everything. The IB expects GDC use.
- Saying “r = 0 means no relationship”. It means no linear relationship. There could be a perfect curve.
- Saying “strong r” without specifying positive or negative. The IB wants both: “strong negative linear correlation”.
- Using r to claim causation. Even r = 0.99 doesn’t prove causation — just strong association.
- Forgetting “linear” in your description. The IB explicitly wants this word.
- Comparing r values without using absolute values. r = −0.9 and r = 0.9 are equally strong — sign just tells direction.
- Mixing up x and y on the GDC — always double-check which list is which.
- Not stating the comparison when testing against a critical value. Marks are for the comparison statement, not just the conclusion.
If r shows the data has a strong linear pattern, the next question is: what’s the equation of the best line through it? That’s what the next note — Linear Regression — is all about.
Need help with Pearson’s PMCC?
Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.
Book Free Session →