Last note you described correlation in words (strong / weak / positive / negative). Pearson’s product-moment correlation coefficient turns that description into a single number, r, between −1 and +1. The sign gives the direction, the magnitude gives the strength. You’ll get r from your GDC in seconds — the real skill is reading what it tells you.
The test: if |
r| > critical value ⇒ a linear model is appropriate. If |
r| ≤ critical value ⇒ not enough evidence for a linear model.
Three things PMCC will NOT tell you: (1) whether the relationship is curved — r only sees lines; (2) whether x causes y — correlation ≠ causation, always; (3) whether outliers are distorting the answer — r is highly sensitive to extreme points. Always sketch the scatter alongside computing r.
🧭 Recipe — any PMCC question
- Enter the bivariate data into the GDC (x in L1, y in L2).
- Run Linear Regression (model y = ax + b) and read off r.
- Quote to 3 sf. Check the sign matches the gradient sign of any line of best fit.
- Compare |r| to the critical value if given. If |r| > critical ⇒ a linear model is appropriate.
- Interpret in context: state strength (close to 1 / moderate / weak), direction (positive / negative), and what that means for the real variables.
Worked examples
WE 1Compute r and interpret — positive case
Eight piano students recorded weekly practice time (x, hours) and score (y, %) in a music exam.
x: 2, 3, 4, 5, 6, 7, 8, 9 | y: 42, 50, 58, 65, 72, 78, 85, 90
(a) Find Pearson’s r. (b) Comment on the result in context.
(a) GDC: enter x in L1, y in L2, run LinReg(ax+b)
r = 0.99814…
r = 0.998 (3 sf)
(b) interpret
r is very close to +1
very strong positive linear correlation
more practice ↔ higher exam score
(a) r = 0.998 · (b) very strong positive linear correlation
always interpret in context — “very strong positive linear correlation” earns the strength + direction + linear marks, and “more practice means higher score” ties it back to the real-world variables.
WE 2Compute r and interpret — negative case
Seven months of weather and utility data: outdoor temperature (x, °C) and the household’s monthly heating bill (y, $).
x: 5, 10, 12, 15, 18, 22, 25 | y: 250, 180, 165, 130, 95, 60, 40
(a) Find r. (b) Comment on the result.
(a) enter data, run LinReg
r = −0.99448…
r = −0.994 (3 sf)
(b) interpret
|r| very close to 1, sign is negative
very strong negative linear correlation
higher temperatures ↔ lower heating bills
(a) r = −0.994 · (b) very strong negative linear correlation
a NEGATIVE r doesn’t mean “less correlation” — |r| = 0.994 is just as strong as +0.994. The sign only tells you the direction.
WE 3Read four r values without calculating
For each value of r, state strength and direction. (a) r = −0.98 (b) r = 0.15 (c) r = 0.78 (d) r = −0.42
use |r| zones: <0.4 weak, 0.4−0.7 moderate, >0.7 strong
(a) |r| = 0.98, sign −
strong negative linear correlation
(b) |r| = 0.15, sign +
weak positive linear correlation
(c) |r| = 0.78, sign +
strong positive linear correlation
(d) |r| = 0.42, sign −
moderate negative linear correlation
(a) strong − · (b) weak + · (c) strong + · (d) moderate −
the zone boundaries (0.4, 0.7) are conventions, not exact rules. In an exam, “moderate” and “strong” are both reasonable near 0.7. Always include the word “linear”.
WE 4Test against a critical value
A researcher records n = 10 paired observations and computes r = 0.71. The critical value for n = 10 at the 5% level is 0.632. Decide whether the data supports a linear model.
compare |r| to critical value
|r| = 0.71
critical value = 0.632
0.71 > 0.632 ✓
conclude
since |r| exceeds the critical value,
a linear model IS appropriate
|r| > critical ⇒ linear model is appropriate
always use the ABSOLUTE value of r when comparing to a critical value — the test is about strength, not direction. r = −0.71 would pass the test just the same as r = +0.71.
WE 5The effect of one outlier on r
Six paired data points are recorded:
(1, 3), (2, 5), (3, 7), (4, 9), (5, 11), (6, 1)
(a) Find r using all six points. (b) The last point (6, 1) is identified as an outlier. Find r with this point removed. (c) Comment on the effect.
(a) GDC with all 6 points
r = 0.143 (3 sf)
looks like weak positive correlation
(b) remove (6, 1), use 5 points
the 5 remaining points lie exactly on y = 2x + 1
r = 1.000 (perfect positive linear)
(c) the outlier moved r from 0.143 to 1
a single outlier completely DISGUISED a perfect line
PMCC is highly sensitive to outliers
(a) r = 0.143 · (b) r = 1 · (c) outlier hides a perfect linear pattern
this is why you always plot the scatter before believing r. If you saw r = 0.14, you’d assume “no correlation” — but a quick sketch reveals 5 points on a line plus one freak. Sketch FIRST, then trust r.
WE 6Full interpretation with critical value — in context
A sleep researcher measures hours of sleep (x) and reaction time (y, ms) for 7 volunteers.
x: 4, 5, 6, 7, 8, 9, 10 | y: 410, 380, 350, 320, 305, 280, 260
For n = 7 the critical value at the 5% level is 0.754.
(a) Find r. (b) Test whether a linear model is appropriate. (c) Interpret the result in context.
(a) GDC LinReg
r = −0.99509…
r = −0.995 (3 sf)
(b) compare |r| to 0.754
|r| = 0.995 > 0.754 ✓
a linear model IS appropriate
(c) interpret
very strong negative linear correlation
more sleep is associated with FASTER reactions
(but correlation ≠ causation)
r = −0.995; linear model appropriate; sleep ↑ ⇒ reaction time ↓
three-part answer for any PMCC-in-context question: NUMBER (r to 3sf), TEST (compared to critical value), MEANING (sentence in real-world units). Mentioning “correlation does not imply causation” earns a soft mark.
💡 Top tips
- Always quote r to 3 significant figures unless told otherwise. Examiners check this.
- Match the sign of r to the gradient sign of any regression line — they must agree. If they don’t, recheck your GDC entries.
- Use |r|, not r, when comparing to a critical value. The test is two-sided.
- Sketch the scatter on the GDC before trusting r. Outliers and curves both fool the coefficient.
- Interpret in context: every PMCC answer needs a sentence in real-world units — “more X is associated with higher Y” — not just “strong positive”.
⚠ Common mistakes
- Writing r outside [−1, 1]: |r| > 1 is mathematically impossible. If your GDC shows that, you’ve entered data wrong.
- Confusing r with r2: both appear in GDC output. r2 is the “coefficient of determination” — always non-negative. r is the one with the sign.
- Claiming causation from a high r: |r| = 0.99 still doesn’t mean X causes Y. Always consider lurking variables.
- Trusting r ≈ 0 as “no relationship”: PMCC only detects LINEAR patterns. A perfect parabola can give r = 0.
- Ignoring outliers: as WE 5 showed, one freak point can hide a perfect line OR fake a strong correlation. Always plot first.
Next up: Spearman’s rank correlation coefficient (rs). When PMCC isn’t appropriate — non-linear monotonic relationships, ordinal data, or outlier-heavy samples — Spearman steps in. The trick: rank the data first, then compute PMCC on the RANKS. Same engine, more robust input.
Need help with AI SL Correlation & Regression?
Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.
Book Free Session →