IB Maths AI SL Correlation & Regression Paper 1 & 2 PMCC ~7 min read

Pearson’s PMCC (r)

Last note you described correlation in words (strong / weak / positive / negative). Pearson’s product-moment correlation coefficient turns that description into a single number, r, between −1 and +1. The sign gives the direction, the magnitude gives the strength. You’ll get r from your GDC in seconds — the real skill is reading what it tells you.

📘 What you need to know

The r scale — reading the number

Every PMCC question ends with one of two tasks: either compute r and interpret it, or be given r and interpret it. Both rest on the same intuition — this scale:

The PMCC scale — what r tells you r ≈ −0.95 r ≈ 0 r ≈ +0.95 −1 −0.7 −0.4 0 +0.4 +0.7 +1 STRONG − MOD − WEAK / NONE MOD + STRONG + perfect down line random cloud perfect up line
The PMCC scale. Sign of r tells you direction (left = negative, right = positive); magnitude tells you strength (closer to the edges = stronger). The middle band (|r| < 0.4) is where points look like a scattered cloud with no clear linear trend.

Getting r from the GDC

You’ll never compute r by hand — the GDC does it. The formula is in the booklet (handy for understanding, not for doing):

PMCC — formula (your GDC handles it) r = SxySx Sy   (a “covariance-over-spread” ratio)

GDC steps (model varies, but the idea is universal): enter the x-values as L1, the y-values as L2, run Linear Regression (ax + b). The output gives a, b, r and r2. Read r off; quote to 3 sf.

Critical values and limitations

Critical values tell you whether the computed r is “big enough” to call the relationship linear. The exam supplies them; they depend on the sample size n.

The test: if |r| > critical value ⇒ a linear model is appropriate. If |r| ≤ critical value ⇒ not enough evidence for a linear model.

Three things PMCC will NOT tell you: (1) whether the relationship is curved — r only sees lines; (2) whether x causes y — correlation ≠ causation, always; (3) whether outliers are distorting the answer — r is highly sensitive to extreme points. Always sketch the scatter alongside computing r.

🧭 Recipe — any PMCC question

  1. Enter the bivariate data into the GDC (x in L1, y in L2).
  2. Run Linear Regression (model y = ax + b) and read off r.
  3. Quote to 3 sf. Check the sign matches the gradient sign of any line of best fit.
  4. Compare |r| to the critical value if given. If |r| > critical ⇒ a linear model is appropriate.
  5. Interpret in context: state strength (close to 1 / moderate / weak), direction (positive / negative), and what that means for the real variables.

Worked examples

WE 1

Compute r and interpret — positive case

Eight piano students recorded weekly practice time (x, hours) and score (y, %) in a music exam.

x: 2, 3, 4, 5, 6, 7, 8, 9  |  y: 42, 50, 58, 65, 72, 78, 85, 90

(a) Find Pearson’s r. (b) Comment on the result in context.

(a) GDC: enter x in L1, y in L2, run LinReg(ax+b) r = 0.99814… r = 0.998 (3 sf) (b) interpret r is very close to +1 very strong positive linear correlation more practice ↔ higher exam score (a) r = 0.998 · (b) very strong positive linear correlation always interpret in context — “very strong positive linear correlation” earns the strength + direction + linear marks, and “more practice means higher score” ties it back to the real-world variables.
WE 2

Compute r and interpret — negative case

Seven months of weather and utility data: outdoor temperature (x, °C) and the household’s monthly heating bill (y, $).

x: 5, 10, 12, 15, 18, 22, 25  |  y: 250, 180, 165, 130, 95, 60, 40

(a) Find r. (b) Comment on the result.

(a) enter data, run LinReg r = −0.99448… r = −0.994 (3 sf) (b) interpret |r| very close to 1, sign is negative very strong negative linear correlation higher temperatures ↔ lower heating bills (a) r = −0.994 · (b) very strong negative linear correlation a NEGATIVE r doesn’t mean “less correlation” — |r| = 0.994 is just as strong as +0.994. The sign only tells you the direction.
WE 3

Read four r values without calculating

For each value of r, state strength and direction. (a) r = −0.98   (b) r = 0.15   (c) r = 0.78   (d) r = −0.42

use |r| zones: <0.4 weak, 0.4−0.7 moderate, >0.7 strong (a) |r| = 0.98, sign − strong negative linear correlation (b) |r| = 0.15, sign + weak positive linear correlation (c) |r| = 0.78, sign + strong positive linear correlation (d) |r| = 0.42, sign − moderate negative linear correlation (a) strong − · (b) weak + · (c) strong + · (d) moderate − the zone boundaries (0.4, 0.7) are conventions, not exact rules. In an exam, “moderate” and “strong” are both reasonable near 0.7. Always include the word “linear”.
WE 4

Test against a critical value

A researcher records n = 10 paired observations and computes r = 0.71. The critical value for n = 10 at the 5% level is 0.632. Decide whether the data supports a linear model.

compare |r| to critical value |r| = 0.71 critical value = 0.632 0.71 > 0.632 ✓ conclude since |r| exceeds the critical value, a linear model IS appropriate |r| > critical ⇒ linear model is appropriate always use the ABSOLUTE value of r when comparing to a critical value — the test is about strength, not direction. r = −0.71 would pass the test just the same as r = +0.71.
WE 5

The effect of one outlier on r

Six paired data points are recorded:

(1, 3), (2, 5), (3, 7), (4, 9), (5, 11), (6, 1)

(a) Find r using all six points. (b) The last point (6, 1) is identified as an outlier. Find r with this point removed. (c) Comment on the effect.

(a) GDC with all 6 points r = 0.143 (3 sf) looks like weak positive correlation (b) remove (6, 1), use 5 points the 5 remaining points lie exactly on y = 2x + 1 r = 1.000 (perfect positive linear) (c) the outlier moved r from 0.143 to 1 a single outlier completely DISGUISED a perfect line PMCC is highly sensitive to outliers (a) r = 0.143 · (b) r = 1 · (c) outlier hides a perfect linear pattern this is why you always plot the scatter before believing r. If you saw r = 0.14, you’d assume “no correlation” — but a quick sketch reveals 5 points on a line plus one freak. Sketch FIRST, then trust r.
WE 6

Full interpretation with critical value — in context

A sleep researcher measures hours of sleep (x) and reaction time (y, ms) for 7 volunteers.

x: 4, 5, 6, 7, 8, 9, 10  |  y: 410, 380, 350, 320, 305, 280, 260

For n = 7 the critical value at the 5% level is 0.754.

(a) Find r. (b) Test whether a linear model is appropriate. (c) Interpret the result in context.

(a) GDC LinReg r = −0.99509… r = −0.995 (3 sf) (b) compare |r| to 0.754 |r| = 0.995 > 0.754 ✓ a linear model IS appropriate (c) interpret very strong negative linear correlation more sleep is associated with FASTER reactions (but correlation ≠ causation) r = −0.995; linear model appropriate; sleep ↑ ⇒ reaction time ↓ three-part answer for any PMCC-in-context question: NUMBER (r to 3sf), TEST (compared to critical value), MEANING (sentence in real-world units). Mentioning “correlation does not imply causation” earns a soft mark.

💡 Top tips

  • Always quote r to 3 significant figures unless told otherwise. Examiners check this.
  • Match the sign of r to the gradient sign of any regression line — they must agree. If they don’t, recheck your GDC entries.
  • Use |r|, not r, when comparing to a critical value. The test is two-sided.
  • Sketch the scatter on the GDC before trusting r. Outliers and curves both fool the coefficient.
  • Interpret in context: every PMCC answer needs a sentence in real-world units — “more X is associated with higher Y” — not just “strong positive”.

⚠ Common mistakes

  • Writing r outside [−1, 1]: |r| > 1 is mathematically impossible. If your GDC shows that, you’ve entered data wrong.
  • Confusing r with r2: both appear in GDC output. r2 is the “coefficient of determination” — always non-negative. r is the one with the sign.
  • Claiming causation from a high r: |r| = 0.99 still doesn’t mean X causes Y. Always consider lurking variables.
  • Trusting r ≈ 0 as “no relationship”: PMCC only detects LINEAR patterns. A perfect parabola can give r = 0.
  • Ignoring outliers: as WE 5 showed, one freak point can hide a perfect line OR fake a strong correlation. Always plot first.
Next up: Spearman’s rank correlation coefficient (rs). When PMCC isn’t appropriate — non-linear monotonic relationships, ordinal data, or outlier-heavy samples — Spearman steps in. The trick: rank the data first, then compute PMCC on the RANKS. Same engine, more robust input.

Need help with AI SL Correlation & Regression?

Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.

Book Free Session →