IB Maths AI SL Correlation & Regression Paper 1 & 2 Spearman’s rank ~7 min read

Spearman’s Rank (rs)

PMCC measures linear correlation. Spearman’s rank rs measures monotonic correlation — whether the data is always going up (or always going down), even if it’s a curve. The trick: rank the data first, then compute PMCC of the ranks. Same engine as Pearson’s, robust input. rs ranges from −1 to +1 just like r.

📘 What you need to know

The big idea — rank, then correlate

Spearman’s beauty: it turns a curved monotonic relationship into a straight one by replacing values with their positions in order. Look:

Ranking straightens a monotonic curve Original data PMCC r ≈ 0.906 0 2 3 4 5 6 0 20 40 60 x: 1, 2, 3, 4, 5, 6 y: 2, 4, 8, 16, 32, 64 rank the values (1, 2, 3, … = positions) After ranking Spearman rs = 1.000 0 2 3 4 5 6 0 2 4 6 x-rank: 1, 2, 3, 4, 5, 6 y-rank: 1, 2, 3, 4, 5, 6
The same six data points before and after ranking. The original (left) follows a clear curve, so Pearson’s r ≈ 0.91 — strong, but not perfect because the relationship isn’t a line. After ranking (right), every rank pair lies on a perfect diagonal: rs = 1, because the data is perfectly monotonic.

Ranking the data (and handling ties)

Ranking is mechanical. For each variable independently:

1. Order the values from smallest to largest (or largest to smallest — just be consistent). 2. Assign rank 1 to the first, rank 2 to the next, and so on, up to rank n. 3. For tied values, give each the average of the ranks they’d occupy. Two tied values stuck at positions 4 and 5 both get rank 4.5.

Why averaging for ties? If you arbitrarily gave one value rank 4 and the other rank 5, you’d be inventing an order that isn’t really there. Averaging keeps the sum of ranks the same (1+2+…+n) and is fair.

Spearman’s rank correlation coefficient rs = PMCC of the ranks of x and the ranks of y

GDC workflow: enter rank(x) as L1 and rank(y) as L2, then run linear regression. The r value the GDC reports is your rs.

Pearson vs Spearman — choosing

Both give a number between −1 and +1; both have the same sign and strength interpretation. They differ on what they measure:

PMCC r: how close to a STRAIGHT LINE. Uses actual values. Sensitive to outliers and to curves.

Spearman rs: how MONOTONIC. Uses only ranks. Robust to outliers; doesn’t care if the curve is straight or smoothly bent — only if it always goes the same way.

Rule of thumb: if rs is much higher than r, your relationship is monotonic but not linear — consider a non-linear model.

🧭 Recipe — computing rs

  1. List the x-values in their original order. Rank them 1 to n (smallest = 1). Average ranks for ties.
  2. Do the same for y-values, ranking independently. Keep each rank aligned with its original data point.
  3. Enter rank(x) into L1, rank(y) into L2 on the GDC.
  4. Run Linear Regression (ax + b). The r value reported = your rs. Quote to 3 sf.
  5. Interpret: strength (close to 1 / moderate / weak), direction (positive = agreement; negative = disagreement), in context of the original variables.

Worked examples

WE 1

Compute rs — basic positive case, no ties

Six employees’ years of experience and annual salaries (in $1000s):

x (years): 1, 3, 5, 7, 9, 11  |  y ($k): 30, 28, 42, 55, 60, 85

(a) Rank both variables. (b) Find rs. (c) Comment.

(a) x already in order ⇒ ranks 1 to 6 x ranks: 1, 2, 3, 4, 5, 6 rank y: sort 28, 30, 42, 55, 60, 85 ⇒ ranks 1 to 6 y values: 30, 28, 42, 55, 60, 85 y ranks: 2, 1, 3, 4, 5, 6 (b) GDC: PMCC of (1,2,3,4,5,6) and (2,1,3,4,5,6) r_s = 0.943 (3 sf) (c) interpret close to +1 ⇒ strong agreement in rankings more experience ⇒ higher salary, almost always (a) y ranks: 2,1,3,4,5,6 · (b) r_s = 0.943 · (c) strong positive monotonic notice only ONE pair is out of order (30 came before 28, breaking the perfect rise). That single swap drops r_s from 1 to 0.943.
WE 2

Compute rs — negative case

Seven trainees’ hours of training and number of errors per task:

x (hours): 2, 4, 6, 8, 10, 12, 14  |  y (errors): 25, 22, 20, 14, 16, 8, 5

Find rs and interpret.

rank x: in order so ranks 1 to 7 x ranks: 1, 2, 3, 4, 5, 6, 7 rank y: sort 5,8,14,16,20,22,25 ⇒ ranks 1 to 7 y values: 25, 22, 20, 14, 16, 8, 5 y ranks: 7, 6, 5, 3, 4, 2, 1 GDC: PMCC of x-ranks and y-ranks r_s = −0.964 (3 sf) interpret close to −1 ⇒ strong disagreement in rankings more training ⇒ fewer errors r_s = −0.964 · strong negative monotonic again only ONE pair is out of order: y dropped from 14 to 16 (when it “should” have continued falling). That single inversion took r_s from −1 to −0.964.
WE 3

Ranking with ties — the average-rank rule

Six students’ weekly homework hours and grade points:

x (hours): 1, 2, 3, 4, 5, 6  |  y (grade pts): 10, 12, 12, 15, 18, 20

(a) Rank both variables, handling ties correctly. (b) Find rs.

(a) rank x — no ties x ranks: 1, 2, 3, 4, 5, 6 rank y — 12 appears twice sorted y: 10, 12, 12, 15, 18, 20 the two 12’s would take ranks 2 and 3 avg rank = (2+3)/2 = 2.5 — both get 2.5 y ranks in original order: 1, 2.5, 2.5, 4, 5, 6 (b) GDC: PMCC of x-ranks and y-ranks r_s = 0.986 (3 sf) (a) y ranks: 1, 2.5, 2.5, 4, 5, 6 · (b) r_s = 0.986 the tied values don’t cost you much — r_s stayed very close to 1 because the overall trend is unbroken. The averaging keeps the analysis fair.
WE 4

Interpret rs values without calculation

For each value of rs, describe the monotonic relationship.

(a) rs = 1   (b) rs = −0.85   (c) rs = 0.30   (d) rs = 0

use the same zones as PMCC, plus the word “monotonic” (a) r_s = 1 perfect monotonic INCREASING (rankings agree exactly) (b) r_s = −0.85 strong monotonic DECREASING (c) r_s = 0.30 weak monotonic increasing (d) r_s = 0 no monotonic relationship (a) perfect + · (b) strong − · (c) weak + · (d) none use the word “monotonic” instead of “linear” with Spearman — that’s the technical difference from PMCC. The strength bands (weak / moderate / strong) work the same way.
WE 5

Why rs can beat r — a curved relationship

A bacterial culture doubles roughly every hour. Measurements at x = 1, 2, 3, 4, 5, 6 hours gave colony counts y = 2, 4, 8, 16, 32, 64.

(a) Find r and rs. (b) Compare the two values and explain what they tell you.

(a) GDC computes both r = 0.906 (3 sf) y-ranks are 1,2,3,4,5,6 — same as x-ranks r_s = 1.000 (3 sf) (b) compare r < 1: the relationship is NOT a straight line r_s = 1: but it IS perfectly monotonic the data follows a CURVE (exponential growth) r = 0.906, r_s = 1.000 · relationship is monotonic but non-linear this is the signature case where Spearman beats Pearson. When r_s is noticeably higher than r, you’ve spotted a curve. The next step would be to try an exponential model instead of a straight line.
WE 6

Outlier robustness — rs shrugs it off

Six pairs were recorded; the last y-value is much larger than the others but the data is still strictly increasing:

x: 1, 2, 3, 4, 5, 6  |  y: 10, 20, 30, 40, 50, 1000

(a) Find r and rs. (b) Comment on which coefficient is more useful here.

(a) GDC r = 0.681 (3 sf) x ranks: 1,2,3,4,5,6   y ranks: 1,2,3,4,5,6 r_s = 1.000 (b) compare r is pulled DOWN by the huge value 1000 r_s ignores magnitude — only POSITION matters r_s correctly says the data is perfectly monotonic (a) r = 0.681, r_s = 1.000 · (b) r_s gives a fairer picture Spearman is the “outlier-proof” sibling of Pearson. As long as the outlier doesn’t break the order, r_s stays exactly the same as it would without it. PMCC, which uses the actual values, gets dragged.

💡 Top tips

  • Rank the original ORDER, not the sorted order — keep each rank glued to its data pair.
  • Tied values get averaged ranks: 2 values tied for 3rd and 4th ⇒ both get 3.5. 3 values tied for 5th, 6th, 7th ⇒ all get 6.
  • Use the GDC’s PMCC function on the ranks — don’t memorise a separate Spearman formula. Same engine, ranked input.
  • Describe relationships as “monotonic” not “linear” when using rs. Linear is a stronger claim.
  • If rs >> r, the data is curved — you’ve found a non-linear model candidate.

⚠ Common mistakes

  • Ranking x and y together in one list. They must be ranked SEPARATELY (x against its own values, y against its own values).
  • Forgetting to average tied ranks: giving two tied values ranks 3 and 4 (instead of 3.5 each) skews the result.
  • Mixing rank directions: ranking x smallest-to-largest but y largest-to-smallest will flip the sign of rs. Be consistent.
  • Calling the relationship “linear” from rs: rs = 1 only says monotonic, not linear. Use PMCC for linearity.
  • Treating rs as immune to ALL outliers: it’s robust to magnitude outliers, but if the outlier breaks the order, rs will drop too.
Next up: Comparison of Correlation Coefficients — a short note pulling Pearson and Spearman together. When to use each, how to read both side-by-side, and what their disagreement tells you about the underlying relationship. After that comes the chapter finale: Linear Regression, where we actually fit a line and use it for predictions.

Need help with AI SL Correlation & Regression?

Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.

Book Free Session →