IB Maths AI SL Correlation & Regression Paper 1 & 2 Spearman’s rank ~7 min read

Spearman’s Rank (r_s)

PMCC measures linear correlation. Spearman’s rank r_s measures monotonic correlation — whether the data is always going up (or always going down), even if it’s a curve. The trick: rank the data first, then compute PMCC of the ranks. Same engine as Pearson’s, robust input. r_s ranges from −1 to +1 just like r.

📘 What you need to know

Monotonic = always increasing OR always decreasing (no peaks or valleys). Lines are monotonic, but so are exponential curves and many others.
r_s = PMCC of the rankings: rank each variable separately from 1 to n, then put the rank lists into the GDC’s linear regression.
Range: −1 ≤ r_s ≤ 1. Same scale, same sign convention, same strength bands as PMCC.
r_s = +1: rankings agree perfectly (every increase in x matches an increase in y). r_s = −1: rankings disagree perfectly.
Ties get average ranks: if two values are tied for 3rd and 4th, both get rank 3.5; if three values tie for 4th, 5th, 6th, all get rank 5.
Robust to outliers: a single freak value can wreck r, but as long as it doesn’t break monotonicity it leaves r_s alone.

The big idea — rank, then correlate

Spearman’s beauty: it turns a curved monotonic relationship into a straight one by replacing values with their positions in order. Look:

The same six data points before and after ranking. The original (left) follows a clear curve, so Pearson’s r ≈ 0.91 — strong, but not perfect because the relationship isn’t a line. After ranking (right), every rank pair lies on a perfect diagonal: r_s = 1, because the data is perfectly monotonic.

Ranking the data (and handling ties)

Ranking is mechanical. For each variable independently:

1. Order the values from smallest to largest (or largest to smallest — just be consistent). 2. Assign rank 1 to the first, rank 2 to the next, and so on, up to rank n. 3. For tied values, give each the average of the ranks they’d occupy. Two tied values stuck at positions 4 and 5 both get rank 4.5.

Why averaging for ties? If you arbitrarily gave one value rank 4 and the other rank 5, you’d be inventing an order that isn’t really there. Averaging keeps the sum of ranks the same (1+2+…+n) and is fair.

Spearman’s rank correlation coefficient r_s = PMCC of the ranks of x and the ranks of y

GDC workflow: enter rank(x) as L1 and rank(y) as L2, then run linear regression. The r value the GDC reports is your r_s.

Pearson vs Spearman — choosing

Both give a number between −1 and +1; both have the same sign and strength interpretation. They differ on what they measure:

PMCC r: how close to a STRAIGHT LINE. Uses actual values. Sensitive to outliers and to curves.

Spearman r_s: how MONOTONIC. Uses only ranks. Robust to outliers; doesn’t care if the curve is straight or smoothly bent — only if it always goes the same way.

Rule of thumb: if r_s is much higher than r, your relationship is monotonic but not linear — consider a non-linear model.

🧭 Recipe — computing r_s

List the x-values in their original order. Rank them 1 to n (smallest = 1). Average ranks for ties.
Do the same for y-values, ranking independently. Keep each rank aligned with its original data point.
Enter rank(x) into L1, rank(y) into L2 on the GDC.
Run Linear Regression (ax + b). The r value reported = your r_s. Quote to 3 sf.
Interpret: strength (close to 1 / moderate / weak), direction (positive = agreement; negative = disagreement), in context of the original variables.

Worked examples

WE 1

Compute r_s — basic positive case, no ties

Six employees’ years of experience and annual salaries (in $1000s):

x (years): 1, 3, 5, 7, 9, 11 | y ($k): 30, 28, 42, 55, 60, 85

(a) Rank both variables. (b) Find r_s. (c) Comment.

(a) x already in order ⇒ ranks 1 to 6 x ranks: 1, 2, 3, 4, 5, 6 rank y: sort 28, 30, 42, 55, 60, 85 ⇒ ranks 1 to 6 y values: 30, 28, 42, 55, 60, 85 y ranks: 2, 1, 3, 4, 5, 6 (b) GDC: PMCC of (1,2,3,4,5,6) and (2,1,3,4,5,6) r_s = 0.943 (3 sf) (c) interpret close to +1 ⇒ strong agreement in rankings more experience ⇒ higher salary, almost always (a) y ranks: 2,1,3,4,5,6 · (b) r_s = 0.943 · (c) strong positive monotonic notice only ONE pair is out of order (30 came before 28, breaking the perfect rise). That single swap drops r_s from 1 to 0.943.

WE 2

Compute r_s — negative case

Seven trainees’ hours of training and number of errors per task:

x (hours): 2, 4, 6, 8, 10, 12, 14 | y (errors): 25, 22, 20, 14, 16, 8, 5

Find r_s and interpret.

rank x: in order so ranks 1 to 7 x ranks: 1, 2, 3, 4, 5, 6, 7 rank y: sort 5,8,14,16,20,22,25 ⇒ ranks 1 to 7 y values: 25, 22, 20, 14, 16, 8, 5 y ranks: 7, 6, 5, 3, 4, 2, 1 GDC: PMCC of x-ranks and y-ranks r_s = −0.964 (3 sf) interpret close to −1 ⇒ strong disagreement in rankings more training ⇒ fewer errors r_s = −0.964 · strong negative monotonic again only ONE pair is out of order: y dropped from 14 to 16 (when it “should” have continued falling). That single inversion took r_s from −1 to −0.964.

WE 3

Ranking with ties — the average-rank rule

Six students’ weekly homework hours and grade points:

x (hours): 1, 2, 3, 4, 5, 6 | y (grade pts): 10, 12, 12, 15, 18, 20

(a) Rank both variables, handling ties correctly. (b) Find r_s.

(a) rank x — no ties x ranks: 1, 2, 3, 4, 5, 6 rank y — 12 appears twice sorted y: 10, 12, 12, 15, 18, 20 the two 12’s would take ranks 2 and 3 avg rank = (2+3)/2 = 2.5 — both get 2.5 y ranks in original order: 1, 2.5, 2.5, 4, 5, 6 (b) GDC: PMCC of x-ranks and y-ranks r_s = 0.986 (3 sf) (a) y ranks: 1, 2.5, 2.5, 4, 5, 6 · (b) r_s = 0.986 the tied values don’t cost you much — r_s stayed very close to 1 because the overall trend is unbroken. The averaging keeps the analysis fair.

WE 4

Interpret r_s values without calculation

For each value of r_s, describe the monotonic relationship.

(a) r_s = 1 (b) r_s = −0.85 (c) r_s = 0.30 (d) r_s = 0

use the same zones as PMCC, plus the word “monotonic” (a) r_s = 1 perfect monotonic INCREASING (rankings agree exactly) (b) r_s = −0.85 strong monotonic DECREASING (c) r_s = 0.30 weak monotonic increasing (d) r_s = 0 no monotonic relationship (a) perfect + · (b) strong − · (c) weak + · (d) none use the word “monotonic” instead of “linear” with Spearman — that’s the technical difference from PMCC. The strength bands (weak / moderate / strong) work the same way.

WE 5

Why r_s can beat r — a curved relationship

A bacterial culture doubles roughly every hour. Measurements at x = 1, 2, 3, 4, 5, 6 hours gave colony counts y = 2, 4, 8, 16, 32, 64.

(a) Find r and r_s. (b) Compare the two values and explain what they tell you.

(a) GDC computes both r = 0.906 (3 sf) y-ranks are 1,2,3,4,5,6 — same as x-ranks r_s = 1.000 (3 sf) (b) compare r < 1: the relationship is NOT a straight line r_s = 1: but it IS perfectly monotonic the data follows a CURVE (exponential growth) r = 0.906, r_s = 1.000 · relationship is monotonic but non-linear this is the signature case where Spearman beats Pearson. When r_s is noticeably higher than r, you’ve spotted a curve. The next step would be to try an exponential model instead of a straight line.

WE 6

Outlier robustness — r_s shrugs it off

Six pairs were recorded; the last y-value is much larger than the others but the data is still strictly increasing:

x: 1, 2, 3, 4, 5, 6 | y: 10, 20, 30, 40, 50, 1000

(a) Find r and r_s. (b) Comment on which coefficient is more useful here.

(a) GDC r = 0.681 (3 sf) x ranks: 1,2,3,4,5,6 y ranks: 1,2,3,4,5,6 r_s = 1.000 (b) compare r is pulled DOWN by the huge value 1000 r_s ignores magnitude — only POSITION matters r_s correctly says the data is perfectly monotonic (a) r = 0.681, r_s = 1.000 · (b) r_s gives a fairer picture Spearman is the “outlier-proof” sibling of Pearson. As long as the outlier doesn’t break the order, r_s stays exactly the same as it would without it. PMCC, which uses the actual values, gets dragged.

💡 Top tips

Rank the original ORDER, not the sorted order — keep each rank glued to its data pair.
Tied values get averaged ranks: 2 values tied for 3rd and 4th ⇒ both get 3.5. 3 values tied for 5th, 6th, 7th ⇒ all get 6.
Use the GDC’s PMCC function on the ranks — don’t memorise a separate Spearman formula. Same engine, ranked input.
Describe relationships as “monotonic” not “linear” when using r_s. Linear is a stronger claim.
If r_s >> r, the data is curved — you’ve found a non-linear model candidate.

⚠ Common mistakes

Ranking x and y together in one list. They must be ranked SEPARATELY (x against its own values, y against its own values).
Forgetting to average tied ranks: giving two tied values ranks 3 and 4 (instead of 3.5 each) skews the result.
Mixing rank directions: ranking x smallest-to-largest but y largest-to-smallest will flip the sign of r_s. Be consistent.
Calling the relationship “linear” from r_s: r_s = 1 only says monotonic, not linear. Use PMCC for linearity.
Treating r_s as immune to ALL outliers: it’s robust to magnitude outliers, but if the outlier breaks the order, r_s will drop too.

Next up: Comparison of Correlation Coefficients — a short note pulling Pearson and Spearman together. When to use each, how to read both side-by-side, and what their disagreement tells you about the underlying relationship. After that comes the chapter finale: Linear Regression, where we actually fit a line and use it for predictions.

Need help with AI SL Correlation & Regression?

Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.

Book Free Session →

IB Demystified is a trusted online learning platform led by certified IB examiners and educators.

Spearman’s Rank (r_s)

📘 What you need to know

The big idea — rank, then correlate

Ranking the data (and handling ties)

Pearson vs Spearman — choosing

🧭 Recipe — computing r_s

Worked examples

💡 Top tips

⚠ Common mistakes

Need help with AI SL Correlation & Regression?

Quick Links

Contact us

Follow us

Spearman’s Rank (rs)

📘 What you need to know

The big idea — rank, then correlate

Ranking the data (and handling ties)

Pearson vs Spearman — choosing

🧭 Recipe — computing rs

Worked examples

💡 Top tips

⚠ Common mistakes

Need help with AI SL Correlation & Regression?

Quick Links

Contact us

Follow us

Spearman’s Rank (r_s)

🧭 Recipe — computing r_s