Spearman’s beauty: it turns a curved monotonic relationship into a straight one by replacing values with their positions in order. Look:
Ranking is mechanical. For each variable independently:
Why averaging for ties? If you arbitrarily gave one value rank 4 and the other rank 5, you’d be inventing an order that isn’t really there. Averaging keeps the sum of ranks the same (1+2+…+
n) and is fair.
Spearman’s rank correlation coefficient
rs = PMCC of the ranks of x and the ranks of y
GDC workflow: enter rank(x) as L1 and rank(y) as L2, then run linear regression. The r value the GDC reports is your rs.
Pearson vs Spearman — choosing
Both give a number between −1 and +1; both have the same sign and strength interpretation. They differ on what they measure:
PMCC r: how close to a STRAIGHT LINE. Uses actual values. Sensitive to outliers and to curves.
Spearman rs: how MONOTONIC. Uses only ranks. Robust to outliers; doesn’t care if the curve is straight or smoothly bent — only if it always goes the same way.
Rule of thumb: if rs is much higher than r, your relationship is monotonic but not linear — consider a non-linear model.
🧭 Recipe — computing rs
- List the x-values in their original order. Rank them 1 to n (smallest = 1). Average ranks for ties.
- Do the same for y-values, ranking independently. Keep each rank aligned with its original data point.
- Enter rank(x) into L1, rank(y) into L2 on the GDC.
- Run Linear Regression (ax + b). The r value reported = your rs. Quote to 3 sf.
- Interpret: strength (close to 1 / moderate / weak), direction (positive = agreement; negative = disagreement), in context of the original variables.
Worked examples
WE 1Compute rs — basic positive case, no ties
Six employees’ years of experience and annual salaries (in $1000s):
x (years): 1, 3, 5, 7, 9, 11 | y ($k): 30, 28, 42, 55, 60, 85
(a) Rank both variables. (b) Find rs. (c) Comment.
(a) x already in order ⇒ ranks 1 to 6
x ranks: 1, 2, 3, 4, 5, 6
rank y: sort 28, 30, 42, 55, 60, 85 ⇒ ranks 1 to 6
y values: 30, 28, 42, 55, 60, 85
y ranks: 2, 1, 3, 4, 5, 6
(b) GDC: PMCC of (1,2,3,4,5,6) and (2,1,3,4,5,6)
r_s = 0.943 (3 sf)
(c) interpret
close to +1 ⇒ strong agreement in rankings
more experience ⇒ higher salary, almost always
(a) y ranks: 2,1,3,4,5,6 · (b) r_s = 0.943 · (c) strong positive monotonic
notice only ONE pair is out of order (30 came before 28, breaking the perfect rise). That single swap drops r_s from 1 to 0.943.
WE 2Compute rs — negative case
Seven trainees’ hours of training and number of errors per task:
x (hours): 2, 4, 6, 8, 10, 12, 14 | y (errors): 25, 22, 20, 14, 16, 8, 5
Find rs and interpret.
rank x: in order so ranks 1 to 7
x ranks: 1, 2, 3, 4, 5, 6, 7
rank y: sort 5,8,14,16,20,22,25 ⇒ ranks 1 to 7
y values: 25, 22, 20, 14, 16, 8, 5
y ranks: 7, 6, 5, 3, 4, 2, 1
GDC: PMCC of x-ranks and y-ranks
r_s = −0.964 (3 sf)
interpret
close to −1 ⇒ strong disagreement in rankings
more training ⇒ fewer errors
r_s = −0.964 · strong negative monotonic
again only ONE pair is out of order: y dropped from 14 to 16 (when it “should” have continued falling). That single inversion took r_s from −1 to −0.964.
WE 3Ranking with ties — the average-rank rule
Six students’ weekly homework hours and grade points:
x (hours): 1, 2, 3, 4, 5, 6 | y (grade pts): 10, 12, 12, 15, 18, 20
(a) Rank both variables, handling ties correctly. (b) Find rs.
(a) rank x — no ties
x ranks: 1, 2, 3, 4, 5, 6
rank y — 12 appears twice
sorted y: 10, 12, 12, 15, 18, 20
the two 12’s would take ranks 2 and 3
avg rank = (2+3)/2 = 2.5 — both get 2.5
y ranks in original order: 1, 2.5, 2.5, 4, 5, 6
(b) GDC: PMCC of x-ranks and y-ranks
r_s = 0.986 (3 sf)
(a) y ranks: 1, 2.5, 2.5, 4, 5, 6 · (b) r_s = 0.986
the tied values don’t cost you much — r_s stayed very close to 1 because the overall trend is unbroken. The averaging keeps the analysis fair.
WE 4Interpret rs values without calculation
For each value of rs, describe the monotonic relationship.
(a) rs = 1 (b) rs = −0.85 (c) rs = 0.30 (d) rs = 0
use the same zones as PMCC, plus the word “monotonic”
(a) r_s = 1
perfect monotonic INCREASING (rankings agree exactly)
(b) r_s = −0.85
strong monotonic DECREASING
(c) r_s = 0.30
weak monotonic increasing
(d) r_s = 0
no monotonic relationship
(a) perfect + · (b) strong − · (c) weak + · (d) none
use the word “monotonic” instead of “linear” with Spearman — that’s the technical difference from PMCC. The strength bands (weak / moderate / strong) work the same way.
WE 5Why rs can beat r — a curved relationship
A bacterial culture doubles roughly every hour. Measurements at x = 1, 2, 3, 4, 5, 6 hours gave colony counts y = 2, 4, 8, 16, 32, 64.
(a) Find r and rs. (b) Compare the two values and explain what they tell you.
(a) GDC computes both
r = 0.906 (3 sf)
y-ranks are 1,2,3,4,5,6 — same as x-ranks
r_s = 1.000 (3 sf)
(b) compare
r < 1: the relationship is NOT a straight line
r_s = 1: but it IS perfectly monotonic
the data follows a CURVE (exponential growth)
r = 0.906, r_s = 1.000 · relationship is monotonic but non-linear
this is the signature case where Spearman beats Pearson. When r_s is noticeably higher than r, you’ve spotted a curve. The next step would be to try an exponential model instead of a straight line.
WE 6Outlier robustness — rs shrugs it off
Six pairs were recorded; the last y-value is much larger than the others but the data is still strictly increasing:
x: 1, 2, 3, 4, 5, 6 | y: 10, 20, 30, 40, 50, 1000
(a) Find r and rs. (b) Comment on which coefficient is more useful here.
(a) GDC
r = 0.681 (3 sf)
x ranks: 1,2,3,4,5,6 y ranks: 1,2,3,4,5,6
r_s = 1.000
(b) compare
r is pulled DOWN by the huge value 1000
r_s ignores magnitude — only POSITION matters
r_s correctly says the data is perfectly monotonic
(a) r = 0.681, r_s = 1.000 · (b) r_s gives a fairer picture
Spearman is the “outlier-proof” sibling of Pearson. As long as the outlier doesn’t break the order, r_s stays exactly the same as it would without it. PMCC, which uses the actual values, gets dragged.
💡 Top tips
- Rank the original ORDER, not the sorted order — keep each rank glued to its data pair.
- Tied values get averaged ranks: 2 values tied for 3rd and 4th ⇒ both get 3.5. 3 values tied for 5th, 6th, 7th ⇒ all get 6.
- Use the GDC’s PMCC function on the ranks — don’t memorise a separate Spearman formula. Same engine, ranked input.
- Describe relationships as “monotonic” not “linear” when using rs. Linear is a stronger claim.
- If rs >> r, the data is curved — you’ve found a non-linear model candidate.
⚠ Common mistakes
- Ranking x and y together in one list. They must be ranked SEPARATELY (x against its own values, y against its own values).
- Forgetting to average tied ranks: giving two tied values ranks 3 and 4 (instead of 3.5 each) skews the result.
- Mixing rank directions: ranking x smallest-to-largest but y largest-to-smallest will flip the sign of rs. Be consistent.
- Calling the relationship “linear” from rs: rs = 1 only says monotonic, not linear. Use PMCC for linearity.
- Treating rs as immune to ALL outliers: it’s robust to magnitude outliers, but if the outlier breaks the order, rs will drop too.
Next up: Comparison of Correlation Coefficients — a short note pulling Pearson and Spearman together. When to use each, how to read both side-by-side, and what their disagreement tells you about the underlying relationship. After that comes the chapter finale: Linear Regression, where we actually fit a line and use it for predictions.
Need help with AI SL Correlation & Regression?
Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.
Book Free Session →