IB Maths AA HL Topic 4 — Statistics & Probability Paper 1 & 2 ~6 min read

Scatter Diagrams & Correlation

A scatter diagram plots paired (x, y) data — controlled variable on the x-axis, response on the y-axis. The direction (positive/negative), strength (strong/weak), and presence of outliers are read straight off the plot. The line of best fit by eye must pass through the mean point (⎯x, ⎯y).

📘 What you need to know

Types of correlation — visual

strong positive r ≈ 0.95 weak positive r ≈ 0.5 no correlation r ≈ 0 weak negative r ≈ −0.5 strong negative r ≈ −0.95
The closer the points cluster around a straight line, the stronger the correlation. Direction is set by the slope of the trend.

The mean point

Mean point — passes through any line of best fit (⎯x, ⎯y)  =  ( ÎŁ xin , ÎŁ yin )

Correlation ≠ causation

Likely causal
a real mechanism links the two
e.g., training hours → improved fitness → faster race times
Spurious / coincidence
a third “lurking” variable
e.g., ice cream sales and shark attacks both rise in summer

🧭 Recipe — analyse a scatter diagram

  1. Identify the variables: which is independent (x), which is dependent (y).
  2. Plot the points with appropriate scale and units on each axis.
  3. Look for direction: are points trending up, down, or no clear pattern?
  4. Judge strength: how tightly do points cluster around the trend?
  5. Spot outliers: any points that don’t follow the trend.
  6. Compute the mean point and draw the line of best fit through it (only if correlation is strong).
  7. Comment on causation — is there a plausible mechanism, or could it be coincidence?

Worked examples

WE 1

Identify variables and predict correlation type

A scientist measures the water temperature (in °C) of a lake at different depths and the dissolved oxygen content (in mg/L). (a) State which is the independent and which is the dependent variable. (b) Predict the type of correlation expected.

(a) Identify variables Temperature is the controlled / explanatory variable → independent → x-axis Oxygen is the measured response → dependent → y-axis (b) Predict Warmer water holds less dissolved oxygen (physical fact) → as temperature ↑, oxygen ↓ → expect NEGATIVE correlation x = temperature, y = oxygen; expect negative correlation independent variable goes on x-axis ALWAYS
WE 2

Find the mean point and describe the correlation

Eight students record the hours of practice per week (x) and their score on a music exam (y):

Practice (h)0235781012
Score5791417202530

(a) Find the mean point. (b) Describe the correlation.

(a) Mean point Σx = 0+2+3+5+7+8+10+12 = 47 x̄ = 47/8 = 5.875 Σy = 5+7+9+14+17+20+25+30 = 127 ȳ = 127/8 = 15.875 (b) Correlation As x increases (0 → 12), y consistently increases (5 → 30) Points lie close to a straight line Mean point: (5.875, 15.875); strong positive linear correlation a line of best fit drawn through (5.875, 15.875) would slope upwards
WE 3

Mean point and correlation — negative trend

The hours of TV watched per evening (x) and quiz score the next day (y) for 7 students:

TV hours1234567
Score90827568605245

(a) Find the mean point, giving Čł to 3 s.f. (b) Describe the correlation.

(a) Mean point Σx = 1+2+3+4+5+6+7 = 28; x̄ = 28/7 = 4 Σy = 90+82+75+68+60+52+45 = 472 ȳ = 472/7 ≈ 67.4 (b) Correlation As x ↑ by 1, y drops by about 7-8 each time (very consistent) Mean point: (4, 67.4); strong negative linear correlation very consistent step-down in y → high strength
WE 4

Correlation vs causation — three scenarios

For each pair of variables, state whether a strong positive correlation is likely to indicate causation or just spurious correlation, and justify briefly.
(a) Cigarettes smoked per day vs lung cancer rates.
(b) Sales of sunscreen vs number of shark attacks at beaches.
(c) Hours of sleep vs reaction time.

(a) Cigarettes vs lung cancer CAUSAL — direct biological mechanism (carcinogens damage lung tissue) (b) Sunscreen sales vs shark attacks SPURIOUS — both rise in summer when more people go to the beach → “lurking variable” is warm weather / beach-going (c) Hours of sleep vs reaction time CAUSAL — fatigue physically slows neural processing (a) causal; (b) spurious — third variable; (c) causal always ask “is there a plausible mechanism, or could a third variable explain both?”
WE 5

Identify a bivariate outlier

The following data points were collected:   (10, 50), (15, 60), (20, 70), (25, 80), (30, 90), (35, 30), (40, 110). Identify any outlier and explain why.

Step 1: Look at the trend Most points follow y ≈ 2x + 30: (10,50) → 2(10)+30 = 50 ✓ (15,60) → 60 ✓; (20,70) → 70 ✓; (25,80) → 80 ✓ (30,90) → 90 ✓; (40,110) → 110 ✓ Step 2: Check the suspect point (35, 30): expected ≈ 2(35) + 30 = 100; actual y = 30 → deviation of 70 from the trend (35, 30) is the outlier — does not follow the linear pattern of the other points x = 35 and y = 30 individually are inside the data range, but the PAIR doesn’t fit
WE 6

Sprint training — full analysis

A coach records the number of training sessions completed (x) and the 100m time in seconds (y) for 6 athletes:

Sessions5812152025
Time (s)13.212.812.512.111.811.4

(a) State which is the independent variable. (b) Find the mean point. (c) Describe the correlation. (d) Comment on causation.

(a) Independent variable Sessions are controlled by the coach → x = sessions (b) Mean point Σx = 5+8+12+15+20+25 = 85; x̄ = 85/6 ≈ 14.17 Σy = 13.2+12.8+12.5+12.1+11.8+11.4 = 73.8 ȳ = 73.8/6 = 12.30 (c) Correlation As sessions ↑, time ↓ consistently → strong negative linear correlation (d) Causation Plausible: more training → improved fitness → faster times (a) sessions; (b) (14.17, 12.30); (c) strong negative linear; (d) likely causal the line of best fit (drawn through the mean point) would slope downwards

💡 Top tips

⚠ Common mistakes

Next: Pearson’s Product-Moment Correlation Coefficient. The PMCC turns “describe the correlation” into a single number r between −1 and 1. Closer to Âą1 = stronger, sign tells direction. Computed using your GDC’s stats mode — and a critical-value check tells you whether a linear model is appropriate.

Need help with Statistics & Probability?

Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.

Book Free Session →