IB Maths AA SL Topic 4 — Correlation & Regression Paper 1 & 2 ~10 min read

Scatter Diagrams & Correlation

Sometimes data comes in pairs — like a student’s height and weight, or hours studied vs test score. A scatter diagram lets you spot a pattern between them at a glance, and “correlation” is the maths word for how strongly two variables move together.

📘 What you need to know

What is bivariate data?

Bivariate just means “two variables”. You collect two pieces of information from each subject — like a student’s height and their weight, or the temperature outside and the number of ice creams sold that day.

Each pair (x, y) becomes one point on a scatter diagram. Then you can see at a glance whether the two variables are related.

Bivariate data is everywhere — exam scores vs hours studied, age vs reaction time, advertising spend vs sales. Anytime you’re comparing how one thing relates to another, you’re looking at bivariate data.

What is a scatter diagram?

A scatter diagram is a graph of bivariate data. Each pair becomes a single dot — and the pattern of dots shows you the relationship.

Which variable goes on which axis?

Independent (explanatory)

Goes on the x-axis
The variable you control or set. Time, hours studied, temperature.
e.g. “hours of study” — you decide how many hours.

Dependent (response)

Goes on the y-axis
The variable you measure. The one that changes because of the other.
e.g. “exam score” — depends on how much you studied.
🧠

Memory trick: “x is the cause, y is the effect”

The thing you change is on the x-axis (cause). The thing that changes as a result is on the y-axis (effect). Hours studied → score. Temperature → ice creams sold.

Correlation — describing the pattern

Correlation is just a fancy word for “how do the two variables move together?” When you describe correlation, you mention two things: the type and the strength.

The three types of correlation

Positive

x goes up → y goes up. Slopes up from left to right.

Negative

x goes up → y goes down. Slopes down from left to right.

None

No clear pattern. Points scattered randomly.
Five common scatter patterns
Strong positive close to a line, sloping up Weak positive scattered, but slopes up No correlation no clear pattern Weak negative scattered, but slopes down Strong negative close to a line, sloping down

The two strengths

📍

Always describe BOTH the type AND the strength

Don’t just say “positive correlation” — say “strong positive linear correlation” or “weak negative linear correlation“. The full description gets you both marks.

Line of best fit

If the data shows strong linear correlation, you can draw a line of best fit by eye through the points. This line follows the general trend.

Key fact about the line of best fit
The line of best fit passes through the mean point (, ȳ)

How to draw a line of best fit

  1. Calculate the mean of the x values and the mean of the y values (use your GDC).
  2. Plot the mean point (, ȳ) on the diagram.
  3. Draw a straight line through the mean point that follows the trend of the data — try to balance the points above and below the line.
“By eye” doesn’t mean “guess wildly”. The line should pass through the mean point and have roughly the same number of points above as below it.

Correlation ≠ Causation

This is the most important warning in statistics: just because two variables are correlated doesn’t mean one is causing the other.

🤔 The classic example

Ice cream sales and shark attacks both go up together every summer. Strong positive correlation! But ice cream doesn’t cause shark attacks (and the reverse seems unlikely too). The real cause is that warmer weather brings more swimmers AND more ice cream buyers — a hidden third factor.

When does correlation suggest causation?

You can usually tell by thinking about the context:

📍

“Causal relationship” — what to say in exams

If asked whether two variables have a causal relationship, look at the real-world context. Is there a sensible mechanism by which one would cause the other? If yes, it’s likely causal. If they’re just two unrelated trends, it’s probably a coincidence — or some hidden third factor.

Worked examples

WE 1

Draw a scatter diagram and describe the correlation

A teacher records the hours her 9 students spent on a phone and on a computer per day:

Phone (hrs)7.67.08.93.03.07.52.11.35.8
Computer (hrs)1.71.10.75.85.21.76.97.13.3

(a) Draw a scatter diagram.   (b) Describe the correlation.   (c) Plot the mean point and draw a line of best fit.

Phone hours = independent (x-axis). Computer hours = dependent (y-axis).part (a) — scatter diagram Plot each (phone, computer) pair as a single point. 0 2 4 6 8 0 2 4 6 8 10 Phone (hours) Computer (hours) part (b) As phone hours increase, computer hours decrease. Points lie close to a downward straight line. Strong negative linear correlationpart (c) — line of best fit Find means using GDC: x̄ ≈ 5.13,   ȳ ≈ 3.72 Plot mean point (5.13, 3.72), draw a line through it sloping down. 0 2 4 6 8 0 2 4 6 8 10 Phone (hours) Computer (hours) (x̄, ȳ) Mean point ≈ (5.13, 3.72) — line drawn through it always plot the mean point first — it’s your anchor for the line
WE 2

Identify the variables and predict correlation

For each pair, identify (i) the independent variable, (ii) the dependent variable, and (iii) the type of correlation you’d expect.

(a) Hours of revision and final exam mark.

(b) Outdoor temperature and number of hot drinks sold at a café.

(c) Number of cats owned and student’s height.

Independent = the cause / what we control. Dependent = the effect.part (a) Independent: hours of revision (we control study time) Dependent: exam mark (depends on revision) More revision → better mark. Positive correlation expectedpart (b) Independent: temperature Dependent: hot drinks sold Hotter day → fewer hot drinks. Negative correlation expectedpart (c) No mechanism linking cats to height! No correlation expected always think about whether there’s a real-world reason for the link
WE 3

Describe correlation from a scatter diagram

The scatter diagram below shows the height (cm) and weight (kg) of 12 students.

40 50 60 70 80 150 160 170 180 190 Height (cm) Weight (kg)

Describe the correlation between height and weight.

Look at the type (slope) and strength (closeness to a line). As height increases, weight increases. Points lie very close to a straight line sloping up. Strong positive linear correlation use all three words: STRONG / POSITIVE / LINEAR
WE 4

Correlation vs causation — comment on a study

A researcher finds a strong positive correlation between the number of swimming pools in a town and the number of pizza shops in that town. Does this mean swimming pools cause pizza shops?

Strong correlation does NOT automatically mean causation. Look for a hidden third factor. Both numbers likely depend on the size of the town. Bigger towns → more pools AND more pizza shops. Town population is the hidden third factor. No — correlation does not imply causation always check for a hidden cause when two things move together
WE 5

Find the mean point for a line of best fit

The data shows hours of TV watched per week and exam scores for 5 students:

TV hrs (x)510152025
Score (y)9278706248

Find the mean point (, ȳ) and state the type of correlation.

n = 5 Mean of x’s, mean of y’s. Then look at the trend. Mean of x: 5+10+15+20+255 = 755 = 15 Mean of y: 92+78+70+62+485 = 3505 = 70 Type: as TV hrs ↑, score ↓ → negative Strength: drops by ~10–14 marks per 5 extra hrs — fairly consistent → strong Mean point = (15, 70)  |  Strong negative linear correlation the line of best fit must pass through (15, 70)

💡 Top tips

⚠ Common mistakes

Scatter diagrams give you a visual feel for correlation, but “strong” or “weak” is subjective. The next note covers Pearson’s PMCC — a single number (between −1 and 1) that measures correlation precisely. No more guessing!

Need help with Scatter Diagrams & Correlation?

Get 1-on-1 help from an IB examiner who knows exactly what Paper 1 & 2 are looking for.

Book Free Session →