IB Maths AI HL Modelling with Functions Paper 2 Choosing & fitting models ~9 min read

Strategy for Modelling Functions

Every function family in the chapter has a fingerprint. Reading the wording, the scatter plot or the differences in a data table tells you which model to reach for — before you let the GDC do the regression and the prediction.

📘 What you need to know

From context to model type

The wording usually tells you everything. “Constant rate per unit”, “fixed cost plus per-unit charge” ⇒ linear. “Maximum height”, “profit peaks then drops” ⇒ quadratic. “S-shaped”, “volume of a box scooped out of card” ⇒ cubic. “Doubles every”, “depreciates by 8%”, “half-life”, “cools toward room temperature” ⇒ exponential (the last with a constant added). “Tide”, “Ferris wheel”, “daylight hours”, “any annual cycle” ⇒ sinusoidal. “Inversely proportional”, “force depends on the square of distance” ⇒ variation. Stop and name the family before writing equations.

From data to model type

When you’re handed a table, three quick tests cover the bulk of cases. Take first differences Δy: if they’re (nearly) constant, the model is linear. If not, take second differences Δ2y: constant ⇒ quadratic. If neither, take ratios yn+1/yn: constant ratio ⇒ exponential. A scatter plot that oscillates is sinusoidal; one that passes through the origin as a power curve is direct variation; one that hugs both axes is inverse variation. With real data the differences are rarely exactly constant — look for “approximately” constant, and let the GDC’s r2 value confirm the best fit.

The six fingerprints — visual cheat-sheet LINEAR y = mx + c QUADRATIC y = ax² + bx + c CUBIC y = ax³ + bx² + cx + d EXPONENTIAL y = k·aᵛ SINUSOIDAL y = a sin(b(x−c)) + d INVERSE y = k / xⁿ
The six shapes you’ll meet in AI HL. Identify the family first, then fit parameters from data or context.
Diagnostic at a glance Δy const ⇒ linear · Δ2y const ⇒ quadratic ratio const ⇒ exponential · periodic ⇒ sinusoidal · power-law shape ⇒ variation

Fitting, predicting, critiquing

Once the model type is set, the GDC handles the rest. Enter the data into two lists, run the matching regression (LinReg for linear, QuadReg for quadratic, ExpReg for exponential, SinReg for sinusoidal, PwrReg for power), and the coefficients drop out along with r2 — the closer to 1, the better the fit. Use the model only within the data range you fitted it on; extrapolating far beyond is risky. Real systems hit carrying capacities, regulations change, oscillations damp out — a single-function model is always a simplification. Be ready to write a one-line critique alongside a numerical prediction.

Sense-check every prediction. Does it have the right sign? Right order of magnitude? Within a physically possible range? If a population model says 4 million whales next year, something’s wrong.

🧭 Recipe — pick a model and use it

  1. Read the context: rate-per-unit, peak-and-fall, doubling-time, oscillation, asymptote — each cues a family.
  2. Diagnose the data: first differences, second differences, ratios, or scatter-plot shape.
  3. Fit: by hand for clean cases (linear from two points, quadratic from second differences), or by GDC regression for everything else.
  4. Apply: substitute to predict, or set equal to a target and solve.
  5. Critique: state the domain over which the model is valid; flag any extrapolation; note r2 if available.

Worked examples

WE 1

Match scenario to model type

Name the most appropriate model family for each:
(a) Daily ferry passenger numbers vary between 200 in winter and 1200 in summer.
(b) £400 is invested at 5% compound interest per year.
(c) The stretch of a spring is measured against the weight hanging from it.
(d) The time to complete a job depends on the number of workers assigned.
(e) A small shop’s profit rises with price up to a peak, then falls.

match the words (a) annual cycle, fixed max/min ⇒ sinusoidal (b) constant % per period ⇒ exponential growth (c) Hooke’s law: stretch ∝ weight ⇒ linear (direct) (d) more workers → less time ⇒ inverse variation (e) one peak then decline ⇒ quadratic sinusoidal · exponential · linear · inverse · quadratic
WE 2

Identify model by constant ratios

A company’s monthly sales (units) are: t = 0 → 80, t = 1 → 96, t = 2 → 115.2, t = 3 → 138.24, t = 4 → 165.888. (a) Identify the model. (b) Write the equation. (c) Predict S(10).

(a) test first differences 16, 19.2, 23.04, 27.65 — not constant test ratios S(t+1) / S(t) 96/80 = 1.2, 115.2/96 = 1.2, 138.24/115.2 = 1.2, 165.888/138.24 = 1.2 constant ratio 1.2 ⇒ exponential (b) initial value 80, base 1.2 S(t) = 80(1.2)ᵗ (c) S(10) = 80(1.2)¹⁰ = 80 × 6.1917 S(10) ≈ 495 units
WE 3

Identify model by second differences

Data: x = 0, 1, 2, 3, 4 with y = 5, 7, 11, 17, 25. (a) Determine the model type. (b) Find the equation. (c) Predict y(7).

(a) first differences 2, 4, 6, 8 — not constant second differences 2, 2, 2 — constant quadratic y = ax² + bx + c (b) y(0) = c = 5 y(1) = a + b + 5 = 7 ⇒ a + b = 2 y(2) = 4a + 2b + 5 = 11 ⇒ 2a + b = 3 subtract: a = 1, b = 1 y = x² + x + 5 (c) y(7) = 49 + 7 + 5 y(7) = 61
WE 4

GDC linear regression

The number of textbooks sold (y) at a school book fair was recorded against hours open (x):
(1, 15), (2, 21), (3, 24), (4, 29), (5, 36). (a) Run a linear regression to find y = mx + c. (b) Predict y after 8 hours.

(a) GDC: LinReg(ax + b) on the data Σx = 15, Σy = 125, Σxy = 425, Σx² = 55 m = (5·425 − 15·125) / (5·55 − 225) m = 250/50 = 5 c = (125 − 5·15) / 5 = 50/5 = 10 y = 5x + 10 (b) y(8) = 5(8) + 10 y(8) = 50 textbooks r² on the GDC will be close to (but not exactly) 1.
WE 5

State a sensible domain

The value of a piece of office equipment t years after purchase is modelled by V(t) = 28000 − 2400t (dollars). (a) Find V(0) and V(5). (b) For what values of t does the model give a sensible answer? (c) When does the model fail?

(a) substitute V(0) = 28000 ⇒ $28,000 (purchase price) V(5) = 28000 − 12000 = 16000 V(0) = $28,000 · V(5) = $16,000 (b) need V ≥ 0 and t ≥ 0 28000 − 2400t ≥ 0 t ≤ 28000/2400 = 11.67 domain 0 ≤ t ≤ 11.67 yr (c) beyond t = 11.67 model gives negative value — physically meaningless. A piece of equipment can’t have negative value.
WE 6

Critique an extrapolation

A biologist fits an exponential model P(t) = 200e0.08t (where t is years since 2010) to fish-population data from 2010 to 2020. She uses it to predict the population in 2060. (a) Compute P(50). (b) Give two reasons the prediction may be unreliable.

(a) P(50) = 200e⁰·⁰⁸·⁵⁰ = 200e⁴ ≈ 200 × 54.60 P(50) ≈ 10,920 fish (b) reasons 1. extrapolation 40 years beyond the fitted data; the model is based on only 10 years of growth and cannot be trusted that far out. 2. real populations face carrying capacity, food limits, predation and disease, so exponential growth can’t continue indefinitely — a logistic model would be more realistic.

💡 Top tips

⚠ Common mistakes

Chapter complete — you now have the full modelling toolkit (linear, quadratic, cubic, exponential, sinusoidal, variation) plus a strategy to choose between them. Next chapter: Geometry & Trigonometry.

Need help choosing the right model?

Get 1-on-1 help from an IB examiner who knows exactly what Paper 2 is looking for.

Book Free Session →