Non-linear Regression

Dr. Peng Zhao (✉ peng.zhao@xjtlu.edu.cn)

Department of Health and Environmental Sciences
Xi’an Jiaotong-Liverpool University

1 Learning objectives

  1. Understand what is non-linear regression
  2. Understand how non-linear regression is performed
  3. Evaluate non-linear regression models
  4. Choose possible non-linear models

2 Principle

2.1 Definition

Non-linear regression:

A model for the relationship between the dependent variable(s) and the independent variable(s) is not linear but a curve

Two categories:

1. One that can be transformed into a linear model

2. One that cannot be transformed into a linear model

Examples:

  1. y=a+bex: Set x=ex, then y=a+bx.
  2. y=a+b1x+b2x2+...+bmxm: Set xi=xi,i=1,2,...,m, then y=a+b1x1+b2x2+...+bmxm
  3. y=aebx: Set log(y)=log(a)+bx and y=log(y), then y=log(a)+bx.
  4. z=a+bexy: cannot be transformed into linear.
Name Equation
Asymptotic functions
Michaelis-Menten y=axb+cx or y=axb+x or…
2-parameter asymptotic exponential y=a(1ebx)
3-parameter asymptotic exponential y=abecx
S-shaped functions
2-parameter logistic y=ea+bx1+ea+bx
3-parameter logistic y=a1+becx
4-parameter logistic y=a+ba1+e(cx)/d
Weibull y=abe(cxd)
Gompertz y=aebecx
Humped curves
Ricker curve y=axebx
First-order compartment y=kexp(exp(a)x)exp(exp(b)x)
Bell-shaped y=aexp(|bx|2)
Biexponential y=aebxcedx
par(mfrow = c(2,2))
curve(x/(1+x), 0, 10)
curve(1 - exp(-x), -100, -90)
curve(exp(x)/(1+exp(x)), -5, 5)
curve(x * exp(x), 95, 100)

2.2 Model

yi=f(xi,θ)+εi,i=1,2,...,n

  • yi: The response variable.
  • xi=(xi1,xi2,...,xik): The explanatory variable.
  • θ=(θ0,θ1,...θp): Unknown parameter vector.
  • εi: Random error term.

Assumptions:

  • E(εi)=0,i=1,2,...,n: The mean value of random error term is 0
  • cor(εi,εj)=0,i,j=1,2,...,n, where ij: No correlation.
  • cov(εi,εj)=σ2,i,j=1,2,...,n, where i=j: Equal variance
  • The explanatory variable is non-random variable
  • f(xi,θ) is assumed to be a continuous differentiable function

2.3 Methods

Non-linear least squares (NLS):

Q(θ)=i=1n(yif(xi,θ))2 For the minimal Q(θ):

Qθj=2i=1n(yif(xi,θ))fθj=0

Partial coefficient of determination: Measure the fitting of non-linear regression model.

R2=1SSESST

3 Workflow

3.1 Data

Reaction rate ~ concentration

dtf <- read.table("data/mm.txt", header = TRUE)
library(ggplot2)
ggplot(dtf) + geom_point(aes(conc, rate))

3.2 Fit the Model

Michaelis-Menten model y=axb+x:

  • Biochemistry: the relationship between the reaction rate and the substrate concentration
  • Ecology: (Holling’s disc equation) the relationship between the feeding rate of predator and the prey density.

Features:

  1. The curve passes through the origin.
  2. Rising rate of the curve diminishes with the increasing of x.
  3. Function has an asymptote y=a.
  4. y=a/2 when x=b.
m1 <- nls(rate ~ a * conc / (b + conc), data = dtf, start = list(a = 200, b = 0.03))
m1
summary(m1)

r=212.7c0.06412+c

3-parameter asymptotic exponential model y=abecx:

m2 <- nls(rate ~ a - b * exp(-c * conc), data = dtf, start = list(a = 200, b = 150, c = 5))
m2
summary(m2)

r=201153e6.38c

xv <- seq(0, 1.2, .01)
y1 <- predict(m1, list(conc = xv))
y2 <- predict(m2, list(conc = xv))
dtf_predicted <- data.frame(xv, y1, y2)
ggplot(dtf) + 
  geom_point(aes(conc, rate)) +
  geom_line(aes(xv, y1), data = dtf_predicted, color = 'blue') + 
  geom_line(aes(xv, y2), data = dtf_predicted, color = 'red')

Partial coefficient of determination:

R2=1SSESST

calc_R2 <- function(model) {
  ms <- summary(model)
  sse <- as.vector((ms[[3]])^2 * ms$df[2])
  null <- lm(dtf$rate ~ 1)
  sst <- as.vector(unlist(summary.aov(null)[[1]][2]))
  1 - sse/sst
}

calc_R2(m1)
[1] 0.9612608
calc_R2(m2)
[1] 0.9672987

3.3 Results

The Michaelis-Menten model can explain 96.1% of the total variation in the reaction rate, while the 3-parameter asymptotic exponential model can explain 96.4%.

3.4 Automatic starting values

m3 <- nls(rate ~ SSmicmen(conc, a, b), data = dtf)
summary(m3)

m4 <- nls(rate ~ SSasympOff(conc, a, b, c), data = dtf)
summary(m4)
Function Model
SSasymp() asymptotic regression model
SSasympOff() asymptotic regression model with an offset
SSasympOrig() asymptotic regression model through the origin
SSbiexp() biexponential model
SSfol() first-order compartment model
SSfpl() four-parameter logistic model
SSgompertz() Gompertz growth model
SSlogis() logistic model
SSmicmen() Michaelis–Menten model
SSweibull() Weibull growth curve model

4 Further readings