Non-linear Regression

Dr. Peng Zhao (✉ peng.zhao@xjtlu.edu.cn)

Department of Health and Environmental Sciences
Xi’an Jiaotong-Liverpool University

1 Learning objectives

  1. Understand what is non-linear regression
  2. Understand how non-linear regression is performed
  3. Evaluate non-linear regression models
  4. Choose possible non-linear models

2 Principle

2.1 Definition

Non-linear regression:

A model for the relationship between the dependent variable(s) and the independent variable(s) is not linear but a curve

Two categories:

1. One that can be transformed into a linear model

2. One that cannot be transformed into a linear model

Examples:

  1. \(y = a + be^x\): Set \(x' = e^x\), then \(y = a + bx'\).
  2. \(y = a + b_1 x + b_2 x^2 +...+ b_m x^m\): Set \(x_{i}=x^i,i=1,2,...,m\), then \(y=a+b_{1}x_{1}+b_{2}x_{2}+...+b_{m}x_{m}\)
  3. \(y=ae^{bx}\): Set \(log(y)=log(a)+bx\) and \(y'=log(y)\), then \(y'=log(a)+bx\).
  4. \(z=a+be^{xy}\): cannot be transformed into linear.
Name Equation
Asymptotic functions
Michaelis-Menten \(y=\frac{a x}{b+cx}\) or \(y=\frac{a x}{b+x}\) or…
2-parameter asymptotic exponential \(y=a\left(1-\mathrm{e}^{-b x}\right)\)
3-parameter asymptotic exponential \(y=a-b \mathrm{e}^{-c x}\)
S-shaped functions
2-parameter logistic \(y=\frac{\mathrm{e}^{a+b x}}{1+\mathrm{e}^{a+b x}}\)
3-parameter logistic \(y=\frac{a}{1+b \mathrm{e}^{-c x}}\)
4-parameter logistic \(y=a+\frac{b-a}{1+\mathrm{e}^{(c-x) / d}}\)
Weibull \(y=a-b \mathrm{e}^{-\left(c x^d\right)}\)
Gompertz \(y=a \mathrm{e}^{-b \mathrm{e}^{-c x}}\)
Humped curves
Ricker curve \(y=a x \mathrm{e}^{-b x}\)
First-order compartment \(y=k \exp (-\exp (a) x)-\exp (-\exp (b) x)\)
Bell-shaped \(y=a \exp \left(-|b x|^2\right)\)
Biexponential \(y=a \mathrm{e}^{b x}-c \mathrm{e}^{-d x}\)
par(mfrow = c(2,2))
curve(x/(1+x), 0, 10)
curve(1 - exp(-x), -100, -90)
curve(exp(x)/(1+exp(x)), -5, 5)
curve(x * exp(x), 95, 100)

2.2 Model

\[y_{i}=f(x_{i}, \theta)+\varepsilon_{i}, i=1,2,...,n\]

  • \(y_{i}\): The response variable.
  • \(x_{i}=(x_{i1},x_{i2},...,x_{ik})'\): The explanatory variable.
  • \(\theta=(\theta_{0},\theta_{1},...\theta_{p})'\): Unknown parameter vector.
  • \(\varepsilon_{i}\): Random error term.

Assumptions:

  • \(E(\varepsilon_{i})=0,i=1,2,...,n\): The mean value of random error term is 0
  • \(cor(\varepsilon_{i},\varepsilon_{j})=0,i,j=1,2,...,n\), where \(i\neq j\): No correlation.
  • \(cov(\varepsilon_{i},\varepsilon_{j})=\sigma^2, i,j=1,2,...,n\), where \(i=j\): Equal variance
  • The explanatory variable is non-random variable
  • \(f(x_{i},\theta)\) is assumed to be a continuous differentiable function

2.3 Methods

Non-linear least squares (NLS):

\[Q(\theta)=\sum_{i=1}^{n}(y_{i}-f(x_{i},\theta))^2\] For the minimal \(Q(\theta)\):

\[\frac{\partial Q}{\partial \theta_{j}}=-2\sum_{i=1}^{n}(y_{i}-f(x_{i},\theta))\frac{\partial f}{\partial \theta_{j}}=0\]

Partial coefficient of determination: Measure the fitting of non-linear regression model.

\[R^2=1-\frac{SSE}{SST}\]

3 Workflow

3.1 Data

Reaction rate ~ concentration

dtf <- read.table("data/mm.txt", header = TRUE)
library(ggplot2)
ggplot(dtf) + geom_point(aes(conc, rate))

3.2 Fit the Model

Michaelis-Menten model \(y=\frac{a x}{b+x}\):

  • Biochemistry: the relationship between the reaction rate and the substrate concentration
  • Ecology: (Holling’s disc equation) the relationship between the feeding rate of predator and the prey density.

Features:

  1. The curve passes through the origin.
  2. Rising rate of the curve diminishes with the increasing of x.
  3. Function has an asymptote \(y = a\).
  4. \(y = a/2\) when \(x = b\).
m1 <- nls(rate ~ a * conc / (b + conc), data = dtf, start = list(a = 200, b = 0.03))
m1
summary(m1)

\[r = \frac{212.7c}{0.06412+c}\]

3-parameter asymptotic exponential model \(y=a-b \mathrm{e}^{-c x}\):

m2 <- nls(rate ~ a - b * exp(-c * conc), data = dtf, start = list(a = 200, b = 150, c = 5))
m2
summary(m2)

\[r = 201 - 153 e^{-6.38c}\]

xv <- seq(0, 1.2, .01)
y1 <- predict(m1, list(conc = xv))
y2 <- predict(m2, list(conc = xv))
dtf_predicted <- data.frame(xv, y1, y2)
ggplot(dtf) + 
  geom_point(aes(conc, rate)) +
  geom_line(aes(xv, y1), data = dtf_predicted, color = 'blue') + 
  geom_line(aes(xv, y2), data = dtf_predicted, color = 'red')

Partial coefficient of determination:

\[ R^{2}=1-\frac{SSE}{SST} \]

calc_R2 <- function(model) {
  ms <- summary(model)
  sse <- as.vector((ms[[3]])^2 * ms$df[2])
  null <- lm(dtf$rate ~ 1)
  sst <- as.vector(unlist(summary.aov(null)[[1]][2]))
  1 - sse/sst
}

calc_R2(m1)
[1] 0.9612608
calc_R2(m2)
[1] 0.9672987

3.3 Results

The Michaelis-Menten model can explain 96.1% of the total variation in the reaction rate, while the 3-parameter asymptotic exponential model can explain 96.4%.

3.4 Automatic starting values

m3 <- nls(rate ~ SSmicmen(conc, a, b), data = dtf)
summary(m3)

m4 <- nls(rate ~ SSasympOff(conc, a, b, c), data = dtf)
summary(m4)
Function Model
SSasymp() asymptotic regression model
SSasympOff() asymptotic regression model with an offset
SSasympOrig() asymptotic regression model through the origin
SSbiexp() biexponential model
SSfol() first-order compartment model
SSfpl() four-parameter logistic model
SSgompertz() Gompertz growth model
SSlogis() logistic model
SSmicmen() Michaelis–Menten model
SSweibull() Weibull growth curve model

4 Further readings