par(mfrow = c(2,2))
curve(x/(1+x), 0, 10)
curve(1 - exp(-x), -100, -90)
curve(exp(x)/(1+exp(x)), -5, 5)
curve(x * exp(x), 95, 100)
Non-linear Regression
Dr. Peng Zhao (✉ peng.zhao@xjtlu.edu.cn)
Department of Health and Environmental Sciences
Xi’an Jiaotong-Liverpool University
1 Learning objectives
- Understand what is non-linear regression
- Understand how non-linear regression is performed
- Evaluate non-linear regression models
- Choose possible non-linear models
2 Principle
2.1 Definition
- Non-linear regression:
-
A model for the relationship between the dependent variable(s) and the independent variable(s) is not linear but a curve
Two categories:
1. One that can be transformed into a linear model
2. One that cannot be transformed into a linear model
Examples:
- \(y = a + be^x\): Set \(x' = e^x\), then \(y = a + bx'\).
- \(y = a + b_1 x + b_2 x^2 +...+ b_m x^m\): Set \(x_{i}=x^i,i=1,2,...,m\), then \(y=a+b_{1}x_{1}+b_{2}x_{2}+...+b_{m}x_{m}\)
- \(y=ae^{bx}\): Set \(log(y)=log(a)+bx\) and \(y'=log(y)\), then \(y'=log(a)+bx\).
- \(z=a+be^{xy}\): cannot be transformed into linear.
Name | Equation |
---|---|
Asymptotic functions | |
Michaelis-Menten | \(y=\frac{a x}{b+cx}\) or \(y=\frac{a x}{b+x}\) or… |
2-parameter asymptotic exponential | \(y=a\left(1-\mathrm{e}^{-b x}\right)\) |
3-parameter asymptotic exponential | \(y=a-b \mathrm{e}^{-c x}\) |
S-shaped functions | |
2-parameter logistic | \(y=\frac{\mathrm{e}^{a+b x}}{1+\mathrm{e}^{a+b x}}\) |
3-parameter logistic | \(y=\frac{a}{1+b \mathrm{e}^{-c x}}\) |
4-parameter logistic | \(y=a+\frac{b-a}{1+\mathrm{e}^{(c-x) / d}}\) |
Weibull | \(y=a-b \mathrm{e}^{-\left(c x^d\right)}\) |
Gompertz | \(y=a \mathrm{e}^{-b \mathrm{e}^{-c x}}\) |
Humped curves | |
Ricker curve | \(y=a x \mathrm{e}^{-b x}\) |
First-order compartment | \(y=k \exp (-\exp (a) x)-\exp (-\exp (b) x)\) |
Bell-shaped | \(y=a \exp \left(-|b x|^2\right)\) |
Biexponential | \(y=a \mathrm{e}^{b x}-c \mathrm{e}^{-d x}\) |
2.2 Model
\[y_{i}=f(x_{i}, \theta)+\varepsilon_{i}, i=1,2,...,n\]
- \(y_{i}\): The response variable.
- \(x_{i}=(x_{i1},x_{i2},...,x_{ik})'\): The explanatory variable.
- \(\theta=(\theta_{0},\theta_{1},...\theta_{p})'\): Unknown parameter vector.
- \(\varepsilon_{i}\): Random error term.
Assumptions:
- \(E(\varepsilon_{i})=0,i=1,2,...,n\): The mean value of random error term is 0
- \(cor(\varepsilon_{i},\varepsilon_{j})=0,i,j=1,2,...,n\), where \(i\neq j\): No correlation.
- \(cov(\varepsilon_{i},\varepsilon_{j})=\sigma^2, i,j=1,2,...,n\), where \(i=j\): Equal variance
- The explanatory variable is non-random variable
- \(f(x_{i},\theta)\) is assumed to be a continuous differentiable function
2.3 Methods
Non-linear least squares (NLS):
\[Q(\theta)=\sum_{i=1}^{n}(y_{i}-f(x_{i},\theta))^2\] For the minimal \(Q(\theta)\):
\[\frac{\partial Q}{\partial \theta_{j}}=-2\sum_{i=1}^{n}(y_{i}-f(x_{i},\theta))\frac{\partial f}{\partial \theta_{j}}=0\]
Partial coefficient of determination: Measure the fitting of non-linear regression model.
\[R^2=1-\frac{SSE}{SST}\]
3 Workflow
3.1 Data
Reaction rate ~ concentration
<- read.table("data/mm.txt", header = TRUE)
dtf library(ggplot2)
ggplot(dtf) + geom_point(aes(conc, rate))
3.2 Fit the Model
Michaelis-Menten model \(y=\frac{a x}{b+x}\):
- Biochemistry: the relationship between the reaction rate and the substrate concentration
- Ecology: (Holling’s disc equation) the relationship between the feeding rate of predator and the prey density.
Features:
- The curve passes through the origin.
- Rising rate of the curve diminishes with the increasing of x.
- Function has an asymptote \(y = a\).
- \(y = a/2\) when \(x = b\).
<- nls(rate ~ a * conc / (b + conc), data = dtf, start = list(a = 200, b = 0.03))
m1
m1summary(m1)
\[r = \frac{212.7c}{0.06412+c}\]
3-parameter asymptotic exponential model \(y=a-b \mathrm{e}^{-c x}\):
<- nls(rate ~ a - b * exp(-c * conc), data = dtf, start = list(a = 200, b = 150, c = 5))
m2
m2summary(m2)
\[r = 201 - 153 e^{-6.38c}\]
<- seq(0, 1.2, .01)
xv <- predict(m1, list(conc = xv))
y1 <- predict(m2, list(conc = xv))
y2 <- data.frame(xv, y1, y2)
dtf_predicted ggplot(dtf) +
geom_point(aes(conc, rate)) +
geom_line(aes(xv, y1), data = dtf_predicted, color = 'blue') +
geom_line(aes(xv, y2), data = dtf_predicted, color = 'red')
Partial coefficient of determination:
\[ R^{2}=1-\frac{SSE}{SST} \]
<- function(model) {
calc_R2 <- summary(model)
ms <- as.vector((ms[[3]])^2 * ms$df[2])
sse <- lm(dtf$rate ~ 1)
null <- as.vector(unlist(summary.aov(null)[[1]][2]))
sst 1 - sse/sst
}
calc_R2(m1)
[1] 0.9612608
calc_R2(m2)
[1] 0.9672987
3.3 Results
The Michaelis-Menten model can explain 96.1% of the total variation in the reaction rate, while the 3-parameter asymptotic exponential model can explain 96.4%.
3.4 Automatic starting values
<- nls(rate ~ SSmicmen(conc, a, b), data = dtf)
m3 summary(m3)
<- nls(rate ~ SSasympOff(conc, a, b, c), data = dtf)
m4 summary(m4)
Function | Model |
---|---|
SSasymp() |
asymptotic regression model |
SSasympOff() |
asymptotic regression model with an offset |
SSasympOrig() |
asymptotic regression model through the origin |
SSbiexp() |
biexponential model |
SSfol() |
first-order compartment model |
SSfpl() |
four-parameter logistic model |
SSgompertz() |
Gompertz growth model |
SSlogis() |
logistic model |
SSmicmen() |
Michaelis–Menten model |
SSweibull() |
Weibull growth curve model |