Combining statistics

Dr. Peng Zhao (✉ peng.zhao@xjtlu.edu.cn)

Department of Health and Environmental Sciences
Xi’an Jiaotong-Liverpool University

1 Learning objectives

Understand why combine statistics.
Use appropriate equations for combining means and standard errors.

2 Combining means and standard errors

An experiment is repeated several times -> \(\bar x_1 \pm se_1\), \(\bar x_2 \pm se_2\), …, \(\bar x_i \pm se_i\) -> \(\bar x \pm se\)?

Example: Two separate but similar experiments measuring the rate of glucose production of liver cells.

Experiment 1: \(\bar x_1 = 4.177\), \(se_1 = 0.281\), \(n_1 = 5\) ,
Experiment 2: \(\bar x_2 = 5.023\), \(se_2 = 0.257\), \(n_2 = 6\) ,
Overall \(n = ?, \bar x =?, se = ?\)

If we know the raw data:

x1 <- c(4.802, 3.81, 4.004, 4.467, 3.8)
x2 <- c(5.404, 5.256, 4.145, 5.401, 5.622, 4.312)
n1 <- length(x1)
n2 <- length(x2)
dtf <- data.frame(n = c(n1, n2), mean = c(mean(x1), mean(x2)), sd = c(sd(x1), sd(x2)))
dtf$se <- dtf$sd/sqrt(dtf$n)

x <- c(x1, x2)
x_bar <- mean(x)
n <- length(x)
se <- sd(x)/sqrt(n)

What if we do not know the raw data?

\[n = \Sigma n_i\]

\[\bar x = \frac{\Sigma n_i \bar x_i}{\Sigma n_i}\]

\[ se = \sqrt {\frac{\Sigma(n_i((n_i - 1)se_i^2 + \bar x_i^2)) - \frac{(\Sigma n_i \bar x_i)^2}{n}}{n (n-1)}}\]

cmean <-  sum(dtf$mean * dtf$n) / sum(dtf$n)
cse <- sqrt((sum(dtf$n * ((dtf$n - 1)* dtf$se ^ 2 + dtf$mean ^ 2)) - sum(dtf$n * dtf$mean) ^ 2/ sum(dtf$n)) / (sum(dtf$n) * (sum(dtf$n) - 1)))

3 Mean and standard errors of the sum and difference

Suppose:

Two variables \(p\) (mean \(\bar p\), standard error \(se_p\), sample size \(n_p\)) and \(q\) (mean \(\bar q\), standard error \(se_q\), sample size \(n_q\)).
The third variable \(x = p + q\).
The fourth variable \(y = p - q\).

Then:

\[\bar x = \bar p + \bar q\] \[\bar y = \bar p - \bar q\]

\[se_x = se_y = \sqrt{\frac{(n_p - 1)se_p^2 + (n_q - 1)se_q^2}{n_p + n_q - 2} \cdot \frac{n_p + n_q}{n_p n_q}} \] When \(n_p = n_q = n\),

\[se_x = se_y = \sqrt{\frac{se_p^2 + se_q^2}{n}} \]

Example: A luciferase-based assay is being used to quantify the amount of ATP and ADP in small tissue samples. The amount of ATP (\(q\)) is measured directly in 8 samples as \(3.25 \pm 0.14\) \(\mu\)mol g\(^{-1}\). A further 10 samples are treated with pyruvate kinase plus phosphoenolpyruvate to convert ADP quantitatively to ATP. The total ATP (\(p\)) in these samples is determined to be \(4.56 \pm 0.29\) \(\mu\)mol g\(^{-1}\). The ADP content is \(p - q\).

What is the mean and standard error of ADP concentration?

x <- data.frame(mean = c(3.25, 4.56), n = c(8, 10), se = c(0.14, 0.29))
ADP <- diff(x$mean)
cse <- sqrt(sum(x$se ^ 2 * (x$n - 1)) / (sum(x$n) - 2) * sum(x$n) / prod(x$n))

4 Mean and standard error of ratios and products

Suppose:

Two variables \(p\) (mean \(\bar p\), standard error \(se_p\), sample size \(n_p\)) and \(q\) (mean \(\bar q\), standard error \(se_q\), sample size \(n_q\)).
The third variable \(x = p \cdot q\).
The fourth variable \(y = p/q\).

Then:

\[\bar x = \bar p \cdot \bar q\]

\[se_x = \sqrt{\frac{\bar p^2 n_q se_q^2 + \bar q^2 n_p se_p^2 + n_p se_p^2 n_q se_q^2}{n_p + n_q - 2}} \]

\[\bar y = \bar p / \bar q\]

\[se_y = \frac{1}{\bar q} \sqrt {\frac{n_p se_p^2 + n_qse_q^2(\frac{\bar p}{\bar q})^2}{n_p + n_q - 2}}\] Example: In the previous example, we got the concentrations of ATP and ADP in a tissue sample. what is the ratio of [ATP]/[ADP]?

x <- data.frame(mean = c(3.25, ADP), n = c(8, 10), se = c(0.14, cse))
ratiox <- x$mean[1] / x$mean[2]
cse <- sqrt((x$n[1] * x$se[1] ^ 2 + x$n[2] * x$se[2] ^ 2 * ((x$mean[1] / x$mean[2]) ^ 2)) / (sum(x$n) - 2)) / x$mean[2]