Statistical graphs (advanced)

Dr. Peng Zhao (✉ peng.zhao@xjtlu.edu.cn)

Department of Health and Environmental Sciences
Xi’an Jiaotong-Liverpool University

1 Learning objectives

  • Criticize statistical graphs in public media and academic publications.
  • Display maths expressions in graphs with R functions.
  • Use proper layouts and cross-references for graphs in academic writing.

2 Bad graphs

2.1 Simple examples

What redundant information can you remove?

2.2 Cases in academic journals

  1. Roeder (1994) DNA fingerprinting: A review of the controversy (with discussion). Statistical Science 9:222-278, Figure 4
  1. Wittke-Thompson et al. (2005) Rational inferences about departures from Hardy-Weinberg equilibrium. American Journal of Human Genetics 76:967-986, Figure 1
  1. Kim et al. (2012) Higher levels of serum triglyceride and dietary carbohydrate intake are associated with smaller LDL particle size in healthy Korean women. Nutrition Research and Practice 6:120-125, Figure 1
  1. Cotter et al. (2004) Hematocrit was not validated as a surrogate endpoint for survival amoung epoetin-treated hemodialysis patients. Journal of Clinical Epidemiology 57:1086-1095, Figure 2

3 Graphs in writing

3.1 Maths

plot(CO2$conc, CO2$uptake, pch = 16, las = 1, 
     xlab = 'CO2 concentration', ylab = 'CO2 uptake')
help("CO2")
plot(CO2$conc, CO2$uptake, pch = 16, las = 1, 
     xlab = expression('CO'[2] * ' concentration (mL/L)'), 
     ylab = expression('CO'[2] * ' uptake (' *mu * 'mol m'^-2 * 's'^-1 * ')'))

library(latex2exp)
plot(CO2$conc, CO2$uptake, pch = 16, las = 1, 
     xlab = TeX('CO$_2$ concentration (mL/L)'), 
     ylab = TeX('CO$_2$ uptake ($\\mu$mol m$^{-2}$ s$^{-1}$)'))
text(800, 30, expression(prod(plain(P)(X == x), x)))

3.2 Themes with ggplot2

ggplot(CO2) + geom_point(aes(conc, uptake)) + theme_bw()
ggplot(CO2) + geom_point(aes(conc, uptake)) + theme_test()
ggplot(CO2) + geom_point(aes(conc, uptake)) + theme_classic()

3.3 Size and layout

  • par()
par(mfrow = c(2, 3))
plot(airquality$Solar.R, airquality$Ozone)
hist(airquality$Solar.R)
barplot(airquality$Month)
plot(airquality$Solar.R, airquality$Ozone)
hist(airquality$Solar.R)
barplot(airquality$Month)
  • layout()
mymat <- matrix(1:6, nrow = 2)
layout(mymat)
plot(airquality$Solar.R, airquality$Ozone)
hist(airquality$Solar.R)
barplot(airquality$Month)
plot(airquality$Solar.R, airquality$Ozone)
hist(airquality$Solar.R)
barplot(airquality$Month)

mymat <- matrix(c(1, 1:5), nrow = 2)
layout(mymat, widths = c(1, 1, 2), heights = c(1, 2))
plot(airquality$Solar.R, airquality$Ozone)
hist(airquality$Solar.R)
barplot(airquality$Month)
plot(airquality$Solar.R, airquality$Ozone)
hist(airquality$Solar.R)
  • The patchwork package
p1 <- ggplot(airquality) + geom_boxplot(aes(as.factor(Month), Ozone))
p2 <- ggplot(airquality) + geom_point(aes(Solar.R, Ozone))
p3 <- ggplot(airquality) + geom_histogram(aes(Ozone))
library(patchwork)
p1 + p2 + p3
p1 + p2 / p3
(p1 + p2) / p3

3.4 Numbering, caption, and cross-reference

  • Header:
output: 
  bookdown::pdf_book: default
  bookdown::html_document2: default
  bookdown::word_document2: default
  • Caption
```{r, fig.cap= 'A scatterplot.'}
plot(co2)
```
  • Numbering and cross-reference
```{r sctplt, fig.cap= 'A scatterplot.'}
plot(co2)
```

See Figure \@ref(fig:sctplt). 

4 Further readings

  • bookdown: Authoring Books and Technical Documents with R Markdown Chapter 2.4 and 2.6

  • 7 Tips to Combine Multiple ggplots Using Patchwork

  • Wainer H (1984) How to display data badly. The American Statistician 38:137-147

  • Carr DB, Nusser SM (1995) Converting tables to plots: A challenge from Iowa State. Statistical Computing & Statistical Graphics Newsletter 6:11-18

  • Gelman A, Pasarica C, Dodhia R (2002) Let’s practice what we preach: Turning tables into graphs. The American Statistician 56:121-130

  • Tufte E (2001) The visual display of quantitative information, 2nd edition. Graphics Press.

  • Robbins NB (2004) Creating more effective graphs. Wiley

5 Exercises

  1. Criticize the following graphs:

    1. Mykland et al. (1995) Regeneration in Markov chain samplers. Journal of the American Statistical Association 90:233-241, Figure 1

    2. Hummer et al. (2001) Role for p53 in gene induction by double-stranded RNA. J Virol 75:7774-7777, Figure 4

    3. Cawley et al. (2004) Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116:499-509, Figure 1

    4. Jorgenson et al. (2005) Ethnicity and human genetic linkage maps. American Journal of Human Genetics 76:276-290, Figure 2

  2. In an aerosol particle size distribution figure, the x-axis label is \(D_\mathrm p\ (\mu \mathrm m)\), and the y-axis label is \(\frac{\mathrm dN}{\mathrm {dlog}D_\mathrm p} (\mathrm {cm}^{-3})\):

    Write the R code for inserting the axis labels into a graph.

  3. How do you generate the following layout for plot(co2)?

  4. Insert the graph you create in Exercise 3 into a R Markdown document. Give proper numbering and a caption. Cross-refer it in the statement.