plot(CO2$conc, CO2$uptake, pch = 16, las = 1,
xlab = 'CO2 concentration', ylab = 'CO2 uptake')
help("CO2")
plot(CO2$conc, CO2$uptake, pch = 16, las = 1,
xlab = expression('CO'[2] * ' concentration (mL/L)'),
ylab = expression('CO'[2] * ' uptake (' *mu * 'mol m'^-2 * 's'^-1 * ')'))
library(latex2exp)
plot(CO2$conc, CO2$uptake, pch = 16, las = 1,
xlab = TeX('CO$_2$ concentration (mL/L)'),
ylab = TeX('CO$_2$ uptake ($\\mu$mol m$^{-2}$ s$^{-1}$)'))
text(800, 30, expression(prod(plain(P)(X == x), x)))
Statistical graphs (advanced)
Dr. Peng Zhao (✉ peng.zhao@xjtlu.edu.cn)
Department of Health and Environmental Sciences
Xi’an Jiaotong-Liverpool University
1 Learning objectives
- Criticize statistical graphs in public media and academic publications.
- Display maths expressions in graphs with R functions.
- Use proper layouts and cross-references for graphs in academic writing.
2 Bad graphs
2.1 Simple examples
What redundant information can you remove?
2.2 Cases in academic journals
- Roeder (1994) DNA fingerprinting: A review of the controversy (with discussion). Statistical Science 9:222-278, Figure 4
- Wittke-Thompson et al. (2005) Rational inferences about departures from Hardy-Weinberg equilibrium. American Journal of Human Genetics 76:967-986, Figure 1
- Kim et al. (2012) Higher levels of serum triglyceride and dietary carbohydrate intake are associated with smaller LDL particle size in healthy Korean women. Nutrition Research and Practice 6:120-125, Figure 1
- Cotter et al. (2004) Hematocrit was not validated as a surrogate endpoint for survival amoung epoetin-treated hemodialysis patients. Journal of Clinical Epidemiology 57:1086-1095, Figure 2
3 Graphs in writing
3.1 Maths
3.2 Themes with ggplot2
ggplot(CO2) + geom_point(aes(conc, uptake)) + theme_bw()
ggplot(CO2) + geom_point(aes(conc, uptake)) + theme_test()
ggplot(CO2) + geom_point(aes(conc, uptake)) + theme_classic()
3.3 Size and layout
par()
par(mfrow = c(2, 3))
plot(airquality$Solar.R, airquality$Ozone)
hist(airquality$Solar.R)
barplot(airquality$Month)
plot(airquality$Solar.R, airquality$Ozone)
hist(airquality$Solar.R)
barplot(airquality$Month)
layout()
<- matrix(1:6, nrow = 2)
mymat layout(mymat)
plot(airquality$Solar.R, airquality$Ozone)
hist(airquality$Solar.R)
barplot(airquality$Month)
plot(airquality$Solar.R, airquality$Ozone)
hist(airquality$Solar.R)
barplot(airquality$Month)
<- matrix(c(1, 1:5), nrow = 2)
mymat layout(mymat, widths = c(1, 1, 2), heights = c(1, 2))
plot(airquality$Solar.R, airquality$Ozone)
hist(airquality$Solar.R)
barplot(airquality$Month)
plot(airquality$Solar.R, airquality$Ozone)
hist(airquality$Solar.R)
- The patchwork package
<- ggplot(airquality) + geom_boxplot(aes(as.factor(Month), Ozone))
p1 <- ggplot(airquality) + geom_point(aes(Solar.R, Ozone))
p2 <- ggplot(airquality) + geom_histogram(aes(Ozone))
p3 library(patchwork)
+ p2 + p3
p1 + p2 / p3
p1 + p2) / p3 (p1
4 Further readings
bookdown: Authoring Books and Technical Documents with R Markdown Chapter 2.4 and 2.6
Wainer H (1984) How to display data badly. The American Statistician 38:137-147
Carr DB, Nusser SM (1995) Converting tables to plots: A challenge from Iowa State. Statistical Computing & Statistical Graphics Newsletter 6:11-18
Gelman A, Pasarica C, Dodhia R (2002) Let’s practice what we preach: Turning tables into graphs. The American Statistician 56:121-130
Tufte E (2001) The visual display of quantitative information, 2nd edition. Graphics Press.
Robbins NB (2004) Creating more effective graphs. Wiley
5 Exercises
Criticize the following graphs:
Mykland et al. (1995) Regeneration in Markov chain samplers. Journal of the American Statistical Association 90:233-241, Figure 1
Hummer et al. (2001) Role for p53 in gene induction by double-stranded RNA. J Virol 75:7774-7777, Figure 4
Cawley et al. (2004) Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116:499-509, Figure 1
Jorgenson et al. (2005) Ethnicity and human genetic linkage maps. American Journal of Human Genetics 76:276-290, Figure 2
In an aerosol particle size distribution figure, the x-axis label is \(D_\mathrm p\ (\mu \mathrm m)\), and the y-axis label is \(\frac{\mathrm dN}{\mathrm {dlog}D_\mathrm p} (\mathrm {cm}^{-3})\):
Write the R code for inserting the axis labels into a graph.
How do you generate the following layout for
plot(co2)
?Insert the graph you create in Exercise 3 into a R Markdown document. Give proper numbering and a caption. Cross-refer it in the statement.