17-3-7

ggplot2 part of tidyverse

library(ggplot2)

or

library(tidyverse)

Other alternatives

  • R base
  • grid
  • lattice.

Grammar of graphics

ggplot2 a grammar of graphics,

  • Construct hundreds of different sentences by learning just a handful of verbs, nouns and adjectives without having to memorize each specific sentence.

Strength and limitation

  • ggplot2 is easier for beginners is that its default behavior is carefully chosen to satisfy the great majority of cases and is visually pleasing.

  • One limitation is that ggplot2 is designed to work exclusively with data tables in which rows are observations and columns are variables.

Sheet cheats

  • You should have the ggplot2 sheet cheat handy.

  • To find perform an internet search for "ggplot2 cheat sheet".

The components of a graph

Components

  • Data: The US murders data table is being summarized. We refer to this as the data component.

  • Geometry: The plot above is a scatterplot. This is referred to as the geometry component.

  • Aesthetic mapping: The plot uses several visual cues to represent the information provided by the dataset. T

Other minor components

  • The points are labeled with the state abbreviations.
  • The range of the x-axis and y-axis appears to be defined by the range of the data. They are both on log-scales.
  • There are labels, a title, a legend, and we use the style of The Economist magazine.

Load data

library(dslabs)
data(murders)

ggplot objects

ggplot(data = murders)

ggplot objects

murders %>% ggplot()

ggplot objects

p <- ggplot(data = murders)
class(p)
## [1] "gg"     "ggplot"

Rendering a plot

print(p)
p

Geometries

DATA %>% ggplot() + LAYER 1 + LAYER 2 + … + LAYER N

Geometries

  • Geometry function names follow the pattern: geom_X
  • where X is the name of the geometry.
  • Examples include geom_point, geom_bar and geom_histogram.

Aesthetics for geometries

  • Look at help file. Here is geom_point
> Aesthetics
> 
> geom_point understands the following aesthetics (required aesthetics are in bold):
>
> x
>
> y
> 
> alpha
>
> colour

Aesthetic mappings

murders %>% ggplot() + 
  geom_point(aes(x = population/10^6, y = total))
  • We can drop the x = and y = if we wanted to since these are the first and second expected arguments, as seen in the help page.

We can add to an existing plot object

p + geom_point(aes(population/10^6, total))

Layers

p + geom_point(aes(population/10^6, total)) +
  geom_text(aes(population/10^6, total, label = abb))

Where are variabled defined?

This is fine:

p_test <- p + geom_text(aes(population/10^6, total, label = abb))

Where are variabled defined?

This is not:

p_test <- p + geom_text(aes(population/10^6, total), label = abb) 

Tinkering with arguments

p + geom_point(aes(population/10^6, total), size = 3) +
  geom_text(aes(population/10^6, total, label = abb))

Tinkering with arguments

p + geom_point(aes(population/10^6, total), size = 3) +
  geom_text(aes(population/10^6, total, label = abb), nudge_x = 1)

Global versus local aesthetic mappings

args(ggplot)
## function (data = NULL, mapping = aes(), ..., environment = parent.frame()) 
## NULL

Global versus local aesthetic mappings

p <- murders %>% ggplot(aes(population/10^6, total, label = abb))

Global versus local aesthetic mappings

p + geom_point(size = 3) + 
  geom_text(nudge_x = 1.5)

Global versus local aesthetic mappings

p + geom_point(size = 3) +  
  geom_text(aes(x = 10, y = 800, label = "Hello there!"))

Scales

p + geom_point(size = 3) + geom_text(nudge_x = 0.05) + 
  scale_x_continuous(trans = "log10") +
  scale_y_continuous(trans = "log10") 

Scales

p + geom_point(size = 3) +  
  geom_text(nudge_x = 0.05) + 
  scale_x_log10() +
  scale_y_log10() 

Labels and titles

p + geom_point(size = 3) +  
  geom_text(nudge_x = 0.05) + 
  scale_x_log10() +
  scale_y_log10() +
  xlab("Populations in millions (log scale)") + 
  ylab("Total number of murders (log scale)") +
  ggtitle("US Gun Murders in 2010")

Labels and titles

Categories as colors

p <-  murders %>% ggplot(aes(population/10^6, total, label = abb)) +   
  geom_text(nudge_x = 0.05) + 
  scale_x_log10() +
  scale_y_log10() +
  xlab("Populations in millions (log scale)") + 
  ylab("Total number of murders (log scale)") +
  ggtitle("US Gun Murders in 2010")

Categories as colors

This won't work

p + geom_point(size = 3, color ="blue")

Example: color as a mapping

p + geom_point(aes(col=region), size = 3)

Annotation, shapes, and adjustments

r <- murders %>% 
  summarize(rate = sum(total) /  sum(population) * 10^6) %>% 
  pull(rate)

Annotation, shapes, and adjustments

p + geom_point(aes(col=region), size = 3) + 
  geom_abline(intercept = log10(r))

Annotation, shapes, and adjustments

p <- p + geom_abline(intercept = log10(r), lty = 2, color = "darkgrey") +
  geom_point(aes(col=region), size = 3)  

Annotation, shapes, and adjustments

p <- p + scale_color_discrete(name = "Region") 

Add-on packages

library(ggthemes)
p + theme_economist()

Another example

library(ggthemes)
p + theme_fivethirtyeight()

Putting it all together

r <- murders %>% 
  summarize(rate = sum(total) /  sum(population) * 10^6) %>%
  pull(rate)

murders %>% ggplot(aes(population/10^6, total, label = abb)) +   
  geom_abline(intercept = log10(r), lty = 2, color = "darkgrey") +
  geom_point(aes(col=region), size = 3) +
  geom_text_repel() + 
  scale_x_log10() +
  scale_y_log10() +
  xlab("Populations in millions (log scale)") + 
  ylab("Total number of murders (log scale)") +
  ggtitle("US Gun Murders in 2010") + 
  scale_color_discrete(name = "Region") +
  theme_economist()

Putting it all together

Quick plots with qplot

Make a quick scatterplot:

data(murders)
x <- log10(murders$population)
y <- murders$total
qplot(x, y)

Grids of plots

There are often reasons to graph plots next to each other. The gridExtra package permits us to do that:

library(gridExtra)
p1 <- murders %>%
  mutate(rate = total/population*10^5) %>%
  filter(population < 2*10^6) %>%
  ggplot(aes(population/10^6, rate, label = abb)) +
  geom_text() +
  ggtitle("Small States")
p2 <- murders %>%
  mutate(rate = total/population*10^5) %>%
  filter(population > 10*10^6) %>%
  ggplot(aes(population/10^6, rate, label = abb)) +
  geom_text() +
  ggtitle("Large States")

Grids of plots

grid.arrange(p1, p2, ncol = 2)