Plotly for R - Multi-Layer Plots

Michael Battaglia

Posted on
rstats plotly data visualization

If you are new to plotly, consider first reading our introductory post:
Introduction to Interactive Graphics in R with plotly
 

Often when analyzing data, it is necessary to produce a complex plot that requires multiple graphical layers. In plotly, multi-layer plots can be specified as a pipeline of data manipulations (dplyr only) and visual mappings. This is possible because dplyr verbs can be used on a plotly object to modify the underlying data. In programming, mutability refers to the ability of an object to be modified after its creation. The mutability of plotly objects allows for a pipeline where you can add a graphical layer based on one version of the data, modify the data with dplyr, and then add a second layer based on the modified data. This design choice provides great flexibility in developing complex plots while still remaining intuitive. The resulting code is easy to read and understand, and it fits perfectly into a tidyverse workflow.

Mutability

To demonstrate the ability to manipulate the underlying data of a plotly object, we’ll use a simple example using the mpg dataset.

library(tidyverse)
library(plotly)

mpg_plotly <- mpg %>%
  plot_ly()

plot_ly() maps the R objects we pass into it into a JavaScript plotly object.

In a simple case we can then pass the plotly object into an add_*() function to specify how we’d like the data to be mapped to a graphical layer.

mpg_plotly %>%
  add_markers(x = ~cty, y = ~hwy)

As opposed to other plot objects (from base, ggplot2, etc), plotly objects are mutable. The data underlying the object can be manipulated using dplyr commands. A useful function to inspect the current data of the object is plotly_data().

mpg
## # A tibble: 234 x 11
##    manufacturer model displ  year   cyl trans drv     cty   hwy fl    class
##    <chr>        <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
##  1 audi         a4      1.8  1999     4 auto~ f        18    29 p     comp~
##  2 audi         a4      1.8  1999     4 manu~ f        21    29 p     comp~
##  3 audi         a4      2    2008     4 manu~ f        20    31 p     comp~
##  4 audi         a4      2    2008     4 auto~ f        21    30 p     comp~
##  5 audi         a4      2.8  1999     6 auto~ f        16    26 p     comp~
##  6 audi         a4      2.8  1999     6 manu~ f        18    26 p     comp~
##  7 audi         a4      3.1  2008     6 auto~ f        18    27 p     comp~
##  8 audi         a4 q~   1.8  1999     4 manu~ 4        18    26 p     comp~
##  9 audi         a4 q~   1.8  1999     4 auto~ 4        16    25 p     comp~
## 10 audi         a4 q~   2    2008     4 manu~ 4        20    28 p     comp~
## # ... with 224 more rows
mpg_plotly %>%
  plotly_data()
## # A tibble: 234 x 11
##    manufacturer model displ  year   cyl trans drv     cty   hwy fl    class
##    <chr>        <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
##  1 audi         a4      1.8  1999     4 auto~ f        18    29 p     comp~
##  2 audi         a4      1.8  1999     4 manu~ f        21    29 p     comp~
##  3 audi         a4      2    2008     4 manu~ f        20    31 p     comp~
##  4 audi         a4      2    2008     4 auto~ f        21    30 p     comp~
##  5 audi         a4      2.8  1999     6 auto~ f        16    26 p     comp~
##  6 audi         a4      2.8  1999     6 manu~ f        18    26 p     comp~
##  7 audi         a4      3.1  2008     6 auto~ f        18    27 p     comp~
##  8 audi         a4 q~   1.8  1999     4 manu~ 4        18    26 p     comp~
##  9 audi         a4 q~   1.8  1999     4 auto~ 4        16    25 p     comp~
## 10 audi         a4 q~   2    2008     4 manu~ 4        20    28 p     comp~
## # ... with 224 more rows

Since we haven’t manipulated the object in any way, plotly_data() returns the data that we passed in.

Let’s say that we only want to plot the miles-per-gallon data for pickup trucks.

pickup_plotly <- mpg_plotly %>%
  filter(class == "pickup") %>%
  add_markers(x = ~cty, y = ~hwy)

pickup_plotly
pickup_plotly %>%
  plotly_data()
## # A tibble: 33 x 11
##    manufacturer model displ  year   cyl trans drv     cty   hwy fl    class
##    <chr>        <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
##  1 dodge        dako~   3.7  2008     6 manu~ 4        15    19 r     pick~
##  2 dodge        dako~   3.7  2008     6 auto~ 4        14    18 r     pick~
##  3 dodge        dako~   3.9  1999     6 auto~ 4        13    17 r     pick~
##  4 dodge        dako~   3.9  1999     6 manu~ 4        14    17 r     pick~
##  5 dodge        dako~   4.7  2008     8 auto~ 4        14    19 r     pick~
##  6 dodge        dako~   4.7  2008     8 auto~ 4        14    19 r     pick~
##  7 dodge        dako~   4.7  2008     8 auto~ 4         9    12 e     pick~
##  8 dodge        dako~   5.2  1999     8 manu~ 4        11    17 r     pick~
##  9 dodge        dako~   5.2  1999     8 auto~ 4        11    15 r     pick~
## 10 dodge        ram ~   4.7  2008     8 manu~ 4        12    16 r     pick~
## # ... with 23 more rows

This equivalent plotly object can also be obtained by filtering the data prior to passing it into plot_ly(). However, the ability to modify the object will prove to be useful when creating more complex multi-layer plots.

plotly_pickup_1 <- mpg %>%
  filter(class == "pickup") %>%
  plot_ly()

plotly_pickup_2 <- mpg %>%
  plot_ly() %>%
  filter(class == "pickup")

plotly_pickup_1 %>%
  plotly_data()
## # A tibble: 33 x 11
##    manufacturer model displ  year   cyl trans drv     cty   hwy fl    class
##    <chr>        <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
##  1 dodge        dako~   3.7  2008     6 manu~ 4        15    19 r     pick~
##  2 dodge        dako~   3.7  2008     6 auto~ 4        14    18 r     pick~
##  3 dodge        dako~   3.9  1999     6 auto~ 4        13    17 r     pick~
##  4 dodge        dako~   3.9  1999     6 manu~ 4        14    17 r     pick~
##  5 dodge        dako~   4.7  2008     8 auto~ 4        14    19 r     pick~
##  6 dodge        dako~   4.7  2008     8 auto~ 4        14    19 r     pick~
##  7 dodge        dako~   4.7  2008     8 auto~ 4         9    12 e     pick~
##  8 dodge        dako~   5.2  1999     8 manu~ 4        11    17 r     pick~
##  9 dodge        dako~   5.2  1999     8 auto~ 4        11    15 r     pick~
## 10 dodge        ram ~   4.7  2008     8 manu~ 4        12    16 r     pick~
## # ... with 23 more rows
plotly_pickup_2 %>%
  plotly_data()
## # A tibble: 33 x 11
##    manufacturer model displ  year   cyl trans drv     cty   hwy fl    class
##    <chr>        <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
##  1 dodge        dako~   3.7  2008     6 manu~ 4        15    19 r     pick~
##  2 dodge        dako~   3.7  2008     6 auto~ 4        14    18 r     pick~
##  3 dodge        dako~   3.9  1999     6 auto~ 4        13    17 r     pick~
##  4 dodge        dako~   3.9  1999     6 manu~ 4        14    17 r     pick~
##  5 dodge        dako~   4.7  2008     8 auto~ 4        14    19 r     pick~
##  6 dodge        dako~   4.7  2008     8 auto~ 4        14    19 r     pick~
##  7 dodge        dako~   4.7  2008     8 auto~ 4         9    12 e     pick~
##  8 dodge        dako~   5.2  1999     8 manu~ 4        11    17 r     pick~
##  9 dodge        dako~   5.2  1999     8 auto~ 4        11    15 r     pick~
## 10 dodge        ram ~   4.7  2008     8 manu~ 4        12    16 r     pick~
## # ... with 23 more rows

Multi-layer Example

Now that we’ve set the foundation, we can look at a more complicated example.

We’ll be using the txhousing dataset from ggplot2, which tracks housing prices for cities in Texas over time. Let’s start by plotting the time trend for each city.

txhousing
## # A tibble: 8,602 x 9
##    city     year month sales   volume median listings inventory  date
##    <chr>   <int> <int> <dbl>    <dbl>  <dbl>    <dbl>     <dbl> <dbl>
##  1 Abilene  2000     1    72  5380000  71400      701       6.3 2000 
##  2 Abilene  2000     2    98  6505000  58700      746       6.6 2000.
##  3 Abilene  2000     3   130  9285000  58100      784       6.8 2000.
##  4 Abilene  2000     4    98  9730000  68600      785       6.9 2000.
##  5 Abilene  2000     5   141 10590000  67300      794       6.8 2000.
##  6 Abilene  2000     6   156 13910000  66900      780       6.6 2000.
##  7 Abilene  2000     7   152 12635000  73500      742       6.2 2000.
##  8 Abilene  2000     8   131 10710000  75000      765       6.4 2001.
##  9 Abilene  2000     9   104  7615000  64500      771       6.5 2001.
## 10 Abilene  2000    10   101  7040000  59300      764       6.6 2001.
## # ... with 8,592 more rows
all_cities <- txhousing %>%
  group_by(city) %>%
  plot_ly(x = ~date, y = ~median) %>%
  add_lines(
    name = "Texan Cities", 
    line = list(width = 1.33), 
    alpha = 0.2, 
    hoverinfo = "none"
  ) %>%
  ungroup()

all_cities

Now let us add a line with the average median price of all Texas cities. We’ll use dplyr::summarise() to average the housing values for each month. After the data is summarized, we’ll add a new layer based on the updated data.

summarized_data <- all_cities %>%
  group_by(date) %>%
  summarise(median = mean(median, na.rm = TRUE)) %>%
  ungroup()

summarized_data %>%
  plotly_data()
## # A tibble: 187 x 2
##     date  median
##    <dbl>   <dbl>
##  1 2000   91622.
##  2 2000.  91342.
##  3 2000.  92703.
##  4 2000.  93934.
##  5 2000.  95038.
##  6 2000. 101051.
##  7 2000.  99757.
##  8 2001.  97439.
##  9 2001.  98213.
## 10 2001.  97755.
## # ... with 177 more rows
summarized_data %>%
  add_lines(name = "Average Median Price")

The mutability of the plotly object allowed us to use all of the data in the first layer, and then add a second layer on a summarized version of the data.

Let’s say that we also wanted to add lines for the major Texan cities to see where their housing prices fell compared to all cities. Let’s look at San Antonio and Austin.

all_cities %>%
  filter(city == "San Antonio") %>%
  plotly_data()
## # A tibble: 187 x 9
##    city         year month sales    volume median listings inventory  date
##    <chr>       <int> <int> <dbl>     <dbl>  <dbl>    <dbl>     <dbl> <dbl>
##  1 San Antonio  2000     1   820  98974924  90900     5866       4.7 2000 
##  2 San Antonio  2000     2  1075 120851076  86000     5933       4.7 2000.
##  3 San Antonio  2000     3  1433 167748201  87000     6187       4.9 2000.
##  4 San Antonio  2000     4  1263 145280248  90200     6339       5   2000.
##  5 San Antonio  2000     5  1574 183281564  91200     6454       5   2000.
##  6 San Antonio  2000     6  1666 210779154 100100     6471       5   2000.
##  7 San Antonio  2000     7  1508 185816640 100500     6328       4.9 2000.
##  8 San Antonio  2000     8  1626 195515195  93400     6764       5.2 2001.
##  9 San Antonio  2000     9  1300 156643797  94800     6761       5.2 2001.
## 10 San Antonio  2000    10  1192 141630200  93500     6850       5.2 2001.
## # ... with 177 more rows
all_cities %>%
  filter(city == "San Antonio") %>%
  add_lines(name = "San Antonio")
all_cities %>%
  filter(city == "Austin") %>%
  add_lines(name = "Austin")

add_fun()

Now we’d like to combine all of these layers into a single plot.

san_antonio <- all_cities %>%
  filter(city == "San Antonio") %>%
  add_lines(name = "San Antonio")

san_antonio %>%
  filter(city == "Austin") %>%
  add_lines(name = "Austin")