[This article was first published on Kyle Walker, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exploring flows between origins and destinations visually is a common task, but can be difficult to get right. In R, there are many tutorials on the web that show how to produce static flow maps (see here, here, here, and here, among others).
Over the past couple years, R developers have created an infrastructure to bridge R with JavaScript using the htmlwidgets package, allowing for the generation of interactive web visualizations straight from R. I’d like to demonstrate here a few examples for exploratory interactive flow graphics that use this infrastructure.
To start, let’s make a random dataset that links countries with US states.
library(dplyr)set.seed(1983)df <- data_frame(origins = sample(c('Portugal', 'Romania', 'Nigeria', 'Peru'), size = 100, replace = TRUE), destinations = sample(c('Texas', 'New Jersey', 'Colorado', 'Minnesota'), size = 100, replace = TRUE))head(df)## # A tibble: 6 × 2## origins destinations## <chr> <chr>## 1 Romania Minnesota## 2 Portugal Texas## 3 Portugal Minnesota## 4 Nigeria Minnesota## 5 Peru Colorado## 6 Portugal Colorado
We can use dplyr to get counts of the unique origin-destination pairs as follows:
df2 <- df %>% group_by(origins, destinations) %>% summarize(counts = n()) %>% ungroup() %>% arrange(desc(counts))df2## # A tibble: 16 × 3## origins destinations counts## <chr> <chr> <int>## 1 Portugal Colorado 9## 2 Romania New Jersey 9## 3 Romania Minnesota 8## 4 Nigeria Colorado 7## 5 Peru Colorado 7## 6 Peru Minnesota 7## 7 Portugal Minnesota 7## 8 Portugal Texas 7## 9 Peru New Jersey 6## 10 Romania Texas 6## 11 Nigeria Minnesota 5## 12 Nigeria New Jersey 5## 13 Peru Texas 5## 14 Romania Colorado 5## 15 Portugal New Jersey 4## 16 Nigeria Texas 3
Now, we’ll want to plot the connections. While maps are often a first choice for visualizing geographic flows, they are not the only option. For example, with a little data formatting, the networkD3 package allows for network visualizations like the following:
library(networkD3)name_vec <- c(unique(df2$origins), unique(df2$destinations))nodes <- data.frame(name = name_vec, id = 0:7)links <- df2 %>% left_join(nodes, by = c('origins' = 'name')) %>% rename(origin_id = id) %>% left_join(nodes, by = c('destinations' = 'name')) %>% rename(dest_id = id)forceNetwork(Links = links, Nodes = nodes, Source = 'origin_id', Target = 'dest_id', Value = 'counts', NodeID = 'name', Group = 'id', zoom = TRUE)
Use the scroll wheel on your mouse to zoom in; the width of the links are proportional to the size of the flow. A more appropriate visualization in this circ*mstance, however, might be a Sankey diagram, which is also available in the networkD3 package:
sankeyNetwork(Links = links, Nodes = nodes, Source = 'origin_id', Target = 'dest_id', Value = 'counts', NodeID = 'name', fontSize = 16)
A similar representation is available in the parsetR package by Kenton Russell, available on GitHub.
library(parsetR) # devtools::install_github("timelyportfolio/parsetR")parset(df2, dimensions = c('origins', 'destinations'), value = htmlwidgets::JS("function(d){return d.counts}"), tension = 0.5)
Now, let’s create a couple interactive flow maps. To do this, we need to have some sense of where the places are located in geographic space, requiring some spatial data; we’ll use the rnaturalearth package for this, available on GitHub.
library(rnaturalearth) # devtools::install_github('ropenscilabs/rnaturalearth')countries <- ne_countries()states <- ne_states(iso_a2 = 'US')
The states data have long/lat information already, but the countries data do not, so we’ll need to calculate it with the rgdal package.
library(rgdal)countries$longitude <- coordinates(countries)[,1]countries$latitude <- coordinates(countries)[,2]countries_xy <- countries@data %>% select(admin, longitude, latitude)states_xy <- states@data %>% select(name, longitude, latitude)
Now that we have the XY data, we can merge it to our pre-existing data frame.
df3 <- df2 %>% left_join(countries_xy, by = c('origins' = 'admin')) %>% left_join(states_xy, by = c('destinations' = 'name'))df3$longitude.y <- as.numeric(as.character(df3$longitude.y))df3$latitude.y <- as.numeric(as.character(df3$latitude.y))head(df3)## # A tibble: 6 × 7## origins destinations counts longitude.x latitude.x longitude.y## <chr> <chr> <int> <dbl> <dbl> <dbl>## 1 Portugal Colorado 9 -8.055766 39.634050 -105.5430## 2 Romania New Jersey 9 24.943252 45.857101 -74.4653## 3 Romania Minnesota 8 24.943252 45.857101 -93.3640## 4 Nigeria Colorado 7 7.995128 9.548318 -105.5430## 5 Peru Colorado 7 -74.391806 -9.191563 -105.5430## 6 Peru Minnesota 7 -74.391806 -9.191563 -93.3640## # ... with 1 more variables: latitude.y <dbl>
Looks good. Now, we can use the gcIntermediate
function in the geosphere package to calculate great circles.
library(geosphere)flows <- gcIntermediate(df3[,4:5], df3[,6:7], sp = TRUE, addStartEnd = TRUE)flows$counts <- df3$countsflows$origins <- df3$originsflows$destinations <- df3$destinations
For interactive web maps in R, the leaflet package is a great option. It’ll allow for some interactive exploration of the data, such as the ability to turn on and off layers to see specific flows more clearly.
library(leaflet)library(RColorBrewer)hover <- paste0(flows$origins, " to ", flows$destinations, ': ', as.character(flows$counts))pal <- colorFactor(brewer.pal(4, 'Set2'), flows$origins)leaflet() %>% addProviderTiles('CartoDB.Positron') %>% addPolylines(data = flows, weight = ~counts, label = hover, group = ~origins, color = ~pal(origins)) %>% addLayersControl(overlayGroups = unique(flows$origins), options = layersControlOptions(collapsed = FALSE))
The default Mercator projection of most web maps is not ideal for visualizing great circles, however, especially for longer distances. As such, you might want to try an alternative representation of the Earth, such as a three-dimensional globe. This can be accomplished withe the threejs package (available on GitHub), and doesn’t even require the great circle objects we created.
library(threejs) # devtools::install_github("bwlewis/rthreejs")df4 <- arrange(df3, origins)df4$colors <- rep(brewer.pal(4, 'Set2'), each = 4)weights <- 1.5 * df4$countsarcs <- data.frame(lat1 = df4$latitude.x, lon1 = df4$longitude.x, lat2 = df4$latitude.y, lon2 = df4$longitude.y)globejs(arcsLwd = weights, arcs = arcs, arcsColor = df4$colors)
To leave a comment for the author, please follow the link and comment on their blog: Kyle Walker.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.