Highlight all connected paths from start to end in Sankey graph using R

The implementation for this question is in this shiny app.

https://setsna2.shinyapps.io/sankey-shinyforallcities/

I had to modify networkD3 from inside, i installed it normally and copied it inside the directory that contains the shiny app and put the package inside R-lib.

I made some modification to sankeyNetwork.js function that plot the sankey graph. Here's a picture for the directory, it shows the structure of the directory to reach the place that has sankeyNetwork.js to change it manually.

Please notice that the version of sankeyNetwork.js i used and uploaded in this question is old, it's from 2 years ago, so u can download the new version of networkD3 and just modify the part i'll mention next. enter image description here What i changed in sankeyNetwork.js is to add

    .on('mouseover', function(node) {
        Shiny.onInputChange("node_name", node.name);
    })

Which means if someone hover on a node, i'll transfer the nodename as "node_name" variable to my R session by using Shiny.onInputChange, u can read more about this shiny function online.

Here's the sankeyNetwork.js i used to know what i mean.

Now, if someone hover on a node, i can get the name of this node and send it to R, and if he moved away his cursor, i won't get any name, that's the core idea.

You can check the code of my shiny app by clicking here

You can see part of Data0 variable here also Goals variable from here.

In R code, you gonna find some comments say "for debug use this code" or comments within the code, so if u run these comments, u will understand how the data looks like before running the shiny app to fully understand how sankey graphs reads the data and how it should look like.

In R code, you gonna find this part which is reading the node_name from sankeyNetwork.js

        NodeName <- reactive({ 
                if(length(input$node_name)>0){return(as.character(input$node_name))}
                else{return(0)}
        })

Then the next part in the code is to check if the NodeName is in my Nodes dataframe, if it exists, then i'll get all the nodes that related to this node, then i'll get the links ids that connect these nodes with each other, and please notice that the links id start from 0 not from 1, because javascript starts from 0 and R starts from 1.

Now we have the NodeName that the user is hovering on, and the Links that related to this node, now we can make the sankey graph and save it in sn, then i remove the old tooltip and add a new one.

Using onRender to modify sankey graph while using shiny and i used it to make the Highlighting function to modify sankey graph while running shiny and when the user hover on a node, i'll get the name of the node then gets the links ids and search for the links ids in the existed sankey graph and increase it's opacity.

Please note that if u run the application, u gonna get errors, u have to upload it on shinyapps.io to debug it, that was the way i was checking if my application works correct or not, maybe u can find another way to debug.

Given the R code data structure you provided...

First, sankeyNetwork expects data that lists edges/links and the nodes that are connected by those links. Your data has a... let's call it a "traveler"-centric format, where each row of your data is related to a specific "path". So first you need to convert that data into the type of data that sankeyNetwork needs, while retaining the information needed to identify links to the path they came from. Additionally, your data only has one city in it, so it will be hard to see the result unless there's at least two different origins for the paths in your data, so I'll duplicate it and attribute the second set to a different city. Here's an example of that...

library(tidyverse)

# duplicate the data for another city so we have more than 1 origin
links <-
  df %>%
  full_join(mutate(df, City = "Hong Kong")) %>%
  mutate(row = row_number()) %>%
  mutate(origin = .[[1]]) %>%
  gather("column", "source", -row, -origin) %>%
  mutate(column = match(column, names(df))) %>%
  arrange(row, column) %>%
  group_by(row) %>%
  mutate(target = lead(source)) %>%
  ungroup() %>%
  filter(!is.na(target)) %>%
  select(source, target, origin) %>%
  group_by(source, target, origin) %>%
  summarise(count = n()) %>%
  ungroup()

nodes <- data.frame(name = unique(c(links$source, links$target)))
links$source <- match(links$source, nodes$name) - 1
links$target <- match(links$target, nodes$name) - 1

Now you have a links and nodes data frame in the form that sankeyNetwork expects, and the links data frame has an extra column origin that identifies which city each link is on the path from. You can now plot this with sankeyNetwork, add back in the origin data since it gets stripped out, and then use htmlwidgets::onRender to assign a click behavior that changes the opacity of any link whose origin is the city node that was clicked...

library(networkD3)
library(htmlwidgets)

sn <- sankeyNetwork(Links = links, Nodes = nodes, Source = 'source',
                    Target = 'target', Value = 'count', NodeID = 'name')

# add origin back into the links data because sankeyNetwork strips it out
sn$x$links$origin <- links$origin


# add onRender JavaScript to set the click behavior
htmlwidgets::onRender(
  sn,
  '
  function(el, x) {
    var nodes = d3.selectAll(".node");
    var links = d3.selectAll(".link");
    nodes.on("mousedown.drag", null); // remove the drag because it conflicts
    nodes.on("click", clicked);
    function clicked(d, i) {
      links
        .style("stroke-opacity", function(d1) {
            return d1.origin == d.name ? 0.5 : 0.2;
          });
    }
  }
  '
)

Here is a simplified version of the above answer (with a smaller example dataset) which keeps each "path" separate, rather than aggregating like paths and incrementing a count/Value variable.

library(dplyr)
library(tidyr)
library(networkD3)
library(htmlwidgets)

df <- read.csv(header = T, as.is = T, text = '
name,origin,layover,destination
Bob,Baltimore,Chicago,Los Angeles
Bob,Baltimore,Chicago,Seattle
Bob,New York,St Louis,Austin
Bob,New York,Chicago,Seattle
Tom,Baltimore,Chicago,Los Angeles
Tom,New York,St Louis,San Diego
Tom,New York,Chicago,Seattle
Tom,New York,New Orleans,Austin
')

links <-
  df %>%
  mutate(row = row_number()) %>%
  mutate(traveler = .[[1]]) %>%
  gather("column", "source", -row, -traveler) %>%
  mutate(column = match(column, names(df))) %>%
  arrange(row, column) %>%
  group_by(row) %>%
  mutate(target = lead(source)) %>%
  ungroup() %>%
  filter(!is.na(target)) %>%
  select(source, target, traveler) %>%
  group_by(source, target, traveler) %>%
  summarise(count = n()) %>%
  ungroup()

nodes <- data.frame(name = unique(c(links$source, links$target)))
links$source <- match(links$source, nodes$name) - 1
links$target <- match(links$target, nodes$name) - 1

sn <- sankeyNetwork(Links = links, Nodes = nodes, Source = 'source',
                    Target = 'target', Value = 'count', NodeID = 'name')

# add origin back into the links data because sankeyNetwork strips it out
sn$x$links$traveler <- links$traveler

# add onRender JavaScript to set the click behavior
htmlwidgets::onRender(
  sn,
  '
  function(el, x) {
    var nodes = d3.selectAll(".node");
    var links = d3.selectAll(".link");
    nodes.select("rect").style("cursor", "pointer");
    nodes.on("mousedown.drag", null); // remove the drag because it conflicts
    //nodes.on("mouseout", null);
    nodes.on("click", clicked);
    function clicked(d, i) {
      links
        .style("stroke-opacity", function(d1) {
            return d1.traveler == d.name ? 0.5 : 0.2;
          });
    }
  }
  '
)

enter image description here

Highlight all connected paths from start to end in Sankey graph using R

Tags:

R

Sankey Diagram

Htmlwidgets

Rcharts

Networkd3

Related

Recent Posts