Skip to content

Latest commit

 

History

History
executable file
·
188 lines (115 loc) · 6.44 KB

README.md

File metadata and controls

executable file
·
188 lines (115 loc) · 6.44 KB

🌟DyGETViz

Our framework is DyGETViz, which stands for Dynamic Graph Embedding Trajectories Visualization.

[Project Page] [Demo] [Data]

Contents

Installation

Automatic Installation

conda create -n dygetviz python=3.9 -y
conda activate dygetviz
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

Manual Installation

If you want to manually install the dependencies, run:

conda install scikit-learn pandas numpy matplotlib plotly
conda install -c conda-forge dash dash-daq dash-bootstrap-components biopython
pip install umap

Please refer to the homepage of PyTorch, PyTorch Geometric, and PyTorch Geometric Temporal to install these 3 packages, respectively.

Upgrade to latest code base

git pull
pip install -e .

Demo

Please check our demo at our website.

Download the data

  • Download all the data from Google Drive
  • Put both data/ and outputs/ under the root directory of this repo.

Getting Started

Procedures of Generating the Visualization

  • Step 1: Discrete-Time Dynamic Graph (DTDG) embedding training

    • We use the GConvGRU model from PyTorch Geometric Temporal to train embeddings of all datasets
    • We extended the dataloader so that we can use a wide variety of data input formats. The original dataloader only used static input at each snapshot.
    • Note: This part is not included in the code yet. For now, we directly provide the embeddings.
  • Output: DTDG embeddings of shape (T, N, D)

    • T: The number of timestamps / snapshots
    • N: The number of nodes
    • D: Embedding dimension

Step 2: Embedding Trajectories Generation

  • Input: DTDG embeddings of shape (T, N, D)

  • Output: JSON file that store the embedding trajectory for Dash

Step 3: Visualizing in a Dash app interactively using the JSON file

  • Users should be able to incrementally add node trajectories / all nodes under a certain category (e.g., normal users v.s. anomalous users) to the visualization

  • highlighted_nodes: List of nodes to be highlighted in the visualization. We need to specify these nodes because we only show the names of a small number of nodes in the plotly visualization. Otherwise, the generated plot will be too messy.

  • plot_dtdg.py: Script for generating the visualization

Generate the visualization using the command:

python dygetviz/plot_dtdg.py --dataset_name <DATASET_NAME> --model GConvGRU

Currently, DATASET_NAME can be selected from one of: Ant, Chickenpox, DGraphFin, Reddit

python dygetviz/plot_dtdg.py --dataset_name Chickenpox --model GConvGRU

python dygetviz/plot_dash.py --dataset_name Chickenpox --model GConvGRU

Data Format

dygetviz supports all temporal networks in [Stanford Large Network Dataset Collection] (https://snap.stanford.edu/data/index.html). Basically, each row is a tuple of (source, target, timestamp) representing an edge in the graph snapshot,

edges.tsv

SRC	DST	TIME
1	2	1082040961
3	4	1082155839
5	2	1082414391
6	7	1082439619
8	7	1082439756
9	10	1082440403
...

An optional nodes.tsv can be provided to indicate the node names. If not provided, the node names will be automatically generated as integers starting from 0.

ID  NAME
0   Anna
1   Bob
2   Charlie
3   David
4   Emma
...

You can also specify an additional column to indicate the node label, such as whether the user is a normal user or an anomalous user.

ID  NAME    LABEL
0   Anna    0
1   Bob     1
2   Charlie 0
3   David   0
4   Emma    1
...

Terminology

  • DG: Dynamic Graphs, which can be categorized into DTDG and CTDG
  • DTDG: Discrete-Time Dynamic Graphs (the type of graphs we are dealing with)
  • CTDG: Continuous-Time Dynamic Graphs
  • Embedding Trajectories: Please refer to the JODIE paper (KDD2019) for more details

Datasets

We provide the following dataset to be viewd in our visualization tool:

Explanation of Each Data File

  • node2idx: A dictionary that maps node names to node indices (usually starting from 0 to #nodes-1).
  • embeds_<DATASET_NAME>.npy: The node embeddings generated by DyGET. The shape of the embeddings is #nodes x #time_steps x #embedding_dim.

Note

  • The Reddit dataset is a bit special because it is the only dataset that describes a bipartite graph. The first 60 snapshots are for each of the 60 snapshots. The last snapshot is for the background nodes. The shape of the embeddings is ``

Acknowledgments

We thank members of the CLAWS Lab and SRI International for their feedback and support.