🌟DyGETViz

Our framework is DyGETViz, which stands for Dynamic Graph Embedding Trajectories Visualization.

Installation

Automatic Installation

conda create -n dygetviz python=3.9 -y
conda activate dygetviz
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

Manual Installation

If you want to manually install the dependencies, run:

conda install scikit-learn pandas numpy matplotlib plotly
conda install -c conda-forge dash dash-daq dash-bootstrap-components biopython
pip install umap

Please refer to the homepage of PyTorch, PyTorch Geometric, and PyTorch Geometric Temporal to install these 3 packages, respectively.

Upgrade to latest code base

git pull
pip install -e .

Demo

Please check our demo at our website.

Download the data

Download all the data from Google Drive
Put both data/ and outputs/ under the root directory of this repo.

Getting Started

Procedures of Generating the Visualization

Step 1: Discrete-Time Dynamic Graph (DTDG) embedding training
- We use the GConvGRU model from PyTorch Geometric Temporal to train embeddings of all datasets
- We extended the dataloader so that we can use a wide variety of data input formats. The original dataloader only used static input at each snapshot.
- Note: This part is not included in the code yet. For now, we directly provide the embeddings.
Output: DTDG embeddings of shape (T, N, D)
- T: The number of timestamps / snapshots
- N: The number of nodes
- D: Embedding dimension

Step 2: Embedding Trajectories Generation

Input: DTDG embeddings of shape (T, N, D)
Output: JSON file that store the embedding trajectory for Dash

Step 3: Visualizing in a Dash app interactively using the JSON file

Users should be able to incrementally add node trajectories / all nodes under a certain category (e.g., normal users v.s. anomalous users) to the visualization
highlighted_nodes: List of nodes to be highlighted in the visualization. We need to specify these nodes because we only show the names of a small number of nodes in the plotly visualization. Otherwise, the generated plot will be too messy.
plot_dtdg.py: Script for generating the visualization

Generate the visualization using the command:

python dygetviz/plot_dtdg.py --dataset_name <DATASET_NAME> --model GConvGRU

Currently, DATASET_NAME can be selected from one of: Ant, Chickenpox, DGraphFin, Reddit

python dygetviz/plot_dtdg.py --dataset_name Chickenpox --model GConvGRU

python dygetviz/plot_dash.py --dataset_name Chickenpox --model GConvGRU

Data Format

dygetviz supports all temporal networks in [Stanford Large Network Dataset Collection] (https://snap.stanford.edu/data/index.html). Basically, each row is a tuple of (source, target, timestamp) representing an edge in the graph snapshot,

edges.tsv

SRC	DST	TIME
1	2	1082040961
3	4	1082155839
5	2	1082414391
6	7	1082439619
8	7	1082439756
9	10	1082440403
...

An optional nodes.tsv can be provided to indicate the node names. If not provided, the node names will be automatically generated as integers starting from 0.

ID  NAME
0   Anna
1   Bob
2   Charlie
3   David
4   Emma
...

You can also specify an additional column to indicate the node label, such as whether the user is a normal user or an anomalous user.

ID  NAME    LABEL
0   Anna    0
1   Bob     1
2   Charlie 0
3   David   0
4   Emma    1
...

Terminology

DG: Dynamic Graphs, which can be categorized into DTDG and CTDG
DTDG: Discrete-Time Dynamic Graphs (the type of graphs we are dealing with)
CTDG: Continuous-Time Dynamic Graphs
Embedding Trajectories: Please refer to the JODIE paper (KDD2019) for more details

Datasets

We provide the following dataset to be viewd in our visualization tool:

Ant: The ant movement dataset from Tracking individuals shows spatial fidelity is a key regulator of ant social organization (Science 2013)
Chickenpox: The chickenpox dataset from the paper Chickenpox Cases in Hungary: a Benchmark Dataset for Spatiotemporal Signal Processing with Graph Neural Networks
HistWords: The historical word co-occurrence dataset from Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change (GitHub) (Website)

Explanation of Each Data File

node2idx: A dictionary that maps node names to node indices (usually starting from 0 to #nodes-1).
embeds_<DATASET_NAME>.npy: The node embeddings generated by DyGET. The shape of the embeddings is #nodes x #time_steps x #embedding_dim.

Note

The Reddit dataset is a bit special because it is the only dataset that describes a bipartite graph. The first 60 snapshots are for each of the 60 snapshots. The last snapshot is for the background nodes. The shape of the embeddings is ``

Acknowledgments

We thank members of the CLAWS Lab and SRI International for their feedback and support.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

🌟DyGETViz

Contents

Installation

Automatic Installation

Manual Installation

Upgrade to latest code base

Demo

Download the data

Getting Started

Procedures of Generating the Visualization

Data Format

Terminology

Datasets

Explanation of Each Data File

Note

Acknowledgments

Files

README.md

Latest commit

History

README.md

File metadata and controls

🌟DyGETViz

Contents

Installation

Automatic Installation

Manual Installation

Upgrade to latest code base

Demo

Download the data

Getting Started

Procedures of Generating the Visualization

Data Format

Terminology

Datasets

Explanation of Each Data File

Note

Acknowledgments