The GraphStorm V0.4 release contains several major feature enhancements. In this version, we have introduced experimental support for using edge features in GNN message passing computation. Now users can use edge features by setting two new command line (CLI) arguments, --edge-feat-name
and --edge-feat-mp-op
. GraphStorm APIs were also updated to support using edge features in message passing. In addition, we introduced support for DGL’s GraphBolt in this version. GraphBolt is a new data loading module for DGL that enables faster and more efficient graph sampling. For link prediction on Paper100M, we achieved a 1.4X speedup in training and a 3.6X speedup in inference with GraphBolt enabled in GraphStorm. We also enhanced distributed graph processing (GSProcess) to support hard negative sampling, multitask mask generation, and saving and loading numeric feature transformations. We added RotatE and TransE score functions for link prediction. In this version, we added a new GraphStorm example that predicts complex and dynamic network traffic and an example that demonstrates how to use the super-node
method to perform graph-level prediction tasks. We also added a new example that demonstrates how to use SageMaker Pipelines with GraphStorm and how to run GraphBolt-enabled jobs.
Major features
- Support using edge features in GNN message passing computation. Users only need to set two new CLI arguments to use this new feature with an RGCN encoder. #1057, #1070, #1074, #1084, #1088, #1096, #1098, #1104.
- GraphBolt integration. Users can use GraphBolt by setting one argument,
--use-graphbolt
, in graph processing and model training and inference. #1001, #1011, #1024, #1025, # 1029, #1083, #1116. - GSProcessing enhancements: supporting hard negative sampling, multitask mask generation, and saving numeric feature transformations. #994, #1050, #1073, #1085, #1091, #1076, #1117.
New Examples
- Network time series traffic prediction. This example demonstrates how to make time series prediction on a synthetic air transportation traffic by using GraphStorm. #1109.
- Graph-level prediction. This example demonstrates how to use the
super-node
method to perform graph-level prediction tasks using GraphStorm CLIs and APIs. #1021, #1026. - A new notebook example of using customized models with CLIs. #1049, #1087.
- A new notebook example of conducting distributed training pipeline on SageMaker. #1108, #1126.
Minor features
- Link prediction enhancements: Adding RotatE and TransE score functions. Add adjusted mean ranking index link prediction metric. #986, #991, #1031, #1042, #1046, #1061, #1075.
Breaking changes
- API changes:
RelGraphConvLayer
adds two new arguments,edge_feat_name
andedge_feat_mp_op
to support using edge feature, and in itsforward
function, change the input argumentinputs
into two arguments,n_h
ande_h
, for node embeddings and edge embeddings, perspectively.RelationalGCNEncoder
adds two new arguments,edge_feat_name
andedge_feat_mp_op
too. Itsforward
function changes the input argumenth
inton_h
ande_hs
too. #1074. - Decoders, including
EntityClassifier
,EntityRegression
,DenseBiDecoder
,EdgeRegression
,MLPEdgeDecoder
, andMLPEFeatEdgeDecoder
, have a new argument,use_bias
, to allow users to set bias in these decoders. #1111, #1125. - Modify GSProcessing configuration parser to be equivalent to GConstruct. #1117.
Contributors
- Xiang Song from AWS
- Jian Zhang from AWS
- Theodore Vasiloudis from AWS
- Runjie Ma from AWS
- Han Xie from AWS
- Ronald Xu from AWS
- Yuke Wang from Rice University