Skip to content

Top1 Solution on OGB Challenge (Graph Property Prediction on HIV dataset)

Notifications You must be signed in to change notification settings

yzhuoning/DeepAUC_OGB_Challenge

Repository files navigation

Deep AUC Maximization on Graph Property Prediction

This repo contains code submission for OGB challenge. Here, we focus on ogbg-molhiv, which is a binary classification task to predict target molecular property, e.g, whether a molecule inhibits HIV virus replication or not. The evaluation metric is AUROC. To our best knowledge, this is the first solution to directly optimize AUC score in this task. Our AUC-Margin loss improves baseline (DeepGCN) to 0.8159 and achieves SOTA performance 0.8352 when jointly training with Neural FingerPrints. Our approaches are implemented in LibAUC, which is a ML library for AUC optimization.

Results on ogbg-molhiv

Our method ranks 1st place as of 10/11/2021 on the leaderboard! We present our results on the ogbg-molhiv dataset with some strong baselines as below:

Method Test AUROC Validation AUROC Parameters Hardware
DeepGCN 0.7858±0.0117 0.8427±0.0063 531,976 Tesla V100 (32GB)
DeeperGCN+FLAG 0.7942±0.0120 0.8425±0.0061 531,976 Tesla V100 (32GB)
Neural FingerPrints 0.8232±0.0047 0.8331±0.0054 2,425,102 Tesla V100 (32GB)
Graphormer 0.8051±0.0053 0.8310±0.0089 47,183,040 Tesla V100 (16GB)
DeepAUC (Ours) 0.8159±0.0059 0.8054±0.0080 1,019,407 Tesla V100 (32GB)
DeepAUC+FPs (Ours) 0.8352±0.0054 0.8238±0.0061 1,019,407** Tesla V100 (32GB)
  • Note that this number** doesn't count the parameters of Random Forest model.

Requirements

  1. Install base packages:
    Python>=3.7
    Pytorch>=1.9.0
    tensorflow>=2.0.0
    pytorch_geometric>=1.6.0
    ogb>=1.3.2 
    dgl>=0.5.3 
    numpy==1.20.3
    pandas==1.2.5
    scikit-learn==0.24.2
    deep_gcns_torch
  2. Install LibAUC (using AUC-Margin loss and PESG optimizer):
    pip install LibAUC

Training

The training process has two steps: 1) we train a DeepGCN model using our AUC-margin loss from scratch. 2) we jointly finetuning the pretrained model from (1) with FingerPrints models.

Training from scratch using AUC-margin loss:

  • Train DeepGCN model with AUC-Margin loss and PESG optimizer by default parameters
python main.py --use_gpu --conv_encode_edge --num_layers 14 --block res+ --gcn_aggr softmax --t 1.0 --learn_t --dropout 0.2 \
            --dataset ogbg-molhiv \
	    --loss auroc \
            --optimizer pesg \
            --batch_size 512 \
	    --lr 0.1 \
            --gamma 500 \
            --margin 1.0 \
            --weight_decay 1e-5 \
            --random_seed 0 \
            --epochs 300

Jointly traininig with FingerPrints Model

  • Extract fingerprints and train Random Forest by following PaddleHelix
python extract_fingerprint.py
python random_forest.py
  • Finetuning pretrained model with FingerPrints model using AUC-margin loss by default parameters
python finetune.py --use_gpu --conv_encode_edge --num_layers 14 --block res+ --gcn_aggr softmax --t 1.0 --learn_t --dropout 0.2 \
            --dataset ogbg-molhiv \
	    --loss auroc \
	    --optimizer pesg \
            --batch_size 512 \
	    --lr 0.01 \
            --gamma 300 \
            --margin 1.0 \
            --weight_decay 1e-5 \
            --random_seed 0 \
            --epochs 100

Results

The results (1) improves the original baseline (DeepGCN) to 0.8159, which is ~3% improvement. The result (2) achieves a higher SOTA performance 0.8352, which is ~1% improvement over previous baselines. For each stage, we train model by 10 times using different random seeds, e.g., 0 to 9.

Citation

If you have any questions, please open an new issue in this repo or contact us @ Zhuoning Yuan [yzhuoning@gmail.com]. If you find this work useful, please cite the following paper for our method and library:

@inproceedings{yuan2021robust,
	title={Large-scale Robust Deep AUC Maximization: A New Surrogate Loss and Empirical Studies on Medical Image Classification},
	author={Yuan, Zhuoning and Yan, Yan and Sonka, Milan and Yang, Tianbao},
	booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
	year={2021}
	}

Reference

About

Top1 Solution on OGB Challenge (Graph Property Prediction on HIV dataset)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages