Benchmark and Model Zoo

Environment

We use the following environment to run all the experiments in this page.

  • Python 3.6
  • PyTorch 0.4.1
  • CUDA 9.0.176
  • CUDNN 7.0.4

VQA-v2

We provide three groups of results (including the accuracies of Overall, Yes/No, Number and Other) for each model on VQA-v2 using different training schemes as follows. We provide pre-trained models for the latter two schemes.

  • Train -> Val: trained on the train split and evaluated on the val split.
  • Train+val -> Test-dev: trained on the train+val splits and evaluated on the test-dev split.
  • Train+val+vg -> Test-dev: trained on the train+val+vg splits and evaluated on the test-dev split.

Note that for one model, the used base learning rate in the two schemes may be different, you should modify this setting in the config file to reproduce the results.

Train -> Val

Model Base lr Overall (%) Yes/No (%) Number (%) Other (%)
BUTD 2e-3 63.84 81.40 43.81 55.78
MFB 7e-4 65.35 83.23 45.31 57.05
MFH 7e-4 66.18 84.07 46.55 57.78
BAN-4 2e-3 65.86 83.53 46.36 57.56
BAN-8 2e-3 66.00 83.61 47.04 57.62
MCAN-small 1e-4 67.17 84.82 49.31 58.48
MCAN-large 7e-5 67.50 85.14 49.66 58.80
MMNasNet-small 1.2e-4 67.79 85.02 52.25 58.80
MMNasNet-large 7e-5 67.98 85.22 52.04 59.09

Train+val -> Test-dev

Model Base lr Overall (%) Yes/No (%) Number (%) Other (%) Download
BUTD 2e-3 66.98 83.28 46.19 57.85 model
MFB 7e-4 68.29 84.64 48.29 58.89 model
MFH 7e-4 69.11 85.56 48.81 59.69 model
BAN-4 1.4e-3 68.9 85.0 49.5 59.56 model
BAN-8 1.4e-3 69.07 85.2 49.63 59.71 model
MCAN-small 1e-4 70.33 86.77 52.14 60.40 model
MCAN-large 5e-5 70.48 86.90 52.11 60.63 model

Train+val+vg -> Test-dev

Model Base lr Overall (%) Yes/No (%) Number (%) Other (%) Download
BUTD 2e-3 67.54 83.48 46.97 58.62 model
MFB 7e-4 68.25 84.79 48.24 58.68 model
MFH 7e-4 68.86 85.38 49.27 59.21 model
BAN-4 1.4e-3 69.31 85.42 50.15 59.91 model
BAN-8 1.4e-3 69.48 85.40 50.82 60.14 model
MCAN-small 1e-4 70.69 87.08 53.16 60.66 model
MCAN-large 5e-5 70.82 87.19 52.56 60.98 model
MMNasNet-small 1e-4 71.24 87.11 56.15 61.08 model
MMNasNet-large 5e-5 71.45 87.29 55.71 61.45 model

GQA

We provide a group of results (including Accuracy, Binary, Open, Validity, Plausibility, Consistency, Distribution) for each model on GQA as follows.

  • Train+val -> Test-dev: trained on the train(balance) + val(balance) splits and evaluated on the test-dev(balance) split.

The results shown in the following are obtained from the online server. Note that the offline Test-dev result is evaluated by the provided offical script, which results in slight difference compared to the online result due to some unknown reasons.

Train+val -> Test-dev

Model Base lr Accuracy (%) Binary (%) Open (%) Validity (%) Plausibility (%) Consistency (%) Distribution Download
BUTD (frcn+bbox) 2e-3 53.38 67.78 40.72 96.62 84.81 77.62 1.26 model
BAN-4 (frcn+bbox) 2e-3 55.01 72.02 40.06 96.94 85.67 81.85 1.04 model
BAN-8 (frcn+bbox) 1e-3 56.19 73.31 41.13 96.77 85.58 84.64 1.09 model
MCAN-small (frcn) 1e-4 53.41 70.29 38.56 96.77 85.32 82.29 1.40 model
MCAN-small (frcn+grid) 1e-4 54.28 71.68 38.97 96.79 85.11 84.49 1.20 model
MCAN-small (frcn+bbox) 1e-4 58.20 75.87 42.66 97.01 85.41 87.99 1.25 model
MCAN-small (frcn+bbox+grid) 1e-4 58.38 76.49 42.45 96.98 84.47 87.36 1.29 model
MCAN-large (frcn+bbox+grid) 5e-5 58.10 76.98 41.50 97.01 85.43 87.34 1.20 model

CLEVR

We provide a group of results (including Overall, Count, Exist, Compare Numbers, Query Attribute, Compare Attribute) for each model on CLEVR as follows.

  • Train -> Val: trained on the train split and evaluated on the val split.

Train -> Val

Model Base lr Overall (%) Count (%) Exist (%) Compare Numbers (%) Query Attribute (%) Compare Attribute (%) Download
MCAN-small 4e-5 98.74 96.81 99.27 98.89 99.53 99.19 model