Benchmark and Model Zoo¶
Environment¶
We use the following environment to run all the experiments in this page.
- Python 3.6
- PyTorch 0.4.1
- CUDA 9.0.176
- CUDNN 7.0.4
VQA-v2¶
We provide three groups of results (including the accuracies of Overall, Yes/No, Number and Other) for each model on VQA-v2 using different training schemes as follows. We provide pre-trained models for the latter two schemes.
- Train -> Val: trained on the
train
split and evaluated on theval
split. - Train+val -> Test-dev: trained on the
train+val
splits and evaluated on thetest-dev
split. - Train+val+vg -> Test-dev: trained on the
train+val+vg
splits and evaluated on thetest-dev
split.
Note that for one model, the used base learning rate in the two schemes may be different, you should modify this setting in the config file to reproduce the results.
Train -> Val¶
Model | Base lr | Overall (%) | Yes/No (%) | Number (%) | Other (%) |
---|---|---|---|---|---|
BUTD | 2e-3 | 63.84 | 81.40 | 43.81 | 55.78 |
MFB | 7e-4 | 65.35 | 83.23 | 45.31 | 57.05 |
MFH | 7e-4 | 66.18 | 84.07 | 46.55 | 57.78 |
BAN-4 | 2e-3 | 65.86 | 83.53 | 46.36 | 57.56 |
BAN-8 | 2e-3 | 66.00 | 83.61 | 47.04 | 57.62 |
MCAN-small | 1e-4 | 67.17 | 84.82 | 49.31 | 58.48 |
MCAN-large | 7e-5 | 67.50 | 85.14 | 49.66 | 58.80 |
MMNasNet-small | 1.2e-4 | 67.79 | 85.02 | 52.25 | 58.80 |
MMNasNet-large | 7e-5 | 67.98 | 85.22 | 52.04 | 59.09 |
Train+val -> Test-dev¶
Model | Base lr | Overall (%) | Yes/No (%) | Number (%) | Other (%) | Download |
---|---|---|---|---|---|---|
BUTD | 2e-3 | 66.98 | 83.28 | 46.19 | 57.85 | model |
MFB | 7e-4 | 68.29 | 84.64 | 48.29 | 58.89 | model |
MFH | 7e-4 | 69.11 | 85.56 | 48.81 | 59.69 | model |
BAN-4 | 1.4e-3 | 68.9 | 85.0 | 49.5 | 59.56 | model |
BAN-8 | 1.4e-3 | 69.07 | 85.2 | 49.63 | 59.71 | model |
MCAN-small | 1e-4 | 70.33 | 86.77 | 52.14 | 60.40 | model |
MCAN-large | 5e-5 | 70.48 | 86.90 | 52.11 | 60.63 | model |
Train+val+vg -> Test-dev¶
Model | Base lr | Overall (%) | Yes/No (%) | Number (%) | Other (%) | Download |
---|---|---|---|---|---|---|
BUTD | 2e-3 | 67.54 | 83.48 | 46.97 | 58.62 | model |
MFB | 7e-4 | 68.25 | 84.79 | 48.24 | 58.68 | model |
MFH | 7e-4 | 68.86 | 85.38 | 49.27 | 59.21 | model |
BAN-4 | 1.4e-3 | 69.31 | 85.42 | 50.15 | 59.91 | model |
BAN-8 | 1.4e-3 | 69.48 | 85.40 | 50.82 | 60.14 | model |
MCAN-small | 1e-4 | 70.69 | 87.08 | 53.16 | 60.66 | model |
MCAN-large | 5e-5 | 70.82 | 87.19 | 52.56 | 60.98 | model |
MMNasNet-small | 1e-4 | 71.24 | 87.11 | 56.15 | 61.08 | model |
MMNasNet-large | 5e-5 | 71.45 | 87.29 | 55.71 | 61.45 | model |
GQA¶
We provide a group of results (including Accuracy, Binary, Open, Validity, Plausibility, Consistency, Distribution) for each model on GQA as follows.
- Train+val -> Test-dev: trained on the
train(balance) + val(balance)
splits and evaluated on thetest-dev(balance)
split.
The results shown in the following are obtained from the online server. Note that the offline Test-dev result is evaluated by the provided offical script, which results in slight difference compared to the online result due to some unknown reasons.
Train+val -> Test-dev¶
Model | Base lr | Accuracy (%) | Binary (%) | Open (%) | Validity (%) | Plausibility (%) | Consistency (%) | Distribution | Download |
---|---|---|---|---|---|---|---|---|---|
BUTD (frcn+bbox) | 2e-3 | 53.38 | 67.78 | 40.72 | 96.62 | 84.81 | 77.62 | 1.26 | model |
BAN-4 (frcn+bbox) | 2e-3 | 55.01 | 72.02 | 40.06 | 96.94 | 85.67 | 81.85 | 1.04 | model |
BAN-8 (frcn+bbox) | 1e-3 | 56.19 | 73.31 | 41.13 | 96.77 | 85.58 | 84.64 | 1.09 | model |
MCAN-small (frcn) | 1e-4 | 53.41 | 70.29 | 38.56 | 96.77 | 85.32 | 82.29 | 1.40 | model |
MCAN-small (frcn+grid) | 1e-4 | 54.28 | 71.68 | 38.97 | 96.79 | 85.11 | 84.49 | 1.20 | model |
MCAN-small (frcn+bbox) | 1e-4 | 58.20 | 75.87 | 42.66 | 97.01 | 85.41 | 87.99 | 1.25 | model |
MCAN-small (frcn+bbox+grid) | 1e-4 | 58.38 | 76.49 | 42.45 | 96.98 | 84.47 | 87.36 | 1.29 | model |
MCAN-large (frcn+bbox+grid) | 5e-5 | 58.10 | 76.98 | 41.50 | 97.01 | 85.43 | 87.34 | 1.20 | model |
CLEVR¶
We provide a group of results (including Overall, Count, Exist, Compare Numbers, Query Attribute, Compare Attribute) for each model on CLEVR as follows.
- Train -> Val: trained on the
train
split and evaluated on theval
split.
Train -> Val¶
Model | Base lr | Overall (%) | Count (%) | Exist (%) | Compare Numbers (%) | Query Attribute (%) | Compare Attribute (%) | Download |
---|---|---|---|---|---|---|---|---|
MCAN-small | 4e-5 | 98.74 | 96.81 | 99.27 | 98.89 | 99.53 | 99.19 | model |