Benchmark and Model Zoo¶

Environment¶

We use the following environment to run all the experiments in this page.

Python 3.6
PyTorch 0.4.1
CUDA 9.0.176
CUDNN 7.0.4

VQA-v2¶

We provide three groups of results (including the accuracies of Overall, Yes/No, Number and Other) for each model on VQA-v2 using different training schemes as follows. We provide pre-trained models for the latter two schemes.

Train -> Val: trained on the train split and evaluated on the val split.
Train+val -> Test-dev: trained on the train+val splits and evaluated on the test-dev split.
Train+val+vg -> Test-dev: trained on the train+val+vg splits and evaluated on the test-dev split.

Note that for one model, the used base learning rate in the two schemes may be different, you should modify this setting in the config file to reproduce the results.

Train -> Val¶

Model	Base lr	Overall (%)	Yes/No (%)	Number (%)	Other (%)
BUTD	2e-3	63.84	81.40	43.81	55.78
MFB	7e-4	65.35	83.23	45.31	57.05
MFH	7e-4	66.18	84.07	46.55	57.78
BAN-4	2e-3	65.86	83.53	46.36	57.56
BAN-8	2e-3	66.00	83.61	47.04	57.62
MCAN-small	1e-4	67.17	84.82	49.31	58.48
MCAN-large	7e-5	67.50	85.14	49.66	58.80
MMNasNet-small	1.2e-4	67.79	85.02	52.25	58.80
MMNasNet-large	7e-5	67.98	85.22	52.04	59.09

Train+val -> Test-dev¶

Model	Base lr	Overall (%)	Yes/No (%)	Number (%)	Other (%)	Download
BUTD	2e-3	66.98	83.28	46.19	57.85	model
MFB	7e-4	68.29	84.64	48.29	58.89	model
MFH	7e-4	69.11	85.56	48.81	59.69	model
BAN-4	1.4e-3	68.9	85.0	49.5	59.56	model
BAN-8	1.4e-3	69.07	85.2	49.63	59.71	model
MCAN-small	1e-4	70.33	86.77	52.14	60.40	model
MCAN-large	5e-5	70.48	86.90	52.11	60.63	model

Train+val+vg -> Test-dev¶

Model	Base lr	Overall (%)	Yes/No (%)	Number (%)	Other (%)	Download
BUTD	2e-3	67.54	83.48	46.97	58.62	model
MFB	7e-4	68.25	84.79	48.24	58.68	model
MFH	7e-4	68.86	85.38	49.27	59.21	model
BAN-4	1.4e-3	69.31	85.42	50.15	59.91	model
BAN-8	1.4e-3	69.48	85.40	50.82	60.14	model
MCAN-small	1e-4	70.69	87.08	53.16	60.66	model
MCAN-large	5e-5	70.82	87.19	52.56	60.98	model
MMNasNet-small	1e-4	71.24	87.11	56.15	61.08	model
MMNasNet-large	5e-5	71.45	87.29	55.71	61.45	model

GQA¶

We provide a group of results (including Accuracy, Binary, Open, Validity, Plausibility, Consistency, Distribution) for each model on GQA as follows.

Train+val -> Test-dev: trained on the train(balance) + val(balance) splits and evaluated on the test-dev(balance) split.

The results shown in the following are obtained from the online server. Note that the offline Test-dev result is evaluated by the provided offical script, which results in slight difference compared to the online result due to some unknown reasons.

Train+val -> Test-dev¶

Model	Base lr	Accuracy (%)	Binary (%)	Open (%)	Validity (%)	Plausibility (%)	Consistency (%)	Distribution	Download
BUTD (frcn+bbox)	2e-3	53.38	67.78	40.72	96.62	84.81	77.62	1.26	model
BAN-4 (frcn+bbox)	2e-3	55.01	72.02	40.06	96.94	85.67	81.85	1.04	model
BAN-8 (frcn+bbox)	1e-3	56.19	73.31	41.13	96.77	85.58	84.64	1.09	model
MCAN-small (frcn)	1e-4	53.41	70.29	38.56	96.77	85.32	82.29	1.40	model
MCAN-small (frcn+grid)	1e-4	54.28	71.68	38.97	96.79	85.11	84.49	1.20	model
MCAN-small (frcn+bbox)	1e-4	58.20	75.87	42.66	97.01	85.41	87.99	1.25	model
MCAN-small (frcn+bbox+grid)	1e-4	58.38	76.49	42.45	96.98	84.47	87.36	1.29	model
MCAN-large (frcn+bbox+grid)	5e-5	58.10	76.98	41.50	97.01	85.43	87.34	1.20	model

CLEVR¶

We provide a group of results (including Overall, Count, Exist, Compare Numbers, Query Attribute, Compare Attribute) for each model on CLEVR as follows.

Train -> Val: trained on the train split and evaluated on the val split.

Train -> Val¶

Model	Base lr	Overall (%)	Count (%)	Exist (%)	Compare Numbers (%)	Query Attribute (%)	Compare Attribute (%)	Download
MCAN-small	4e-5	98.74	96.81	99.27	98.89	99.53	99.19	model