Abstract
Visual relationship detection, i.e., discovering the interaction between pairs of objects in an image, plays a significant role in image understanding. However, most of recent works only consider visual features, ignoring the implicit effect of common sense. Motivated by the iterative visual reasoning in image recognition, we propose a novel model to take the advantage of common sense in the form of the knowledge graph in visual relationship detection, named Iterative Visual Relationship Detection with Commonsense Knowledge Graph (IVRDC). Our model consists of two modules: a feature module that predicts predicates by visual features and semantic features with a bi-directional RNN; and a commonsense knowledge module that constructs a specific commonsense knowledge graph for predicate prediction. After iteratively combining prediction from both modules, IVRDC updates the memory and commonsense knowledge graph. The final predictions are made by taking the result of each iteration into account with an attention mechanism. Our experiments on the Visual Relationship Detection (VRD) dataset and the Visual Genome (VG) dataset demonstrate that our proposed model is competitive.
Original language | English |
---|---|
Title of host publication | Semantic Technology |
Subtitle of host publication | 9th Joint International Conference, JIST 2019, Hangzhou, China, November 25–27, 2019, Proceedings |
Editors | Xin Wang, Francesca Alessandra Lisi, Guohui Xiao, Elena Botoeva |
Place of Publication | Switzerland |
Publisher | Springer |
Pages | 210-225 |
Number of pages | 16 |
ISBN (Electronic) | ISBN 978-3-030-41407-8 |
ISBN (Print) | 9783030414061 |
DOIs | |
Publication status | Published - 2020 |
Event | Joint International Semantic Technology Conference 2019 - Hangzhou, China Duration: 25 Nov 2019 → 27 Nov 2019 Conference number: 9 https://link.springer.com/book/10.1007/978-981-15-3412-6 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 12032 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | Joint International Semantic Technology Conference 2019 |
---|---|
Abbreviated title | JIST 2019 |
Country/Territory | China |
City | Hangzhou |
Period | 25/11/19 → 27/11/19 |
Internet address |
Keywords
- Commonsense knowledge graph
- Visual Genome
- Visual relationship detection